Did students who attended a review session perform better on the test than those who didn’t?
There are multiple different types of t-tests: the one-sample t-test, independent-samples t-test, and the paired-samples t-test. In this blog, we’ll look at the independent-samples t-test, where we compare two groups or samples. For example, did students who attended a review session perform better on the test than those who didn’t? The null hypothesis would be that there is no difference between the groups, whereas the alternative hypothesis would say that there would be group differences.
An independent-samples t-test requires a categorical independent variable and a continuous dependent variable that is normally distributed (technically the residuals must be normally distributed). The two groups should also be independent (i.e., each participant was only in one group) and not related (i.e., each participant was in both groups).
The test score for an independent-samples t-test is, of course, t. t is computed by subtracting the means of the two samples and dividing by the pooled standard error. The formula is:

There are three variables in the t-test formula: the means, the standard deviations, and the sample size. As the differences between the means increase, so will the t score. That should make sense, because the farther apart the two means are, the less likely the difference is due to chance. As the standard deviation increases, the t score decreases. This is because the more variability there is, the less confident we are about the accuracy of our sample means. Finally, as the sample size increases, so does the t score. This is because the larger our sample size, the more confident we are in the accuracy of our sample means.
The t score is also associated with a p value, which tests for statistical significance. The p value assesses how likely we would obtain this dataset by chance, if the null hypothesis were true. The lower the p value, the less likely it is that the null hypothesis is true. Typically, our alpha level, the threshold for statistical significance was set at .05. So, if our p value is below .05, then we reject the null hypothesis.
It’s very important to know that you should only use a t-test if your independent variable has two levels (e.g., smokers and non-smokers). If you have more than two levels (e.g., freshman, sophomore, junior, senior), then you must use an ANOVA.
So now that we know what an independent-samples t-test is, let’s look at how to perform one on MagicStat (1.1.3). Let’s look at the research question from earlier: did students who attended a review session perform better on the test than those who didn’t? Our alternative hypothesis is that students who went to the review session will do better on the test.
Let’s take a look at our example dataset, “review_session”, which is also available on the “example datasets” at MagicStat.

We’ll begin by selecting the “review session” data file, then clicking “explore”.
As soon as MagicStat loads the file, we see a list of descriptive information on the right-hand side. At the top of the right-hand column are the total number of observations and the first top five observations, which gives us an overview of the dataset.

Below that, MagicStat automatically identifies each variable as either categorical or numerical and provides a descriptive breakdown including means, standard deviations, minimums, maximums, and quartiles.
Next, we have bar graphs for our categorical variables and histograms for our numerical variables. We can use the histogram to visually inspect each variable to see if it is normally distributed.


Now that we’ve inspected our data, we’re ready to perform the analysis. At the top, on the left-hand side, click “select a model”, and select “independent samples t-test”.

Next, select the independent variable, which is “review attendance”. Since there are only two values (“attended review session” and “did not attend review session”), MagicStat automatically selects those two for the two groups. The dependent variable is “test_score”. Finally, click “Analyze”.

The output shows us the mean and standard deviation for each group. The last table shows the results of the t-test, including the degrees of freedom, the t value, p value, and Cohen’s d effect size. In this case, those that attended the review session had a mean test score of 80.58 compared to 75 for those that didn’t attend the review session. The p value is .006, which means that if the null hypothesis were true, and there were no differences between the groups, then we would expect to see this dataset (or one more extreme) about six times in a thousand. That’s very unlikely, so we’ll reject the null hypothesis and conclude that those who attended the review session did better on the test than those who didn’t. This was accompanied by a Cohen’s d of 0.83, which is a fairly strong effect.

One of the assumptions of t-tests is homogeneity of variance (called homoscedasticity), which is just a fancy way of saying that each sample should have similar variances (e.g., the variance of the treatment condition should be the same as the variance for the control condition). Another way to think of it is that the effect of the treatment shouldn’t affect the variance. There are a number of tests that check this assumption, but Levene’s test is perhaps best-known, if only because it’s included by default when running t-tests in SPSS.
Levene’s test includes both a test statistic (W) and a p value. Note that this p value has nothing to do with the p value of your t-test. It is a completely separate test (again, it is measuring whether the variances are equal).
Interpreting Levene’s is similar to hypothesis testing with other inferential tests: if the p value is below your α threshold (typically .05), then you would reject the null hypothesis that the two variances are equal. If you do conclude that the variances are not equal, then MagicStat provides an adjustment that accounts for the violation of this assumption. In the example provided, the p value (0.242) is greater than .05, so we would fail to reject the null hypothesis and proceed as though the two groups have equal variances.
So, let’s see what would happen if we use gender as an independent variable instead of review attendance. In this case, we are curious whether males or females did better on the test.

The p value of 0.575 is much higher than the threshold value of point 0.05. Thus, we conclude that there is no significant differences between males and females on the test scores.

And that’s how you run an independent-samples t-test on MagicStat.
Written by the MagicStat Team
Leave a Reply