Titanic has become one of the most famous ships in history. It was very sad and tragic that the “unsinkable” ship was built in three years but sank within three days after its departure. Passengers were from the wealthiest people in the world to the middle class to emigrants.

Have you ever wondered whether there is a significant relationship between class and survival or gender and survival? What about other factors such as age of passenger, number of siblings or spouses on board, number of parents or children on board and cost of ticket that most predict or explain whether or not someone survived the Titanic? In this blog, we will try to answer these questions.

This blog contains a step by step guide analysis of the Titanic dataset by conducting two different types of analyses using MagicStat. The two analyses reviewed in this blog are the Chi-Square Tests for Independence and the Logistic Regression. Chi-Square Test for Independence are used to evaluate the relationship between two categorical variables. Logistic regressions are used to determine the degree to which a set of independent variables predict categorical variables, or outcomes. We would also like to note that we will show you how to perform a Pearson Correlation using MagicStat as an assumption check for multicollinearity in the instructions for the Logistic regression in this document.

Within the blog, you will find instructions for uploading your data to MagicStat and exploring it. Then you will see a rationale and examples for the two analyses discussed in the blog. Next, you will see step by step instructions for conducting each analysis. The blog also provides step by step instructions for interpreting the results of each analysis. Finally, you will see APA formatted examples demonstrating how to write up the results from each of the analyses.

**Chi-Square Test of Independence**

**1. Uploading and exploring your data.**

Begin the analysis by uploading the Titanic.sav and press explore. The .sav indicates this is an SPSS data file.

**2. Exploring your data**

Once the data is uploaded and you click explore, you should notice on the right side of the screen several data elements. First, you should notice that there are 1309 observations within the data set (See screen shot below).

You should also notice that there are 11 categorical variables, 4 numeric variables, and a table summarizing the numeric variables within the dataset (See screen shots below).

**3. Selecting the right model – Examining relationships between categorical variables.**

Many times, researchers and students are interested in the relationship between two categorical variables. An example of categorical variables would home ownership and education. For the most part, people either own or rent their home. As such, there would be two categories for the variable of home ownership – own and rent. Additionally, most everyone has some degree of education (e.g., GED, High School, Some College, College Degree). In this example, the researcher may be interested in determining if there is a relationship between home ownership and education. Or, is a person more likely to rent or own depending on their level of education?

There are many types of analyses that can be used to determine relationships between variables (e.g., correlation and regression). However, when a researcher is interested in examining the relationship between two categorical variables, the best type of analysis for this situation would be the Chi-Square Test for Independence.

There are two types of Chi-Square analyses. One type is referred to as the Chi-Square for Goodness of Fit. Researchers use the Chi-Square for Goodness of Fit analysis to examine the relationship between the proportion within a sample (e.g., data you collected) compared to the proportion within a population (e.g., data that is known). For example, is the proportion of college students who smoke at your university similar to the proportion of college students who smoke in the USA? In this case, we would use the Chi-Square for Goodness of Fit Analysis to determine if the proportion of college students who smoke at your university differs from the proportion of college students who smoke in the USA. If you are interested in examining the relationship between two categorical variables from the sample data set, then the Chi-Square Test for Independence would be used to answer this type of research question.

**Model summary – Chi-Square Test for Independence**

- What you need – You will need two categorical variables from the same data set.
- Example research question – is there a relationship between gender and political affiliation?
- Assumptions – Lowest expected frequency in any cell should be 5 or more.

**4. Running the analysis Chi-Square Test for Independence.**

The Titanic Dataset offers several categorical variables that are well suited for running a Chi-Square Test for Independence. The first question we will consider will be: * is there a relationship between class and survival?* To get started, we will need to select a model to analyze our data. Click the button, select a model to analyze your data (See Screen shot below).

Once you have selected the Chi-Square test for Independence, you should see some changes on your screen. There should be an option to select categorical variables for your analysis. See the example below.

In order to run a Chi-Square Test for Independence we will need to select the two categorical variables, in this case the two variables are class (pclass) and survival (survived).

- To enter these variables into the analysis, click – Select a categorical variable – and then select pclass.
- Repeat this step by clicking – Select a categorical variable – and then select survived.
- In the event you do not see these variables, click shift and refresh on your browser to clear your cache and again click – Select a categorical variable – to select the appropriate variable.
- The screen shot provided below demonstrates what the screen should look like once the correct variables are selected. If you are seeing the same thing on your screen go ahead and click analyze.

**5. Your results of Chi-Square Test for Independence**

You should see the following output on your screen after clicking the “Analyze” button.

- The first output is a crosstabulation of the two categorical variables of class and survival. By scanning left to right, you can see that
`123`

people from 1^{st}class died and`200`

from 1^{st}class survived. - You will also see the output for the Chi-Square results of
`pclass`

and`survived`

.

**6. Interpreting your crosstabulation**

- The first step for interpreting your results is evaluating the crosstabulation for class and survival. We want to ensure we did not violate any of our assumptions so that we can interpret the statistical results of the analysis with confidence. Scan through the table below and ensure that none of count totals are less than
`5`

. For example, the count for the number of people in first class that died equals`123`

. There are not any cells with a count size smaller than`5`

. This is good news; we did not violate any of our assumptions for this analysis and can interpret the results with confidence. Additionally, did you notice any patterns in the data? An examination of the crosstabulation indicated that death rate increased by class – 1^{st}class (

), 2*n*= 123^{nd}class (

), and 3*n*= 158^{rd}class (

). However, the increase in the death rate is not enough to conclude the increase is significant. We now need to interpret the Chi-Square results.*n*= 528

**7. Interpreting the Chi-Square results of pclass and survived**

- The second step for interpreting your results is evaluating the Chi-Square results of
`pclass`

and`survived`

. There are several data points within these results. The Chi-square stat is the actual statistic that you will report in writing up the analysis. The df is the degrees of freedom within the analysis. In a Chi-Square Test for Independence the df represents the degree to which an independent variable can vary within an analysis. The formula for df is`df = (r-1)(c-1)`

, with r being rows and c being columns. There is one data point within the table below that is relevant to our question of statistical significance, this data point is the*p*value (`0.000`

). The*p*value is the probability that the survival rate varies significantly by class. In other words, how likely are the differences between survival rates between classes due to chance? In order to be considered significant, the*p*value needs to be less than`0.05`

– in this case our*p*value is`0.000`

which is less than`0.05`

.

- As a result, we can conclude that that there are significant differences in the survival rates between groups based on class amongst passengers on the Titanic since our
*p*value is less than`0.05`

.

- As a general rule-of-thumb,
*p*values of`< 0.05`

are said to be statistically significant group differences. While the word “significant” carries certain connotations in English use, statistical significance can only tell you a result is unlikely to be caused by chance. This only means a difference is likely not a random difference.

**8. Writing up your results**

In the event you need to write up the results of a Chi-Square Test for Independence we provided an example to guide your efforts.

- There are several data points you will need for the write up.
- These data points can be found in the Chi-Square Results of pclass and survived.
- You will need the values associated with the Chi-Square stat, the degrees of freedom (df), and the
*p*value.

The write up – We are interested in exploring the relationship between class and survival. As such, the main research question was:

The results of the analysis indicated that the survival rate varied between the classes considered within the analysis – 1*is there a relationship between class and survival?*^{st} class (

), 2*n* = 123^{nd} class (

), and 3*n* = 158^{rd} class (

). Additionally, the results of the Chi-Square Test for independence indicated that there are significant differences in death rates by class: *n* = 528`c`

, ^{2 }(2) = 127.86*p* value `< 0.001`

. These results mean that the differences in death rates by class are not due to chance.

In the above write up the symbol c^{2 }stands for Chi-Square. The (2) represents the degrees of freedom within the analysis. The value of `127.86`

is the actual Chi-Square statistic. The

indicates that our *p* < 0.001*p* value is less than 0.001.

Let’s run another Chi-Square Test for Independence to explore the relationship between gender and survival. Follow along with the instructions and examples provided below.

**1. Running another analysis with Chi-Square Test for Independence**

The next question we will consider will be: * is there a relationship between gender and survival?* To get started, we will need to select a model to analyze our data. Click the button, select Chi-Square Test for Independence (See Screen shot below).

Once you have selected the Chi-Square test for Independence, you should see some changes on your screen. There should be an option to select categorical variables for your analysis. See the example below.

In order to run the Chi-Square Test for Independence we need to select the two categorical variables, in this case the two variables are class (gender) and survival (survived). To enter these variables, click – Select a categorical variable – and select gender first and survived second. The screen shot provided below demonstrates what the screen should look like once the correct variables are selected. If you are seeing the same thing on your screen go ahead and click analyze.

**2. Results – Chi-Square Test for Independence**

You should see the following output on your screen after clicking the “Analyze” button.

- The first output is a crosstabulation of the two categorical variables of gender and survival. By scanning left to right, you can see that
`127`

women from 1^{st}class died and`339`

from 1^{st}class survived. - You will also see the output for the Chi – Square results of gender and survived.

**3. Interpreting your crosstabulation**

- The first step for interpreting your results is evaluating the crosstabulation for gender and survival. We want to ensure we did not violate any of our assumptions so that we can interpret the statistical results of the analysis with confidence. Scan through the table below and ensure that none of count totals are less than
`5`

. For example, the count for the number of Females that died equals`127`

. There are not any cells with a count size smaller than`5`

. This is good news; we did not violate any of our assumptions for this analysis and can interpret the results with confidence. Additionally, did you notice any patterns in the data? An examination of the crosstabulation indicated that death rate is higher among Males (

) compared to Females (*n*= 682

). However, the difference in death rate by gender is not enough to conclude the difference is statistically significant. We now need to interpret the Chi – Square results.*n*= 127

**4. Interpreting the Chi-Square results of gender and survived**

- The second step for interpreting your results is evaluating the Chi-Square results of gender and survived. There are several data points within these results. The Chi-Square stat is the actual statistic that you will report in writing up the analysis. The df is the degrees of freedom within the analysis. In a Chi-Square Test for Independence the
`df`

represents the degree to which an independent variable can vary within an analysis. The formula for df is`df = (r-1)(c-1)`

, with r being rows and c being columns. There is one data point within the table below that is relevant to our question of statistical significance, this data point is the*p*value (`0.000`

). The*p*value is the probability that the survival rate varies significantly by gender. In other words, how likely are the differences between survival rates between the genders due to chance? In order to be considered significant, the*p*value needs to be less than`0.05`

– in this case our*p*value is`0.000`

which is less than`0.05`

.

- As a result, we can conclude that that there are significant differences in the survival rates between groups based on gender amongst passengers on the Titanic since our
*p*value is less than`0.05`

.

**5. Writing up your results**

The write up for the results for this Chi-Square Test for Independence are provided below. There are several data points you will need for the write up. These data points can be found in the Chi-Square Results of gender and survived. You will need the values associated with the Chi-Square stat, the degrees of freedom (df), and the *p* value.

The researcher was interested in exploring the relationship between gender and survival. As such, the main research question was: `is there a relationship between gender and survival?`

The results of the analysis indicated that the survival rate varied between the genders considered within the analysis – Males (

) compared to Females (*n* = 682

). Additionally, the results of the Chi-Square Test for independence indicated that there are significant differences in death rates by gender: *n* = 127`c`

, ^{2 }(1) = 365.89

. These results mean that the differences in death rates by gender are not due to chance. In the above write up the symbol c*p* < 0.001^{2 }stands for Chi-Square. The (1) represents the degrees of freedom within the analysis. The value of `365.89`

is the actual Chi-Square statistic. The

indicates that our *p* < 0.001*p* value is less than `0.001`

.

Go to the next page: Analyzing the Titanic Dataset with MagicStat – Part 2

]]>Many times, researchers are interested in predicting categorical variables or specific outcomes. An example of a categorical variable, would be students passing an exam (e.g., pass or fail). In this case of passing an exam, there would be two categories, either students pass or fail an exam. In situations like this, researchers may be interested in determining specific factors that predict or explain categories within a categorical variable of interest. Using our example of passing an exam, researchers may be interested in determining the factors that most influence whether a student passes or fails an exam. Time spent studying, previous exam scores, and time spent working a tutor are all examples of variables that might influence whether or not a student passes an exam.

**Logistic Regression**

Regression is the main type of analysis that can be used to determine the degree to which a set of variables (e.g., independent variables) explains or predicts scores on an outcome variable (e.g., dependent variable). There are two main types of regression, standard multiple regression and logistic regression. Standard multiple regression is used when a researcher has multiple independent variables, that are scale based, and is interested in testing the degree to which these independent variables explain or predict scores on one scale based dependent variable. Logistic regression is similar to standard multiple regression except for one difference, the type of dependent variable. Logistic regression is just like standard multiple regression in that the researcher has multiple, scale based, independent variables; however, in logistic regression the dependent variable is categorical, there are only two categories. If your dependent variable has more than two categories, then a multinomial logistic regression would need to be used. Going back to our example, passing or failing an exam, the logistic regression would be the best statistical test for answering these types of research questions. If you are interested in learning more about logistic regression, follow along with the example provided below.

**Model summary – Logistic Regression**- What you need – You will need several scale based independent variables and one dependent variable that is categorical (e.g. only two categories).
- Some researchers argue categorical predictors can be used in logistic regression. For simplicity sake, we will only use scale based variables.

- Example research question – What factors most predict or explain whether or not someone survived the Titanic?

- Assumptions – There are no assumptions regarding the distribution of scores for the predictor variables within logistic regression. However, logistic regression is sensitive to predictor variables that are highly intercorrelated, a condition referred to as multicollinearity.

- What you need – You will need several scale based independent variables and one dependent variable that is categorical (e.g. only two categories).

**1. Assumption Test – Correlation Analysis**

In order to determine if we have violated the assumption of multicollinearity, we must run a correlation analyzing the degree to which independent variables are correlated with each other. Let’s do this now. The titanic data set should already be uploaded to MagicStat and we will select the Pearson correlation model.

Once we have selected the Pearson Correlation model, we will need to address a few issues.

*Handling Missing Data?*– When running a Pearson Correlation model, we need to tell MagicStat how to handle missing data. Listwise deletion removes cases if any data is missing. Pairwise deletion includes cases if data is available for analysis. Since we are analyzing the correlation between multiple variable and want to maximize power, we will select pairwise deletion.*Selecting variables*– Magicstat.co automatically selects scale based variables within a dataset when running a Pearson Correlation. In this case, there are four variables. Select all of these variables for the analysis.- Age – Age of passenger.
- Sibsp – Number of siblings or spouses on board.
- Parch – Number of parents or children on board.
- Fare – Cost of ticket.

*Analyze*– Click analyze once you have selected your variables.

*Output*- Degrees of Freedom – In this case, we have
`1307`

which means the number of data cases included in the analysis was`1307`

. That number is not too different than the`1309`

cases within the data set. Only two data cases were excluded from the analysis. We need to pay attention to the difference in these numbers when conducting analyses as large drops in these numbers signal poor data quality or issues in the data. - Pearson Correlation (r) – Interpreting the results, the Pearson Correlation table takes a minute to learn how to interpret the results. The data in the table mirror each other on either side of the perfect correlations between the same variables. Notice the red and blue triangles below. The numbers in each triangle mirror each other. In this case, what we are interested in is the degree to which the variables are correlated with each other. Variables with correlations above

signal a problem with multicollinearity. In this case, there are no correlations larger than*r*> 0.70

. Since there are no correlations larger than*r*> 0.70`0.70`

, we can confidently assume we have not violated the assumption of multicollinearity and can confidently proceed with the analysis.

- Degrees of Freedom – In this case, we have

**2. Running the analysis – Logistic Regression**

The Titanic Dataset offers several independent variables and one important dependent variable that is well suited for running a Logistic Regression. Our research question will be: * what factors predict the likelihood that someone survived the Titanic? *To get started, we will need to select a model to analyze our data. Click the button, select a model to analyze your data (See Screen shot below) and select the Logistic Regression.

Once you have selected the Logistic Regression, you should see some changes on your screen. There should be an option to select categorical independent and dependent variables for your analysis. See the example below.

**Independent Variables** – Let’s select the independent variables for our analysis. We select the following variables: `age`

(how old the passenger was), `sibsp`

(number of children on board), `parch`

(number of parents or spouses on board), and `fare`

(the cost of the individuals ticket). See the example below. Now that we have selected our independent variables, let’s select our dependent variable.

**Dependent Variable** – We want to select the following variable – `survived`

. See the example below. Now that we have selected our dependent variables, let’s review our variables.

**Reviewing the Model** – Let’s review the variables we selected for the model. The independent variables seem correct. We selected `age`

, `sibsp`

, `parch`

, and `fare`

. Notice that there is an option to select categorical for each variable. MagicStat allows researchers to include categorical variables in their logistic regressions. None of our variables are categorical. Ensure none of these boxes are checked. The dependent variable is correct – `survived`

. Now that we have selected our variables, let’s run the analysis.

**3. Results – Logistic Regression**

You should see the following output on your screen after clicking the “Analyze” button.

- The first output is an indicator of how many missing cases we have in the data. Logistic regression automatically leverages listwise deletion meaning if a case is missing data for any of the variables included in the analysis the data case will be excluded from the analysis.

**4. Interpreting – Overview of logit regression results**

- There are several pieces to this output that need to be interpreted and considered. First, we want to ensure that our dependent variable is correct and in this case it is,
`survived`

. We also want to ensure we selected the correct model and we did as the model is logit. Next, we need to interpret the model. When running regressions, researchers are building their hypotheses around the idea that a certain set of variables, sometimes referred to as a model, are likely to predict of influence changes in the dependent variable. To this point, the first step in interpreting the results of a regression is to interpret the overall significance of the model. The first output “Overview of logit regression results” contains the data points we need to determine if the model is significant. Those values are the`LLR p-value`

and the`Pseudo R-square`

. We want our model to be significant and in this case it is,

, as the*p*= 0.000*p*value is less than`0.05`

. However, we need to also consider the amount of variance explained by the model, the`Pseudo R-squ = 0.071`

. While there are no hard or fast rules for interpreting the values in the Pseudo R-squ our model only explained 7% of variance in the categories of our dependent variable. In regression, you can have models that are significant that do not explain much variance, or change, in the dependent variable. In cases where the model is significant and does not explain much variance, there are usually independent variables that are not very strong predictors of the dependent variable. Now we must consider the other pieces of output.

**5. Interpreting – Details of logit regression results**

There are several pieces of information within this table; however, one piece of information is important to understanding which variables are significantly contributing to variance, or differences, in the dependent variable. The column `P > |z|`

contains the significance values for each of the independent variables entered into our model. We want to see significant *p* values,

. A quick scan of the table indicates that each of our variables are significantly contributing to variance, or differences, in the dependent variable. However, one variable is less significant than the others. Which one? Parch or parents or children on board is not as significant as the other variables (*p* < 0.05

). Now that we know each of our independent variables are significantly contributing to the model, we need to understand the degree to which each of these variables is influencing differences in our dependent variable, whether or not someone survived the titanic.*p* = 0.038

Another piece of information within this table that we need to consider is the direction of the coefficient. The column `coeff`

contains the directional prediction of the specific independent variables. Negative coefficient values indicate increases in the independent variable predict decreases in the likelihood of a specific outcome, in this case survival. Positive coefficient values indicate increases in the independent variable predict increases in the likelihood of a specific outcome, in this case survival. In this case, increases in age and the number of siblings or spouses on board decrease the likelihood that an individual survived the titanic.

**6. Interpreting – Odds ratios and 95% Confidence Intervals**

The final table in the output contains a combination of previous tables. The Odds ratios and 95% Confidence Intervals table conveniently provides two pieces of information, the confidence intervals and odds ratios, related to our independent variables that we need to interpret our output. When interpreting the confidence intervals and odds ratios we need to consider a couple of issues. The first issue is the confidence intervals, we want to see confidence intervals that do not contain the value of 1. For example, age does not contain the value of 1 in the confidence intervals (e.g., 0.97 – 0.99). This means that the confidence interval does not contain the value of one and should be considered statistically significant `(`

) and we can assume that the odds ratio is more than likely correct. In the event that the odds ratio contains a value of 1, we cannot conclude the odds ratio is statistically significant, which means there is still equal probability that the variable equally influences the two outcomes in the dependent variable. *p* < 0.05

**7. Writing up your results**

In the event you need to write up the results of a logistic regression we provided an example to guide your efforts.

- There are several data points you will need for the write up.
- These data points can be found in the first table Overview of Logistic Regression Results.
- You will need the values associated with the
*p*value and the`Pseudo R-squ`

.

- You also need data points found in the Details of logit regression results.
- You will need the values associated with the
*p*value and the coefficients.

The write up – We were interested in determining if specific variables predicted changes in likelihood an individual survived the Titanic disaster. The independent variables included in the analysis were: age, number of children on board, number of parents or spouses on board, and the cost of the individuals ticket. The dependent variable was whether or not someone survived. We ran a logistic regression to test this research question. The result of the analysis indicated that the model was significant (

); however, the model only explained *p* < 0.05`7%`

of the variance in likelihood that an individual died or survived the Titanic disaster.

A closer examination of the results indicated that all of the variables included in the model significantly contributed to the model (See Table 1). The results indicated that as age increases the likelihood of surviving the Titanic decreased. Additionally, the results indicated that as the number of siblings or spouses on board increased the likelihood of surviving the Titanic decreased. Lastly, while the cost of the ticket and the number of children on board tested as significant the confidence intervals indicated that we could not confidently conclude that the variables significantly contributed in either direction to the prediction of whether or not an individual survived the Titanic disaster.

Table 1 – Logistic Regression Results | |||||

Coefficient | P | 0.25 CI | 0.97 CI | Odds Ratio | |

Age | -0.02 | 0.00 | 0.97 | 0.99 | 0.98 |

Fare | 0.01 | 0.00 | 1.01 | 1.01 | 1.01 |

Parch | 0.18 | 0.04 | 1.01 | 1.20 | 1.20 |

Sibsp | -0.30 | 0.00 | 0.63 | 0.74 | 0.74 |

So far all of our analyses have asked questions about the manipulation of a single independent variable. The *t*-test can compare two groups/levels while the ANOVA can ask about the differences between multiple levels. But, what do we do when there is more than one independent variable being manipulated within our experiment? What if we want to know how these factors interact with each other to produce our final result? To demonstrate how you could assess these questions we’ll again go through our `Cholesterol.csv`

dataset.

To correctly analyze this dataset we use a two-way mixed model ANOVA. It is “two-way” because there are two independent variables being manipulated. It is a “mixed model” because one independent variable, participation time, is within-subjects and the other independent variable, margarine type, is between-subjects.

To run the two-way mixed NOVA model, apply the following steps:

**1.** Load the `Cholesterol.csv`

data at the top of the MagicStat (version 1.1.3) and press `Explore`

to begin.

**2.** Select a model to analyze your data.

**3.** After choosing the Two-Way Mixed ANOVA (Factorial Between and Within Subjects ANOVA) model you will be asked, Is your dataset long or wide format? This data uses one row per participant so we should select `wide`

and proceeded. At this point the left panel of your screen should show the following.

**4.** `Select a between subjects variable`

Since we are using a mixed ANOVA model we need to tell the program which column of our dataset denotes the levels of our between subjects variable. You can look to the data preview in the right panel and see the `Margarine`

column labels each participant as having received either margarine type `A`

or `B`

.

Choose `Margarine`

as your between subjects variable.

**5.** **Naming variables**: After specifying a between subjects variable you are asked to name the within subjects variable as well as the dependent measure. Although, this step is optional we *highly recommend* taking a moment to give your variables useful and meaningful names. In the coming steps there will be many charts and tables to consider; having useful labels for our independent and dependent variables helps us keep all of the factors and relevant comparisons straight in our heads.

We chose the label `time`

for my independent variable and `chol`

for my dependent variable. Avoid using overly long descriptions for these names because long labels will make the resulting charts and tables harder to read. You want to provide the minimum necessary label to remain useful without cluttering the visual space of your figures.

**6.** **Specify levels of within-subjects variable**: The final step before we can run our analysis asks us to choose which columns of our data represent the levels of our `time`

within-subjects variable.

In the left gray box select `Before`

then click the rightward-facing arrow, `>`

, to select it as one of the `time levels`

. Repeat this process for the `After4Weeks`

and `After8Weeks`

labels.

When your display looks like the above image click `Analyze`

to see the results of your two-way mixed ANOVA.

The output of a two-way ANOVA can seem daunting so we’ll go through it piece-by-piece. Luckily, we’ve established a strong foundation of understanding by beginning with paired-samples *t*-tests and the one-way repeated measures ANOVA. The concepts we built up there will prove very helpful in tackling this more complex analysis.

The first table of results shown is the breakdown of the sources of variance in our data. As was the case with out one-way ANOVA, each of the rows of interest have sum of square (`SS`

), degrees of freedom (`df`

), mean square (`MS`

), `F`

statistic, and * p value* columns. Here we will not focus on the full calculations but it is enough to keep in mind that each

`F`

-value is essentially a ratio of explainable over unexplained error variance.To guide our reading of this table it is best to remember the paramaters of the experiment.

- We manipulated participation
`time`

in our study within subjects - We manipulated
`Margarine`

type between subjects - We want to know whether these two factors interact with each other to effect cholesterol level

For our `time`

manipulation we are asking whether the level of the independent `time`

can explain a statistically significant proportion of the variance in our data? Looking to the `time`

row we see the observed `F`

-value of `259.49`

and the associated * p value* of

`0.000`

. As per convention, this `p`

`< 0.05`

. This means we can say yes, the level of our independent variable `time`

has a statistically significant effect on the mean level of cholesterol.For the `Margarine`

manipulation we are asking a similar question. Does the type of margarine used explain a statistically significant proportion of the variance in our data? Looking to the `Margarine`

row we see an observed `F`

-value of `1.45`

and the associated * p value* of

`0.247`

. As per convention, this `p`

`>= 0.05`

. We fail to reject the null hypothesis that type of margarine does not have a statistically significant effect on mean level of cholesterol.For consideration of the interaction between these two factors we look to the `Margarine X time`

row. Here the question is not about the effects of our factors in isolation but instead we are asking whether the level(s) our factors have an effect on each other. This type of effect is most easily understood in the domain of medicine where we commonly hear it invoked.

Imagine a patient with two underlying health conditions, both requiring medication to manage them. Independently, each of these medications would improve the health of this patient. But, if the effects of these medications interact with one another then the addition of the second medication will change the effectivness of the intervention.

This interaction can play out in many different ways. Together they could lead to more improvement than would be expected by adding up their independent effects (super addativity). Their combined effectivness could be less than would be expected from adding the effects together (sub-addativity). It could even be dangerous and detrimental to health by combining these medications (cross-over interaction). The important thing to understand is that an interaction means a particular combination of the levels of our factors can produce their own effects on the result.

When we look to the `Margarine X time`

row we see an observed `F`

-value of `4.78`

and the associated * p value* of

`0.015`

. Therefore, we reject the null hypothesis that our two independent variables do not interact with one another.In summary, our table of ANOVA results revealed the following:

- There is a statistically significant effect of
`time`

- There is not a statistically significant effect of
`Margarine`

- There is statistically significant interaction of
`Margarine x time`

These statistically significant ANOVA results only tell us that all levels do not produce the same results. To know which groups differ and the direction(s) of those difference we look to our descriptive statistics and pairwise comparisons.

Above we are given the `Mean`

, standard deviation (`SD`

), standard error of the mean (`SEM`

), and number of participants (`N`

) for each of our 6 experimental conditions.

Mean cholesterol level for participants given margarine `B`

decreased across the study. They began at `6.78`

, dropped to `6.13`

after 4 weeks, and ended the 8 week intervention at `6.07`

.

Mean cholesterol level for participants given margarine `A`

also decreased across the study. They began at `6.04`

, dropped to `5.55`

after 4 weeks, and ended the 8 week intervention at `5.49`

.

Although we have all of the raw group means in our descriptive statistics, it is often very helpful to visualize the results of our experiment using charts. Both of the charts below are representing the same data obtained from our descriptive statistics. The only difference between the charts is the variable chosen to place on the X-axis. Using multiple representations of the same data is informative because some patterns “pop out” at us more readily in one configuration or another.

In the first chart we see type of `Margarine`

on the X-axis and each level of `time`

as a separate line. Cholesterol scores are generally lower for the `A`

than the `B`

margarine groups. ANOVA `F`

-table results tell us this between-subjects manipulation of `Margarine`

is not statistically significant (* p = 0.247*).

Our second chart shows `time`

on the X-axis and type of margarine as separate lines. Here we see the decrease in cholesterol as the study progresses and we also see that this pattern is largely the same for the `A`

and `B`

margarine groups. Differences between the `Before`

and `After4Weeks`

groups are large; differences between the `After4Weeks`

and `After8Weeks`

groups appear small.

With our general understanding of the patterns in our data we can move on to the pairwise post-hoc comparisons. These comparisons will tell us which of our apparent group differences are statistically significant and which are not.

For our purposes, the most important columns are `Group 1`

, `Group 2`

, `Reject`

, and * p value*.

`Group 1`

and`Group 2`

tell us which two experimental conditions are being compared.and`p value`

`Reject`

tell us whether the group difference being compared is statistically significant (`Reject = True`

).

- Confirming our ANOVA result, we see
**no main effect of**type on mean cholesterol level (`Margarine`

`p = 0.247`

). - The next two rows show statistically significant simple effects of
`time`

with the`B`

-type margarine. More specifically, for participants given`B`

-type margarine there were statistically significant differences between the`Before`

and`After4Weeks`

groups as well as between the`Before`

and`After8Weeks`

groups (`p = 0.000`

for both comparisons). The next row shows no statistically significant difference between`After4Weeks`

and`After8Weeks`

groups for participants given`B`

-type margarine (`p = 0.294`

). - The last three rows of the
`Margarine Post-Hoc Tests`

table show the simple effects of`time`

for participants given the`A`

-type margarine. This pattern is largely the same as was observed for participants given`B`

-type margarine. Differences between`Before`

and`After4Weeks`

as well as differences between`Before`

and`After8Weeks`

groups were statistically significant for participants given`A`

-type margarine (`p = 0.000`

for both comparisons). Just as was seen with`B`

-type margarine, differences between`After4Weeks`

and`After8Weeks`

were not statistically significant for participants given`A`

-type margarine (`p = 0.060`

).

- The first three rows of the
`time Post-Hoc Tests`

show statistically significant differences between each level of the`time`

variable.`Before`

vs`After4Weeks`

(`p = 0.000`

)`Before`

vs`After8Weeks`

(`p = 0.000`

)`After4Weeks`

vs`After8Weeks`

(`p = 0.004`

)

- The last three rows of the table compare groups given margarine
`A`

to groups given margarine`B`

at each level of the`time`

variable.`A`

vs`B`

at`Before`

(`p = 0.475`

)`A`

vs`B`

at`After4Weeks`

(`p = 0.633`

)`A`

vs`B`

at`After4Weeks`

(`p = 0.622`

)

- None of these three comparisons rises to the level of statistical significance.

Full analysis of our `Cholesterol.csv`

dataset under a two-way mixed ANOVA model shows a statistically significant effect of our `time`

intervention.

Participation in this study lead to a decrease in mean cholesterol level for all experimental groups. Cholesterol significantly dropped from during the 1st 4 weeks of participation and continued to drop (although less dramatically) given an additional 4 weeks of margarine use.

The type of margarine used by participants did not have a statistically significant effect on the mean cholesterol level. Both were equally effective at decreasing group mean cholesterol levels.

There was a statistically significant interaction observed between `time`

and `Margarine`

although none of the post-hoc comparisons shed light upon how this interaction is operating in our study.

It is possible our comparisons of cell means to assess differential effectiveness of margarine types was underpowered due to small sample sizes within each cell (`N = 9`

). When comparing `After4Weeks`

to `After8Weeks`

groups for either margarine `A`

or `B`

we failed to find significant differences. When using a more powerful test which collapsed over type of `Margarine`

the difference between `After4Weeks`

and `After8Weeks`

was significant.

]]>

Begin your analysis by going to the MagicStat website (version 1.1.3), uploading the dataset `Cholesterol.csv`

and pressing `Explore`

.

This dataset contains a mixture of Between-Subjects (type of margarine) as well as within-subjects factors (length of intervention). This leaves us room to make many comparisons but we will begin with the most straightforward comparison of whether participation in these interventions lead to a change in cholesterol.

Although there were 2 different types of margarine used we will first begin by asking the simple question: was cholesterol different from the beginning of the experiment until the end of the 8-week program? The appropriate analysis for this is the paired-samples *t*-test.

After loading your data click `select a model to analyze your data`

and choose `Paired Samples `

*t*-test

- After loading your data click
`select a model to analyze your data`

and choose`Paired Samples`

.*t*-test

2. The next step is to choose variables for groups 1 and 2.

`select a variable for group 1`

and pick`Before`

`select a variable for group 2`

and pick`After8Weeks`

In this comparison, we are ignoring the factor of which margarine the participant was assigned to use and simply asking whether 8 weeks of using margarine in our experiment leads to a difference in cholesterol score at the end. This doesn’t tell us everything we want to know but it gives us an idea of whether our intervention is impacting the outcome we are measuring.

3. Press `Analyze`

and you should get the following table of statistical results.

4. **Summary of Stats**: This table contains the statistics for the two groups of interest. `Group 1`

, Before, and `Group 2`

, After8Weeks. Here we are given the respective group means `6.41`

and `5.78`

as well as standard deviations (`sd`

) and standard errors (`sem`

). These values are informative in describing our data but alone they cannot tell us whether the observed differences in groups are statistically significant. For this question we move onto the next table.

5. **Group 1 and Group 2 Stats**: The first three numbers of this table: `df`

,

, and *t** p* are relevant to our question of statistical significance.

*p*

is the probability of the observed group difference under the null-hypothesis. In other words, how likely would the `Before`

and `After8Weeks`

groups have this observed `mean diff`

by chance?As a general rule-of-thumb, * p* values of < 0.05 are said to be statistically significant group differences. While the word “significant” carries certain connotations in English use, this is not the same significance we are talking about with statistical significance. Statistical significance can only tell you a result is unlikely to be caused by chance.

**Cohen’s D**

Statistical reliability/significance is an important tool to keep us from chasing after results or interventions which only appear to have an effect but are actually due to chance. Knowing our effect is non-random is a good start but in many practical use-cases it is even more important to have a tool that lets us measure the size of the effect we’re having.

Initially, the straightforward idea would be to compare the differences between the two group means. Unfortunately, this kind of raw measure would vary wildly by something as simple as changing the unit of measurement. Imagine two groups of experimenters observing the exact same set of subjects but one is measuring them using inches and another using centimeters. Although the differences they’re observing between the control and experimental groups are identical they’d both get different measures of their effect size. This is a silly example because you can convert between inches and centimeters easily but imagine a more complex situation.

Imagine a business is trying to decide which of two training programs to hold for their employees.

**Program A:**decreases mean employee stress level from 65 to 60**Program B:**increases mean employee satisfaction from 65 to 70

Both are a change in 5 points but there is no clear way to relate stress and job satisfaction. What do you do?

To make a more informed decision, we can look at each program’s effect size using the cohen’s d statistic. Cohen’s d is so useful because it scales the raw mean difference relative to how much the underlying data already varies. Instead of focusing on the 5 point difference, we ask how much variation is naturally in the data (`sd`

or standard deviation) and compare the raw difference relative to natural variation.

More concretely, if the standard deviation of job stress is 15 and the standard deviation of job satisfaction is 5 we can compute cohen’s d for both these groups.

```
change = 5 # both programs change mean scores by 5 points
stress = change / 15 # Cohen's d of 0.33
satisfaction = change / 5 # Cohen's d of 1.0
```

These effect size statistics tell a very different tale than the raw group differences. The stress reduction program has a small-to-moderate effect size, `0.33`

, when compared to the job satisfaction program’s large effect size of `1.0`

. If you were torn between which of the two programs to choose these effect size numbers would be good reason to prefer the job satisfaction program. We can expect more people will be helped and in a bigger way than with the stress reduction program.

Of course, no statistic alone can blindly guide decision making because there is always a question of values outside of the realm of statistics. Maybe stress in the office has been the cause of a lot of recent troubles or has been continually mentioned by employees as in particular need of improvement. Maybe job satisfaction is already so high that you think there would be diminishing returns from improving it further.

Statistics are powerful tools but we must remain thinking and skeptical agents. We empower ourselves when we understand the meaning of our results and we enslave ourselves when we forget to heed those limits.

Returning to our *t*-test results, we our comparison of the `Before`

and `After8Weeks`

groups produces a cohen’s d of `0.55`

. This value is considered a moderate effect size between groups. If you picked scores at random from each of the two groups you’d expect the `After8Weeks`

cholesterol level would be lower 65% of the time.

General guidance on Cohen’s d is shown in the table below.

Cohen’s D | Effect Size |

`<= 0.3` | small |

`<= 0.5` | small-to-moderate |

`<= 0.7` | moderate |

`<= 0.8` | moderate-to-large |

`>= 0.8` | large |

*Next page:* Analyzing Cholesterol Dataset – Part 2

Another method of inspecting the data is to use an Analysis of Variance. As with all analysis, the appropriate model depends on the question you are asking. To begin, we will ask “do the levels of our independent variable have an effect on our dependent variable?”.

Firstly, make sure you’ve uploaded the `Cholesterol.csv`

dataset. If you’ve been following along since the paired-samples *t*-test, you can simply go back to the top of the screen on MagicStat (version 1.1.3) and press `Explore`

again to begin this analysis.

**1.** Next `select a model to analyze your data`

.

For datasets with more than two levels of your independent variable, it is appropriate to use an analysis of variance model to test whether your independent variable can account for the variation in your data. In this case, we are again ignoring the independent variable of type of margarine (A or B) and only looking at the three levels of participation in our study.

Choose the `One-Way Within Subjects ANOVA (One-Way Repeated Measures ANOVA)`

model.

It is “One-Way” because there is only a single independent variable (duration of study participation) with three levels. It is a “Within Subjects” or “Repeated Measures” model because each participant in the study appears in all three levels (`Before`

, `After4Weeks`

, and `After8Weeks`

). The levels vary *within each subject* and produce *repeated measures* of each participant.

**2.** After selecting the model you will be asked `Is your dataset long format or wide format?`

.

Previews and descriptions of both wide format and long format are shown below.

If you look to the right panel of the MagicStat display, you will see a preview of our dataset and notice that is is following the wide format. Each row corresponds to one participant and contains data for each of the three levels of our independent variable.

Select `wide`

and proceed.

**3.** After specifying our model and data format, next we `Select independent variables`

for analysis. Since we are only looking at the single variable (duration of study participation) we choose all three levels from the dropdown box.

**4.** Press `Analyze`

and three tables of results will appear beneath the **Output** heading.

**Summary of sources of variance in our data**: When conducting an ANOVA it is important to keep in mind the framework being used to ask our question. In the simplest sense, an ANOVA is asking where the variation in our data is coming from. Do the levels of our independent variables account for an amount of variation in our means over and above what we would expect from pure random chance?

First, imagine a completely naive approach where we throw out all information about how the data was collected. We ignore which subject we are measuring and also ignore which level of our independent variable the observation is coming from. With this impoverished picture all we could do would be to treat the dataset as single sample and calculate it’s grand mean and variance. Why any given data point is higher or lower would be a mystery to us and we wouldn’t be able to say anything about whether or not our independent variable had an impact on our dependent variable.

*Breaking it down*

Luckily, in the case of a one-way repeated measures model we are not so naive and able to parse our variance into three potential sources:

a. **Between Group variation**: Variation due to the levels of our independent variable.

If you think about grouping data points from each of the levels of our independent variable you’d have three groups, each group having it’s own mean value. We can then compare these mean values to the grand mean of the undifferentiated naive case.

*Question*: Why are the group means different than the grand mean?

*Answer*: The Between Group means are different from our grand mean because of the level of our independent variable they were collected from. This is the logic of how an ANOVA calculation tells us about the impact of our independent variable. If our groupings did not have an impact on our dependent variable then knowing which group an observation comes from would not tell us additional information above the naive case.

b. **Subjects variation**: Variation due to individual differences between participants.

Ignoring group membership, we can instead group data points by the subject they are collected from. In this view of our dataset, each participant would have their own mean score and each of these mean scores would be different from the grand mean. Why? Because there are individual differences between people. This fact is not surprising or of particular relevance to questions about out experiment but it is useful to pull out the variance of individual differences from the other variation in our study. The statistical power of repeated measures designs is due precisely to our abilities to differentiate between group variation and individual differences.

c. **Error**: Variation of unknown or unspecified origin.

After accounting for variation in our data due to **subjects **and **between **group factors what is left is called “error” or sometimes “residual” variance. This is the stuff in our data that our model is not able to capture. We can know something about how our grouping are effecting outcomes and how individual variation effects results but what remains unexplained is called error.

**The F-statistic**

Mean Square

In the previous section we talked about sources of variation in our data. Those concepts we were talking about roughly correspond to the values in the `Sum of Square`

column. Unhelpfully, `Between`

, `Subjects`

, and `Error`

values are all based on different numbers of observations. To correct for this factor, we divide each `Sum of Square`

value by it’s `Degree of Freedom`

to compute a `Mean Square`

. I’ll demonstrate below for the `Between`

sources of variance row.

```
# Between row calculation
ss = 4.32 # sum of square
df = 2 # degree of freedom
mean_square = ss / df
# Mean Square is 2.16
```

This same process of dividing `Sum of Square`

by `Degrees of Freedom`

can be performed for the `Error`

row and yields a `Mean Square`

of `0.01`

. Given all these pieces we are finally in position to understand our primary statistic, `F`

.

F as a ratio

The `F`

-statistic of a One-Way repeated measures ANOVA table is the ratio of `Mean Square Between`

over `Mean Square Error`

. More meaningfully, we can conceptualize `F`

as the ratio of explainable group-based variance over unexplainable error variance. Imagine the error term frozen at a value of `10`

; as our explainable variation goes up so does our `F`

-statistic.

```
ms_error = 10
F = 1 / ms_error # F = 0.1
F = 10 / ms_error # F = 1
F = 100 / ms_error # F = 10
```

This understanding is what the `F`

-statistic is there to tell us about. Do our groups produce different means than the naive “grand mean” we began with? If so, does the observed difference go beyond what we would expect in a case of pure random chance? To answer that last part we look at the `p`

or `Significance`

value in the rightmost column. As per *t*-tests, the convention is to consider any * p*-value

`< 0.05`

to be statistically significant. Again, we cannot say how big a difference is or even which groups are different based on `p`

*Post-Hoc tests*

A significant ANOVA result can only tell us that all of our independent variable levels do not produce the same means. To get a more refined picture of our results we look to the next two tables of our output.

Describing Our Data

The table of descriptive statistics is useful for getting a general idea of how your experiment turned out. Reported results include `Mean`

, standard deviation (`SD`

), standard error (`SEM`

), and number of observations (`N`

).

Do the direction and spread of group means make sense for your experiment? Imagine an 8-week stress reduction program which is showing an ever increasing amount of stress as participants engage for longer with the program. A significant ANOVA result only tells you there are differences, not that they are in the directions you supposed or even in a pattern that makes sense. What if stress is increases for the first 4 weeks of a program but then ends up lower at the end of 8 weeks than at the beginning?

Did people increasingly dropout of your study as time went on? The exact questions will be informed by your designs but thinking about our descriptives helps us generate hypotheses to explain what happened in our experiment. Statistics are tools to help us answer specific questions about our data but to explain why those differences are or aren’t there is your job.

For this dataset we see cholesterol began at a mean value of `6.41`

, decreased to `5.84`

after 4 weeks of margarine use, and decreased slightly more to `5.78`

after an additional 4 weeks of margarine use. Standard deviations also decreased as participation in the study proceeded (`1.19`

to `1.12`

to `1.10`

).

**Making Inferences About Our Data**

Finally, we come to our post-hoc inferential statistics. Although our mean group values showed a drop in cholesterol as time in the study increased we do not know which of those differences is statistically significant. It is possible the largest observed difference of `0.57`

between the `Before`

and `After4Weeks`

group is statistically significant but the smaller `After4Weeks`

to `After8Weeks`

difference of `0.06`

is not.

Above we see the table of inferential statistics from our analysis. Each row represents a comparison of one group to another. Because we have 3 levels of our independent variable, there are three possible comparisons to make. Each is present in the table of `Post-Hoc Tests`

.

The key values in this table are `Mean Differnce`

, `p value`

, and `Reject`

. From this we can see all of our groups are producing statistically significant differences from one another. Each of our comparisons is reporting `True`

as to whether or not we should reject the null hypotheses and all *p*-values are `<= 0.05`

. Combining this with our descriptive statistics we can say the following:

- Mean Cholesterol levels significantly decreased after 4 weeks of margarine use
- Mean Cholesterol levels significantly decreased between 4 weeks and 8 weeks of margarine use
- Participation in our program lead to statistically significant decreases in mean levels of cholesterol

*Next page:* Analyzing Cholesterol Dataset – Part 3

For a Pearson correlation, we need two variables. Typically, both variables need to be continuous, normally distributed, and unbounded, like height or age. If a variable is categorical, like profession, or if there are a lot of bounded scores, like a lot of 0s or 100s on a test, it won’t work.

The test score for a Pearson correlation is *r*, which has a range from -1 to +1. The *r* score tells you two things about the relationship between the two variables: the strength and the direction of the relationship. The larger the absolute value of the *r* score, the stronger the relationship. If the number is positive, then the two variables are directly related: as one goes up, the other goes up. If the value is negative, then they are inversely related: as one goes up, the other goes down, and vice-versa.

It’s important to remember that although a Pearson correlation can identify a relationship between two variables, it cannot (by itself) determine whether there is a causal relationship, let alone which variable is causing the other. Some relationships are clearly the product of a third variable. For example, ice cream sales are positively correlated with drownings. Now, does buying ice cream cause people to drown? Of course not. In reality, a third variable (temperature) is responsible for the relationship between ice cream and drowning: as it gets hotter, people are more likely to eat ice cream and more likely to go swimming.

The *r* score is also associated with a *p* value, which tests for statistical significance. The *p* value assesses how likely we would obtain this dataset by chance, if the null hypothesis were true. So, the lower the *p* value, the less likely it is that the null hypothesis is true. Typically, our alpha level, the threshold for statistical significance, is set at .05. That is, if our *p* value is below .05, then we reject the null hypothesis.

The *p* value for a Pearson correlation is governed by two things: the strength of the relationship, and the degrees of freedom. The stronger the relationship (either negative or positive), the lower the *p* value. The degrees of freedom for a Pearson correlation is N minus 2, so the larger your sample size, the more degrees of freedom, and the lower your *p* value.

So now that we know what a correlation is, let’s look at an example. Let’s say that we want to know whether a person’s IQ is related to their income. We have the following dataset.

Our hypothesis is that smarter people are more skilled and in higher demand, and therefore make more money. However, the relationship between IQ and income isn’t perfect, is it? There’s a lot more that goes into a person’s income than just their IQ: what field they work in, how much experience they have, even where they live. So, it won’t be a perfect relationship between IQ and income, and it probably won’t even be a particularly strong relationship. So, we’ll hypothesize a moderate, positive relationship between IQ and income. In general, we want to have hypotheses that are backed by theory. That way we can avoid “fishing expeditions” which throw variables together randomly. Performing a test without a hypothesis grounded in theory increases the likelihood that any relationship you might find is just due to chance. In the last column, we also have the foot size of each individual. Obviously, we would not hypothesize any difference between foot size and either IQ or income.

So now that we have our hypothesis, let’s see how to perform a correlation on MagicStat (version 1.1.3).

**1-) Select a data file**

Select your own dataset by clicking the “Choose a data file” button. If you would like to use a sample data file, click “Sample datasets” on the toolbar, save it to your hard drive, then click “Choose a data file” and navigate to where you saved it.

**2-) Explore the dataset**

After you select your dataset, click the “Explore” button.

After you select your dataset, click the “Explore” button. On the right side of the window is information-at-a-glance about your dataset, including variable information, bar graphs, and histograms.

**4- Choose the “Pearson Correlation” model**

Click “Select a model to analyze your data”, and select “Pearson Correlation” on the dropdown.

**5- Choose variables**

Click the “Select variables” button, and pick which variables you want to include in the model. Here, we’re selecting “IQ”, “Income” and “Foot_Size”.

**6- Analyze the dataset**

Finally, click the “Analyze” button.

**Interpreting results**

Now it is time to interpret the results we obtained in the previous steps.

In a Pearson correlation, the degrees of freedom is purely a function of sample size, N minus 2. So, it is 52.

Next is our correlation table. We have a moderate correlation between IQ and Income at .41, as we hypothesized, and no correlation between Foot_Size and IQ or Income.

Below that is the *p* value for each relationship, and we see that moderate correlation between IQ and Income has a *p* value of 0.02, which means that if there were no relationship between IQ and Income, we’d expect to get this dataset about two times out of a thousand — not very likely!. And the *p* values for Foot_Size-IQ and Foot_Size-Income are close to 1, which means it’s not very likely that there is a relationship between them.

After the correlation table, MagicStat gives us some graphs. First is a correlation heatmap, to show where the strongest relationships are. We can see the moderate relationship between IQ and Income in purple, and the lack of relationship with Foot_Size in blue.

Then, we can select a scatterplot to visualize the relationship and check for outliers. If we look at the IQ-Income scatterplot, there do not seem to be any obvious outliers. With this scatterplot and a theoretical link between IQ and Income, we can feel confident in the relationship we found in our dataset.

**Written by the MagicStat Team**

There are multiple different types of *t*-tests: the one-sample *t*-test, independent-samples *t*-test, and the paired-samples *t*-test. In this blog, we’ll look at the independent-samples *t*-test, where we compare two groups or samples. For example, did students who attended a review session perform better on the test than those who didn’t? The null hypothesis would be that there is no difference between the groups, whereas the alternative hypothesis would say that there would be group differences.

An independent-samples *t*-test requires a categorical independent variable and a continuous dependent variable that is normally distributed (technically the residuals must be normally distributed). The two groups should also be independent (i.e., each participant was only in one group) and not related (i.e., each participant was in both groups).

The test score for an independent-samples *t*-test is, of course, *t*. *t *is computed by subtracting the means of the two samples and dividing by the pooled standard error. The formula is:

There are three variables in the t-test formula: the **means**, the **standard deviations**, and the **sample size**. As the differences between the means increase, so will the t score. That should make sense, because the farther apart the two means are, the less likely the difference is due to chance. As the standard deviation increases, the t score decreases. This is because the more variability there is, the less confident we are about the accuracy of our sample means. Finally, as the sample size increases, so does the t score. This is because the larger our sample size, the more confident we are in the accuracy of our sample means.

The *t* score is also associated with a *p* value, which tests for statistical significance. The *p* value assesses how likely we would obtain this dataset by chance, if the null hypothesis were true. The lower the *p* value, the less likely it is that the null hypothesis is true. Typically, our alpha level, the threshold for statistical significance was set at .05. So, if our *p* value is below .05, then we reject the null hypothesis.

It’s very important to know that you should only use a *t*-test if your independent variable has two levels (e.g., smokers and non-smokers). If you have more than two levels (e.g., freshman, sophomore, junior, senior), then you must use an ANOVA.

So now that we know what an independent-samples *t*-test is, let’s look at how to perform one on MagicStat (1.1.3). Let’s look at the research question from earlier: did students who attended a review session perform better on the test than those who didn’t? Our alternative hypothesis is that students who went to the review session will do better on the test.

Let’s take a look at our example dataset, “review_session”, which is also available on the “example datasets” at MagicStat.

We’ll begin by selecting the “review session” data file, then clicking “explore”.

As soon as MagicStat loads the file, we see a list of descriptive information on the right-hand side. At the top of the right-hand column are the total number of observations and the first top five observations, which gives us an overview of the dataset.

Below that, MagicStat automatically identifies each variable as either categorical or numerical and provides a descriptive breakdown including means, standard deviations, minimums, maximums, and quartiles.

Next, we have bar graphs for our categorical variables and histograms for our numerical variables. We can use the histogram to visually inspect each variable to see if it is normally distributed.

Now that we’ve inspected our data, we’re ready to perform the analysis. At the top, on the left-hand side, click “select a model”, and select “independent samples *t*-test”.

Next, select the independent variable, which is “review attendance”. Since there are only two values (“attended review session” and “did not attend review session”), MagicStat automatically selects those two for the two groups. The dependent variable is “test_score”. Finally, click “Analyze”.

The output shows us the mean and standard deviation for each group. The last table shows the results of the *t*-test, including the degrees of freedom, the *t* value, *p* value, and Cohen’s *d* effect size. In this case, those that attended the review session had a mean test score of 80.58 compared to 75 for those that didn’t attend the review session. The *p* value is .006, which means that if the null hypothesis were true, and there were no differences between the groups, then we would expect to see this dataset (or one more extreme) about six times in a thousand. That’s very unlikely, so we’ll reject the null hypothesis and conclude that those who attended the review session did better on the test than those who didn’t. This was accompanied by a Cohen’s *d* of 0.83, which is a fairly strong effect.

One of the assumptions of *t*-tests is homogeneity of variance (called homoscedasticity), which is just a fancy way of saying that each sample should have similar variances (e.g., the variance of the treatment condition should be the same as the variance for the control condition). Another way to think of it is that the effect of the treatment shouldn’t affect the variance. There are a number of tests that check this assumption, but Levene’s test is perhaps best-known, if only because it’s included by default when running *t*-tests in SPSS.

Levene’s test includes both a test statistic (W) and a *p* value. Note that this *p* value has nothing to do with the p value of your t-test. It is a completely separate test (again, it is measuring whether the variances are equal).

Interpreting Levene’s is similar to hypothesis testing with other inferential tests: if the *p* value is below your α threshold (typically .05), then you would reject the null hypothesis that the two variances are equal. If you do conclude that the variances are not equal, then MagicStat provides an adjustment that accounts for the violation of this assumption. In the example provided, the *p* value (0.242) is greater than .05, so we would fail to reject the null hypothesis and proceed as though the two groups have equal variances.

So, let’s see what would happen if we use gender as an independent variable instead of review attendance. In this case, we are curious whether males or females did better on the test.

The *p* value of 0.575 is much higher than the threshold value of point 0.05. Thus, we conclude that there is no significant differences between males and females on the test scores.

And that’s how you run an independent-samples *t*-test on MagicStat.

**Written by the MagicStat Team**

Welcome to the Merjek Blog, the blog and newsroom for Merjek Inc, a data service company dedicated to solving data problems of people and organizations anytime anywhere.

On this blog, we plan to share articles about data related topics. For example, we will cover topics of data freelancing, data visualization, statistical analysis, data processing, etc. If you are dealing with data and interested in learning something more, then this blog is for you.

We will also share with you news and releases relating to our product(s) and services. We expect that other topics will come up along the way and hope you will find them interesting.

Please feel free to comment on our blog posts. On the other hand, comments that are offensive, irrelevant or disrespectful will be removed. For example, your comment will be deleted if you simply come by to advertise your irrelevant business. However, we welcome your perspective if you truly want to engage with people and content.

Thanks for visiting this blog. We’re excited to have you here!

The Merjek Blog Team

]]>