Titanic has become one of the most famous ships in history. It was very sad and tragic that the “unsinkable” ship was built in three years but sank within three days after its departure. Passengers were from the wealthiest people in the world to the middle class to emigrants.

Have you ever wondered whether there is a significant relationship between class and survival or gender and survival? What about other factors such as age of passenger, number of siblings or spouses on board, number of parents or children on board and cost of ticket that most predict or explain whether or not someone survived the Titanic? In this blog, we will try to answer these questions.

This blog contains a step by step guide analysis of the Titanic dataset by conducting two different types of analyses using MagicStat. The two analyses reviewed in this blog are the Chi-Square Tests for Independence and the Logistic Regression. Chi-Square Test for Independence are used to evaluate the relationship between two categorical variables. Logistic regressions are used to determine the degree to which a set of independent variables predict categorical variables, or outcomes. We would also like to note that we will show you how to perform a Pearson Correlation using MagicStat as an assumption check for multicollinearity in the instructions for the Logistic regression in this document.

Within the blog, you will find instructions for uploading your data to MagicStat and exploring it. Then you will see a rationale and examples for the two analyses discussed in the blog. Next, you will see step by step instructions for conducting each analysis. The blog also provides step by step instructions for interpreting the results of each analysis. Finally, you will see APA formatted examples demonstrating how to write up the results from each of the analyses.

Chi-Square Test of Independence

1. Uploading and exploring your data.

Begin the analysis by uploading the Titanic.sav and press explore. The .sav indicates this is an SPSS data file.

2. Exploring your data

Once the data is uploaded and you click explore, you should notice on the right side of the screen several data elements. First, you should notice that there are 1309 observations within the data set (See screen shot below).

You should also notice that there are 11 categorical variables, 4 numeric variables, and a table summarizing the numeric variables within the dataset (See screen shots below).

3. Selecting the right model – Examining relationships between categorical variables.

Many times, researchers and students are interested in the relationship between two categorical variables. An example of categorical variables would home ownership and education. For the most part, people either own or rent their home. As such, there would be two categories for the variable of home ownership – own and rent. Additionally, most everyone has some degree of education (e.g., GED, High School, Some College, College Degree). In this example, the researcher may be interested in determining if there is a relationship between home ownership and education. Or, is a person more likely to rent or own depending on their level of education?  

There are many types of analyses that can be used to determine relationships between variables (e.g., correlation and regression). However, when a researcher is interested in examining the relationship between two categorical variables, the best type of analysis for this situation would be the Chi-Square Test for Independence.

There are two types of Chi-Square analyses. One type is referred to as the Chi-Square for Goodness of Fit. Researchers use the Chi-Square for Goodness of Fit analysis to examine the relationship between the proportion within a sample (e.g., data you collected) compared to the proportion within a population (e.g., data that is known). For example, is the proportion of college students who smoke at your university similar to the proportion of college students who smoke in the USA? In this case, we would use the Chi-Square for Goodness of Fit Analysis to determine if the proportion of college students who smoke at your university differs from the proportion of college students who smoke in the USA. If you are interested in examining the relationship between two categorical variables from the sample data set, then the Chi-Square Test for Independence would be used to answer this type of research question.

Model summary – Chi-Square Test for Independence

  • What you need – You will need two categorical variables from the same data set.
  • Example research question – is there a relationship between gender and political affiliation?
  • Assumptions – Lowest expected frequency in any cell should be 5 or more.

4. Running the analysis Chi-Square Test for Independence.

The Titanic Dataset offers several categorical variables that are well suited for running a Chi-Square Test for Independence. The first question we will consider will be: is there a relationship between class and survival? To get started, we will need to select a model to analyze our data. Click the button, select a model to analyze your data (See Screen shot below).

Once you have selected the Chi-Square test for Independence, you should see some changes on your screen. There should be an option to select categorical variables for your analysis. See the example below.

In order to run a Chi-Square Test for Independence we will need to select the two categorical variables, in this case the two variables are class (pclass) and survival (survived).

  • To enter these variables into the analysis, click – Select a categorical variable – and then select pclass.
  • Repeat this step by clicking – Select a categorical variable – and then select survived.
  • In the event you do not see these variables, click shift and refresh on your browser to clear your cache and again click – Select a categorical variable – to select the appropriate variable.
  • The screen shot provided below demonstrates what the screen should look like once the correct variables are selected. If you are seeing the same thing on your screen go ahead and click analyze.

5. Your results of Chi-Square Test for Independence

You should see the following output on your screen after clicking the “Analyze” button.

  • The first output is a crosstabulation of the two categorical variables of class and survival. By scanning left to right, you can see that 123 people from 1st class died and 200 from 1st class survived.
  • You will also see the output for the Chi-Square results of pclass and survived.

6. Interpreting your crosstabulation

  • The first step for interpreting your results is evaluating the crosstabulation for class and survival. We want to ensure we did not violate any of our assumptions so that we can interpret the statistical results of the analysis with confidence. Scan through the table below and ensure that none of count totals are less than 5. For example, the count for the number of people in first class that died equals 123. There are not any cells with a count size smaller than 5. This is good news; we did not violate any of our assumptions for this analysis and can interpret the results with confidence. Additionally, did you notice any patterns in the data? An examination of the crosstabulation indicated that death rate increased by class – 1st class (n = 123), 2nd class (n = 158), and 3rd class (n = 528). However, the increase in the death rate is not enough to conclude the increase is significant. We now need to interpret the Chi-Square results.    

7. Interpreting the Chi-Square results of pclass and survived

  • The second step for interpreting your results is evaluating  the Chi-Square results of pclass and survived. There are several data points within these results. The Chi-square stat is the actual statistic that you will report in writing up the analysis. The df is the degrees of freedom within the analysis. In a Chi-Square Test for Independence the df represents the degree to which an independent variable can vary within an analysis. The formula for df is df = (r-1)(c-1), with r being rows and c being columns. There is one data point within the table below that is relevant to our question of statistical significance, this data point is the p value (0.000). The p value is the probability that the survival rate varies significantly by class. In other words, how likely are the differences between survival rates between classes due to chance? In order to be considered significant, the p value needs to be less than 0.05 – in this case our p value is 0.000 which is less than 0.05.
  • As a result, we can conclude that that there are significant differences in the survival rates between groups based on class amongst passengers on the Titanic since our p value is less than 0.05.
  • As a general rule-of-thumb, p values of < 0.05 are said to be statistically significant group differences. While the word “significant” carries certain connotations in English use, statistical significance can only tell you a result is unlikely to be caused by chance. This only means a difference is likely not a random difference.

8. Writing up your results

In the event you need to write up the results of a Chi-Square Test for Independence we provided an example to guide your efforts.

  • There are several data points you will need for the write up.
  • These data points can be found in the Chi-Square Results of pclass and survived.
  • You will need the values associated with the Chi-Square stat, the degrees of freedom (df), and the p value.

The write up – We are interested in exploring the relationship between class and survival. As such, the main research question was: is there a relationship between class and survival? The results of the analysis indicated that the survival rate varied between the classes considered within the analysis – 1st class (n = 123), 2nd class (n = 158), and 3rd class (n = 528). Additionally, the results of the Chi-Square Test for independence indicated that there are significant differences in death rates by class: c2 (2) = 127.86, p value < 0.001. These results mean that the differences in death rates by class are not due to chance.

In the above write up the symbol c2 stands for Chi-Square. The (2) represents the degrees of freedom within the analysis. The value of 127.86 is the actual Chi-Square statistic. The p < 0.001 indicates that our p value is less than 0.001.

Let’s run another Chi-Square Test for Independence to explore the relationship between gender and survival. Follow along with the instructions and examples provided below.

1. Running another analysis with Chi-Square Test for Independence

The next question we will consider will be: is there a relationship between gender and survival? To get started, we will need to select a model to analyze our data. Click the button, select Chi-Square Test for Independence (See Screen shot below).

Once you have selected the Chi-Square test for Independence, you should see some changes on your screen. There should be an option to select categorical variables for your analysis. See the example below.

In order to run the Chi-Square Test for Independence we need to select the two categorical variables, in this case the two variables are class (gender) and survival (survived). To enter these variables, click – Select a categorical variable – and select gender first and survived second. The screen shot provided below demonstrates what the screen should look like once the correct variables are selected. If you are seeing the same thing on your screen go ahead and click analyze.

2. Results – Chi-Square Test for Independence

You should see the following output on your screen after clicking the “Analyze” button.

  • The first output is a crosstabulation of the two categorical variables of gender and survival. By scanning left to right, you can see that 127 women from 1st class died and 339 from 1st class survived.
  • You will also see the output for the Chi – Square results of gender and survived.

3. Interpreting your crosstabulation

  • The first step for interpreting your results is evaluating the crosstabulation for gender and survival. We want to ensure we did not violate any of our assumptions so that we can interpret the statistical results of the analysis with confidence. Scan through the table below and ensure that none of count totals are less than 5. For example, the count for the number of Females that died equals 127. There are not any cells with a count size smaller than 5. This is good news; we did not violate any of our assumptions for this analysis and can interpret the results with confidence. Additionally, did you notice any patterns in the data? An examination of the crosstabulation indicated that death rate is higher among Males (n = 682) compared to Females (n = 127). However, the difference in death rate by gender is not enough to conclude the difference is statistically significant. We now need to interpret the Chi – Square results. 

4. Interpreting the Chi-Square results of gender and survived

  • The second step for interpreting your results is evaluating  the Chi-Square results of gender and survived. There are several data points within these results. The Chi-Square stat is the actual statistic that you will report in writing up the analysis. The df is the degrees of freedom within the analysis. In a Chi-Square Test for Independence the df represents the degree to which an independent variable can vary within an analysis. The formula for df is df = (r-1)(c-1), with r being rows and c being columns. There is one data point within the table below that is relevant to our question of statistical significance, this data point is the p value (0.000). The p value is the probability that the survival rate varies significantly by gender. In other words, how likely are the differences between survival rates between the genders due to chance? In order to be considered significant, the p value needs to be less than 0.05 – in this case our p value is 0.000 which is less than 0.05.
  • As a result, we can conclude that that there are significant differences in the survival rates between groups based on gender amongst passengers on the Titanic since our p value is less than 0.05.

5. Writing up your results

The write up for the results for this Chi-Square Test for Independence are provided below. There are several data points you will need for the write up. These data points can be found in the Chi-Square Results of gender and survived. You will need the values associated with the Chi-Square stat, the degrees of freedom (df), and the p value.

The researcher was interested in exploring the relationship between gender and survival. As such, the main research question was: is there a relationship between gender and survival? The results of the analysis indicated that the survival rate varied between the genders considered within the analysis – Males (n = 682) compared to Females (n = 127). Additionally, the results of the Chi-Square Test for independence indicated that there are significant differences in death rates by gender: c2 (1) = 365.89, p < 0.001. These results mean that the differences in death rates by gender are not due to chance. In the above write up the symbol c2 stands for Chi-Square. The (1) represents the degrees of freedom within the analysis. The value of 365.89 is the actual Chi-Square statistic. The p < 0.001 indicates that our p value is less than 0.001.

Go to the next page: Analyzing the Titanic Dataset with MagicStat – Part 2

Read More

Go to the previous page: Analyzing the Titanic Dataset with MagicStat – Part 1

Many times, researchers are interested in predicting categorical variables or specific outcomes. An example of a categorical variable, would be students passing an exam (e.g., pass or fail). In this case of passing an exam, there would be two categories, either students pass or fail an exam. In situations like this, researchers may be interested in determining specific factors that predict or explain categories within a categorical variable of interest. Using our example of passing an exam, researchers may be interested in determining the factors that most influence whether a student passes or fails an exam. Time spent studying, previous exam scores, and time spent working a tutor are all examples of variables that might influence whether or not a student passes an exam.  

Logistic Regression

Regression is the main type of analysis that can be used to determine the degree to which a set of variables (e.g., independent variables) explains or predicts scores on an outcome variable (e.g., dependent variable). There are two main types of regression, standard multiple regression and logistic regression. Standard multiple regression is used when a researcher has multiple independent variables, that are scale based, and is interested in testing the degree to which these independent variables explain or predict scores on one scale based dependent variable. Logistic regression is similar to standard multiple regression except for one difference, the type of dependent variable. Logistic regression is just like standard multiple regression in that the researcher has multiple, scale based, independent variables; however, in logistic regression the dependent variable is categorical, there are only two categories. If your dependent variable has more than two categories, then a multinomial logistic regression would need to be used. Going back to our example, passing or failing an exam, the logistic regression would be the best statistical test for answering these types of research questions. If you are interested in learning more about logistic regression, follow along with the example provided below.

  • Model summary – Logistic Regression
    • What you need – You will need several scale based independent variables and one dependent variable that is categorical (e.g. only two categories).
      • Some researchers argue categorical predictors can be used in logistic regression. For simplicity sake, we will only use scale based variables. 
    • Example research question – What factors most predict or explain whether or not someone survived the Titanic?
    • Assumptions – There are no assumptions regarding the distribution of scores for the predictor variables within logistic regression. However, logistic regression is sensitive to predictor variables that are highly intercorrelated, a condition referred to as multicollinearity. 

1. Assumption Test – Correlation Analysis

In order to determine if we have violated the assumption of multicollinearity, we must run a correlation analyzing the degree to which independent variables are correlated with each other. Let’s do this now. The titanic data set should already be uploaded to MagicStat and we will select the Pearson correlation model.

Once we have selected the Pearson Correlation model, we will need to address a few issues.

  • Handling Missing Data? – When running a Pearson Correlation model, we need to tell MagicStat how to handle missing data. Listwise deletion removes cases if any data is missing. Pairwise deletion includes cases if data is available for analysis. Since we are analyzing the correlation between multiple variable and want to maximize power, we will select pairwise deletion.
  • Selecting variables – Magicstat.co automatically selects scale based variables within a dataset when running a Pearson Correlation. In this case, there are four variables. Select all of these variables for the analysis.
    • Age – Age of passenger.
    • Sibsp – Number of siblings or spouses on board.
    • Parch – Number of parents or children on board.
    • Fare – Cost of ticket. 
  • Analyze – Click analyze once you have selected your variables.
  • Output
    • Degrees of Freedom – In this case, we have 1307 which means the number of data cases included in the analysis was 1307. That number is not too different than the 1309 cases within the data set. Only two data cases were excluded from the analysis. We need to pay attention to the difference in these numbers when conducting analyses as large drops in these numbers signal poor data quality or issues in the data.
    • Pearson Correlation (r) – Interpreting the results, the Pearson Correlation table takes a minute to learn how to interpret the results. The data in the table mirror each other on either side of the perfect correlations between the same variables. Notice the red and blue triangles below. The numbers in each triangle mirror each other. In this case, what we are interested in is the degree to which the variables are correlated with each other. Variables with correlations above r > 0.70 signal a problem with multicollinearity. In this case, there are no correlations larger than r > 0.70. Since there are no correlations larger than 0.70, we can confidently assume we have not violated the assumption of multicollinearity and can confidently proceed with the analysis.

2. Running the analysis – Logistic Regression

The Titanic Dataset offers several independent variables and one important dependent variable that is well suited for running a Logistic Regression. Our research question will be: what factors predict the likelihood that someone survived the Titanic? To get started, we will need to select a model to analyze our data. Click the button, select a model to analyze your data (See Screen shot below) and select the Logistic Regression.

Once you have selected the Logistic Regression, you should see some changes on your screen. There should be an option to select categorical independent and dependent variables for your analysis. See the example below.

Independent Variables – Let’s select the independent variables for our analysis. We select the following variables: age (how old the passenger was), sibsp (number of children on board), parch (number of parents or spouses on board), and fare (the cost of the individuals ticket). See the example below. Now that we have selected our independent variables, let’s select our dependent variable.

Dependent Variable – We want to select the following variable – survived. See the example below. Now that we have selected our dependent variables, let’s review our variables.

Reviewing the Model – Let’s review the variables we selected for the model. The independent variables seem correct. We selected age, sibsp, parch, and fare. Notice that there is an option to select categorical for each variable. MagicStat allows researchers to include categorical variables in their logistic regressions. None of our variables are categorical. Ensure none of these boxes are checked. The dependent variable is correct – survived. Now that we have selected our variables, let’s run the analysis.

3. Results – Logistic Regression

You should see the following output on your screen after clicking the “Analyze” button.

  • The first output is an indicator of how many missing cases we have in the data. Logistic regression automatically leverages listwise deletion meaning if a case is missing data for any of the variables included in the analysis the data case will be excluded from the analysis.

4. Interpreting – Overview of logit regression results

  • There are several pieces to this output that need to be interpreted and considered. First, we want to ensure that our dependent variable is correct and in this case it is, survived. We also want to ensure we selected the correct model and we did as the model is logit. Next, we need to interpret the model. When running regressions, researchers are building their hypotheses around the idea that a certain set of variables, sometimes referred to as a model, are likely to predict of influence changes in the dependent variable. To this point, the first step in interpreting the results of a regression is to interpret the overall significance of the model. The first output “Overview of logit regression results” contains the data points we need to determine if the model is significant. Those values are the LLR p-value and the Pseudo R-square. We want our model to be significant and in this case it is, p = 0.000, as the p value is less than 0.05. However, we need to also consider the amount of variance explained by the model, the Pseudo R-squ = 0.071. While there are no hard or fast rules for interpreting the values in the Pseudo R-squ our model only explained 7% of variance in the categories of our dependent variable. In regression, you can have models that are significant that do not explain much variance, or change, in the dependent variable. In cases where the model is significant and does not explain much variance, there are usually independent variables that are not very strong predictors of the dependent variable. Now we must consider the other pieces of output.

5. Interpreting – Details of logit regression results

There are several pieces of information within this table; however, one piece of information is important to understanding which variables are significantly contributing to variance, or differences, in the dependent variable. The column P > |z| contains the significance values for each of the independent variables entered into our model. We want to see significant p values, p < 0.05. A quick scan of the table indicates that each of our variables are significantly contributing to variance, or differences, in the dependent variable. However, one variable is less significant than the others. Which one? Parch or parents or children on board is not as significant as the other variables (p = 0.038). Now that we know each of our independent variables are significantly contributing to the model, we need to understand the degree to which each of these variables is influencing differences in our dependent variable, whether or not someone survived the titanic.

Another piece of information within this table that we need to consider is the direction of the coefficient. The column coeff contains the directional prediction of the specific independent variables. Negative coefficient values indicate increases in the independent variable predict decreases in the likelihood of a specific outcome, in this case survival. Positive coefficient values indicate increases in the independent variable predict increases in the likelihood of a specific outcome, in this case survival. In this case, increases in age and the number of siblings or spouses on board decrease the likelihood that an individual survived the titanic. 

6. Interpreting – Odds ratios and 95% Confidence Intervals

The final table in the output contains a combination of previous tables. The Odds ratios and 95% Confidence Intervals table conveniently provides two pieces of information, the confidence intervals and odds ratios, related to our independent variables that we need to interpret our output. When interpreting the confidence intervals and odds ratios we need to consider a couple of issues. The first issue is the confidence intervals, we want to see confidence intervals that do not contain the value of 1. For example, age does not contain the value of 1 in the confidence intervals (e.g., 0.97 – 0.99). This means that the confidence interval does not contain the value of one and should be considered statistically significant (p < 0.05) and we can assume that the odds ratio is more than likely correct. In the event that the odds ratio contains a value of 1, we cannot conclude the odds ratio is statistically significant, which means there is still equal probability that the variable equally influences the two outcomes in the dependent variable. 

7. Writing up your results

In the event you need to write up the results of a logistic regression we provided an example to guide your efforts.

  • There are several data points you will need for the write up.
  • These data points can be found in the first table Overview of Logistic Regression Results.
  • You will need the values associated with the p value and the Pseudo R-squ.
  • You also need data points found in the Details of logit regression results.
  • You will need the values associated with the p value and the coefficients.

The write up – We were interested in determining if specific variables predicted changes in likelihood an individual survived the Titanic disaster. The independent variables included in the analysis were: age, number of children on board, number of parents or spouses on board, and the cost of the individuals ticket. The dependent variable was whether or not someone survived. We ran a logistic regression to test this research question. The result of the analysis indicated that the model was significant (p < 0.05); however, the model only explained 7% of the variance in likelihood that an individual died or survived the Titanic disaster.

A closer examination of the results indicated that all of the variables included in the model significantly contributed to the model (See Table 1). The results indicated that as age increases the likelihood of surviving the Titanic decreased. Additionally, the results indicated that as the number of siblings or spouses on board increased the likelihood of surviving the Titanic decreased. Lastly, while the cost of the ticket and the number of children on board tested as significant the confidence intervals indicated that we could not confidently conclude that the variables significantly contributed in either direction to the prediction of whether or not an individual survived the Titanic disaster. 

Table 1 – Logistic Regression Results
 CoefficientP0.25 CI0.97 CIOdds Ratio
Age-0.020.000.970.990.98
Fare 0.010.001.011.011.01
Parch 0.180.041.011.201.20
Sibsp-0.300.000.630.740.74
Read More

Previous page: Analyzing Cholesterol Dataset – Part 2

So far all of our analyses have asked questions about the manipulation of a single independent variable. The t-test can compare two groups/levels while the ANOVA can ask about the differences between multiple levels. But, what do we do when there is more than one independent variable being manipulated within our experiment? What if we want to know how these factors interact with each other to produce our final result? To demonstrate how you could assess these questions we’ll again go through our Cholesterol.csv dataset.

Two-Way Mixed ANOVA

To correctly analyze this dataset we use a two-way mixed model ANOVA. It is “two-way” because there are two independent variables being manipulated. It is a “mixed model” because one independent variable, participation time, is within-subjects and the other independent variable, margarine type, is between-subjects.

To run the two-way mixed NOVA model, apply the following steps:

1. Load the Cholesterol.csv data at the top of the MagicStat (version 1.1.3) and press Explore to begin.

2. Select a model to analyze your data.

3. After choosing the Two-Way Mixed ANOVA (Factorial Between and Within Subjects ANOVA) model you will be asked, Is your dataset long or wide format? This data uses one row per participant so we should select wide and proceeded. At this point the left panel of your screen should show the following.

4. Select a between subjects variable

Since we are using a mixed ANOVA model we need to tell the program which column of our dataset denotes the levels of our between subjects variable. You can look to the data preview in the right panel and see the Margarine column labels each participant as having received either margarine type A or B.

data preview

Choose Margarine as your between subjects variable.

5. Naming variables: After specifying a between subjects variable you are asked to name the within subjects variable as well as the dependent measure. Although, this step is optional we highly recommend taking a moment to give your variables useful and meaningful names. In the coming steps there will be many charts and tables to consider; having useful labels for our independent and dependent variables helps us keep all of the factors and relevant comparisons straight in our heads.

We chose the label time for my independent variable and chol for my dependent variable. Avoid using overly long descriptions for these names because long labels will make the resulting charts and tables harder to read. You want to provide the minimum necessary label to remain useful without cluttering the visual space of your figures.

name IV and DV

6. Specify levels of within-subjects variable: The final step before we can run our analysis asks us to choose which columns of our data represent the levels of our time within-subjects variable.

pick IV levels

In the left gray box select Before then click the rightward-facing arrow, >, to select it as one of the time levels. Repeat this process for the After4Weeks and After8Weeks labels.

within-subjects levels selected

When your display looks like the above image click Analyze to see the results of your two-way mixed ANOVA.

Results

The output of a two-way ANOVA can seem daunting so we’ll go through it piece-by-piece. Luckily, we’ve established a strong foundation of understanding by beginning with paired-samples t-tests and the one-way repeated measures ANOVA. The concepts we built up there will prove very helpful in tackling this more complex analysis.

ANOVA summary

The first table of results shown is the breakdown of the sources of variance in our data. As was the case with out one-way ANOVA, each of the rows of interest have sum of square (SS), degrees of freedom (df), mean square (MS), F statistic, and p value columns. Here we will not focus on the full calculations but it is enough to keep in mind that each F-value is essentially a ratio of explainable over unexplained error variance.

To guide our reading of this table it is best to remember the paramaters of the experiment.

  • We manipulated participation time in our study within subjects
  • We manipulated Margarine type between subjects
  • We want to know whether these two factors interact with each other to effect cholesterol level

For our time manipulation we are asking whether the level of the independent time can explain a statistically significant proportion of the variance in our data? Looking to the time row we see the observed F-value of 259.49 and the associated p value of 0.000. As per convention, this p is said to be statistically significant because it is < 0.05. This means we can say yes, the level of our independent variable time has a statistically significant effect on the mean level of cholesterol.

For the Margarine manipulation we are asking a similar question. Does the type of margarine used explain a statistically significant proportion of the variance in our data? Looking to the Margarine row we see an observed F-value of 1.45 and the associated p value of 0.247. As per convention, this p is not statistically significant because it is >= 0.05. We fail to reject the null hypothesis that type of margarine does not have a statistically significant effect on mean level of cholesterol.

Interactions

For consideration of the interaction between these two factors we look to the Margarine X time row. Here the question is not about the effects of our factors in isolation but instead we are asking whether the level(s) our factors have an effect on each other. This type of effect is most easily understood in the domain of medicine where we commonly hear it invoked.

Imagine a patient with two underlying health conditions, both requiring medication to manage them. Independently, each of these medications would improve the health of this patient. But, if the effects of these medications interact with one another then the addition of the second medication will change the effectivness of the intervention.

This interaction can play out in many different ways. Together they could lead to more improvement than would be expected by adding up their independent effects (super addativity). Their combined effectivness could be less than would be expected from adding the effects together (sub-addativity). It could even be dangerous and detrimental to health by combining these medications (cross-over interaction). The important thing to understand is that an interaction means a particular combination of the levels of our factors can produce their own effects on the result.

When we look to the Margarine X time row we see an observed F-value of 4.78 and the associated p value of 0.015. Therefore, we reject the null hypothesis that our two independent variables do not interact with one another.

ANOVA Conclusions

In summary, our table of ANOVA results revealed the following:

  1. There is a statistically significant effect of time
  2. There is not a statistically significant effect of Margarine
  3. There is statistically significant interaction of Margarine x time

These statistically significant ANOVA results only tell us that all levels do not produce the same results. To know which groups differ and the direction(s) of those difference we look to our descriptive statistics and pairwise comparisons.

Descriptive Statistics

ANOVA descriptives

Above we are given the Mean, standard deviation (SD), standard error of the mean (SEM), and number of participants (N) for each of our 6 experimental conditions.

Mean cholesterol level for participants given margarine B decreased across the study. They began at 6.78, dropped to 6.13 after 4 weeks, and ended the 8 week intervention at 6.07.

Mean cholesterol level for participants given margarine A also decreased across the study. They began at 6.04, dropped to 5.55 after 4 weeks, and ended the 8 week intervention at 5.49.

Charts

Although we have all of the raw group means in our descriptive statistics, it is often very helpful to visualize the results of our experiment using charts. Both of the charts below are representing the same data obtained from our descriptive statistics. The only difference between the charts is the variable chosen to place on the X-axis. Using multiple representations of the same data is informative because some patterns “pop out” at us more readily in one configuration or another.

In the first chart we see type of Margarine on the X-axis and each level of time as a separate line. Cholesterol scores are generally lower for the A than the B margarine groups. ANOVA F-table results tell us this between-subjects manipulation of Margarine is not statistically significant (p = 0.247).

Our second chart shows time on the X-axis and type of margarine as separate lines. Here we see the decrease in cholesterol as the study progresses and we also see that this pattern is largely the same for the A and B margarine groups. Differences between the Before and After4Weeks groups are large; differences between the After4Weeks and After8Weeks groups appear small.

Margarine x Time
Margarine x Time

Pairwise Comparisons

With our general understanding of the patterns in our data we can move on to the pairwise post-hoc comparisons. These comparisons will tell us which of our apparent group differences are statistically significant and which are not.

For our purposes, the most important columns are Group 1Group 2Reject, and p value.

  • Group 1 and Group 2 tell us which two experimental conditions are being compared.
  • p value and Reject tell us whether the group difference being compared is statistically significant (Reject = True).
Time x Margarine
  • Confirming our ANOVA result, we see no main effect of Margarine type on mean cholesterol level (p = 0.247).
  • The next two rows show statistically significant simple effects of time with the B-type margarine. More specifically, for participants given B-type margarine there were statistically significant differences between the Before and After4Weeks groups as well as between the Before and After8Weeks groups (p = 0.000 for both comparisons). The next row shows no statistically significant difference between After4Weeks and After8Weeks groups for participants given B-type margarine (p = 0.294).
  • The last three rows of the Margarine Post-Hoc Tests table show the simple effects of time for participants given the A-type margarine. This pattern is largely the same as was observed for participants given B-type margarine. Differences between Before and After4Weeks as well as differences between Before and After8Weeks groups were statistically significant for participants given A-type margarine (p = 0.000 for both comparisons). Just as was seen with B-type margarine, differences between After4Weeks and After8Weeks were not statistically significant for participants given A-type margarine (p = 0.060).
posthoc time
  • The first three rows of the time Post-Hoc Tests show statistically significant differences between each level of the time variable.
    • Before vs After4Weeks (p = 0.000)
    • Before vs After8Weeks (p = 0.000)
    • After4Weeks vs After8Weeks (p = 0.004)
  • The last three rows of the table compare groups given margarine A to groups given margarine B at each level of the time variable.
    • A vs B at Before (p = 0.475)
    • A vs B at After4Weeks (p = 0.633)
    • A vs B at After4Weeks (p = 0.622)
  • None of these three comparisons rises to the level of statistical significance.

Conclusions

Full analysis of our Cholesterol.csv dataset under a two-way mixed ANOVA model shows a statistically significant effect of our time intervention.

Participation in this study lead to a decrease in mean cholesterol level for all experimental groups. Cholesterol significantly dropped from during the 1st 4 weeks of participation and continued to drop (although less dramatically) given an additional 4 weeks of margarine use.

The type of margarine used by participants did not have a statistically significant effect on the mean cholesterol level. Both were equally effective at decreasing group mean cholesterol levels.

There was a statistically significant interaction observed between time and Margarine although none of the post-hoc comparisons shed light upon how this interaction is operating in our study.

It is possible our comparisons of cell means to assess differential effectiveness of margarine types was underpowered due to small sample sizes within each cell (N = 9). When comparing After4Weeks to After8Weeks groups for either margarine A or B we failed to find significant differences. When using a more powerful test which collapsed over type of Margarine the difference between After4Weeks and After8Weeks was significant.

Read More

Cholesterol is a very importance substance in our body for digesting foods, producing hormones, and generating Vitamin D. This blog includes an analysis of an cholesterol dataset retrieved from MASH at The University of Sheffield. This dataset contains a mixture of Between-Subjects (type of margarine) as well as within-subjects factors (length of intervention). This leaves us room to make many comparisons but we will begin with the most straightforward comparison of whether participation in these interventions lead to a change in cholesterol.

Begin your analysis by going to the MagicStat website (version 1.1.3), uploading the dataset Cholesterol.csv and pressing Explore.

This dataset contains a mixture of Between-Subjects (type of margarine) as well as within-subjects factors (length of intervention). This leaves us room to make many comparisons but we will begin with the most straightforward comparison of whether participation in these interventions lead to a change in cholesterol.

Paired-Samples t-Tests

Although there were 2 different types of margarine used we will first begin by asking the simple question: was cholesterol different from the beginning of the experiment until the end of the 8-week program? The appropriate analysis for this is the paired-samples t-test.

After loading your data click select a model to analyze your data and choose Paired Samples t-test

  1. After loading your data click select a model to analyze your data and choose Paired Samples t-test.

2. The next step is to choose variables for groups 1 and 2.

  • select a variable for group 1 and pick Before
  • select a variable for group 2 and pick After8Weeks

In this comparison, we are ignoring the factor of which margarine the participant was assigned to use and simply asking whether 8 weeks of using margarine in our experiment leads to a difference in cholesterol score at the end. This doesn’t tell us everything we want to know but it gives us an idea of whether our intervention is impacting the outcome we are measuring.

3. Press Analyze and you should get the following table of statistical results.

4. Summary of Stats: This table contains the statistics for the two groups of interest. Group 1, Before, and Group 2, After8Weeks. Here we are given the respective group means 6.41 and 5.78 as well as standard deviations (sd) and standard errors (sem). These values are informative in describing our data but alone they cannot tell us whether the observed differences in groups are statistically significant. For this question we move onto the next table.

5. Group 1 and Group 2 Stats: The first three numbers of this table: df, t, and p are relevant to our question of statistical significance. p is the probability of the observed group difference under the null-hypothesis. In other words, how likely would the Before and After8Weeks groups have this observed mean diff by chance?

As a general rule-of-thumb, p values of < 0.05 are said to be statistically significant group differences. While the word “significant” carries certain connotations in English use, this is not the same significance we are talking about with statistical significance. Statistical significance can only tell you a result is unlikely to be caused by chance. This does not mean a result is big, impactful, or useful as an intervention. This only means a difference is likely not a random difference. To ask about the relevance of a difference we need a different measure.

Cohen’s D

Statistical reliability/significance is an important tool to keep us from chasing after results or interventions which only appear to have an effect but are actually due to chance. Knowing our effect is non-random is a good start but in many practical use-cases it is even more important to have a tool that lets us measure the size of the effect we’re having.

Initially, the straightforward idea would be to compare the differences between the two group means. Unfortunately, this kind of raw measure would vary wildly by something as simple as changing the unit of measurement. Imagine two groups of experimenters observing the exact same set of subjects but one is measuring them using inches and another using centimeters. Although the differences they’re observing between the control and experimental groups are identical they’d both get different measures of their effect size. This is a silly example because you can convert between inches and centimeters easily but imagine a more complex situation.

Imagine a business is trying to decide which of two training programs to hold for their employees.

  • Program A: decreases mean employee stress level from 65 to 60
  • Program B: increases mean employee satisfaction from 65 to 70

Both are a change in 5 points but there is no clear way to relate stress and job satisfaction. What do you do?

To make a more informed decision, we can look at each program’s effect size using the cohen’s d statistic. Cohen’s d is so useful because it scales the raw mean difference relative to how much the underlying data already varies. Instead of focusing on the 5 point difference, we ask how much variation is naturally in the data (sd or standard deviation) and compare the raw difference relative to natural variation.

More concretely, if the standard deviation of job stress is 15 and the standard deviation of job satisfaction is 5 we can compute cohen’s d for both these groups.

change = 5                  # both programs change mean scores by 5 points

stress = change / 15        # Cohen's d of 0.33
satisfaction = change / 5   # Cohen's d of 1.0

These effect size statistics tell a very different tale than the raw group differences. The stress reduction program has a small-to-moderate effect size, 0.33, when compared to the job satisfaction program’s large effect size of 1.0. If you were torn between which of the two programs to choose these effect size numbers would be good reason to prefer the job satisfaction program. We can expect more people will be helped and in a bigger way than with the stress reduction program.

Of course, no statistic alone can blindly guide decision making because there is always a question of values outside of the realm of statistics. Maybe stress in the office has been the cause of a lot of recent troubles or has been continually mentioned by employees as in particular need of improvement. Maybe job satisfaction is already so high that you think there would be diminishing returns from improving it further.

Statistics are powerful tools but we must remain thinking and skeptical agents. We empower ourselves when we understand the meaning of our results and we enslave ourselves when we forget to heed those limits.

Returning to our t-test results, we our comparison of the Before and After8Weeks groups produces a cohen’s d of 0.55. This value is considered a moderate effect size between groups. If you picked scores at random from each of the two groups you’d expect the After8Weeks cholesterol level would be lower 65% of the time.

General guidance on Cohen’s d is shown in the table below.

Cohen’s DEffect Size
<= 0.3small
<= 0.5small-to-moderate
<= 0.7moderate
<= 0.8moderate-to-large
>= 0.8large

Next page: Analyzing Cholesterol Dataset – Part 2

Read More

Previous page: Analyzing Cholesterol Dataset – Part 1

Another method of inspecting the data is to use an Analysis of Variance. As with all analysis, the appropriate model depends on the question you are asking. To begin, we will ask “do the levels of our independent variable have an effect on our dependent variable?”.

One-Way Within-Subjects ANOVA

Firstly, make sure you’ve uploaded the Cholesterol.csv dataset. If you’ve been following along since the paired-samples t-test, you can simply go back to the top of the screen on MagicStat (version 1.1.3) and press Explore again to begin this analysis.

1. Next select a model to analyze your data.

For datasets with more than two levels of your independent variable, it is appropriate to use an analysis of variance model to test whether your independent variable can account for the variation in your data. In this case, we are again ignoring the independent variable of type of margarine (A or B) and only looking at the three levels of participation in our study.

Choose the One-Way Within Subjects ANOVA (One-Way Repeated Measures ANOVA) model.

It is “One-Way” because there is only a single independent variable (duration of study participation) with three levels. It is a “Within Subjects” or “Repeated Measures” model because each participant in the study appears in all three levels (BeforeAfter4Weeks, and After8Weeks). The levels vary within each subject and produce repeated measures of each participant.

2. After selecting the model you will be asked Is your dataset long format or wide format?.

Previews and descriptions of both wide format and long format are shown below.

If you look to the right panel of the MagicStat display, you will see a preview of our dataset and notice that is is following the wide format. Each row corresponds to one participant and contains data for each of the three levels of our independent variable.

Select wide and proceed.

3. After specifying our model and data format, next we Select independent variables for analysis. Since we are only looking at the single variable (duration of study participation) we choose all three levels from the dropdown box.

choose independent variables

4. Press Analyze and three tables of results will appear beneath the Output heading.

ANOVA Summary

Summary of sources of variance in our data: When conducting an ANOVA it is important to keep in mind the framework being used to ask our question. In the simplest sense, an ANOVA is asking where the variation in our data is coming from. Do the levels of our independent variables account for an amount of variation in our means over and above what we would expect from pure random chance?

First, imagine a completely naive approach where we throw out all information about how the data was collected. We ignore which subject we are measuring and also ignore which level of our independent variable the observation is coming from. With this impoverished picture all we could do would be to treat the dataset as single sample and calculate it’s grand mean and variance. Why any given data point is higher or lower would be a mystery to us and we wouldn’t be able to say anything about whether or not our independent variable had an impact on our dependent variable.

Breaking it down

Luckily, in the case of a one-way repeated measures model we are not so naive and able to parse our variance into three potential sources:

a. Between Group variation: Variation due to the levels of our independent variable.

If you think about grouping data points from each of the levels of our independent variable you’d have three groups, each group having it’s own mean value. We can then compare these mean values to the grand mean of the undifferentiated naive case.

Question: Why are the group means different than the grand mean?

Answer: The Between Group means are different from our grand mean because of the level of our independent variable they were collected from. This is the logic of how an ANOVA calculation tells us about the impact of our independent variable. If our groupings did not have an impact on our dependent variable then knowing which group an observation comes from would not tell us additional information above the naive case.

b. Subjects variation: Variation due to individual differences between participants.

Ignoring group membership, we can instead group data points by the subject they are collected from. In this view of our dataset, each participant would have their own mean score and each of these mean scores would be different from the grand mean. Why? Because there are individual differences between people. This fact is not surprising or of particular relevance to questions about out experiment but it is useful to pull out the variance of individual differences from the other variation in our study. The statistical power of repeated measures designs is due precisely to our abilities to differentiate between group variation and individual differences.

c. Error: Variation of unknown or unspecified origin.

After accounting for variation in our data due to subjects and between group factors what is left is called “error” or sometimes “residual” variance. This is the stuff in our data that our model is not able to capture. We can know something about how our grouping are effecting outcomes and how individual variation effects results but what remains unexplained is called error.

The F-statistic

ANOVA Summary

Mean Square

In the previous section we talked about sources of variation in our data. Those concepts we were talking about roughly correspond to the values in the Sum of Square column. Unhelpfully, BetweenSubjects, and Error values are all based on different numbers of observations. To correct for this factor, we divide each Sum of Square value by it’s Degree of Freedom to compute a Mean Square. I’ll demonstrate below for the Between sources of variance row.

# Between row calculation
ss = 4.32               # sum of square
df = 2                  # degree of freedom
mean_square = ss / df
# Mean Square is 2.16

This same process of dividing Sum of Square by Degrees of Freedom can be performed for the Error row and yields a Mean Square of 0.01. Given all these pieces we are finally in position to understand our primary statistic, F.

F as a ratio

The F-statistic of a One-Way repeated measures ANOVA table is the ratio of Mean Square Between over Mean Square Error. More meaningfully, we can conceptualize F as the ratio of explainable group-based variance over unexplainable error variance. Imagine the error term frozen at a value of 10; as our explainable variation goes up so does our F-statistic.

ms_error = 10

F = 1 / ms_error        # F = 0.1
F = 10 / ms_error       # F = 1
F = 100 / ms_error      # F = 10

This understanding is what the F-statistic is there to tell us about. Do our groups produce different means than the naive “grand mean” we began with? If so, does the observed difference go beyond what we would expect in a case of pure random chance? To answer that last part we look at the p or Significance value in the rightmost column. As per t-tests, the convention is to consider any p-value < 0.05 to be statistically significant. Again, we cannot say how big a difference is or even which groups are different based on p alone but we can say that all levels of our independent variable are not the same and therefore something is effecting the dependent variable.

Post-Hoc tests

A significant ANOVA result can only tell us that all of our independent variable levels do not produce the same means. To get a more refined picture of our results we look to the next two tables of our output.

Describing Our Data

The table of descriptive statistics is useful for getting a general idea of how your experiment turned out. Reported results include Mean, standard deviation (SD), standard error (SEM), and number of observations (N).

Do the direction and spread of group means make sense for your experiment? Imagine an 8-week stress reduction program which is showing an ever increasing amount of stress as participants engage for longer with the program. A significant ANOVA result only tells you there are differences, not that they are in the directions you supposed or even in a pattern that makes sense. What if stress is increases for the first 4 weeks of a program but then ends up lower at the end of 8 weeks than at the beginning?

Did people increasingly dropout of your study as time went on? The exact questions will be informed by your designs but thinking about our descriptives helps us generate hypotheses to explain what happened in our experiment. Statistics are tools to help us answer specific questions about our data but to explain why those differences are or aren’t there is your job.

For this dataset we see cholesterol began at a mean value of 6.41, decreased to 5.84 after 4 weeks of margarine use, and decreased slightly more to 5.78 after an additional 4 weeks of margarine use. Standard deviations also decreased as participation in the study proceeded (1.19 to 1.12 to 1.10).

ANOVA Descriptives

Making Inferences About Our Data

Finally, we come to our post-hoc inferential statistics. Although our mean group values showed a drop in cholesterol as time in the study increased we do not know which of those differences is statistically significant. It is possible the largest observed difference of 0.57 between the Before and After4Weeks group is statistically significant but the smaller After4Weeks to After8Weeks difference of 0.06 is not.

ANOVA post-hoc tests

Above we see the table of inferential statistics from our analysis. Each row represents a comparison of one group to another. Because we have 3 levels of our independent variable, there are three possible comparisons to make. Each is present in the table of Post-Hoc Tests.

The key values in this table are Mean Differncep value, and Reject. From this we can see all of our groups are producing statistically significant differences from one another. Each of our comparisons is reporting True as to whether or not we should reject the null hypotheses and all p-values are <= 0.05. Combining this with our descriptive statistics we can say the following:

  • Mean Cholesterol levels significantly decreased after 4 weeks of margarine use
  • Mean Cholesterol levels significantly decreased between 4 weeks and 8 weeks of margarine use
  • Participation in our program lead to statistically significant decreases in mean levels of cholesterol

Next page: Analyzing Cholesterol Dataset – Part 3

Read More