Analyzing Data
After creating, collecting, and cleaning data, it's finally ready for analysis. Analyzing data is the process of extracting meaning from a dataset by aggregating, finding patterns, or summarizing data in order to draw conclusions. One simple place to start is with summary statistics. When we have continuous data, we might identify the mean, median, mode, standard deviation, maximum, or minimum values of a dataset. We might use regression to compare two variables to each other. In the case of categorical data, we might use a chi-square test to look at differences. Many of these analyses will lead us to use p-values to determine the extent to which an observation is statistically significant (represented by alpha being <= .05). Just as it's important to understand how our data were created, in data analysis, we need to understand the nature of the data in order to know what tests we can run, the assumptions and requirements of those tests. For example, in the case of our bubble gum flavor study, we do not have enough responses to be able to reliably compare responses from different categories for statistically significant difference. In such cases, we can still analyze the data using simple descriptive statistics. For example, the following tables analyze the distribution of responses as simple percentages for the different sub-groups.
male | TOTAL | 19 | |||
peppermint | cinnamon | watermelon | bubblegum | sour patch | |
really dislike | 0% | 0% | 0% | 0% | 16% |
dislike | 16% | 11% | 16% | 21% | 16% |
meh | 5% | 5% | 11% | 32% | 5% |
like | 32% | 32% | 21% | 11% | 11% |
really like | 5% | 11% | 11% | 5% | 21% |
can't tell | 42% | 42% | 42% | 32% | 32% |
female | TOTAL | 13 | |||
peppermint | cinnamon | watermelon | bubblegum | sour patch | |
really dislike | 0% | 23% | 0% | 0% | 8% |
dislike | 0% | 15% | 0% | 8% | 23% |
meh | 15% | 0% | 0% | 23% | 0% |
like | 31% | 8% | 23% | 31% | 15% |
really like | 8% | 8% | 23% | 15% | 23% |
can't tell | 46% | 46% | 54% | 23% | 31% |
under 18 | 8 | ||||
peppermint | cinnamon | watermelon | bubblegum | sour patch | |
really dislike | 0% | 13% | 0% | 0% | 0% |
dislike | 38% | 25% | 0% | 38% | 25% |
meh | 13% | 0% | 13% | 38% | 13% |
like | 25% | 38% | 50% | 0% | 25% |
really like | 25% | 13% | 25% | 13% | 25% |
can't tell | 0% | 13% | 13% | 13% | 13% |
18-30 | 10 | ||||
peppermint | cinnamon | watermelon | bubblegum | sour patch | |
really dislike | 0% | 10% | 0% | 0% | 10% |
dislike | 0% | 0% | 10% | 0% | 10% |
meh | 0% | 0% | 0% | 10% | 0% |
like | 0% | 0% | 0% | 40% | 0% |
really like | 0% | 10% | 0% | 0% | 10% |
can't tell | 100% | 80% | 90% | 50% | 70% |
31-50 | 6 | ||||
peppermint | cinnamon | watermelon | bubblegum | sour patch | |
really dislike | 0% | 0% | 0% | 0% | 0% |
dislike | 0% | 17% | 0% | 17% | 33% |
meh | 17% | 0% | 0% | 17% | 0% |
like | 50% | 33% | 17% | 17% | 17% |
really like | 0% | 0% | 50% | 33% | 33% |
can't tell | 33% | 50% | 33% | 17% | 17% |
over 50 | 8 | ||||
peppermint | cinnamon | watermelon | bubblegum | sour patch | |
really dislike | 0% | 13% | 0% | 0% | 38% |
dislike | 0% | 13% | 25% | 13% | 13% |
meh | 13% | 13% | 13% | 50% | 0% |
like | 63% | 25% | 25% | 13% | 13% |
really like | 0% | 13% | 0% | 0% | 25% |
can't tell | 25% | 25% | 38% | 25% | 13% |
Usually, it's helpful to analyze the data in a number of different ways. After all, simply knowing the average depth of a river might get a person in over their head!
Think About It...
What sorts of analyses could you run on the India COVID-19 dataset Download India COVID-19 dataset to answer questions about its spread over time?