Analyzing Data

After creating, collecting, and cleaning data, it's finally ready for analysis.  Analyzing data is the process of extracting meaning from a dataset by aggregating, finding patterns, or summarizing data in order to draw conclusions.  One simple place to start is with summary statistics.  When we have continuous data, we might identify the mean, median, mode, standard deviation, maximum, or minimum values of a dataset.  We might use regression to compare two variables to each other.  In the case of categorical data, we might use a chi-square test to look at differences.  Many of these analyses will lead us to use p-values to determine the extent to which an observation is statistically significant (represented by alpha being <= .05).   Just as it's important to understand how our data were created, in data analysis, we need to understand the nature of the data in order to know what tests we can run, the assumptions and requirements of those tests.  For example, in the case of our bubble gum flavor study, we do not have enough responses to be able to reliably compare responses from different categories for statistically significant difference.  In such cases, we can still analyze the data using simple descriptive statistics.  For example, the following tables analyze the distribution of responses as simple percentages for the different sub-groups.

male TOTAL 19
peppermint cinnamon watermelon bubblegum sour patch
really dislike 0% 0% 0% 0% 16%
dislike 16% 11% 16% 21% 16%
meh 5% 5% 11% 32% 5%
like 32% 32% 21% 11% 11%
really like 5% 11% 11% 5% 21%
can't tell 42% 42% 42% 32% 32%
female TOTAL 13
peppermint cinnamon watermelon bubblegum sour patch
really dislike 0% 23% 0% 0% 8%
dislike 0% 15% 0% 8% 23%
meh 15% 0% 0% 23% 0%
like 31% 8% 23% 31% 15%
really like 8% 8% 23% 15% 23%
can't tell 46% 46% 54% 23% 31%
under 18 8
peppermint cinnamon watermelon bubblegum sour patch
really dislike 0% 13% 0% 0% 0%
dislike 38% 25% 0% 38% 25%
meh 13% 0% 13% 38% 13%
like 25% 38% 50% 0% 25%
really like 25% 13% 25% 13% 25%
can't tell 0% 13% 13% 13% 13%
18-30 10
peppermint cinnamon watermelon bubblegum sour patch
really dislike 0% 10% 0% 0% 10%
dislike 0% 0% 10% 0% 10%
meh 0% 0% 0% 10% 0%
like 0% 0% 0% 40% 0%
really like 0% 10% 0% 0% 10%
can't tell 100% 80% 90% 50% 70%
31-50 6
peppermint cinnamon watermelon bubblegum sour patch
really dislike 0% 0% 0% 0% 0%
dislike 0% 17% 0% 17% 33%
meh 17% 0% 0% 17% 0%
like 50% 33% 17% 17% 17%
really like 0% 0% 50% 33% 33%
can't tell 33% 50% 33% 17% 17%
over 50 8
peppermint cinnamon watermelon bubblegum sour patch
really dislike 0% 13% 0% 0% 38%
dislike 0% 13% 25% 13% 13%
meh 13% 13% 13% 50% 0%
like 63% 25% 25% 13% 13%
really like 0% 13% 0% 0% 25%
can't tell 25% 25% 38% 25% 13%

Usually, it's helpful to analyze the data in a number of different ways.  After all, simply knowing the average depth of a river might get a person in over their head!

Think About It...

What sorts of analyses could you run on the Download India COVID-19 dataset

to answer questions about its spread over time?