Analyzing Data

After creating, collecting, and cleaning data, it's finally ready for analysis. Analyzing data is the process of extracting meaning from a dataset by aggregating, finding patterns, or summarizing data in order to draw conclusions. One simple place to start is with summary statistics. When we have continuous data, we might identify the mean, median, mode, standard deviation, maximum, or minimum values of a dataset. We might use regression to compare two variables to each other. In the case of categorical data, we might use a chi-square test to look at differences. Many of these analyses will lead us to use p-values to determine the extent to which an observation is statistically significant (represented by alpha being <= .05). Just as it's important to understand how our data were created, in data analysis, we need to understand the nature of the data in order to know what tests we can run, the assumptions and requirements of those tests. For example, in the case of our bubble gum flavor study, we do not have enough responses to be able to reliably compare responses from different categories for statistically significant difference. In such cases, we can still analyze the data using simple descriptive statistics. For example, the following tables analyze the distribution of responses as simple percentages for the different sub-groups.

male	TOTAL	19
	peppermint	cinnamon	watermelon	bubblegum	sour patch
really dislike	0%	0%	0%	0%	16%
dislike	16%	11%	16%	21%	16%
meh	5%	5%	11%	32%	5%
like	32%	32%	21%	11%	11%
really like	5%	11%	11%	5%	21%
can't tell	42%	42%	42%	32%	32%

female	TOTAL	13
	peppermint	cinnamon	watermelon	bubblegum	sour patch
really dislike	0%	23%	0%	0%	8%
dislike	0%	15%	0%	8%	23%
meh	15%	0%	0%	23%	0%
like	31%	8%	23%	31%	15%
really like	8%	8%	23%	15%	23%
can't tell	46%	46%	54%	23%	31%

under 18	8
	peppermint	cinnamon	watermelon	bubblegum	sour patch
really dislike	0%	13%	0%	0%	0%
dislike	38%	25%	0%	38%	25%
meh	13%	0%	13%	38%	13%
like	25%	38%	50%	0%	25%
really like	25%	13%	25%	13%	25%
can't tell	0%	13%	13%	13%	13%

18-30	10
	peppermint	cinnamon	watermelon	bubblegum	sour patch
really dislike	0%	10%	0%	0%	10%
dislike	0%	0%	10%	0%	10%
meh	0%	0%	0%	10%	0%
like	0%	0%	0%	40%	0%
really like	0%	10%	0%	0%	10%
can't tell	100%	80%	90%	50%	70%

31-50	6
	peppermint	cinnamon	watermelon	bubblegum	sour patch
really dislike	0%	0%	0%	0%	0%
dislike	0%	17%	0%	17%	33%
meh	17%	0%	0%	17%	0%
like	50%	33%	17%	17%	17%
really like	0%	0%	50%	33%	33%
can't tell	33%	50%	33%	17%	17%

over 50	8
	peppermint	cinnamon	watermelon	bubblegum	sour patch
really dislike	0%	13%	0%	0%	38%
dislike	0%	13%	25%	13%	13%
meh	13%	13%	13%	50%	0%
like	63%	25%	25%	13%	13%
really like	0%	13%	0%	0%	25%
can't tell	25%	25%	38%	25%	13%

Usually, it's helpful to analyze the data in a number of different ways. After all, simply knowing the average depth of a river might get a person in over their head!

Think About It...

What sorts of analyses could you run on the India COVID-19 dataset Download India COVID-19 dataset

to answer questions about its spread over time?