Creating Data

In order to have data, we must first create it.  Data creation is the process of generating data from scientific observation.  Upon observing a phenomenon, scientists must first decide how to represent that in a way that will allow them to later analyze it.  Suppose you wanted to know if different people liked different flavors of gum.  How might a scientist go about creating data for this question? 

One way to create data for this question might be to use open-ended questions.  For example, the scientist could ask the individual to taste the gum and then describe their reaction to that taste. The prompt, "describe your reaction to this flavor" then results in the creation of data.  While this type of question enables individuals to describe the their reaction in any way they choose, the way this data is created has implications for how the scientist can then later analyze the data.  A benefit of this approach might be that you will get authentic and varied responses. A drawback of this approach might be that people will give responses that aren't reactions but rather a description of the flavor.  

A different way to create data for this question might be to use pre-defined categories.  For example, the scientist might ask individuals to choose from one of the five following reactions, "really like, like, neither like nor dislike, dislike, or really dislike." A benefit of creating data in this way is that it standardizes all the responses to fit into a specific category, which facilitates later interpretation and analysis. A drawback of this way of creating data is that it may not capture the full variety of reactions people might have to each flavor of gum. 

In short, the first step of working with data is to create it in the first place.  The way that a scientist creates data affects everything else they might do with that data thereafter.  For example, in one investigation, scientists wanted to track seismic activity.  To create data, they chose to place motion sensors ten feet underground in various locations.  There were several decisions about creating data in this example. They had to decide how to sense seismic activity, what instruments might help them measure it, locations where they would place the instruments, and the units in which to measure seismic activity.

 

Think About It...

Suppose you are a public health official in India.  With the COVID-19 crisis, it's important to understand how it's affecting your populace.  Consider the following research question:

Research Question: What trends do we see with COVID-19 in India?

(Enter ideas in the Google Form below. Then, refresh page to see new responses)