Collecting Data

Once data is created, it can be collected.  Collecting data refers to the way in which a scientist gathers or puts together data.  For example, suppose that in our bubblegum flavor investigation we chose to create data using the pre-defined Likert response scale.  With that in place, how might we collect the data?  One simplistic way would be to create a chart with flavors on one axis and reactions on the other axis, marking each response as a tick-mark in the correct box, like in the following table:

cinnamon watermelon mint tuti-fruity
really dislike IIIIIIIIIIII IIIII III IIIIIIIIII
dislike IIII II II IIIIIII
neither like nor dislike IIIII III IIIII IIII
like II IIIIIII IIIII II
really like I IIIIIIIIIIIIIIII IIIIIIIIIIII II

An advantage of this way of collecting the information is that it allows for a lot of responses to all be gathered in one quick view and one sheet of paper.  The person collecting this information might be able to have a single clipboard with a single sheet of paper.  They could quickly test different flavors with passers-by in a mall or some other public space in this way.  A drawback of collecting data in this way is that it does not allow the scientist to group other information about the respondent in the same collection.  For example, suppose the scientist wanted to know if there were a difference between male and female reactions.  In that case, they would need to collect information in a way that enabled more data in a single response.  Perhaps an electronic form could be used with a drop-down for gender and preference, and the scientist could gather then information using an ipad. 

Decisions about collecting information answer questions such as how often? how much? and what instruments to use to aggregate information.  While the collection and creation of data are often done simultaneously, data collection ≠ data creation. Collection refers to how to gather the data into a single place for later analysis, whereas creation focuses on how to generate data in the first place. The question with the Likert scale (i.e., really dislike to really like) results in created data, but this can be collected in a number of different ways.  


In today's data-focused world, a lot of data has already been created and collected.  This is especially useful for data that would be difficult or impossible for a single individual to create, especially in high school (e.g., observations of sunspots over 200 years; data on tide levels on different coasts; the weight of an atom; the speeds of different particles; etc.). 

Even when the data has already been created and collected ahead of time, it's important to understand how these things occurred so that we can interpret them appropriately.  For example, in our earlier example about seismic activity, scientists discovered that the depth at which they buried the sensors was insufficient for sensing the earth's movements.  This flaw in the creation of the data led scientists to incorrect conclusions about the frequency and force of seismic activity in the measured regions. Thus, even when a scientist uses pre-created or pre-collected data, they need to understand its origins.

 

Think About It...

Suppose you are a public health official in India.  With the COVID-19 crisis, it's important to understand how it's affecting your populace.  Consider the following research question:

Research Question: What trends do we see with COVID-19 in India?

(input answer in google form below.  Then, refresh the page to see new ideas on the right)