In ” Responsible Use Of Big Data: Evaluating New Data Sources”, Kelly McGuire provides tips and explains how to determine if a particular data set is useful and worth the information one would get from analyzing the given data set. First off, when you analyze data, you should first write down a goal of what type of information you want to obtain from analyzing a given data set. You should also think about how this data will impact your argument and how it will be used. After this, McGuire gives a series of questions to help the analyst understand the characteristics of the data set. The first of these questions is “What is the Data?” which helps the analyst understand how the data is calculated and how it will be relevant to existing data. It is also important to analyze correlations to existing data in this step. The second question is “How is this data collected?” which can help the analyst determine how reliable the data is and help understand potential sources for error. The final question is “How often is the data updated and how?” which relates to the amount of time it takes to analyze and store data. If new data is released as soon as the existing data is analyzed, your data is now obsolete and the analyst wasted time analyzing data this is now useless. McGuire also points how having too much data can make it harder to draw conclusions on the relevant data, and in order to analyze the data most accurately, only relevant data should be accessed and used.
The University of Saskatchewan website identifies and explains the different types of sources and how they should be used. A primary source is a source that conveys new ideas or information. Primary sources include research studies, autobiographies, works of art, and more. A secondary source is a source that is based on information from a primary source and can include biographies, analyses of research studies, and literary analyses. The last type of source, tertiary sources, are sources where the data used is based on a collection of secondary and primary sources and can include textbooks, essays, encyclopedias, and almanacs. The website also contains information about scholarly sources versus popular sources, and why using sources found by just using google can be a bad idea.
- Source: US Census Bureau
- Who made/published it: This website is called the American Factfinder, which is part of the US Census Bureau
- How was it collected: This particular data was collected by the Census Bureau by surveying Americans, whether it was from having people going door to door asking questions or mailing people surveys.
- How old is it and why is it old? : The data is from 2010, because the Census is only taken every 10 years. However, the chart I used had the monetary values adjusted for inflation for 2014.
- What is the format? : The format is a table with 4 columns, one with the actual data, and then three containing the percentages, the margin of error for the actual data, and the margin of error for the percentage data.
- What’s the type?: This is geographical data. This is data only from citizens of the United States.
- How would you need to transform the data to make it useful? : For my purposes, I would not need to worry about the margin of error for the actual data or the percentage data. I would also need to trim alot of the non relevant data out from the table. The sub category of the table I would probably use the most since I was focusing on comparing income is the “Income and Benefits” section.