Thu., Oct. 25, 12:30 pm – 2:00 pm, Mullen Library Instruction Room
When working with your dataset, have you wondered how to remove ‘null’ or ‘N/A’ from fields, handle different spellings of words, or determining whether a field name is ambiguous? When interviewed, many data scientists complain that the most tedious, time-consuming aspect of any project is the cleaning and manipulating of data. For this workshop, we will use the open access software, OpenRefine, to clean, manipulate, and refine a dataset before analysis. Since this workshop is focused on saving you time by discovering and avoiding common pitfalls in data preparation, a brief foray into regular expressions will be useful. You are welcome to bring your own dataset.
Please RSVP to firstname.lastname@example.org.
A new title this month at Mullen Library is Mathematics Without Apologies: Portrait of a Problematic Vocation by Michael Harris. No apologies here if your vocation is data – Dr. Data drops first choreographed rap video about predictive analytics. Thanks, Eric Siegel, Ph.D.!
Predictive analytics learns from the data you supply,
and predicts if you will click, buy, lie, or die.
It ain’t astrological – it’s math, it’s methodological.
So better pay attention cause my flow is pedagogical. [Full lyrics]
Valid data and a belief that government is a public good can be motivators in society. The PEW Research Center 2015 report Americans’ Views on Open Government Data documents the not-quite-tipping-point of the value of open data. It seems the jury is still out!
More data is available everyday:
DATA.GOV – managed and hosted by the U.S. General Services Administration, Office of Citizen Services and Innovative Technologies
OECD Data – Organisation for Economic Co-operation and Development
World Bank Data – Economic Indicators
DC Open Data – District of Columbia GIS (DC GIS)
Researchers are working toward shared definitions and repositories of data. Data management is an added task that researchers find troublesome. Continue reading “Digital Scholarship @ CUA: Open Data to the Rescue?”
Researchers at universities are beginning to think beyond the requirements to author a data management plan. Kristen Briney, Data Services Librarian at the University of Wisconsin–Milwaukee has taught and advised researchers on practical data management, creating data management plans and working with electronic lab notebooks. Her recently published TedxUMilwaukee Talk Rethinking Research Data asks researchers to go further and publish their data when they publish an article.
“The National Institutes of Health has issued a final NIH Genomic Data Sharing (GDS) policy to promote data sharing as a way to speed the translation of data into knowledge, products and procedures that improve health while protecting the privacy of research participants.” From post NIH issues finalized policy on genomic data sharing
The policy’s implementation is meant to accelerate biomedical discoveries, while safeguarding patient privacy and data sensitivity. Investigators applying for grant funding in January 2015 will need to supply data-sharing plans prior to the start of their research project.
“Everyone is eager to see the incredible deluge of molecular discoveries about disease translated into prevention, diagnostics, and therapeutics for patients,” said Kathy Hudson, Ph.D., NIH deputy director for science, outreach and policy. “The collective knowledge achieved through data sharing benefits researchers and patients alike, but it must be done carefully. The GDS policy outlines the responsibilities of investigators and institutions that are using the data and also encourages researchers to get consent from participants for future unspecified use of their genomic data.”
Along with statistics about the use of dbGaP data, the Nature Genetics report outlines the challenges facing the field, such as the increased volume and complexity of genomic data.
For a link to the GDS Policy see http://gds.nih.gov.