(This is a reposting from the iDigBio Blog, November 2014) Data Carpentry - Please can we have some more?!iDigBio and the American Museum of Natural History (AMNH) co-hosted a Data Carpentry Workshop on Monday and Tuesday, September 29 – 30, 2014. What skills do researchers in the life sciences need to be equipped with today to address current issues facing our planet? How can they make best use of all the data available to them, now, and in the future? To start off our Data Carpentry Workshop, University of Florida (UF) Botany Professor and iDigBio PI, Pam Soltis, shared her vision and historical perspective on the skills researchers need to make best use of data, now and going forward. From her own thorough grounding in statistical methods, Pam highlighted how changes in science, and data, necessitate the researcher’s need for new skills in her talk: Linking Heterogeneous Data in Biodiversity Studies: the need for data carpentry. For two intensive, information-filled days of hands-on learning designed for beginners, 31 students tackled improving their spreadsheet skills, learned about the power of Open Refineto clean data and reveal data patterns via facets and clustering algorithms, discovered the power of the shell, found out just how simple it can be, to get a dataset from a spreadsheet into a database to make use of structured query language (SQL), and got an introduction to Rfor data analysis and visualization. Broadening Participation. Graduate students made up 60% of the participants, the other 40% were university faculty and staff. Nine students participated via Adobe Connect from the AMNH, including students from the City College of New York (CUNY), AMNH - Columbia University, and Hunter College. Three Information Science students from Florida State University (FSU) joined the UF students, faculty, and staff to make 31 participants total. Across diverse fields, there is a demand for beginner-level courses introducing researchers to up-to-date computational literacy, data literacy, and data management skills. Disciplines of participants ranged across Physics, Earth Sciences, Ecology, Zoology, Epidemiology, Botany, Genetics, Engineering, Social Science, Humanities, Tech Support, Public Health, and Information Science. The Workshop Experience. All available workshop slots at UF and AMNH filled in just 3 days, with four people left on the wait-list at UF. With a student-teacher ratio of 3:1, everyone found someone nearby, ready and willing to assist, if they ran into tricky bits. The iDigBio Data Carpentry Workshop Wiki reveals all materials used and topics covered, and includes recordings, notes taken, links to the datasets and materials on GitHub, the participant list, and more. Using Adobe Connect (AC) software and Kevin Love’s know-how, UF and AMNH students met each other virtually to learn together and share problem-solving strategies. We took notes together using a MoPad, with help from our remote assistant fromUSGS-BISON, Derek Masaki. Thanks Derek! Scenes from the workshop are up on the iDigBio Facebook pages. Tracy K Teal, Professor at Michigan State University (MSU) in Microbiology and Molecular Genetics, walked us through better spreadsheet skills and the power of the shell. Deb Paul (that’s me), highlighted the importance of quality data and showed how one tool, Open Refine, can be part of your scientific workflow to enhance your data and its fitness-for-use. Matt Collins (iDigBio Systems Administrator) provided a hands-on step-by-step introduction for us to the world of relational databases and SQL. All of these skills lead up to an interactive introduction to the scripting language, R, taught by Francois Michonneau, PhD candidate (Marine Invertebrates) at UF. Katja Seltmann, Entomologist and Project Manager for the Tri-Trophic Thematic Collection Network (TTD-TCN), provided instruction in the remote location – AMNH. In addition to our 5 instructors, we also had assistants to make sure no one gets too lost, or waits too long for help. The workshop depends on assistants to run smoothly. Part of the process of becoming a Data Carpentry instructor requires attending a Data Carpentry workshop, and assisting at one. Several of our assistants are in the process of becoming Data Carpentry certified. AMNH students report they can’t wait to do this again. All at UF and AMNH are clamoring for more R, eager to pick up where we left off on day two, just as Francois got to the good stuff (in R) with his amazing demonstration of the power of all these skills combined. We’re thinking that Data Carpentry courses, normally two days, need a third day. A bit on Assessment (more on this in a future post). For assessment, Data Carpentry courses use not only pre and post workshop surveys, but also minute cards. Periodically, after a course module, students are asked to write down one thing they learned, and one thing they still find confusing. This immediate feedback provides mid-course correction opportunities, as well as valuable input for next courses. Some examples of minute card comments from our Data Carpentry workshop…
Our post-workshop survey resulted in an overall workshop grade of A- and many comments indicating the desire for more such focused, hands-on training, targeted at beginners – and designed with the biodiversity researcher in mind. What are some lessons learned at this workshop? Our remote participant strategy seems to have worked well to extend the reach of our workshop beyond UF. Keys to making a remote workshop site (AMNH) successful include having an:
What’s Next?
Please let us know your thoughts. What skills do you need? What else do we need to cover? Got an idea for where to host one of these? Thanks for reading and stay tuned for more Data Carpentry! If you've made it this far, you might be wondering... Just where did Data Carpentry come from? From the COLLAB-IT meeting in September of 2013, one break-out group coalesced an idea into action to form Data Carpentry. The IT groups from NESCent, BEACON, iDigBio, NEON,iPlant, SESYNC, DataONE, and NIMBios shared their observations about data literacy and computational literacy skills needs across the stakeholders in these overlapping communities. Course content needed to address these skills gaps make up the Data Carpentry curriculum. Following the Software Carpentry model, Data Carpentry seeks to improve and enhance researchers skills needed to collect, manage, and analyze data efficiently. We aim to teach skills that result in reproducible, sustainable scientific workflows that result in discoverable, re-useable datasets and reproducible analysis. |
News and Updates >