News and Updates

A summer learning R to clean up data with the iDigBio portal recordset correction feature

posted Oct 15, 2015, 10:31 AM by Katja Seltmann

(this is a reposting from the iDigBio blog, October 2015)

Heather Appleby, former undergrad intern, Tri-Trophic Thematic Collection Network (TTD-TCN). Katja Seltmann (TTD-TCN), Deb Paul, Alex Thompson, and Matt Collins Eds.

Hi Everyone, I'm Heather Appleby. I was an undergraduate intern for the Tri-Trophic Thematic Collection Network (TTD-TCN) for 10 weeks in the summer of 2015. My internship focused on learning the fundamentals of R, while producing a useful product to help the TTD-TCN review georeferenced data for over 1 million specimen records. I endeavored to accomplish these goals in collaboration with iDigBio, by utilizing the iDigBio portal recordset correction feature (, while providing both feedback to iDigBio about those error reports, and creating publicly available R-scripts for simplifying the view of the error reports (

One of the major objectives of the Tri-Trophic Thematic Collection Network is to make available one million North American Hemiptera specimen records through the iDigBio data portal. At the American Museum of Natural History (AMNH), the principal database used for this effort is Arthropod Easy Capture (AEC; Arthropod Easy Capture, 2014; Schuh, Hewson-Smith, & Ascher, 2010) Database. In total, a group of 50+ paid digitizers and volunteers at 11 institutions used AEC to capture species and collection event information from specimen labels, contributing to a grand total of 902,000 specimen just in the AMNH AEC, with a project wide total reaching 1.3 million records (and still growing).

The specimen databasing procedure for the TTD-TCN is fairly standard. It begins with a group of specimen digitizers, or people who are trained in specimen handling and recording information off of specimens, tasked with capturing information from insect labels and entering it into the AEC database. The labels denote species, collector name, collection number, date, and location, and sometimes latitude and longitude coordinates. If coordinates are not provided on the collection label, the specimen record is reviewed a second time, by different group of digitizers to be georeferenced. Georeferencing, or the process of assigning latitude and longitude coordinates to a collection site locality string, involves searching for this collection location information in GeoLocate web software and Google Earth. Our georeferencing method follows the best practices outlined by iDigBio as well as a "how to" guide specifically designed for the AEC database (see links below). Once coordinates are retrieved, they are entered into the AEC database, along with the previously entered label information for each species and locality. Occasionally records are georeferenced incorrectly during the digitization process. Incorrect georeferences are typically due to human error at some point in the process. Latitude and longitude coordinates may be read incorrectly, localities may be written poorly on the label and, thus, typed incorrectly in the database, or the wrong state, county or municipality may be applied to a locality string.

Completed specimen records are shared with the iDigBio portal through a Darwin Core Archive. iDigBio detects incorrect locality data, based on latitude and longitude coordinate data and county, state, or country bounds, and proposes corrections to those records. The latest version of the iDigBio portal (released August, 2015) comes with the ability to download their analysis of every provided record set, allowing data providers direct feedback about the record sets they are submitting. This new feature categorizes bad entries and their corrections based on the detected errors, making database cleanup more organized and straightforward.

iDigBio error flags



The lat/lon point in the ocean, or in a location that is not considered part of an Exclusive Economic Zone, and should be moved inland.



The lat/lon does not correspond to the state. Either the coordinates must be re-georeferenced or the state was entered incorrectly.



The lat/lon does not correspond to the country. Either the coordinates must be re-georeferenced or the country was entered incorrectly.



The latitude and longitude values are switched. The reversed latitude and longitude should be checked in GeoLocate or Google maps and regeoreferenced.



The longitude sign is flipped. The sign should be switched to negative or positive and checked on GeoLocate or Google Maps and re-georeferenced.



The latitude sign is flipped. The sign should be switched to negative or positive and checked on GeoLocate or Google Maps and re-georeferenced.



The latitude or longitude may have been copied twice, resulting in the same coordinate for both. The locality should be re-georeferenced.

Each type of flag is packaged up in a downloadable folder. Every folder contains a list of the incorrectly georeferenced data  (occurrence_raw.csv) and a list of and the iDigBio corrected records (occurrence.csv).  All files are in the Comma Separated Value (CSV) format. The uncorrected file list (occurrence_raw.csv) is useful to search for "bad localities" in our data. Different flags indicate different kinds of errors (see table above). For example, the flag "rev_geocode_eez" indicates the coordinates represent a location in the ocean, not a likely place to collect an insect. Once a bad locality is identified, the incorrect latitude and longitude values are inserted into GeoLocate to verify the error. If it is an error, new coordinates are obtained and the AEC database updated.

After correcting a few errors, I had come upon a number of duplicate localities in the dataset and some correctly georeferenced entries in the bad locality files. iDigBio files contain all of the Darwin Core fields for all specimens associated with the bad localities, making extraction of only the locality data from these files time consuming. I realized that it would make sense for me to reformat the iDigBio output using R into a list of unique localities with only the locality information fields (i.e. dwc:locality, dwc:county, dwc:decimalLatitude, dwc:decimalLongitude).  In addition, I filtered out all localities except US, Mexican, and Canadian localities and made a unique list, removing duplicates. Duplicate localities in the raw files exist because iDigBio outputs specimen data, not specifically locality data, and many specimens may be collected at the same locality. Finally, in an attempt to determine the cause of false positives, I organized the file by country and state, as it appeared that these errors were occurring because the centrad was placed in a different county than what is directly recorded in the Darwin Core county (dwc:county) field directly.

Another objective of this project was to provide feedback for the new recordset evaluation feature in iDigBio. While accessing the iDigBio correction records, and working with the output to fix our incorrectly georeferenced data, I reported several observations to the iDigBio Advanced Computing and Information Systems (ACIS) team. These observations were utilized to help refine the iDigBio correction output. We were able to help identify two bugs in the flagged files through our correspondence. Initially many of the localities showing up in the reverse longitude sign flag appeared to be false positives. When checked in GeoLocate, the coordinates seemed to be very close to a county or state border. Immediately the ACIS team recognized an issue with the designation of boundaries and fixed the problem.  We came across another issue while cleaning up the files in the “dwc_stateprovince_replaced” flagged list. All of the georeferenced localities labeled “Baja California Norte” in our database were corrected to “California” in the iDigBio Portal. While “Baja California Norte” is an incorrect state name for the region, we learned that the correction should have been made to “Baja California”.

Note from Katja Seltmann: As a result of Heather’s ten week summer internship, the TTD-TCN project provided useful feedback that helped to guide development of this important community resource. Her introduction to R quickly proved helpful in her current position at the School of Visual Arts NATLab (, where she is collecting NYC soil samples and analyzing them for nutrient content (

Note from Deb Paul: The topic of biodiversity Data Quality (DQ) cannot be over-emphasized. How can publishing data with an aggregator like iDigBio promote data quality? The iDigBio Data Management Interest Group (DMI) and Cyberinfrastructure Working Group (CYWG) collaborated with the TTD-TCN to highlight data DQ from the provider and aggregator points-of view. Heather's showing us here how a provider (TTD-TCN) uses the iDigBio DQ flags and coming up in part two of this series, the CYWG explains just how these DQ flags are generated (Webinar: Improving Data Quality: iDigBio Recordset data cleaning method, tools, and data flags, October 23rd, 2015, 2 PM EST).

Important Links:

Project GitHub:

School of Visual Arts NATLab:

TCN Summary:

iDigBio Portal:


iDigBio georeferencing practices:

Georeferencing how to:


Arthropod Easy Capture. (2014). Arthropod Easy Capture. Retrieved October 14, 2014, from

Schuh, R. T., Hewson-Smith, S., & Ascher, J. S. (2010). Specimen databases: A case study in entomology using Web-based software. American Entomologist56, 206–216.

Digitizing the U-M Herbarium collections: tri-trophic update

posted Jul 24, 2015, 7:56 AM by Katja Seltmann   [ updated Jul 24, 2015, 7:58 AM ]

This is a reposting of an article by Richard Rabeler for the University of Michigan, Department of Ecology and Evolutionary Biology news.
Jul 21, 2015

Richard Rabeler, associate research scientist, U-M Herbarium, verifying the determination of a specimen. Image credit: Dale Austin.

The University of Michigan Herbarium has been awarded seven National Science Foundation grants over the past four years. Six of the grants involve Thematic Collections Networks (TCN), which are collaborative projects administered by the Advancing Digitization of Biodiversity Collections (ADBC) project.  

Each TCN is a network of institutions with a strategy for digitizing information that addresses a particular research theme, according to iDigBio. Once digitized, data are easily accessed and available for other research and educational use. The nationwide effort is coordinated by theiDigBio program based at the University of Florida.

Since the first TCN project at the Herbarium (Tri-Trophic TCN) began in January 2012, over 475,000 specimens from the collection have been imaged as part of these projects.  Most of the images, either of the specimen labels or of the specimens themselves, are available online. Another aspect involves digitizing the data about the individual specimens and georeferencing localities.

This is the first in a series of updates on the digitization projects ongoing at the Herbarium. Watch for more in the coming weeks.

Tri-trophic update

The first TCN project at the University of Michigan Herbarium was "Plants, Herbivores, and Parasitoids: A Model System for the study of Tri-trophic Associations”, or, abbreviated, the Tri-trophic TCN. Dr. Richard Rabeler, associate research scientist at the Herbarium, is the principal investigator.

The project goal is to digitize specimen records for 20 families of vascular plants, the Hemiptera (plant bugs) that eat them, and the parasitoid wasps that feed on the Hemiptera; hence the name "tri-trophic.” There are 34 institutions involved, including 15 herbaria and 19 insect collections. The lead institution is the American Museum of Natural History.

Our contribution to the project involved imaging all of our North American and Mexican specimens of Cyperaceae (sedges) and Poaceae (grasses); approximately 116,000 specimens have been imaged.  Data records have been completed where none existed before using a combination of optical character recognition and transcription. Records will be georeferenced prior to the project’s completion in January 2016.

Field to Database (F2DB): field-data collecting trends and 21st century data skills

posted Apr 28, 2015, 1:02 PM by Katja Seltmann

From Deb Paul, @idbdeb (a reposting of the iDigBio blog post)

This 4-day hands-on short course in March investigated current trends in collecting, and focused on best practices and skills development for supporting the collection and sharing of robust, fit-for-research-use data.

What can we do to facilitate stakeholders’ access to quality data? High quality data is generated when data collection is planned before it gets collected in the field. Fixing data errors “after the fact” is expensive, and gets more expensive, the further away we get from the original specimen collecting event. Starting with richer and more standardized data, should also mean faster access to the data, for everyone.


This Field to Database (F2DB) course, was our third in a series of four biodiveristy informatics workshops*, each focusing on different stakeholders’ needs and relevant collections data and computational literacy skills. On our first day and a half, 22 participants heard from several different collectors about their collecting and data management practices and then headed for the field to put them into practice. After this, we spent three days learning more about how to use R for data cleaning, and for data research and visualization. All the course materials, links to necessary softward and workshop recordings are available on the wiki. What follows is an overview of our four days.

F2DB Photos on Facebook.


In the classroom, Charlotte Germain-Aubrey (Botanist, iDigBio PostDoc) and Katja Seltmann (Entomologist, TTD-TCN Project Manager) presented Why a Field-to-Database Biodiversity Informatics Workshop?. They kick-started our specimen data conversation with examples of challenges researchers face when compiling data from museum legacy records from many collections. These include (summarized from slides):

  • standardizing datasets
  • the need to georeference the material
  • transforming lat / lon values to a standard format
  • uncertainty data about any given georeference, often missing
  • assumptions having to be made about some dates due to ambiguous formats
  • taxon name resolution / reconciliation needed to merge datasets
  • learning to manage the resulting very large datasets – very large files

After dealing with these issues, only then is this legacy data fit-for-use. Charlotte showed one example of how plant collections data are being used to model the impact of climate change and hinted at some future research plans to further investigate what is likely to happen to Florida plants when considering species clusters, movement analysis, and sea-level rise change. Katja and Charlotte showed us both the challenges and potential of collections data.

Emilio Bruna, Ecology Professor at the University of Florida, shared insights into the realities of field work withLet's go to the field! Where the best places are wet, isolated, and without internet. A story of the trials of typical fieldwork. (Hear his talk in this recording). Next up, we heard from Andrew Short (Entmologist, University of Kansas Biodiversity Institute) with Tips and Workflows for Managing Field Data: Field templates, workflow, and planning ahead for better results. And then Grant Godden(Botanist, Rancho Santa Ana Botanic Garden Post Doc) gave us his take on Using Digital Resources to Plan Field Expeditions offering hints on How to prioritize where you collect? How do you plan a collecting trip? and What kind of resources do you bring in the field? Just back from a recent field trip to Columbia, he also talked about Standards for Collection of Genomic Resources and documenting flower color.

In this recording, you can listen to Mike Webster, Ornithologist at Cornell, talking about Data and metadata standards for biodiversity media: the past, present and future and Emilio Bruna talking about the Top 10 mobile applications every biologist should know about. Are you using apps in the field? Which ones? What apps do you need that don’t yet exist? How have they facilitated your research efforts?

After all these lectures, we moved to the Natural Area Teaching Labrotory for lunch and some field work. Deb Paul (that’s me) gave a quick introduction to some relevant Data Standards to use when collecting / using field data such as Ecological Metadata Language (EML)Darwin Core (DC)Audubon Core (AC), and the new Global Genome Biodiversity Network (GGBN)) genomics data standard.

Then it was time for some hands-on collecting and animal sound recording experiences. Andy and Grant set up two collecting experiences to illustrate the need for prior planning. We learned about the challenges of keeping track of specimen identifiers, how to be sure we know which insect was found on which plant when we get back to the lab, we need to be careful when using abbreviations, and that writing a good locality description is vital (a georeference is not enough). (See Andy's Sample Field Data Collection sheet and sample field labels). Andy, we look forward to hearing about your upcoming field course at the University of Kansas. Let us know how it goes.

Using a shot-gun microphone and a recorder with headset, Mike gave us some hands-on experience capturing the sounds in nature. Have you done this? it’s amazing and quite challenging to then capture that particular specimen one has been listenting to. When trying to use some of the field apps, we also noticed a lot of variability with the georeferences our GPS phone apps returned. What’s your experience? Do you have a favorite GPS app? Have you compared it to a GPS unit?

Upon return to the iDigBio classroom space, we discovered what it’s like to plan for and collect paleontological specimens from Justin Wood's presentation and video. And for marine invertebrates, Francois Michonneau (Zoologist and iDigBio Post Doc) illustrated issues with collecting data and specimens in a marine setting. I think everyone wanted to study marine invertebrates after we saw Francois’ video and heard his talk Efficient workflow from collection to cataloging for marine invertebrates.

Common notions emerged from the lectures, field experiences, and videos, about planning for field data collection and subsequent data research and data management. We included coverage of SymbiotaSpecifyBiocode’s Field Information Management System (FIMS),Arthropod Easy Capture (AEC)Silver Biology, and Arctos. Our summary group discussion helped to reveal themes such as:

  • The use of standards such as Darwin Core and Audubon Media to support reproducible research
  • Data Validation – the importance of planning for and creating tidy, standardized data
  • Specimen Identifiers – we need to use them, store and share them
  • Online resources – available to enhance the data, using one’s data skills
  • Publishing – getting the data out there is important
  • Planning ahead - for what data to collect, and how to collect and document it

See custom-videos made by community remote participants just for this workshop. Thank you Ed Gilbert (Symbiota), Andy Bentley (Specify), John Deck (FIMS), Amy Smith (KML files), Katja Seltmann (AEC), and Shelley James (Bishop Museum). Using remote participation and their recordings, we were able to cover even more software, methods, tools, and ideas for capturing specimen data collection that otherwise would have fit in 4 days.

After covering why it’s important to plan ahead for what data to collect, and how we might do that, we switched to hands-on skills that can make collecting, standardizing, and sharing data easier. These skills support best practices for reproducible research over a lifetime. Whether you’re a collection manager, or a collector, a botanist, or zoologist, these skills can serve to make your data easier to collect, to keep track of, to query for your research questions, to disseminate, to disover, and to cite! Most of our participants were collectors, a few were collection managers – who also collect or work closely with collectors.

So, from day two through day four, our course emphasis shifted to how to use the scripting language R (and Rstudio) for data cleaning, standardization, enhancement, and visualization. Francois enticed us with Intro to R. Derek Masaki (Developer, USGS-BISON) gave us a rationale and a workflow using R that supports reproducible research (see participant Rick Levy’s blog post). We needed to learn how to clean, standardize, and transform our data so Derek put together a hands-on R tutorial using a Bee dataset (from the Smithsonian). Now that we had learned a bit about R, R vectors, dataframes, and functions, we were ready on day 4 to learn aboutApplication Programming Interfaces, affectionately known as APIs. Thanks Matt Collins (iDigBio Systems Administrator) for a fun, interactive introduction to the power of APIs andUsing APIs in R.

We had a little extra time, and Francois jumped in to give a brief overview of two topics we don’t usually have time for in beginner courses: GitHub (versioning) and Rmarkdown. See course participant Rick Levy’s blog post to learn more!

To complete the data life-cycle picture, Molly Phillips (iDigBio Information Specialist) stepped in to give us an overview of how collection data gets to iDigBio in her talk Getting your data out there: publishing & standards with iDigBio and Todd Vision from Data Dryad joined us remotely with an in-depth talk about Publishing data on Dryad.

What is compelling from every part of this workshop is that with these 21st century skills, a scientist can do more research, faster, and in a manner that supports reproducibility and collaboration. Scientists recognize they need these skills and are asking for them.

Field To Database Workshop

posted Feb 26, 2015, 2:48 AM by Katja Seltmann

The Field to Database workshop is almost here! The third workshop in a series of biodiversity informatics, Field to Database is being held at the University of Florida, iDigBio from March 9 - 12, 2015. It is the third in a series of four biodiversity informatics workshops planned in collaboration with the Tri-Trophic Thematic Collection Network for iDigBio in the upcoming year (2014-2015). The fourth workshop in this series is Sept 15-16, 2015 and focuses on Data Management for Collection Managers. Look to the Field to Database workshop wiki for more information and available online lectures. 

Data Carpentry Workshop: a iDigBio and TTD-TCN Collaboration

posted Nov 11, 2014, 11:31 AM by Katja Seltmann

(This is a reposting from the iDigBio Blog, November 2014)

Data Carpentry - Please can we have some more?!

iDigBio and the American Museum of Natural History (AMNH) co-hosted a Data Carpentry Workshop on Monday and Tuesday, September 29 – 30, 2014.

What skills do researchers in the life sciences need to be equipped with today to address current issues facing our planet? How can they make best use of all the data available to them, now, and in the future?

To start off our Data Carpentry Workshop, University of Florida (UF) Botany Professor and iDigBio PI, Pam Soltis, shared her vision and historical perspective on the skills researchers need to make best use of data, now and going forward. From her own thorough grounding in statistical methods, Pam highlighted how changes in science, and data, necessitate the researcher’s need for new skills in her talk: Linking Heterogeneous Data in Biodiversity Studies: the need for data carpentry.

For two intensive, information-filled days of hands-on learning designed for beginners, 31 students tackled improving their spreadsheet skills, learned about the power of Open Refineto clean data and reveal data patterns via facets and clustering algorithms, discovered the power of the shell, found out just how simple it can be, to get a dataset from a spreadsheet into a database to make use of structured query language (SQL), and got an introduction to Rfor data analysis and visualization.

Broadening Participation.

Graduate students made up 60% of the participants, the other 40% were university faculty and staff. Nine students participated via Adobe Connect from the AMNH, including students from the City College of New York (CUNY), AMNH - Columbia University, and Hunter College. Three Information Science students from Florida State University (FSU) joined the UF students, faculty, and staff to make 31 participants total. Across diverse fields, there is a demand for beginner-level courses introducing researchers to up-to-date computational literacy, data literacy, and data management skills. Disciplines of participants ranged across Physics, Earth Sciences, Ecology, Zoology, Epidemiology, Botany, Genetics, Engineering, Social Science, Humanities, Tech Support, Public Health, and Information Science.

The Workshop Experience.

All available workshop slots at UF and AMNH filled in just 3 days, with four people left on the wait-list at UF. With a student-teacher ratio of 3:1, everyone found someone nearby, ready and willing to assist, if they ran into tricky bits.

The iDigBio Data Carpentry Workshop Wiki reveals all materials used and topics covered, and includes recordings, notes taken, links to the datasets and materials on GitHub, the participant list, and more. Using Adobe Connect (AC) software and Kevin Love’s know-how, UF and AMNH students met each other virtually to learn together and share problem-solving strategies. We took notes together using a MoPad, with help from our remote assistant fromUSGS-BISON, Derek Masaki. Thanks Derek! Scenes from the workshop are up on the iDigBio Facebook pages.

Tracy K Teal, Professor at Michigan State University (MSU) in Microbiology and Molecular Genetics, walked us through better spreadsheet skills and the power of the shell. Deb Paul (that’s me), highlighted the importance of quality data and showed how one tool, Open Refine, can be part of your scientific workflow to enhance your data and its fitness-for-use. Matt Collins (iDigBio Systems Administrator) provided a hands-on step-by-step introduction for us to the world of relational databases and SQL. All of these skills lead up to an interactive introduction to the scripting language, R, taught by Francois Michonneau, PhD candidate (Marine Invertebrates) at UF. Katja Seltmann, Entomologist and Project Manager for the Tri-Trophic Thematic Collection Network (TTD-TCN), provided instruction in the remote location – AMNH. In addition to our 5 instructors, we also had assistants to make sure no one gets too lost, or waits too long for help. The workshop depends on assistants to run smoothly. Part of the process of becoming a Data Carpentry instructor requires attending a Data Carpentry workshop, and assisting at one. Several of our assistants are in the process of becoming Data Carpentry certified.

AMNH students report they can’t wait to do this again. All at UF and AMNH are clamoring for more R, eager to pick up where we left off on day two, just as Francois got to the good stuff (in R) with his amazing demonstration of the power of all these skills combined. We’re thinking that Data Carpentry courses, normally two days, need a third day.

A bit on Assessment (more on this in a future post).

For assessment, Data Carpentry courses use not only pre and post workshop surveys, but also minute cards. Periodically, after a course module, students are asked to write down one thing they learned, and one thing they still find confusing. This immediate feedback provides mid-course correction opportunities, as well as valuable input for next courses. Some examples of minute card comments from our Data Carpentry workshop…

Something I learned

Something I still find confusing

Be careful with naming files, don’t use spaces

I have my own versioning schema. Are there standards for versioning?

Export spreadsheet data as CSV, or perhaps TSV

I’m still a bit confused about when to use () and [] in the same line

Basic R syntax

What are the benefits of using R as opposed to SPSS? <excluding cost>

Never understood cbind() before [now I do]

Still confused on some terminology – objects vs. variables? Vectors vs. factors?

Our post-workshop survey resulted in an overall workshop grade of A- and many comments indicating the desire for more such focused, hands-on training, targeted at beginners – and designed with the biodiversity researcher in mind. What are some lessons learned at this workshop? Our remote participant strategy seems to have worked well to extend the reach of our workshop beyond UF. Keys to making a remote workshop site (AMNH) successful include having an:

  1. on-site instructor in the remote location who is familiar with all the course materials and the skills being taught
    1. in the event the connection is lost, the remote instructor can carry on with the lessons
  2. instructor, or other individual in the remote location who can troubleshoot the audio / video issues that arise.

What’s Next?

  • Would you like to request a Data Carpentry Workshop? Please send an email to[email protected]
  • Are you interested in becoming a Data Carpentry Instructor? We use the Software Carpentry training course to certify our instructors. Our goal is to cease to be needed because all scientists have the skills they need to manipulate their data. Until then, if you’ve got skills, want to enhance your skills, and the skill set of your colleagues in the biological and paleontological sciences community, please join us.
  • Discussions are just beginning for another Data Carpentry Workshop to be held at FSU in the Spring of 2015 with a remote location to be decided.
  • Note the broader community, across the planet, is converging on ways to define the skills that are needed and the best way to meet the demand for these skills. This includes conversations about how to get these skills into undergraduate and K-12 education so that incoming graduate students have them at the start of their advanced degree programs. For examples of this international convergence, see the upcoming Biodiversity Information Standards (TDWG) 2014 Interest Group / Task Group Meeting: Biodiversity Informatics Curriculum / Teaching and Workshop: Effective Biodiversity Data Management Trainingdescriptions!

Please let us know your thoughts. What skills do you need? What else do we need to cover? Got an idea for where to host one of these?

Thanks for reading and stay tuned for more Data Carpentry!

If you've made it this far, you might be wondering...

Just where did Data Carpentry come from?

From the COLLAB-IT meeting in September of 2013, one break-out group coalesced an idea into action to form Data Carpentry. The IT groups from NESCentBEACONiDigBioNEON,iPlantSESYNCDataONE, and NIMBios shared their observations about data literacy and computational literacy skills needs across the stakeholders in these overlapping communities. Course content needed to address these skills gaps make up the Data Carpentry curriculum.

Following the Software Carpentry model, Data Carpentry seeks to improve and enhance researchers skills needed to collect, manage, and analyze data efficiently. We aim to teach skills that result in reproducible, sustainable scientific workflows that result in discoverable, re-useable datasets and reproducible analysis.

Idigbio Collections for the 21st Century Symposium

posted May 5, 2014, 6:53 AM by Katja Seltmann   [ updated May 5, 2014, 6:54 AM ]

This is a reposting of the iDigBio website. You can attend remotely! 

On May 5-6, 2014, iDigBio, in conjunction with the NSC Alliance, will present a symposium themed 'Collections for the 21st Century'. The symposium will emphasize the value of collections data in meeting challenges facing biodiversity and human societies. Digitization of Bio-Specimens has brought a tremendous amount of data on-line for new and exciting uses in research and education. But we as scientists need to take the initiative and demonstrate ways in which the data is being used now, so that policy makers and administrators will provide ongoing support. Digitized data are valuable only if it is widely known as useful.

The symposium will demonstrate the value of biodiversity, and our natural history collections, to policy makers, administrators and others who use collections data and impact the levels of support for collections. The symposium will feature a full day of talks on May 5 and a half-day of talks on May 6. A workshop or other activities yet to be determined will be held on the afternoon of May 6. Topics of discussion will include uses of taxonomic, spatial, and temporal data on biodiversity to address big-science questions related to human health, climate change, food security, and related issues, as well as more fundamental investigations related to understanding and protecting biodiversity. We will keep those who register informed of our plans as they develop. Attendance will be limited to 80 persons.

Registration for the symposium is free, but all travel-related expenses (e.g., airfare, hotel, meals, ground transport) are the responsibility of each participant.  Information on accommodations at a discounted rate will be provided once you register.

Register for this Symposium

Workshop Wiki:


Remote participation will be available via Adobe Connect:


Start Date: 
Monday, May 5, 2014 (All day) to Tuesday, May 6, 2014 (All day)
University of Florida

Female Entomologist: Patricia Vaurie (1909 - 1982)

posted Apr 3, 2014, 11:17 AM by Becky Fisher

Patricia Vaurie and her husband were both enthusiastic about natural history. Patricia has an extremely successful career studying beetles while her husband pursued his artistic interest in North American birds. Their trips together provided the AMNH Entomology collection with a breadth of data that is still productive and informative to the field as we continue to digitize her plant bug specimens. 

Patricia Wilson was born on September 14, 1909 in Swarthmore, Pennsylvania. While she was still young, her family moved to New York City. By 1920, they lived only a block away from the American Museum of Natural History. Patricia attended high school and later Barnard College, Columbia University. She graduated in 1931 with a degree in English literature.

During World War II she started volunteering as a technical assistant in the Department of Insects and Spiders (now the Department of Invertebrate Zoology). Around this time she met her husband Charles Vaurie. He was a dentist in New York with an avid interest in painting North American birds. Although Patricia focused on the study of beetles, the two of them appreciated their mutual interests in natural history. They were married in 1934.

By 1947 she achieved the title of Assistant and by 1957 became a Research Associate. Patricia published 77 revisionary studies of beetles throughout the course of her work. She received a total of four grants from the National Science Foundation to study Diplotaxis (Scarabaeidae) and Metamasius (Curculionidae). According to her glowing obituary, her colleagues held her in high regard for her meticulous and usable work.

The TTD-TCN project digitizers at the American Museum of Natural History are still benefitting from her enthusiasm for insects and detail. Although Patricia specialized in beetles, she collected a variety of other insects while on trips for the Natural History Museum with her husband. Interestingly, the Specimen Database tells us that most of their work trips took place during the months of July and August in pleasant locations such as the Bahamas, Cuba, New Mexico, Guatemala, the Ruins at Palenque (dark blue peg), and Flagstaff, Arizona. This sounds like an extremely convenient way to skip out on New York City summers. Moreover, these trips were often funded as expeditions. The map to the right represents all of the localities that the Vauries collected plant bug specimens during the D. Rockefeller Mexico Expedition of 1953 (green pegs).

We have been digitizing many species that were collected by her but were later determined by other AMNH entomologists. This tells us that she collected for the greater good of the field even though beetles were clearly her specialty. The current, massive plant bug collection at AMNH does not solely exist because of previous plant bug enthusiasts. Although such specialists have left a huge impact, the enormity of the project is just as dependent on the previous work of enthusiastic entomologists in general.

Although some of these plant bugs were collected without a clear objective, the usefulness of these insects has only increased over time. Take the image of the Josephinus reinhardi (light blue star) for example, it was collected is 1952 by Patricia Vaurie and it sat until 2005 when M. D. Schwartz determined the specimen. Now, in 2014, it is available for digitization and has become a small piece of data in a growing database.



References and Suggested Further Reading:

Herman, Lee H. "Patricia Vaurie: 1909-1982." The Coleopterists Bulletin 36.2 (1982): 453-57. JSTOR. Web. 01 Apr. 2014.

Ratcliffe, Brett. "PATRICIA VAURIE." PATRICIA VAURIE. University of Nebraska-Lincoln State Museum - Division of Entomology, 01 Jan. 1988. Web. 01 Apr. 2014.

Short, Lester L. "In Memoriam: Charles Vaurie." The Auk 93.3 (1976): 620-25. JSTOR. Web. 01 Apr. 2014.

Maps built using and the Tri-Tropic Specimen Database.

Photo of Patricia Vaurie from her obituary in The Coleopterists Bulletin


Article by Becky Fisher: TTTCN Intern and Masters candidate at Columbia University in Museum Anthropology.

Female Entomologist: Grace Olive Wiley (1883 - 1948)

posted Mar 13, 2014, 11:38 AM by Becky Fisher   [ updated Apr 3, 2014, 9:00 AM ]

Grace Olive Wiley is most widely known for her illustrious career as a fearless and controversial snake collector. She frightened and informed her audiences by demonstrating nurturing relationships with her snakes. Snakes and insects have been associated with negative images of hypnotizing demons or filthy infestations. Her closeness with snakes captures attention, to say the least, but I think it also overshadowed her contributions to the field of entomology. This article is an attempt to shed some light on her life, her specimens in the Tri-Trophic TCN project and the field of entomology before she immersed herself in a world of herpetology exhibitionism.

Grace Olive Wiley was born and raised in Chanute, Kansas in 1883 on a farm. She attended the University of Kansas and achieved a bachelor degree in Entomology. She worked as an entomologist at the University at a time when women struggled to gain acceptance in scientific fields. The University of Kansas was unique. By 1867 they had already appointed their first (as well as one of the country's first) female professor, Cynthia A. Smith. In 1922, the Kansas University Science Bulletin put out Wiley's Life History Notes on Two Species of Saldidae (Hemiptera) Found in Kansas. Her enthusiasm in the field was also recognized by her male academic advisors in Kansas. Her professor H. B. Hungerford wrote in The Life History of The Toad Bug, "The live insects supplied by Mrs. Wiley [from her home in Chanute] thus made possible the notes here reported, and I wish to acknowledge my gratitude to her for her kindness".

By 1923, Wiley became the curator of the Minneapolis Public Library’s natural history museum. Although it is now defunct, this position made her one of the first female zoo curators in the world. This was also the year she announced her discovery of a new species of Rheumatobates from Texas in The Canadian Entomologist. She donated her private collection of reptiles which included 150 species and 330 individuals to the zoo and her reputation as a reptile expert took off. She quickly became the first person (man or woman) to successfully breed rattlesnakes in captivity. 

She believed that deadly snakes could be tamed and she refused to use hooks or other safety devices to handle them. Instead, she would gently stroke them and speak to them (snakes are deaf). Her unorthodox methods caused friction within the Zoo’s administrators. They demanded that she stop handling the snakes even though they were her own collection. Although she was never bitten at the Zoo, she was gavin a choice to either use safety equipment or leave. Wiley left, took her snakes with her and started a new job at the Brookfield Zoo outside of ChicagoThis new zoo wanted to display reptiles in a more natural setting. Their displays replicated the snakes' natural habitats and were big enough to hold multiple snakes at once. Before then, it is was customary to keep reptiles in separate metal cages without any stimulation. Yet again, Wiley’s habits of leaving the reptiles’ cases open caused problems between herself and the Director. She was fired after 19 venomous snakes escaped. 

Again, she packed up and moved. This time to Long Beach, California where she established a roadside zoo relatively close to L. A. Her snakes were featured in a few of the sensational movies of the time such as The Jungle Book, Trade Wind and Cobra Woman. During filming she was always on set and appeared onscreen as a snake charmer in the 1940 film Moon Over Burma. She charged 25 cents to join her, wander her roadside property and handle the snakes. She moved twice because neighbors complained. She had been bitten many times and lost two fingers to her Komodo Dragon.

On July 20, 1948 the renowned freelance journalist Daniel Mannix was visiting her zoo to finish an interview and take some photos. While posing with one of her new Indian Cobras, it bit her on her middle finger. Cobras have short fangs and need to chew on their prey to transfer their venom. Unfortunately, the cobra was able to chew on her finger for 30 seconds before Wiley was able to remove it. She was 64 years old. She calmly put the snake back in its cage and told the journalist to get her snakebite kit. Sadly, the kit was about 20 years old, the syringes were corroded and the serums were broken or evaporated. Wiley fell into a coma, was placed in an ambulance and died 65 minutes later at Long Beach Municipal Hospital. The hospital only carried anti-venom serums for North American snakes.

Before her unexpected death, Wiley had planned to sell her reptile collection to the Griffith Park Zoo. However, her estate was not able to find a buyer. As a result, her exotic collection was auctioned off bit by bit to the highest bidders. Overall, it was worth $3,000. The Indian Cobra that fatally bit Wiley was purchased by a man who displayed it as the “Lady-Killing Cobra” at a tourist spot in Arizona.

Wiley’s career as an entomologist was relatively short but productive. So much of it can be overpowered by her mystical, dangerous and high-profile herpetology career. Yet, we are still reaping the benefits of her work in entomology today. The green pegs on the map represent the various locations from which plant bugs were collected by Grace Olive Wiley. These specimens are located in the AMNH collection, the United States National Museum of Natural History, the University of Massachusetts Museum, the Oregon State Arthropod Collection, the University of Minnesota at St. Paul and the University of Kansas.

References and Suggested Further Reading: 

Maps built using and the Specimen Database

Photo of G. O. Wiley from 

Article by Becky Fisher: TTTCN Intern and Masters candidate at Columbia University in Museum Anthropology.

Female Entomologist: Edith Marion Patch (1876 – 1954)

posted Feb 20, 2014, 1:33 PM by Becky Fisher   [ updated Mar 27, 2014, 10:38 AM ]

While digitizing specimens in the collection, we gloss over thousands of names of collectors worldwide. Although the main intention is to map and study the lives of the insects, we have wondered if we were also mapping the lives of the collectors. This series is an opportunity to use the digitized collection to map the lives of women who have contributed to the American Museum of Natural History collection and the Tri-Trophic TCN project. Who were they? What are their stories?

Like many entomologists at the time, Edith Marion Patch’s first recorded interest was butterflies. In her senior year of high school she wrote an essay about monarchs that won $25.00. With her prize money she purchased the Manual for the Study of Insects written by John Henry Comstock and illustrated by his wife, Anna Comstock. The Comstocks were entomologists at Cornell University whom Patch would later befriend. 

Edith Patch attended the University of Minnesota in 1897 and graduated in 1901 with a Bachelor in Science. Despite her qualifications, she couldn’t find a job in entomology so she took a position teaching English at a high school in Minnesota for two years. Finally, in 1903, she was invited by Dr. Charles D. Woods to organize a Department of Entomology in Orono, Maine. Today UCBs Department contains digitized plant bugs collected by E.M. Patch in Orono, Maine. Three specimens of Cryptomyzus (Cryptomonyzis) ribis and five of Eirosoma ulmi are currently in the database.

EM Patch 1916  -

Initially, she wasn’t offered a salary and Dr. Woods was “ridiculed for appointing a woman in a man’s field” ( To earn a living wage, he arranged for Patch to teach English in the area while she organized the Entomology department. Within a year, Patch had proven herself to her male coworkers, established the department and earned herself a salaried position.

For her masters degree she attended the University of Maine in 1910. Although a few websites say that Patch earned her PhD from Columbia University, she actually attended Cornell University in 1911 for her doctorate. At Cornell she became colleagues with the Comstocks. 

In 1930, Patch became the first female president elected to the Entomology Society of America.She was ahead of her time in the early 1900s. It is said that she warned against the indiscriminate use of pesticides, such as DDT, forty years before Rachel Carson’s Silent Spring (1962) was published. She was concerned about the devastating impact pesticides would have on songbirds amongst other dangers. She was one of the original environmentalists and advocated for education of the natural world, especially for children. Despite her busy career in entomology, she also published books for children starring accurate, insect characters. She retired to her home in Orono, “Braeside," in 1937 as “Entomologist Emeritus” and lived there until she passed away in 1954.

Check out her children’s literature: Hexapod Stories, Bird Stories, Dame Bug and her Babies and Elm Leaf Curl and Wooly Apple Aphid

References & Suggested Further Reading:

Article by Becky Fisher: TTTCN Intern and Masters candidate at Columbia University in Museum Anthropology.

Crowd Sourcing The New York Botanical Garden Herbarium Plant Database

posted Sep 12, 2013, 10:51 AM by Katja Seltmann   [ updated May 5, 2014, 6:52 AM ]

This is a reposting of the NYBG Plant Talk

Michael Bevans is the Information Manager for Digitization at The New York Botanical Garden Herbarium.

Virtual Herbarium ImageThe William and Linda Steere Herbarium at The New York Botanical Garden houses over 7 million plant specimens gathered from around the world over the course of 250 years.

Plants supply most of the world’s food, fuel, shelter and medicine, and plant specimens help us answer the most critical questions facing our planet. How many species are there and how are they related? What environmental factors control their growth? And how do plants respond to climate change? Now you can help scientists to better understand our planet by transcribing plant specimen labels in our newly released crowd sourcing effort, hosted by the Atlas of Living Australia.

Choose an expedition studying the Oaks of North America or the Plants of the Caribbean. Click the “Start Transcribing” button to log in and get started.

To learn more about the William and Lynda Steere Herbarium, make sure to watch “Treasures of New York: The New York Botanical Garden” this Thursday, June 27th at 10:30pm on Channel Thirteen.

1-10 of 28