Tri-Trophic Database Project: Data Cleaning and Data Dissemination Utilizing Discover Life
iDigBio Summit III, November 18-20, 2013
We held a demo at the iDigBio Summit III meeting about how the TTD-TCN utilizes Discover Life services. Below is the documentation of that demo.
Discover Life (http://www.discoverlife.org) is data portal whose mission is to assemble and share knowledge about biodiversity. The project is located at the University of Georgia, under the direction of Dr. John Pickering. Fundamentally, Discover Life is a data aggregator, with information from more than 108 institutions, and the Global Biodiversity Information Facility (GBIF). Discover Life utilizes this collective data to create 1,268,125 species pages with 624,476 maps. Additionally, DL exports maps to Encyclopedia of Life, exports records to GBIF, assimilated over 1.2 million valid taxon names, and holds geographic boundary limits for many of the world divisions.
Who can participate?
Anyone can participate right now. To sign up for help with data cleaning email John Pickering: email@example.com
Data Cleaning Services with Discover Life
Discover Life leverages the large amount of collective specimen based information available in its databases for data cleaning efforts. Project datasets are compared with data in Discover Life, analyzing each for differences and possible errors. Two particular Discover Life services heavily utilized by the TTD-TCN are: 1) Locality Data Checking and 2) Taxon Name List Checking.
The general paradigm for all Discover Life services is to “round-trip” your data exposed on the web, through a comprehensive DL data checker, and returned to you on the web at a different address. Data providers (like the TTD-TCN) expose data through a text file on the web and let DL know of its location. Discover Life picks up this text file from the URL every night and processes those records. The data you provide are compared with the collective information found in the DL databases. The results from the comparison are then returned to the provider in a separate text file, at a different web address, for review at anytime.
“round-trip” your data
The Tri-Trophic Database project, as well as many others, utilizes the “round-trip” service in two main ways. The first is for latitude/longitude locality checking. An example output from this Discover Life service can be found here: http://pick18.pick.uga.edu/DB/NCSU/BAD.txt. The output shows a series of coordinates that do not fit within the pixel maps of the world, or the map worldview, DL maintains. Additionally, in the return service, a note is added to the output text describing what might be the source of the error. These localities are then reviewed in the TTD-TCN database, as they are considered suspect until further appraisal. Once records are corrected in the TTD database, the output file made available for DL also reflects this correction, and the record will no longer appear in the BAD.txt file shown above.
A second service, performed in a similar way as the locality cleaning service, is a valid name checker. A list of names TTD provides to Discover Life is compared with the entire 1.2 million valid names from DL. TTD-TCN includes in the clean-up host plant names and insect names. To augment the DL name lists we periodically provide them with highly vetted name lists from taxon experts (particularly from the Miridae Catalog). DL then assimilates any updates from the catalog into its services. An example output from a checklist is: http://www.discoverlife.org/nh/cl/US/GA/Clarke/moth.cl
Name resources DL utilizes include, but are not limited to, TROPICOS, Plant Name Index, Catalog of Life, and ITIS. Discover Life also maintains a list of all known synonyms for valid names, which includes misspellings.
Transcription of Labels
The new label transcription service provided by Discover Life for any natural history collection label.
Discover Life Time Machine (public view)
The transcription service utilizes full quality jpg images from providers. Providers can either upload images directly to Discover Life or provide URLs for the images for Discover Life to pick up. Functionality highlights include:
1. Different views for authoritative and nonauthoritative digitizers
2. Ability for providers to hide uploaded data and images from the public. This functionality is important when working with endangered specimens.
3. Providers may include OCR text for parsing by Discover Life into locality, collector, institution and other fields. Alternatively, Discover Life will perform the OCR for providers.
4. Results of transcribed labels are returned to providers as a text file through a web page such as:
TTD-TCN Integration Portal
The display of organism association data, or trophic level integration, is a fundamental product of the Tri-Trophic Database project. Discover Life creates view and discovery pages where the public can explore host - plant - parasitoid data on the web.
Modeling association data across institutions is part of the challenge. We have a proposed list of defined MISC fields for exposing those data and are soliciting input about the data structure. We think one important aspect of host data are well defined relationships.
Feedback to Providers
Feedback needs to return to providers, and is not edited directly on Discover Life. Comments and corrections about specimens are emailed directly to providers using a simple feedback form. The email and contact information for providers is carefully curated in DL database.
While digitizing specimens in the collection, we gloss over thousands of
names of collectors worldwide. Although the main intention is to map and study the lives of the insects, we have wondered if we were also mapping the lives of
the collectors. This series is an opportunity to use the digitized collection to
map the lives of women who have contributed to the American Museum of
Natural History collection and the Tri-Trophic TCN project. Who were
they? What are their stories?
Like many entomologists at the time, Edith Marion Patch’s first
recorded interest was butterflies. In her senior year of high school she wrote
an essay about monarchs that won $25.00. With her prize money she purchased the
Manual for the Study of Insects written by John Henry Comstock and
illustrated by his wife, Anna Comstock. The Comstocks were entomologists at
Cornell University whom Patch would later befriend.
Edith Patch attended the University of Minnesota in 1897 and
graduated in 1901 with a Bachelor in Science. Despite her qualifications, she
couldn’t find a job in entomology so she took a position teaching English at a
high school in Minnesota for two years. Finally, in 1903, she was invited by
Dr. Charles D. Woods to organize a Department of Entomology in Orono, Maine. Today
UCBs Department contains digitized plant bugs collected by E.M. Patch in Orono,
Maine. Three specimens of Cryptomyzus
(Cryptomonyzis) ribis and five of Eirosoma
ulmi are currently in the database.
EM Patch 1916 - edithpatch.org
Initially, she wasn’t offered a salary and Dr. Woods was
“ridiculed for appointing a woman in a man’s field” (http://www.edithpatch.org/). To earn a
living wage, he arranged for Patch to teach English in the area while she
organized the Entomology department. Within a year, Patch had proven herself to her male coworkers, established the department and earned herself a salaried
For her masters degree she attended the University of Maine in 1910. Although a few websites say that Patch earned her PhD from
Columbia University, she actually attended Cornell University in 1911 for her
doctorate. At Cornell she became colleagues with the Comstocks.
In 1930, Patch
became the first female president elected to the Entomology Society of America.She was ahead of her time in the early 1900s. It is said
that she warned against the indiscriminate use of pesticides, such as DDT, forty
years before Rachel Carson’s Silent Spring (1962) was published. She was concerned about the devastating impact pesticides would have on songbirds amongst other
dangers. She was one of the original environmentalists and advocated for
education of the natural world, especially for children. Despite her busy career
in entomology, she also published books for children starring accurate, insect
characters. She retired to her home in Orono, “Braeside," in 1937 as “Entomologist Emeritus” and lived there until she passed away in 1954.
Check out her children’s literature: Hexapod
Stories, Bird Stories, Dame Bug and her Babies and Elm Leaf Curl and
Wooly Apple Aphid: http://amzn.to/1mf9OAc
The Tri-Tropic Database Thematic Collection Network recently finished up an exciting course about present best practices for specimen-level data management. The two-week Short Course on Biological Specimen Informatics (Specimen Short Course; syllabus and more information: tcn.amnh.org/home/specimen-course) was designed as a first introduction to biological informatics with early career graduates students in mind. The Specimen Short Course gathered individuals from 18 different institutions across the United States at the Richard Gilder Graduate School (American Museum of Natural History - rggs.amnh.org) in order to specifically address research specimen data capture issues through training, from the field to preserved collections. Instructors for the course were staff were Mike Bevins (Information Manager, NYBG), Christine Johnson (TTD co-PI, AMNH), Rob Naczi (TTD PI, NYBG), Randall Schuh (TTD PI, AMNH), Katja Seltmann (TTD Project Manager, AMNH), Steve Thurston (Image Specialist, AMNH), Melissa Tulig (TTD co-PI, NYBG), and Kim Watson (TTD Project Manager, NYBG).
Unarguably, biological research generates a great deal of specimen level data. These data can be complex and include familiar collection level data (the focus of many broad museum digitization efforts) as well as highly specific data depending on the research question/s. Researchers have the additional need of high accessibility to all of their data, either through bulk download, or by direct database access, in order to perform analysis. The results of early training for students is mutually beneficial, as improved specimen handling techniques facilitates research, and well-managed data according to community standards allows for greater dissemination of the end products. In order to create a workflow that fits their research needs, participants in the Short Course learned about, and worked on, projects with several tools including: Arthropod Easy Capture (sourceforge.net/projects/arthropodeasy), Specify (specifysoftware.org), ScratchPads (scratchpads.eu), SimpleMappr (simplemappr.net) and others. At the same time students gained valuable expertise mapping datasets to DarwinCore, manipulating data with Open Refine (openrefine.org), Excel, and MySQL.
Enabling students to manage research was one aspect of the Specimen Short Course. The second was to place these efforts in the context of the larger biodiversity informatics community. The course involved visiting research areas at the American Museum of Natural History and the New York Botanical Garden, allowing participants to gain a sense of what the various workflows at these institutions are like, as well as the collection requirements for vouchering specimens at the end of a research project. These experiences helped participants develop techniques and collecting protocols that they could use in their own research.
Imaging was another important component discussed as a means of data capture. At the New York Botanical Garden, participants in the course got hands-on experience photographing plant specimens. At AMNH, insect specimens were the focus. Participants got to see how high-quality images of small insects are taken, and visited the museum’s imaging lab to learn about the technology at work there. They also learned about how images could be incorporated into their databases to strengthen specimen records. As the course progressed participants began to plan how they will use databasing techniques and other resources discovered through the Specimen Short Course. Each participant brought some of his or her data to the course and began to develop a workflow that best matches their individual research needs. As a culmination of the course, each participant delivered a presentation on how they will continue to incorporate the techniques they had learned into their own research.
By the end of the course, each participant had gained not only a better understanding of specimen informatics techniques, but also a sense of how they could apply these techniques to their own research. The goal of the course was to train students in present best practices for specimen-level data management from the field to preserved collections, and how a specimen management plan can facilitate addressing research questions. The experiences they gained through the course will aid them in producing and making available datasets that will be of great use to them and countless other researchers.
Authors: Jeremy Frank (Short Course Participant) & Katja Seltmann (TTD Project Manager, AMNH)
With the three 17-year species of Magicicada from Brood II emerging this year in the eastern United States, the Staten Island Museum, co-founded by cicada expert William T. Davis (1862 – 1945), is focusing on making the most of this infrequent event. Their current temporary exhibition, "They're Baaack! Return of the 17-year Cicadas," along with planned workshops and nature walks, will inform visitors about these unique bugs in the coming months. This event happens to coincide with the our digitization of the cicada collection at the Staten Island Museum, which includes many specimens from previous emergences of Brood II, as well as the other broods of the 13 and 17-year cicadas. Posts on the Staten Island Museum's Tumblr and Blogspot sites offer information about local ecology and news about the museum, including their relation to the TTD.
Post by Alexander Bolesta: Database Assistant at the American Museum of Natural History, and Curatorial Assistant at the Staten Island Museum
One of the exciting aspects of the Tri-Trophic Database project is the cooperation that occurs between different institutions in the name of science. Each of the 30+ museums involved with the project offers a wealth of experience in addition to the diversity that comes from adding their collections to the database. Of these, the Staten Island Museum in New York City offers a cicada (family Cicadidae) collection of about 35,000 specimens: the second largest collection of cicadas in the world. Since cicadas belong to the order Hemiptera, this comprehensive collection is a perfect addition to the Tri-Trophic Database.
Staten Island Museum main entrance. Downloaded
Specimen collection room where the cicadas are housed at the
Staten Island Museum. Photograph by Alexander Bolesta.
The Staten Island Museum, previously the Staten Island Institute of Arts & Sciences, and the Staten Island Association of Arts & Sciences, was founded in 1881 as the Natural Science Association of Staten Island by a group of 14 local naturalists who were concerned that overdevelopment would lead to the destruction of Staten Island’s natural history. By pooling their resources, these environmentalist pioneers were able to put together a collection worthy of drawing public attention, and, in 1908, the museum opened its doors. Associates of the museum have since kept a continuous record of the changing ecosystem and environment on Staten Island, while inspiring the creation of establishments including the Staten Island Zoo, Staten Island Historical Society, Staten Island Greenbelt, and the New York Botanical Garden. To this day, the Staten Island Museum carries out specimen acquisition and field work, in addition to special events like annual bird counts in conjunction with the Audubon Society.
Permanent natural history collection at the Staten Island Museum. Photograph by Alexander Bolesta.
The Staten Island Museum has been amassing a collection of specimens from the areas of natural science, art, and history since the days of the founders, and now boasts a collection of over half of a million specimens and pieces of art. One of the founders, William Thompson Davis, took a particular interest in the insects known as cicadas. Born in 1862 in New Brighton, Staten Island, Davis played a big role in the development of the Natural Science Association of Staten Island and its derivatives, despite being, for the most part, self-taught. His sense of wonder resulting from his study of cicadas can be seen most succinctly in his choice of name for the genus of 13- and 17-year periodical cicadas: Magicicada.
William T. Davis in the field. Downloaded
His efforts over half of a century resulted not only in the accumulation of the world’s second largest cicada collection, as noted above, but it led to him formally describing over half of the known species of cicada in North America. As a result, the Staten Island Museum’s cicada collection includes many of the type specimens that were used by Davis as the official examples of these species. During his career, Davis was published extensively in the Journal of the New York Entomological Society, and even held positions there as treasurer and as delegate to the New York Academy of Sciences. Davis passed away in 1945, leaving behind an impressive list of accomplishments. In 1955, the 51 acre New Springfield Bird Sanctuary that was created in 1933 thanks to efforts by Davis and the National Audubon Society was expanded to 260 acres and was renamed the William T. Davis Wildlife Refuge: a fitting dedication to one of New York City’s premier naturalists.
Sample from the William T. Davis Collection of Cicadas.
Photograph by Alexander Bolesta.
Abbott, Mabel. The Life of William T. Davis. Ithaca: Cornell University Press, 1949. Print.
"Collections: Natural Science Collection." Staten Island Museum. N.p.. Web. 24 Oct 2012. <http://bit.ly/RV8kwn
Davis, William. Days afield on Staten Island. New York: L.H. Biglow & Co., 1892. Print. <http://bit.ly/TCLebD
Davis, William. North American cicadas. 1. New York: Society Quarterly, 1921. eBook. <http://bit.ly/UAqZC2
"Freshkills Park." Official New York City Web Site. N.p., n.d. Web. 24 Oct 2012. <http://bit.ly/X37UtG
Journal of the New York Entomological Society. 30. New York: Society Quarterly, 1922. eBook. <http://bit.ly/TXFdL2
Journal of the New York Entomological Society. 31. New York: Society Quarterly, 1923. eBook. <http://bit.ly/TXFdL2
Pratt, Jr., George O., G. K. Schneider, and Mathilde P. Weingartner. "The William T. Davis Wildlife Refuge and its Environs." Proceedings of the Staten Island Institute of Arts and Sciences. 24.2 (1969): n. page. Print.
"Staten Island Museum." Wikipedia, The Free Encyclopedia. Wikimedia Foundation, Inc, 09 Oct 2012. Web. 24 Oct 2012. <http://bit.ly/TCLlUp
"William T. Davis." Wikipedia, The Free Encyclopedia. Wikimedia Foundation, Inc, 03 Feb 2012. Web. 24 Oct 2012. <http://bit.ly/SYVJcZ
Zelasnic, Laura. "Records of the Herbarium (RG 4) CHARLES ARTHUR HOLLICK RECORDS (1873-1979)." The New York Botanical Garden. N.p., n.d. Web. 24 Oct 2012. <http://bit.ly/TGApqY
Article by Alexander Bolesta: Database Assistant at the American Museum of Natural History, and Curatorial Assistant at the Staten Island Museum
Here at the University of Minnesota Herbarium (J.F. Bell
Museum of Natural History), we have reached our first milestone – we have
finished photographing our first plant family, the Pinaceae (almost 1200
Anita F. Cholewa, Curator of the UM Herbarium (MIN) and Hannah Conley in front of their digitization light box and workstation.
The Pinaceae also will probably be our most difficult. Since pines produce thickened bulky cones,
these were often removed from the branches during the collecting stage to make
filing in the museum more efficient, but it now meant cones had to be reunited
with their branches, both barcoded, and both photographed. Given the thickness of some cones, this
was not always an easy task.
(Pinus lambertiana, sugar pine, branch and cone)
Additionally, spruces and hemlocks have a tendency to lose
their needles upon drying, leaving specimens looking like winter collections of
plant skeletons. Thankfully
most needles are captured during the drying stage and kept in small packets
attached to the specimens, but the flip side required us to remove some of
these needles so they could be photographed with the branch skeletons.
Although a difficult and time-consuming group of plants, the
Pinaceae also included some interesting specimens. Among our Minnesota plants was a collection of Tsuga canadensis (Canadian hemlock) that
consisted of a cross-section through the trunk of the most northwestern and
isolated population in the state.
And because our specimen database includes numerous collections from
national parks and adjacent states these were also photographed. These included historical
collections by Joseph Whipple Congdon from Yosemite, among the earliest
botanists collecting in that region.
canadensis, Canadian hemlock; right: Congdon collection of Pinus albicaulis, whitebark pine)
Article by Anita F. Cholewa, Curator of the UM Herbarium (MIN)
During our digitizing work, one name that we come across quite often on specimen tags is L. B. Woodruff, or just the initials L.B.W. This is Lewis Bartholomew Woodruff, a prominent collector from the New York area. Woodruff was born in New York City on January 1, 1868 into a distinguished family of lawyers and politicians; his grandfather, with whom he shared a name, had been a U.S. Circuit Court judge nominated by President Ulysses S. Grant. Previous members of the Woodruff family founded towns in Farmington and Litchfield, Connecticut. Like his father and grandfather before him, Woodruff attended Columbia Law School, and received his degree from New York Law. After being admitted to the New York Bar in 1893, he joined the practice of Hornblower, Byrne, Miller & Potter, where he stayed until 1917. In the meantime, he married his wife, Helen, in 1904.
In 1919, Woodruff was able to concentrate fully on his natural science research and studies. He was considered an expert in both ornithology and entomology. His scientific memberships included the American Ornithological Union, the Entomological Society of Ontario, the New York Entomological Society (president in 1918, 1919, and 1920), the Academy of Science of the State of New York, and the Linnean Society of New York (treasurer 1902-1921). Much of his collection, which we are now encountering as part of the Tri-Trophic Thematic Collections project, focused on the fauna of the Atlantic seaboard. Woodruff was also sent by the AMNH on a three-month entomological survey to the Virgin Islands in 1925.
In addition to having a desk at the AMNH, Woodruff contributed to the Leng's Catalogue of the Coleoptera of North America and published numerous papers. He is listed in our TCN database as the author of over 15 species of treehoppers, mostly in the Cyrtolobus and Ophiderma genera. The New York State Museum holds the type specimen for Cyrtolobus parvulus, described in 1924 (Jour. N.Y. Ent. Soc. 32:31). One species, Cyrtolobus helena (now known as Atymna helena) was likely named after his wife.
Helen Woodruff died in 1924, and Lewis Bartholomew Woodruff died one year later at the age of 57. He left behind a body of work and a collection that is still being studied by entomologists today. A selection of his publications is listed below.
Many thanks to Dr. Lewis Deitz at North Carolina State University for the information on L.B. Woodruff.
L. B. 1915a Woodruff, Louis Bartolomew. 1915. A new membracid
from New York. (Homop.). Jour. New York Entomol. Soc. 23:44-47.
[Cyrtolobus helena n. sp.] Special
Collections call no.: MC 220.226
Woodruff, L. B. 1919a Woodruff, Lewis Bartolomew. 1919. A review
of our local species of the membracid genus Ophiderma Fairm. (Hemipt.
-Homop.). Jour. New York Entomol. Soc. 27:249-260. Plate(s): 23.
[Key to species of this genus; several n.
spp.] Special Collections call no.: MC 220.226
Woodruff, L. B. 1920a Woodruff, Lewis Bartolomew. 1920. Further
notes on the membracid genus Ophiderma Fairm. (Hemip. -Homop.). Jour.
New York Entomol. Soc. 28:212-214. [Describes the male of O.
grisea.] Special Collections call no.: MC 220.226
Woodruff, L. B. 1923a Woodruff, Lewis Bartolomew. 1923.
Supplementary notes on Ophiderma Fairm. (Hemip.-Homop.). Jour. New
York Entomol. Soc. 31:188-190. Special Collections call no.: MC
Woodruff, L. B. 1924a Woodruff, Lewis Bartolomew. 1924. Critical observations in the membracid genus Cyrtolobus Goding.
If you ever enjoyed collecting insects as a kid, you probably put some in a jar and watched them fight. It seems only in keeping with the rules of the natural kingdom that insects of differing species would try to eat one another. But if you think that the spirit of cooperation is a uniquely human trait, you might be surprised to learn about the relationships between certain treehoppers and ants.photo credit: Yon Visell
Pictured above are a treehopper, Thelia bimaculata, and an ant from the collection. This hopper is common in the United States east of the Mississippi River, as well as in Canada. It makes its home on the Black Locust tree, which has been widely planted around the country. Treehoppers like these are a common sight around the TTD-TCN office, but the ant is unusual and intriguing.
The reason these two different insects have been pinned together is because of the mutualistic relationship that they share. In mutualism, two organisms cooperate so that they both benefit. This is opposed to parasitism, in which one-organism benefits and the other is harmed, or the rarer commensalism, in which one organism benefits and the other is neither benefited nor harmed.
Thelia bimaculata, like many other treehoppers, may use its unusual thorn-like appearance to blend in with their host plant and avoid predators. However, this may not always be enough to protect the treehopper colonies. In this case, they are protected by ants. These ants will surround the treehoppers, eliminating predators on the Black Locust trees that would otherwise eat the colony. With the pressure of predation diminished, the treehoppers are free to feed and reproduce as they please, causing the population to grow.
In exchange for the protection, the treehoppers allow the ants to collect a reward. This reward takes the form of a nutritious sugary substance called honeydew that the hoppers secrete. As anybody who’s been on a picnic can tell you, ants enjoy sweet substances, and they will vigilantly defend the hoppers so that they can harvest honeydew. This also has a secondary benefit for the hoppers, as the excess buildup of honeydew is harmful to them. In this manner, both the treehopper and the ant benefit and have a greater chance of survival than they would alone.
The insect world is fascinating and surprising, and the distinctive partnership between the treehoppers and ants is just one aspect of it. As we continue to digitize the plant bug collections, we will gain a better understanding of their place in the ecosystem.
Treehoppers (Aetalionidae, Melizoderidae, and Membracidae)
Morales, M. A.
Survivorship of an ant-tended membracid as a function of ant recruitment. 2000 Oikos 90: 469–476.
This Cyrtolobus fuscipennis specimen from our collections at the AMNH was collected exactly one hundred years ago today! This female was collected from an oak tree by Lewis B. Woodruff in Litchfield, Connecticut, the locality from which the holotype of the species (Van Duzee 1908) was collected.
Please join us on Facebook and discover the process of collection digitization with the digitizers themselves.