Future-proofing 'big data' biological research depends on good digital identifiers - Phys.Org

Over the past decade, the life sciences have drastically changed as data continues to evolve to be larger, more interdependent and natively web-based. In this landscape, the broader scientific research community has struggled to engineer this data for the web so that it is persistently accessible, reusable and attributable.

Depending on the individual database involved, identifiers can signify a gene, a genome, a chemical, an organism, a set of experimental data, or even a published article. The usefulness of all these items depends on the robustness and uniqueness of their respective identifiers, enabling them to be linked and discovered in perpetuity. The authors point out that the organic way in which most identifiers have arisen threatens that usefulness, and recognise that it is difficult to create and sustain persistent identifiers or web addresses that won't break and that are used consistently.

This work calls on professionals to do a better job of identifier engineering - according to emerging community-developed conventions - so that data can be utilized more effectively for scientific discovery. It also calls on users to be aware enough of these conventions, and of available tooling, to not get burned by broken links and missed connections.

"As with plumbing fixtures, the question of how identifiers work should only need to be understood by those that build and maintain them. However, everyone needs to know how identifiers should be used, and this is where convention is important," said McMurry. "Through this work, we hope to encourage all participants in the scholarly ecosystem - including authors, data creators, data integrators, publishers, software developers, and resolvers - to adhere to best practice in order to maximize the utility and impact of life science data."

Explore further: Search gets smarter with identifiers

More information: McMurry JA, Juty N, Blomberg N, Burdett T, Conlin T, Conte N, et al. (2017) Identifiers for the 21st century: How to design, provision, and reuse persistent identifiers to maximize utility and impact of life science data. PLoS Biol 15(6): e2001414. doi.org/10.1371/journal.pbio.2001414

Like what you've read? Why not share it with others?


Moment of truth for Bloodhound supersonic car: 1000mph rocket powered vehicle is set to take to test track for the ... - Daily Mail
NEW Former RAF fighter pilot Andy Green will test out supersonic vehicle Bloodhound (main image) on October 26 at Newquay Airport in Cornwall (bottom right), 20 years after he set the previous land speed record. He will reach speeds of up to 220mph 
Read More »
2008-05-21 07:00:00
Arachnophobes look away: How ground spiders release super-sticky silk that is 750 times stronger than artificial ... - Daily Mail
Researchers led by Macquarie University, Sydney, Australia pitted spiders against one another to analyse the strange silk used by ground spiders (pictured). In the video, the ground spider is seen chasing a giant house spider around a small container
Read More »
2008-05-21 07:00:00
The elaborate mating rituals of white-spotted pufferfish revealed: Males spend 6 WEEKS building underwater 'crop ... - Daily Mail
The recently discovered white-spotted pufferfish spends six weeks building underwater 'crop circles' in order to lure a female to mate with. The structure, which is 20 times the fish's size, also protects the eggs. Once the design is finished, a female
Read More »
2008-05-21 07:00:00
SpaceX blasts off on first of two launches in just 48 hours and completes 'most difficult landing yet' on Atlantic ... - Daily Mail
NEW SpaceX successfully launched its second previously-flown Falcon 9 rocket, in the first of two missions scheduled for its 'weekend doubleheader.' After pushing back its initial launch date for BulgariaSat-1, SpaceX's rocket lifted off on schedule at
Read More »
2008-05-21 07:00:00