Skip to content

Introduction to Open Science

Learning Objectives

After this lesson, you should be able to:

  • Explain what Open Science is
  • Explain the components of Open Science
  • Describe the behaviors of Open Science
  • Explain why Open Science matters in education, research, and society
  • Understand the advantages and the challenges to Open Science
  • Identify who the practitioners of Open Science are
  • Understand the underlying Ethos of Open Science

What is Open Science?

If you ask a dozen researchers this question, you will probably get just as many answers.

This means that Open Science isn't necessarily a set of checkboxes you need to tick, but rather a holistic approach to doing science.

Definitions

"Open Science is defined as an inclusive construct that combines various movements and practices aiming to make multilingual scientific knowledge openly available, accessible and reusable for everyone, to increase scientific collaborations and sharing of information for the benefits of science and society, and to open the processes of scientific knowledge creation, evaluation and communication to societal actors beyond the traditional scientific community." - UNESCO Definition

"Open Science is the movement to make scientific research (including publications, data, physical samples, and software) and its dissemination accessible to all levels of society, amateur or professional..." Wikipedia definition

Open and Collaborative Science Network's Open Science Manifesto

Six Pillars of Open Science

Open Access Publications

Open Data

Open Educational Resources

Open Methodology

Open Peer Review

Open Source Software

Wait, how many pillars of Open Science Really Are There?

The number can be from 4 to 8

foster

Graphic by Foster Open Science

flowchart LR

id1([open science]) --> id3([publishing]) & id4([data]) & id5([open source software])

id3([publishing]) -->  id41([access]) & id42([reviews]) & id43([methods]) & id44([educational resources]) 

id5([open source software]) --> id13([container registries]) & id10([services]) & id101([workflows]) & id12([version control systems])

id12([version control systems]) --> id101([workflows])

id13([container registries]) --> id101([workflows])

id14([public data registry]) --> id101([workflows])

id10([services]) --> id101([workflows]) 

id44([educational resources]) --> id21([university libraries])

id21([university libraries]) --> id101([workflows])

id22([federal data archives]) --> id101([workflows]) 

id4([data]) --> id21([university libraries]) & id22([federal data archives]) & id14([public data registries]) 

id101([workflows]) --> id15([on-premises]) & id16([commercial cloud]) & id17([public cloud])

Mermaid Diagram: Conceptual relationships of Open Science and cyberinfrastructure

🕶 Awesome Lists of Open Science

Awesome lists were started on GitHub by Sindre Sorhus and typically have a badge associated with them [Awesome]([https://github.com/sindresorhus/awesome])

(There is even a Searchable Index of Awesome Lists)

We have created our own Awesome Open Science List here which may be valuable to you.

Open Access Publications

open access

Definitions

"Open access is a publishing model for scholarly communication that makes research information available to readers at no cost, as opposed to the traditional subscription model in which readers have access to scholarly information by paying a subscription (usually via libraries)." -- OpenAccess.nl

Pre-print Services

ASAPbio Pre-Print Server List - ASAPbio is a scientist-driven non-profit promoting transparency and innovation comprehensive list of pre-print servers inthe field of life science communication.

ESSOar - Earth and Space Science Open Archive hosted by the American Geophysical Union.

Peer Community In (PCI) a free recommendation process of scientific preprints based on peer reviews

OSF.io Preprints are partnered with numerous projects under the "-rXivs"

The rXivs

AfricArXiv

AgrirXiv

Arabixiv

arXiv - is a free distribution service and an open-access archive for 2,086,431 scholarly articles in the fields of physics, mathematics, computer science, quantitative biology, quantitative finance, statistics, electrical engineering and systems science, and economics.

BioHackrXiv

BioRxiv - is an open access preprint repository for the biological sciences.

BodorXiv

EarthArXiv - is an open access preprint repository for the Earth sciences.

EcsArXiv - a free preprint service for electrochemistry and solid state science and technology

EdArXiv - for the education research community

EngrXiv for the engineering community

EvoEcoRxiv - is an open acccess preprint repository for Evolutionary and Ecological sciences.

MediArXiv for Media, Film, & Communication Studies

MedRxiv - is an open access preprint repository for Medical sciences.

PaleorXiv - is an open access preprint repository for Paleo Sciences

PsyrXiv - is an open access preprint repository for Psychological sciences.

SocArXiv - is an open access preprint repository for Social sciences.

SportrXiv - is an open access preprint for Sports sciences.

ThesisCommons - open Theses

Financial Support for Open Access Publishing Fees

There are mechanisms for helping to pay for the additional costs of publishing research as open access:

SciDevNet

Health InterNetwork Access to Research Initiative (HINARI)

Some institutions offer support for managing publishing costs (check to see if your institution has such support):

University of Arizona Open Access Investment Fund

Colorado University at Boulder Open Access Fund

Max Planck Digital Library - German authors can have OA fees in Springer Nature research journals paid for.

Bibsam Consortium - Swedish authors can have OA fees in Springer Nature research journals paid for.

Open Access Publishing

Major publishers have provided access points for publishing your work

AAAS

Nature

American Geophysical Union

Commonwealth Scientific and Industrial Research Organisation (CSIRO)

Open Research Europe

Open Data

Open Data are a critical aspect of open science. There are three key attributes of Open Data:

  • Availability and accessibility
  • Reusability
  • Inclusivity
Definitions

“Open data and content can be freely used, modified, and shared by anyone for any purpose” - The Open Definition

"Open data is data that can be freely used, re-used and redistributed by anyone - subject only, at most, to the requirement to attribute and sharealike." - Open Data Handbook

Wikipedia definition

DIKW Pyramid

Data are the basis of our understanding the natural world. The Data-Information-Knowledge-Wisdom (DIKW) pyramid describes for us how data are refined into information and knowledge.

dikw

FAIR & CARE Principles

Wilkinson et al. (2016) established the guidelines to improve the Findability, Accessibility, Interoperability, and Reuse (FAIR) of digital assets for research.

Go-FAIR website

Carroll et al. (2020) established the CARE Principles for Indigenous Data Governance. full document

Indigenous Data Sovereignty Networks

LOD Cloud

The Linked Open Data Cloud shows how data are linked to one another forming the basis of the semantic web .

linked open data

Open Educational Resources

open educational resources

Definitions

"Open Educational Resources (OER) are learning, teaching and research materials in any format and medium that reside in the public domain or are under copyright that have been released under an open license, that permit no-cost access, re-use, re-purpose, adaptation and redistribution by others." - UNESCO

Wikipedia definition

Digital Literacy Organizations

The Carpentries - teaches foundational coding and data science skills to researchers worldwide

EdX - Massively Online Online Courses (not all open) hosted through University of California Berkeley

EveryoneOn - mission is to unlock opportunity by connecting families in underserved communities to affordable internet service and computers, and delivering digital skills trainings

ConnectHomeUSA - is a movement to bridge the digital divide for HUD-assisted housing residents in the United States under the leadership of national nonprofit EveryoneOn

Global Digital Literacy Council - has dedicated more than 15 years of hard work to the creation and maintenance of worldwide standards in digital literacy

IndigiData - training and engaging tribal undergraduate and graduate students in informatics

National Digital Equity Center a 501c3 non-profit, is a nationally recognized organization with a mission to close the digital divide across the United States

National Digital Inclusion Allaince - advances digital equity by supporting community programs and equipping policymakers to act

Net Literacy

Open Educational Resources Commons

Project Pythia is the education working group for Pangeo and is an educational resource for the entire geoscience community

Research Bazaar - is a worldwide festival promoting the digital literacy emerging at the centre of modern research

TechBoomers - is an education and discovery website that provides free tutorials of popular websites and Internet-based services in a manner that is accessible to older adults and other digital technology newcomers

Educational Materials

Teach Together by Greg Wilson

DigitalLearn

Open Methodology

plos open methods

The use of version control systems like GitHub and GitLab present one of the foremost platforms for sharing open methods for digital research.

Definitions

"An open methodology is simply one which has been described in sufficient detail to allow other researchers to repeat the work and apply it elsewhere." - Watson (2015)

"Open Methodology refers to opening up methods that are used by researchers to achieve scientific results and making them publicly available." - Open Science Network Austria

Protocols and Bench Techniques

BioProtocol

Current Protocols

Gold Biotechnology Protocol list

JoVE - Journal of Visualized Experiments

Nature Protocols

OpenWetWare

Protocol Exchange

Protocols Online

Protocols

SciGene

Springer Nature Experiments

Open Peer Review

plos open peer review

Pros and Cons of Open Peer Review

Definitions

Ross-Hellauer et al. (2017) ask What is Open Peer Review? and state that there is no single agreed upon definition

Wikipedia's definition

Open Peer Review Resources

F1000Research the first open research publishing platform. Offering open peer review rapid publication

PREreview provides a space for open peer reviews, targeted toward early career researchers.

ASAPbio Accelerating Science and Publication in Biology, an open peer review source for biologists and life scientists.

PubPeer platform for post-publication of peer reviews.

Sciety platform for evaluating preprints.

Open Source Software

Definitions

"Open source software is code that is designed to be publicly accessible—anyone can see, modify, and distribute the code as they see fit. Open source software is developed in a decentralized and collaborative way, relying on peer review and community production." - Red Hat

Open Source Initiative definition

Wikipedia definition

Awesome list

Breakout Discussion

As you already know, being a scientist requires you to wear many hats, and trying to do Open Science is no different.

venn

Bernery et al. (2022) Figure 2: The positive aspects of doing a PhD.

As we mentioned, Open Science is not a set of boxes you need to check off to be "Certified Open", but an intersecting set of philosophies and approaches, all of which occur on some type of spectrum.

To get a feel for how Open Science can be multifaceted and different for each researcher, we will do a short breakout group session to discuss what Open Science means to you.

What does Open Science mean to you?
Which of the pillars of Open Science is nearest to your own heart?

Open Access Publications

Open Data

Open Educational Resources

Open Methodology

Open Peer Review

Open Source Software

Are any of the pillars more important than the others?
Are there any pillars not identified that you think should be considered?

Components of Open Science

One of the most fundamental, and certainly the most publicized component of Open Science is the accessibility of data.

This makes sense - without access to your data, nothing else about your science can be all that open.

While we will devote an entire 10 weeks of this course to data, opening up your data is only one piece of the puzzle.

gallagher

Gallagher et al (2020) Box 2: six core principles of Open Science.

This figure demonstrates the multiple intersecting pieces of Open Science, which go beyond simply making data accessible.

While we focus primarily on Open Data, Open Source, and Open Methodology in FOSS, it's worth considering how other parts of the scientific process might be opened up more broadly.

Another component which sort of covers all of the pictured components, or at least links a lot of them together, might be referred to as Open Process.

In response to the Reproducibility Crisis, many researchers, particularly in fields like psychology, have begun to advocate for preregistration of studies.

This involves writing out and publishing your entire research plan, from data collection to analysis and publication, for the sake of avoiding practices like p-hacking or HARKing.

What preregistration also does is make the process of your work more open, including many of the small decisions and tweaks you make to a project that probably wouldn't make it into a manuscript.

To learn more about preregistration, you can check out the Open Science Foundation, a project that provides a preregistration platform and other Open Science tools.

As mentioned above, it is worthwhile to think about Open Science not as a set of checkboxes, but rather a holistic approach to doing science.

In that spirit, it can also be useful to think about Open Science as a spectrum, from less to more open.

While you might not achieve some platonic ideal of openness for a variety of reasons, you can still make great progress in moving your science towards the Open end of the spectrum.

In reality, a large scientific project probably consists of multiple spectra; you can move your data towards the open end of the spectrum while your software remains less open, and vice versa.

All this is to say that doing Open Science is not a static set of goals you must achieve, it is a process that grows and changes with your science itself.

One of the biggest challenges of doing science is that you might have to wear many different hats: domain expert, lab manager, statistician, teacher, mentor, grant writer, manuscript author, public speaker... the list goes on.

Doing Open Science is no different, but the list of skills may be even greater, since the goal is now to openly communicate each step of the process to a broader audience.

This also makes teaching Open Science quite challenging- we will cover topics ranging from "soft skills" like project management and internal communications to more technical skills like software management and containers.

We could probably teach this whole workshop on each single topic, but we clearly don't have the time to do that.

Instead, we will focus on a higher-level look at the landscape of Open Science and introduce you to a wide variety of skills and concepts with the idea that you can go on to find ways to implement them in your own work.

What characteristics might a paper, project, lab group require to qualify as doing Open Science
What are some limitations to you, your lab group, or your domain?

WHY do Open Science?

There are many reasons to do Open Science, and presumably one or more of them brought you to this workshop.

Whether you feel an ethical obligation, want to improve the quality of your work, or want to look better to funding agencies, many of the same approaches to Open Science apply.

A paper from Bartling & Friesike (2014) posits that there are 5 main schools of thought in Open Science, which represent 5 underlying motivations:

  1. Democratic school: primarily concerned with making scholarly work freely available to everyone

  2. Pragmatic school: primarily concerned with improving the quality of scholarly work by fostering collaboration and improving critiques

  3. Infrastructure school: primarily focused on the platforms, tools, and services necessary to conduct efficient research, collaboration, and communication

  4. Public school: primarily concerned with societal impact of scholarly work, focusing on engagement with broader public via citizen science, understandable scientific communication, and less formal communication

  5. Measurement school: primarily concerned with the existing focus on journal publications as a means of measuring scholarly output, and focused on developing alternative measurements of scientific impact

fecher_friesike

In Bartling & Friesike (2014) Open Science: One Term, Five Schools of Thought

While many researchers may be motivated by one or more of these aspects, we will not necessarily focus on any of them in particular. If anything, FOSS may be slightly more in the Infrastructure school, because we aim to give you the tools to do Open Science based on your own underlying motivations.

Let's break out into groups again to discuss some of our motivations for doing Open Science.

What motivates you to do Open Science?
Do you feel that you fall into a particular "school"? If so, which one, and why?
Are there any motivating factors for doing Open Science that don't fit into this framework?

Ethos of Open Science

Doing Open Science requires us to understand the ethics of why working with data which do not belong to us is privileged.

We must also anticipate how these could be re-used in ways contrary to the interests of humanity.

Ensure the use of Institutional Review Boards (IRB) or your local ethical committee.

Areas to consider:

ethics assessment

Source: UK Statistics Authority

"Nothing about us, without us"

For more information (training):

Ethics and Data Access (General Information with BioMedical and Life Sciences Data) includes a legal and ethical checklist lesson for researchers around "FAIR Plus".

The Turing Way NASA Transform to Open Science Foster Open Science The Carpentries COS

Open Scholarship Grassroots Community Networks

International Open Science Networks

Center for Scientific Collaboration and Community Engagement (CSCCE)

Center for Open Science (COS)

Eclipse Science Working Group

eLife

NumFocus

Open Access Working Group

Open Research Funders Group

Open Science Foundation

Open Science Network

pyOpenSci

R OpenSci

Research Data Alliance (RDA)

The Turing Way

UNESCO Global Open Science Partnership

World Wide Web Consortium (W3C)

US-based Open Science Networks

CI Compass - provides expertise and active support to cyberinfrastructure practitioners at USA NSF Major Facilities in order to accelerate the data lifecycle and ensure the integrity and effectiveness of the cyberinfrastructure upon which research and discovery depend.

Earth Science Information Partners (ESIP) Federation - is a 501©(3) nonprofit supported by NASA, NOAA, USGS and 130+ member organizations.

Internet2 - is a community providing cloud solutions, research support, and services tailored for Research and Education.

Minority Serving Cyberinfrastructure Consortium (MS-CC) envisions a transformational partnership to promote advanced cyberinfrastructure (CI) capabilities on the campuses of Historically Black Colleges and Universities (HBCUs), Hispanic-Serving Institutions (HSIs), Tribal Colleges and Universities (TCUs), and other Minority Serving Institutions (MSIs).

NASA Transform to Open Science (TOPS) - coordinates efforts designed to rapidly transform agencies, organizations, and communities for Earth Science

OpenScapes - is an approach for doing better science for future us

The Quilt - non-profit regional research and education networks collaborate to develop, deploy and operate advanced cyberinfrastructure that enables innovation in research and education.

Oceania Open Science Networks

New Zealand Open Research Network - New Zealand Open Research Network (NZORN) is a collection of researchers and research-associated workers in New Zealand.

Australia & New Zealand Open Research Network - ANZORN is a network of local networks distributed without Australia and New Zealand.


Self Assessment

True or False: All research papers published in the top journals, like Science and Nature, are always Open Access?
Failure

False

Major Research journals like Science and Nature have an "Open Access" option when a manuscript is accepted, but they charge an extra fee to the authors to make those papers Open Access.

These high page costs are exclusionary to the majority of global scientists who cannot afford to front these costs out of pocket.

This will soon change, at least in the United States. The Executive Branch of the federal government recently mandated that future federally funded research be made Open Access after 2026.

True or False: an article states all of the research data used in the experiments "are available upon request from the corresponding author(s)," meaning the data are "Open"
Failure

False

In order for research to be open, the data need to be freely available from a digital repository, like Data Dryad, Zenodo.org, or CyVerse.

Data that are 'available upon request' do not meet the FAIR data principles.

True or False: Online Universities and Data Science Boot Camps like UArizona Online, Coursera, Udemy, etc. promote digital literacy and are Open Educational Resources?
Failure

False

These examples are for-profit programs which teach data science and computer programming online. Some may be official through public or private universities and offer credits toward a degree or a certificate. Some of these programs are known to be predatory.

The organizations we have listed above are Open Educational Resources - they are free and available to anyone who wants to work with them asynchronously, virtually, or in person.

Using a version control system to host the analysis code and computational notebooks, and including these in your Methods section or Supplementary Materials, is an example of an Open Methodology?
Success

Yes!

Using a VCS like GitHub or GitLab is a great step towards making your research more reproducible.

Ways to improve your open methology can include documentation of your physical bench work, and even video recordings and step-by-step guides for every part of your project.

You are asked to review a paper for an important journal in your field. The editor asks if you're willing to release your identity to the authors, thereby "signing" your review. Is this an example of "Open Peer Review"?
Failure

No

Just because you've given your name to the author(s) of the manuscript, this does not make your review open.

If the journal later publishes your review alongside the final manuscript, than you will have participated in an Open Review.

You read a paper where the author(s) wrote their own code and licensed as "Open Source" software for a specific set of scientific tasks which you want to replicate. When you visit their personal website, you find the GitHub repository does not exist (because its now private). You contact the authors asking for access, but they refuse to share it 'due to competing researchers who are seeking to steal their intellectual property". Is the software open source?
Failure

No

Just because an author states they have given their software a permissive software license, does not make the software open source.

Always make certain there is a LICENSE associated with any software you find on the internet.

In order for the software to be open, it must follow the Open Source Initiative definition


Last update: 2022-11-09