Outputs

The BY-COVID project ran between 2021 and 2024.

The ultimate outcome of the project is that SARS-CoV-2 and other infectious disease data will be easier to access, share and analyse. This will enable the world to respond more quickly to infectious disease outbreaks. During the project there will also be specific outputs such as publications and deliverables. Deliverables will include reports and best practice guidelines. These outputs will appear here as the project progresses.

Publications

Dec 2023 | Plass M et al, New Biotechnology (2023)

Provenance of specimen and data – A prerequisite for AI development in computational pathology

In this paper, a framework is presented for recording and publishing provenance information to meet these requirements.

Oct 2023 | Meurisse M et al, BMC Med Res Methodol (2023)

Federated causal inference based on real-world observational data sources: application to a SARS-CoV-2 vaccine effectiveness assessment

The framework provides a systematic approach to address federated cross-national policy-relevant causal research questions based on sensitive population, health and care data in a privacy-preserving and interoperable way.

Oct 2023

Preparedness Data Hub

The system is intended to develop the tools (technical implementation) to allow the rapid deployment and configuration of a disease X scenario preparedness Data Hub.

Sep 2023 | Cannas A et al, Microorganisms (2023)

Epidemiological and molecular investigation of the heater–cooler unit (HCU)-related outbreak of invasive mycobacterium chimaera infection occurred in Italy

Here, we report the results of the epidemiological and molecular investigations conducted in Italy after the alarm raised about this epidemic event.

Sep 2023 | David, Romain, et al. CODATA Data Science Journal 22 (2023)

Umbrella Data Management Plans to integrate FAIR data : lessons from the ISIDORe and BY-COVID consortia for pandemic preparedness

Research data can have enduring value, as long as scientists can use, reuse and combine data sets.

Sep 2023 | David, Romain, et al. CODATA Data Science Journal 22 (2023)

"Be Sustainable", Recommendations for FAIR Resources in Life Sciences research: EOSC-Life's Lessons

LifeScience(LS) communities cover multiple scientific domains and carry out a diversity of research, from basic biological studies to applied epidemiological and environmental investigations.

Aug 2023 | Catalano A et al, Viruses (2023)

Antibiotic-Resistant ESKAPE Pathogens and COVID-19: The Pandemic beyond the Pandemic

The aim of this review is to highlight the state of the art of antibacterial resistance worldwide, focusing on the most important pathogens, namely Enterobacterales, Acinetobacter baumannii, and Klebsiella pneumoniae, and their resistance to the most common antibiotics.

July 2023 | Pipek O et al, Preprint (2023)

Systematic detection of co-infection and intra-host recombination in more than 2 million global SARS-CoV-2 samples

Here we present a comprehensive analysis of more than 2 million SARS-CoV-2 raw read datasets submitted to the European COVID-19 Data Portal to identify coinfections and intra-host recombination.

June 2023 | Kemmer et al., Histochemistry and Cell Biology

Building a FAIR image data ecosystem for microscopy communities

We outline a wide range of efforts and solutions currently being developed by the microscopy community to address these challenges on the path towards FAIR bioimaging data.

Apr 2023 | Karki et al., Bioinformatics advances

Mpox Knowledge Graph: a comprehensive representation embedding chemical entities and associated biology of Mpox

Using Knowledge Graph (KG) representations we have depicted chemical and biological aspects of MPXV.

Sep 2023 | David, Romain, et al. CODATA Data Science Journal 22 (2023)

Umbrella Data Management Plans to integrate FAIR data : lessons from the ISIDORe and BY-COVID consortia for pandemic preparedness

Research data can have enduring value, as long as scientists can use, reuse and combine data sets.

Sep 2023 | David, Romain et al. Zenodo preprint (2023)

"Be Sustainable", Recommendations for FAIR Resources in Life Sciences research: EOSC-Life's Lessons

LifeScience(LS) communities cover multiple scientific domains and carry out a diversity of research, from basic biological studies to applied epidemiological and environmental investigations.

June 2023 | Kemmer et al., Histochemistry and Cell Biology

Building a FAIR image data ecosystem for microscopy communities

We outline a wide range of efforts and solutions currently being developed by the microscopy community to address these challenges on the path towards FAIR bioimaging data.

Apr 2023 | Karki et al., Bioinformatics advances

Mpox Knowledge Graph: a comprehensive representation embedding chemical entities and associated biology of Mpox

Using Knowledge Graph (KG) representations we have depicted chemical and biological aspects of MPXV.

Apr 2023 | Wittner et al., Learning Health Systems

Toward a common standard for data and specimen provenance in life sciences

We present our effort to provide trustworthy machine-actionable documentation of the data lineage and specimens. Experts from the biotechnology and biomedical fields are invited to further contribute to the standard.

Apr 2023 | Chiara et al., Communications Biology

HaploCoV: unsupervised and rapid variant detection

HaploCoV enables the exploration of SARS-CoV-2 genomic diversity through space and time, to identify novel emerging viral variants and prioritize variants of potential epidemiological interest in a rapid and unsupervised manner.

Apr 2023 | Braukmann, Zenodo

Enabling FAIR data in the Dutch SSH community

A presentation given at the BY-COVID Spring 23 Use Cases Workshop: Integration of socioeconomic data in observational studies on vaccine effectiveness about the DANS Data Station Social Sciences and Humanities (SSH).

Nov 2022 | Conner et al., bioRxiv preprint

Towards increased accuracy and reproducibility in SARS-CoV2 sequence analysis

We examine the impact of sequencing technologies (Illumina and Oxford Nanopore) and 7 different downstream bioinformatic protocols on SARSCoV-2 variant calling as part of the NIH Accelerating COVID-19 Therapeutic Interventions and Vaccines (ACTIV) Tracking Resistance and Coronavirus Evolution (TRACE) initiative.

Nov 2022 | Mentes et al., Scientific Reports

Identification of mutations in SARS-CoV-2 PCR primer regions

We propose an analysis pipeline to discover genomic variations overlapping the target regions of commonly used PCR primer sets. These are in a publicly available format based on a dataset of more than 1.2 million SARS-CoV-2 samples.

Oct 2022 | Soiland-Reyes et al., Research Ideas and Outcomes

Updating Linked Data practices for FAIR Digital Object principles

We believe that by adopting Linked Data principles, we can accelerate FAIR Digital Object (FDO) and start building practical ways to assist scientists in efficiently answering topical questions based on knowledge graphs.

Oct 2022 | Soiland-Reyes et al., Research Ideas and Outcomes

Creating lightweight FAIR Digital Objects with RO-Crate

RO-Crate is a lightweight method to package research outputs along with their metadata, based on Linked Data principles. We present how we have followed the FAIR Digital Object (FDO) recommendations and turned research outcomes into FDOs by publishing RO-Crates on the Web using HTTP.

Oct 2022 | Bisognin et al., Mycobacteriology Spectrum

Investigating M.chimaera contamination in heater-cooler units

We found highly similar genetic and phenotypic profiles of M. chimaera isolated from heater-cooler units (HCU) used during surgery to thermo-regulate patients' body temperature and from the same hospital tap water, suggesting the need for environmental surveillance and associated control measures.

Oct 2022 | Zenodo preprint

The FAIR Cookbook - the essential resource for and by FAIR doers

We present the FAIR Cookbook, its creation and content, its value, use and adoptions, as well as the participatory process, collaborative plans for sustainability, and its adoption

Oct 2022 | Angewandte Chemie

Comprehensive Fragment Screening of the SARS-CoV-2 Proteome Explores Novel Chemical Space for Drug Development

The international Covid19-NMR consortium have identified binders targeting the RNA genome of SARS-CoV-2. We provide novel structural and chemical space for structure-based drug design against the SARS-CoV-2 proteome

Aug 2022 | Scientific Data

A lightweight distributed provenance model

We define a lightweight provenance model enabling generation of distributed provenance chains in complex, multi-organizational environments.

July 2022 | Zenodo

COVID-19 vaccine effectiveness assessment - CDM specification

The Common Data Model specification of the BY-COVID project (WP5) on COVID-19 vaccine effectiveness in preventing SARS-CoV-2 infection.

May 2022 | arXiv

Systemic barriers to pathogen-related data sharing

We report results of a study interviewing data professionals working with COVID-19-relevant data types including social media, mobility, viral genome, testing, infection, hospital admission and deaths.

Mar 2022 | Molecular Biology and Evolution

Clusters of unusual mutational changes in Omicron lineage BA.1

We propose that mutations in three clusters interact to mitigate their individual fitness costs and adaptively alter the function of Spike.

Mar 2022 | PLOS Computational Biology

10 Simple Rules for making a software tool workflow-ready

Workflows have become a core part of computational scientific analysis in recent years. This paper presents 10 simple rules for how a software tool can be prepared for workflow use.

Feb 2022 | Research Square preprint

Host genomes for SARS-CoV-2 variant leaked into Antarctic soil

We follow up a report of a contaminated metagenomic sample set from Antarctica containing traces of unique SARS-CoV-2 variants. We identify genetic material from mitochondria of Homo sapiens, green monkey and Chinese hamster, the latter two probably originating from cell lines used for studying viruses.

Jan 2022 | Data Science

Packaging research artefacts with RO-Crate

The aim of this paper is to introduce RO-Crate (an open, community-driven, and lightweight approach to packaging research artefacts along with their metadata in a machine readable manner) and assess it as a strategy for making multiple types of research artefacts FAIR.

Dec 2021 | Research Square Preprint

Host genomes for SARS-CoV-2 variant leaked into Antarctic soil

We follow up a report of a contaminated metagenomic sample set from Antarctica containing traces of unique SARS-CoV-2 variants. We identify genetic material from mitochondria of Homo sapiens, green monkey and Chinese hamster, the latter two probably originating from cell lines used for studying viruses.

Dec 2021 | Figshare

The response of the scholarly communication system to the COVID-19 pandemic

This paper analyses how the scholarly communication system – involving the production, evaluation, and dissemination of research outputs – has responded to this crisis, focusing on the period until mid-2021.

Nov 2021 | Zenodo

FAIR, ethical, and coordinated data sharing for COVID-19 response

Data sharing is central to the rapid translation of research into advances in clinical medicine and public health practice. This paper is a review of COVID-19 data sharing platforms and registries.

September 2021 | Nature Biotechnology

Ready-to-use public infrastructure for global SARS-CoV-2 monitoring

This paper presents the COVID-19 effort by the Galaxy Project, which pools free worldwide public computational infrastructure, making the analysis of deep sequencing data accessible to anyone while also providing an analytical framework for global pathogen genomic surveillance based on raw sequencing-read data.

Deliverables and Milestones

Outputs are also published on the BY-COVID Zenodo community.


CodeDueDescriptionResponsibility
D8.111/21Project Handbook initial release and periodic updatesWP8
D3.103/22Metadata standardsWP3
D7.103/22Dissemination, exploitation and communication PlanWP7
D8.2.102/22Project Data Management Plan initial release and periodic updatesWP8
D2.106/22Initial data and metadata harmonisation at domain level to enable fast responses to COVID-19WP2
D1.109/22Extended workflowsWP1
D3.209/22Implementation of cloud-based, high performance, scalable indexing systemWP3
D8.2.212/22Project Data Management Plan initial release and periodic updatesWP8
D8.1.203/23Project Handbook initial release and periodic updatesWP8
D2.206/23Data Access and Transfer across research domains and jurisdictionsWP2
D1.209/23Preparedness Data HubWP1
D3.3.109/23COVID-19 Data PortalWP3
D7.309/23Report on public engagement activitiesWP7
D5.311/23Hot Spot detection, samples data collection and mechanistic analysesWP5
D1.303/24Tracking and open analytics toolsWP1
D2.303/24Enabling data discovery at source using beacon-like mechanismsWP2
D4.303/24Provenance modelWP4
D7.203/24Public report showcasing industry value from infectious disease dataWP7
D4.204/24Common analysis environmentWP4
D5.105/24Enriched report viral variants and health outcomesWP5
D4.106/24Infectious diseases toolkitWP4
D3.3.207/24COVID-19 Data PortalWP3
D6.107/24Stakeholder engagement reportWP6
D6.207/24The training efforts reportWP6
D8.2.307/24Project Data Management Plan initial release and periodic updatesWP8
D8.307/24Report on sustainability plansWP8
D2.409/24Report on data sources discovery and integration for enabling data use and re-use in response to future outbreaksWP2
D5.209/24Secondary use of vaccine trial data and biosamplesWP5
D8.1.309/24Project Handbook initial release and periodic updatesWP8
CodeDueDescriptionResponsibility
M7.110/21Branding and communications guidelines
WP7
M7.211/21Launch of project website
WP7
M8.111/21Project mobilised. All governing boards and WPs established
WP8
M8.202/22DMP approved by the relevant project boards before submission
WP8
M1.103/22First support services in operation
WP1
M2.103/22Identified data sources have been registered in the BY-COVID reference catalogue
WP2
M5.102/22Compiled research questions and requirements Workshop 1
WP5
M6.103/22Stakeholder engagement (initial scoping and draft monitoring approach)
WP6
M5.409/22FAIR open-source pipeline
WP5
M6.209/22Identified training needs and roadmap
WP6
M7.309/22Industry sector mapping report
WP7
M4.109/22Common analysis environment
WP4
M4.209/22Prototype Infectious diseases toolkit
WP4
M2.201/23Identified the preferred mechanisms for data access and use of Real-world data
WP2
M1.203/23First globally comprehensive data set
WP1
M3.103/23Initial set of resources metadata mapped, indexed, and discoverable in COVID-19 Data Portal
WP3
M5.203/23Compiled research questions and requirements Workshop 2
WP5
M5.503/23Viral variant and health outcomes
WP5
M5.303/24Compiled research questions and requirements Workshop 3
WP5
M2.307/24Report on upgrade of clinical trial data and metadata
WP2