Tool:A resource with diverse databases and tools for SARS-CoV-2 (COVID-19) bioinformatics -- Updates
Entering edit mode
4.1 years ago
Michael ▴ 270

Dear bioinformatics and genomics community

We are in the process of setting up resources helpful for bioinformatics/genomics analyses related to SARS-CoV-2 (COVID-19).

So far we offer the following databases/tools on our website:

  • Up to date Centrifuge database including SARS-CoV-2 genome (including prokaryotes, viruses and human genome)
  • EDIT: We added a Kraken2 database with Viral genome plus Human including SARS-CoV-2
  • EDIT: We added a MetaMaps database "human and viral" including 88 SARS-CoV-2 genomes
  • Database with kmers specific for SARS-CoV-2 (from 19bp to 25 bp) based on 89 SARS-CoV-2 genomes and extensive check for off-targets
  • Database with kmers found in 89 (EDIT: Now 153) SARS-CoV-2 genomes examined (but not exclusively in SARS-CoV-2, in contrast to the specific kmers mentioned above)
  • Multiple sequence alignment (MSA) of all public complete SARS-CoV-2 genome assemblies(n=93 EDIT: Now 102)
  • Consensus based on MSA of 102 complete genomes
  • All complete SARS-CoV-2 genomes as one FASTA file (n=93 EDIT: Now 102). Yes, this is pretty trivial to generate, but some people might not be very familiar with NCBI's Databases.

We are currently extending our resources with new databases/tools and at the same time are working on improving the resources mentioned above.

Please visit

Please be aware that this is work in progress and if something is missing or not finished check again in a couple of hours. This is an effort to rapidly provide the community with tools related to SARS-CoV-2 genomics and bioinformatics. We really hope it turns out to be helpful!

If you want to help, please send a mail to the contact mentioned on the website.

Tell us what you miss and what you think can be improved.

Best regards,


  • UPDATE1: We now added Centrifuge database "human+virus RefSeq" including SARS-CoV-2
  • UPDATE2: We added Kraken2 database "human+virus RefSeq" including SARS-CoV-2
  • UPDATE3: We added more genomes to the kmer analysis (21th March 2020)
  • UPDATE4: We provide the 100 complete genomes used in the kmer analysis as one FASTA, 93 from NCBI GenBank, 7 from Chines National GenBank
  • UPDATE5: We now added Centrifuge database "human+PROKARYOTES+virus RefSeq" including SARS-CoV-2
  • UPDATE6: More complete genomes added to MSA and for download
  • UPDATE7: We added a MetaMaps database "human + viral"
  • UPDATE8: We added more genomes to the kmer analysis (n=153) (29th March 2020)
SARS-CoV-2 Corona Databases COVID-19 • 2.6k views
Entering edit mode

SARS-CoV-2 is the official name of this virus. Correct on your website but not in this post.

Entering edit mode

Sure, I was sometimes shortening it on the resources page but will adjust it here.

Entering edit mode

Thanks for putting this together.

The real shame is that GISAID still locks away genomic the data behind a phony registration window and extremely restrictive licensing.

We need to call on the German government to immediately stop the practice. I am going to make a new post on it.

Entering edit mode

Cool, thanks for the petition!

I am actually trying to get a hold on the GISAID assemblies and integrate it into analyses where the actual sequence is not directly visible. This should be fine I guess.

Entering edit mode

yes exactly, mention this on the petition thread.

I have known about the GISAID practice for a while now, reading about your resource made me realize that you like many others are restricted from doing all that you can

EDIT: you just did. great!


Login before adding your answer.

Traffic: 2477 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6