Sequence Retrieval of SARS-CoV Complete Genomes
2
0
Entering edit mode
3.4 years ago
pfee418 ▴ 10

Hi guys, I hope to only retrieve complete virus genomes on SARS-CoV (not SARS-CoV-2) and its strains. I tried to retrieve in NCBI Virus database but the search result showed all virus genomes mixed with SARS-CoV and SARS-CoV-2. Is there any ways where I can solely download SARS-CoV complete virus genomes? I'm okay with using other databases as well apart from NCBI Virus.

Thank you in advanced for all the suggestions and opinions.

alignment genome sarscov • 1.2k views
ADD COMMENT
1
Entering edit mode
ADD REPLY
3
Entering edit mode
3.4 years ago
vkkodali_ncbi ★ 3.7k

You can use NCBI Datasets for this. Navigate to the Coronavirus genomes page to use the web-based interface or use the command-line tool as follows:

## for sars-cov2 genomes
datasets download virus genome taxon sars-cov2 --complete-only

## for sars genomes
datasets download virus genome taxon 694009 --complete-only

It will download a package containing:

  • genomic.fna (genomic sequences)
  • cds.fna (nucleotide coding sequences)
  • protein.faa (protein sequences)
  • protein.gpff (protein sequence and annotation in GenPept flat file format)
  • protein structures in PDB format
  • data_report.jsonl (data report with viral metadata)
  • virus_dataset.md (README containing details on sequence file data content and other information)
  • dataset_catalog.json (a list of files and file types included in the dataset)

There are additional options that allow you to choose only specific file types for download.

ADD COMMENT
1
Entering edit mode

OP does NOT want SARS-CoV-2 genomes. Should your answer change to just sars-cov?

ADD REPLY
0
Entering edit mode

Thank you for noticing that I only want to retrieve SARS-CoV :)

ADD REPLY
0
Entering edit mode

Hi there, thank you for suggestions and the links. Looks like I had found a way to identify and find out SARS-CoV genomes through these websites. From Coronavirus genomes website, there is information/details that can be downloaded in the "Taxonomy" section. I have downloaded the csv file with information and slowly filter out all SARS-CoV-2 strains information and able to keep the SARS-CoV strains information. Thank you, the links are useful.

ADD REPLY
0
Entering edit mode

Yes, I missed that important detail. That said, the same tool can be used for sars-cov as well. If this is what the OP is looking for, then entering the taxid with the datasets command will do the trick. I will update my response.

ADD REPLY
1
Entering edit mode
3.4 years ago
GenoMax 141k

peifei0418 : Since genome sequencing was not as prevalent in early 2000's you are not going to find hundreds of genomes of the original coronavirus. There are only 2 entries for Bat Coronaviruses here.

ADD COMMENT
0
Entering edit mode

Oh I see. So, this means that there will be very few actual SARS-CoV strains sequences/genomes?

ADD REPLY
0
Entering edit mode

That is likely going to be the case.

ADD REPLY

Login before adding your answer.

Traffic: 1489 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6