Spike protein sequences for COVID-19 variants
1
0
Entering edit mode
2.4 years ago
offbynull • 0

Are these available online somewhere? I've written up some code to build phylogenetic trees and infer ancestral sequences. I'd like to try them out on real-world data.

COVID COVID-19 • 1.0k views
ADD COMMENT
0
Entering edit mode

Easier to download the proteins from NCBI datasets page for SARS-CoV-2. Entire genomes available from this page.

ADD REPLY
2
Entering edit mode
2.4 years ago
cfos4698 ★ 1.1k

Perhaps download the genomes from NCBI (https://www.ncbi.nlm.nih.gov/nuccore/?term=txid2697049[Organism:noexp]) and use a tool like Biopython to extract the Spike region. Otherwise you need to make an account with GISAID if eligible, be accepted, then you can directly download alignments of the Spike region (nucleotide or amino acid).

ADD COMMENT
0
Entering edit mode

Sorry I'm not too familiar with NCBI's interface. Is there a way to download these ~2 million sequences (or some subset) in bulk or do I have to make a script to download each FASTA individually??

ADD REPLY
0
Entering edit mode

The suggestion from GenoMax above is the easiest way. Go here: https://www.ncbi.nlm.nih.gov/datasets/coronavirus/proteins/ then tick the S gene, click download.

ADD REPLY

Login before adding your answer.

Traffic: 2054 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6