Are these available online somewhere? I've written up some code to build phylogenetic trees and infer ancestral sequences. I'd like to try them out on real-world data.
Are these available online somewhere? I've written up some code to build phylogenetic trees and infer ancestral sequences. I'd like to try them out on real-world data.
Perhaps download the genomes from NCBI (https://www.ncbi.nlm.nih.gov/nuccore/?term=txid2697049[Organism:noexp]) and use a tool like Biopython to extract the Spike region. Otherwise you need to make an account with GISAID if eligible, be accepted, then you can directly download alignments of the Spike region (nucleotide or amino acid).
The suggestion from GenoMax above is the easiest way. Go here: https://www.ncbi.nlm.nih.gov/datasets/coronavirus/proteins/ then tick the S gene, click download.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Easier to download the proteins from NCBI datasets page for SARS-CoV-2. Entire genomes available from this page.