What to do with a bunch of Viral Genomes
2.2 years ago
tw617 ▴ 40

I made a script to retrieve all of the recorded full-length human lung infecting coronaviruses in genbank (virus variation database, ncbi). After trimming out all of the really small files or those that didnt work, I'm left with ~200 fasta files to work with. What are some analyses I can run with the new COVID-19 genome? I made a blast database and ran some blastn queries but I'm wondering what bioinformatic analyses are typically run on novel viruses?

My blastn query of word size 7 had a few hits but I am wondering how to interpret these results and what to do with that data.

This is just for fun/educational. I don't have much experience with viruses or comparative genomics.

Thanks

2.2 years ago
GenoMax 115k

Since you selected these strains (which are similar) there is not much of a point in doing BLAST analysis. Instead, you can start doing a multiple sequence alignment with the sequences. If you are serious about learning then there are command line version of MAFTT, T-COFFEE, Clustal , MUSCLE. You will also find web front-ends for these tools (if you search around). A multiple sequence alignment gives you an idea of relationship of the sequences to each other. This can then be used to infer possible evolutionary relationships among the genomes.

MEGA is user friendly software that has a GUI for doing above analysis. They have a pretty good online manual as well.

Thank you very much! I will report back with my findings (if I can figure it out :)).

We have some bioinformatics resources related to SARS-CoV-2 (Also an MSA with MUSCLE, as suggested above). You can check them out. Maybe you find something of interest: https://genexa.ch/sars2-bioinformatics-resources/