Hi all,
I am trying to create one viral reference file with all viral RefSeq genomes known in a human host. Basically something like the file created here but updated for 2021:
Create viral reference We were interested in exploring all viruses existing in humans. So we first obtained reference genomes of all known and sequenced human viruses obtained from NCBI (as of Sep 2015), and merged them into one file (referred to as the "viral reference file") in fasta file format. Merge all virus fasta file into one big fasta file called viruses.fa
Reference for the above citation: GitHub viGEN tutorial
I'd like to know what's the appropriate way to merge files to create *.fa file, or alternatively if anyone encountered a published reference file as this recently that would also help at this point.
All the best
taxID
) you are interested in.The
.fa
files can be simplycat
ed together to make multi-fasta file.This part is going to be tricky. Unless you have a clear list of virii you are interested in this information may not always be available.
Hi GenoMax,
Thanks for the answer.
What part did you mean is tricky, filtering the viral RefSeq according to human host? If so, do you think the following link solves it?
https://www.ncbi.nlm.nih.gov/genomes/GenomesGroup.cgi?taxid=10239&host=human
Best