Question: Variant Annotation Tools For Viral Genomes
gravatar for potassiumiodide0990
5.1 years ago by
United States
potassiumiodide099070 wrote:

Hi everyone!

I was wondering if there are any variant annotation tools (like SeattleSeq is for whole human genome) for viral genome that are freely available and also easy to use? Can vcf formats be applied to viral genomes too?

variant vcftools • 2.4k views
ADD COMMENTlink modified 5.1 years ago by Charles Warden6.5k • written 5.1 years ago by potassiumiodide099070
gravatar for Charles Warden
5.1 years ago by
Charles Warden6.5k
Duarte, CA
Charles Warden6.5k wrote:

I think snpEff can work with any genome, and I think it even has some pre-compiled viral references:

That said, I have tried to use general amino acid substitution patterns to prioritize among a handful of non-synonymous viral variants and I found that the least likely substitution was not actually the variant causing a phenotype of interest (as determined by comparing a wider variety of strain sequences). So, I'm not sure if snpEff is really going to be the absolute best strategy.

If you are interested in HSV, I think these are valuable resources to get an idea about natural variation. Otherwise, I'm afraid you'll need to find comparable resources among your own niche of virus research. Either way, you'll have to do some extra work beyond just providing a .vcf file and getting annotation information.

ADD COMMENTlink written 5.1 years ago by Charles Warden6.5k

Hi there! Actually I was planning on working with HIV. But thanks for your help... I'll try working SnpEff out. Also, where do I get the HIV-1 reference genome?

ADD REPLYlink written 5.1 years ago by potassiumiodide099070

You can look for sequences in NCBI Nucleotide. However, I urge you to not use a single genome reference for interpretation of your results. Otherwise, you run the risk that I described above: you may pick the mutant that is predicted to be the most deleterious, but there is a good chance the prediction is not precise (and comparisons across variants in multiple strains with or without your phenotype of interest is a much better way of narrowing down your options).

ADD REPLYlink written 5.1 years ago by Charles Warden6.5k

Okay... thanks for the suggestion... what if i'm just interested in HIV-1 reverse transcriptase (RT) nucleotide sequence? I want its reference and using this reference, apply variant annotation techniques to assess my sample set ... what would you suggest? Also, where do i get a reference just for this protein ? (genbank doesnt really have a reference... or maybe i'm searching it wrong )

ADD REPLYlink modified 5.1 years ago • written 5.1 years ago by potassiumiodide099070

NCBI contains all types of nucleotide sequences (not just genomic - also transcript, EST, cDNA, etc.). You can search "HIV-1 reverse transcriptase" to get a lot of hits, but I think you want something more specific.

You can use the same strategy for NCBI Protein (you can switch between databases using the pull-down tab). However, I think SnpEff will only work with nucleotide sequences (and I'm not sure if it is really well designed for a single gene, either way).

Perhaps something like base-by-base is better for finding variants among NCBI nucleotide entries (as opposed to Illumina short-read data, for example):

ADD REPLYlink modified 5.1 years ago • written 5.1 years ago by Charles Warden6.5k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1875 users visited in the last hour