Question: How to create GTF file for virus and build database for influenza virus using annovar for annotating virus variants?
0
gravatar for bioinforesearchquestions
3.9 years ago by
United States
bioinforesearchquestions260 wrote:

Dear All,

I am working on influenza virus. I would like to annotate variants from virus using annovar. I am facing issue in building the database for virus. I have just 8 genes for the influenza virus and have their refseq id. I tried finding the genome id but I couldn't . So I saved all the 8 gene's fasta sequence in one file and used it for building the database for annovar. The annovar program throws an error saying that I don't have refGene.txt file. 

Can I create custom GTF file for influenza virus based on these 8 genes?

Have anyone faced similar issue?

Can you suggest any other variant annotation tool apart from annovar?

ADD COMMENTlink modified 3.9 years ago by pld4.8k • written 3.9 years ago by bioinforesearchquestions260
0
gravatar for pld
3.9 years ago by
pld4.8k
United States
pld4.8k wrote:

The problem is that a GFF is basically a list of where genes/etc are within a given chromosome (genome in the case of viruses). You can't make one with just CDS sequences. You need the genome and the locations of those sequences within that virus.

http://www.ncbi.nlm.nih.gov/genome/?term=influenza

There are a few genomes here, some of them have GFFs, or you can use some of the tools available to convert the .gbk to GTF. For viruses, it can be tricky, I've changed over to making them by hand it isn't too hard.

http://www.ncbi.nlm.nih.gov/genomes/FLU/Database/nph-select.cgi?go=genomeset

There are genomes here, but they're in fasta format so you'd have to do some leg work to create a GFF.

You could also try here, if you're using publicly available data, this might be the best option.

http://www.fludb.org/brc/home.spg?decorator=influenza

ADD COMMENTlink written 3.9 years ago by pld4.8k

Thanks Joe for your suggestions. For example, I am interested in these two strains Influenza A virus (A/California/07/2009(H1N1)) and (A/Texas/50/2012(H3N2). From following link http://www.ncbi.nlm.nih.gov/genome/10290, I have following ids,

NC_026431.1

NC_026432.1

NC_026433.1

NC_026434.1

NC_026435.1

NC_026436.1

NC_026437.1

NC_026438.1 

I don't have genomes for these two strains. Instead I have segments refseq id. So what are the fields manually I need to capture.

ADD REPLYlink modified 3.9 years ago • written 3.9 years ago by bioinforesearchquestions260
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 780 users visited in the last hour