Question: variant annotation issue
0
gravatar for prasundutta87
21 months ago by
prasundutta87280
prasundutta87280 wrote:

Hi,

I have recently performed a variant annotation using Snpeff in water buffalo. The variants were called through bcftools call program (multiallelic caller) from Samtools package and are based on RNAseq reads. The BAM files were produced by aligning the RNAseq reads on the water buffalo genome available in NCBI Genome website.

SnpEff annotated many variants as downstream gene variants which is not right because the variants are called based on RNAseq reads. Can anyone throw any light on this?

ADD COMMENTlink modified 21 months ago by Brian Bushnell16k • written 21 months ago by prasundutta87280
2

Are you sure you used the same assembly for mapping and annotation ?

ADD REPLYlink written 21 months ago by Pierre Lindenbaum115k

Yes. I did. There us only one draft assembly present in the NCBI genome database for a Mediterranean water buffalo.

ADD REPLYlink written 21 months ago by prasundutta87280

I hope I am not missing anything logically. I am not sure what exactly to look for or check in order to find the reason behind it. Any suggestion is welcome.

ADD REPLYlink modified 21 months ago • written 21 months ago by prasundutta87280

You could try annotating with a different annotator and compare results. That could probably help you understand what's going wrong.

ADD REPLYlink written 21 months ago by vakul.mohanty230

Yes..I am trying to do that with VEP..but ensemble does not have water buffalo genome. I am trying to use their standalone Perl version and custom use it by using the water buffalo genome and annotation file present in NCBI Genome database.

ADD REPLYlink written 21 months ago by prasundutta87280
0
gravatar for Brian Bushnell
21 months ago by
Walnut Creek, USA
Brian Bushnell16k wrote:

Genomes are often higher quality than transcriptomes. If the transcriptome is incomplete (which it always is, for vertebrates) you will get RNA reads mapping to genes that were not annotated. This is a good thing, as it can advance science! You've possibly discovered previously unknown genes.

ADD COMMENTlink written 21 months ago by Brian Bushnell16k

The thing is that I annotated the variants using SNpEff using -onlyprotein option (meaning-Only use protein coding transcripts). I though that becasue of this non-coding genes and probable genes will be avoided becasue I am focusing on only specific type of genes. I guess -onlyprotein option did not work correctly or I think I understood it wrongly.

ADD REPLYlink written 21 months ago by prasundutta87280

It seems like you might be confused about the nature of annotation. The water buffalo genome is not complete. The water buffalo transcriptome is even less complete. So, you can map reads to an official water buffalo reference, but the results will not be perfectly correct. It does not matter which variant-caller or annotation software you use - if the reference or annotation is incomplete, you will get incorrect results.

ADD REPLYlink written 21 months ago by Brian Bushnell16k

Yes..the genome/transcriptome is not complete..what I will do is that I will complement my rnaseq variant calling with DNAseq data variant call and hopefully get a consensus..

ADD REPLYlink written 21 months ago by prasundutta87280
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1732 users visited in the last hour