Question: COSMIC non-coding mutants annotation thru ANNOVAR
0
gravatar for Santosh Anand
2.1 years ago by
Santosh Anand4.7k
Santosh Anand4.7k wrote:

I am trying to annotate both coding and non-coding variants for information on COSMIC database using ANNOVAR. ANNOVAR doesn't provide direct support for the latest release of COSMIC due to licensing issues. Instead, they direct users to build their own ANNOVAR-database for COSMIC following these guidelines: http://annovar.openbioinformatics.org/en/latest/user-guide/filter/#cosmic-annotations

I am able to build the coding variants' database using the guideline, but not the non-coding ones. And the wording in the manual seems like suggesting that it is not possible to do it for non coding variants:

COSMIC changed their data formats so non-coding mutations are no longer in the MutantExport file, so we can no longer calculate their occurrences in various tumors. COSMIC now provides a CosmicNCV.tsv file, but it is not really that informative as the cancer tissue information is missing from this file.

Is there a way out to do the annotation for non-coding variants in COSMIC using ANNOVAR?

My failed attempt:

~/utils/annovar/prepare_annovar_user.pl --buildver hg19 -dbtype cosmic <(zcat CosmicNCV.tsv.gz) -vcf <(zcat CosmicNonCodingVariants.vcf.gz) > hg19_cosmicNonCoding80.txt 2> hg19_cosmicNonCoding80.log

Error: COSMIC MutantExport format error: column 17 should be 'Mutation ID'

EDIT: Cross-posted on ANNOVAR discussion board. Shall update if there is any lead. http://annovar.openbioinformatics.org/en/latest/user-guide/filter

annovar non coding cosmic • 1.3k views
ADD COMMENTlink modified 2.0 years ago • written 2.1 years ago by Santosh Anand4.7k

Hi , Just a note i used ANNOVAR for long time and i changed for VEP and SnpEFF , because i found some exonic variants which were annotated as intronic by ANNOVAR. I think problems come from the database used by ANNOVAR but i never succeed to update it...

ADD REPLYlink modified 2.1 years ago • written 2.1 years ago by Titus850

Might be due to the fact that different transcript was used by ANNOVAR. Deciding which transcript to use is extremely tricky, and non of the annotators completely solve this problem. Some use the canonical, other the longest one, and others still something else :-(

ADD REPLYlink written 2.0 years ago by Santosh Anand4.7k

If you go with VEP/snpEff + GEMINI you get a bit of a better approach to the transcript issue. Although again, not perfect. snpEff annotates for all transcripts and GEMINI stores it, it then basically picks the transcript with the highest predicted impact. Ends up giving you some false positive hits with predicted high-impact variants in transcripts that aren't well supported but at least you end up not missing things.

ADD REPLYlink written 2.0 years ago by Dan Gaston7.1k

it then basically picks the transcript with the highest predicted impact

based on what?

ADD REPLYlink written 2.0 years ago by Santosh Anand4.7k

Most of time based on the longest for low annotation genes if i remember well .

ADD REPLYlink written 2.0 years ago by Titus850

snpEff translates the nucleotide variant into the protein level impact (sequence ontology). GEMINI categorizes those into HIGH, MED, and LOW categories. If there is more than one transcript whose impact is in the same category I believe it picks the longest transcript. But the HIGH, MED, LOW is basically just binning what you would expect, Stop gain, frameshift, splice donor/acceptor in high, medium is mostly missense mutations, low is synonymous and intronic, etc.

ADD REPLYlink written 2.0 years ago by Dan Gaston7.1k

GEMINI categorizes those into HIGH, MED, and LOW categories.

Actually, snpEFF does this categorization by itself (known as "Variant impact")!

ADD REPLYlink modified 2.0 years ago • written 2.0 years ago by Santosh Anand4.7k

GEMINI re-categorizes based on its own criteria from the Sequence Ontology Terms.

ADD REPLYlink modified 2.0 years ago • written 2.0 years ago by Dan Gaston7.1k

Thanks, I'll have a look.

ADD REPLYlink written 2.0 years ago by Santosh Anand4.7k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1874 users visited in the last hour