if you annotate a vcf with snpeff, how do you know if the annotation are referring to the alternate or reference allele?
1
0
Entering edit mode
3.0 years ago
boxate1618 ▴ 60

for example I run snpEff my_vcf.vcf

where my_vcf.vcf is

#CHROM    POS    REF    ALT
chr1    97828139    G    A

will give annotation stop_gained. How do I know this is referring to the presence of the alternate allele A. Does `snpEff' really go through and look at ref and alt alleles of does it just update annotation based on position?

annotation snpsift snpeff vep • 2.2k views
ADD COMMENT
0
Entering edit mode

I would say that if it says stop_gained then I'd say that the most direct interpretation of that statement is that the presence of A leads to a stop_gain.

Does `snpEff' really go through and look at ref and alt alleles of does it just update annotation based on position

I am bit uncertain what you mean here. Why does it matter to matter to you "how" it knows that a stop codon is gained. Are you concerned that in some cases the annotation might be incorrect?

As always visualize your VCF relative to the annotation in IGV. Then, when zoomed in, and if translations tables are enabled in the view, you can view the wild type codons in different frames and you can immediately validate any statement.

ADD REPLY
1
Entering edit mode

Istvan,

I think OP's question is if the annotation is based on CHR-POS-REF-ALT or just CHR-POS set of fields. VCF annotation frequently happens by the annotator matching a subset of the four fields that form the quasi composite primary key with the library file (dbSNP VCF or whatever other ROD file), and while designing VCF annotation pipelines, we need to make sure which fields are being matched to the library annotation. I think this is what OP is concerned with. Some digging into snpEff documentation should reveal how they match.

ADD REPLY
0
Entering edit mode

yes, i am digging now

ADD REPLY
0
Entering edit mode

To follow up, I am guessing it is the first base before the pipe following ANN: https://pcingola.github.io/SnpEff/se_inputoutput/, but will check with authors

ADD REPLY
0
Entering edit mode

To clarify another example

#CHROM    POS    REF    ALT
chr1    97883329    G    A

is annotated as missense_variant, is the assumption that the alternate allele produces missense? The reference allele produces misense? I realize I could visualize manually, but if scanning dozens or hundreds as often the case it is more important to know the behavior of the software outright

ADD REPLY
0
Entering edit mode

If it is a missense variant, that means that the two alleles produce different amino acids. Determining which one is the missense mutation would require knowing which one is the ancestral allele and which one is the mutant. Otherwise, either could be considered missense variants of the other depending on which you consider to be the reference.

ADD REPLY
0
Entering edit mode

We provide SNPeff with an annotation, and we provide it with a VCF file. When we do so we implicitly state that the REF column of the VCF matches the annotation that SNPeff operates on.

In that scenario, the variant annotation ought to refer to the effect of the variant. I believe that all information is reported relative to what SNPeff is aware of as being the "reference" and I would be eager to know if that is not the case.

ADD REPLY
0
Entering edit mode

OP here, see my link to the docs above but to me it really looks like the spell out exactly which alt allele the annotation is referring to, so my guess would be that they match alt allele at that position. Let me know if you interpret otherwise

ADD REPLY
2
Entering edit mode
3.0 years ago
Emily 23k

Yes, it uses your ALT. If you want to test it, make a fake VCF where you change the same base to something that will make it missense or synonymous – you will get a different results.

ADD COMMENT

Login before adding your answer.

Traffic: 2764 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6