Interpretation of snpeff results
1
1
Entering edit mode
6.7 years ago
misbahabas ▴ 70

Asslam o Alikum

I used snpeff for variants annotation of single in multiple species and I cannot understand how Interpret its results

##fileformat=VCFv4.1
##INFO=<ID=AB,Number=1,Type=String,Description="Alt Base">
##SnpEffVersion="4.3p (build 2017-06-06 09:55), by Pablo Cingolani"
##SnpEffCmd="SnpEff  GRCh37.75 AMY2B.vcf "
##INFO=<ID=ANN,Number=.,Type=String,Description="Functional annotations: 'Allele | Annotation | Annotation_Impact | Gene_Name | Gene_ID | Feature_Type | Feature_ID | Transcript_BioType | Rank | HGVS.c | HGVS.p | cDNA.pos / cDNA.length | CDS.pos / CDS.length | AA.pos / AA.length | Distance | ERRORS / WARNINGS / INFO' ">
##INFO=<ID=LOF,Number=.,Type=String,Description="Predicted loss of function effects for this variant. Format: 'Gene_Name | Gene_ID | Number_of_transcripts_in_gene | Percent_of_transcripts_affected'">
##INFO=<ID=NMD,Number=.,Type=String,Description="Predicted nonsense mediated decay effects for this variant. Format: 'Gene_Name | Gene_ID | Number_of_transcripts_in_gene | Percent_of_transcripts_affected'">
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  MP-Hsap_AM  MP-Lafr_AM  MP-Cang_AM  MP-Ptro_AM  MA-Phod_AM  MP-Pcoq_AM  MC-Clup_AM  MC-Fcat_AM  MA-Chir_AM  MR-Mmus_AM  MR-Jjac_AM  MR-Hgla_AM  MC-Mnat_AM  MH-Nleu_AM  MC-Oros_AM  MA-Sscr_AM  MP-Ppan_AM  MP-Mmul_AM  MA-Oari_AM  MA-Etel_AM  MC-Ptig_AM  AA-Xtro_AM  MA-Bbub_AM  MC-Lwed_AM  MP-Ggor_AM  MA-Bmut_AM  MA-Bind_AM  
1   2   .   -   T,A .   .   AB;ANN=T|intergenic_region|MODIFIER|CHR_START-DDX11L1|CHR_START-ENSG00000223972|intergenic_region|CHR_START-ENSG00000223972|||n.2->T||||||,A|intergenic_region|MODIFIER|CHR_START-DDX11L1|CHR_START-ENSG00000223972|intergenic_region|CHR_START-ENSG00000223972|||n.2->A||||||  .   .   .   T   .   .   .   .   A   .   .   .   A   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .
1   4   .   -   T,G .   .   AB;ANN=T|intergenic_region|MODIFIER|CHR_START-DDX11L1|CHR_START-ENSG00000223972|intergenic_region|CHR_START-ENSG00000223972|||n.4->T||||||,G|intergenic_region|MODIFIER|CHR_START-DDX11L1|CHR_START-ENSG00000223972|intergenic_region|CHR_START-ENSG00000223972|||n.4->G||||||  .   .   .   T   .   .   .   .   .   .   .   .   G   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .
1   5   .   -   T,A .   .   AB;ANN=T|intergenic_region|MODIFIER|CHR_START-DDX11L1|CHR_START-ENSG00000223972|intergenic_region|CHR_START-ENSG00000223972|||n.5->T||||||,A|intergenic_region|MODIFIER|CHR_START-DDX11L1|CHR_START-ENSG00000223972|intergenic_region|CHR_START-ENSG00000223972|||n.5->A||||||  .   .   .   T   .   .   .   .   .   .   .   .   A   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .
1   6   .   -   G,T .   .   AB;ANN=G|intergenic_region|MODIFIER|CHR_START-DDX11L1|CHR_START-ENSG00000223972|intergenic_region|CHR_START-ENSG00000223972|||n.6->G||||||,T|intergenic_region|MODIFIER|CHR_START-DDX11L1|CHR_START-ENSG00000223972|intergenic_region|CHR_START-ENSG00000223972|||n.6->T||||||  .   .   .   G   .   .   .   .   .   .   .   .   T   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .

Please help me to do this, I used snpeff first time and not understand how interpret it

Thanks

vcf snp annotation variants • 6.7k views
ADD COMMENT
0
Entering edit mode

I added markup to your post for increased readability. You can do this by selecting the text and clicking the 101010 button. When you compose or edit a post that button is in your toolbar, see image below:

101010 Button

ADD REPLY
0
Entering edit mode

Have you tried the manual?

Especially in the part Input and Output files it clearly explains the output you get.

ADD REPLY
0
Entering edit mode

thankx for reply ,

I read the manual but i cannot understand warnings in vcf files like

WARNING_REF_DOES_NOT_MATCH_GENOME

I used human AMY2B genes to align different species AMY2B gene and than annotate using human as a ref database but warnings in vcf file

1   139242  .   -   C,A,T   .   .   AB;ANN=C|missense_variant|MODERATE|AL627309.1|ENSG00000237683|transcript|ENST00000423372|protein_coding|1/2|c.68->G|p.Pro23Arg|138/2661|68/780|23/259||**WARNING_TRANSCRIPT_NO_START_CODON**&**WARNING_REF_DOES_NOT_MATCH_GENOME**,A|missense_variant|MODERATE|AL627309.1|ENSG00000237683|transcript|ENST00000423372|protein_coding|1/2|c.68->T|p.Pro23Leu|138/2661|68/780|23/259||WARNING_TRANSCRIPT_NO_START_CODON&**WARNING_REF_DOES_NOT_MATCH_GENOME**,T|missense_variant|MODERATE|AL627309.1|ENSG00000237683|transcript|ENST00000423372|protein_coding|1/2|c.68->A|p.Pro23Gln|138/2661|68/780|23/259||**WARNING_TRANSCRIPT_NO_START_CODON**&WARNING_REF_DOES_NOT_MATCH_GENOME,C|TF_binding_site_variant|MODIFIER|||CTCF|MA0139.1|||n.139242->C||||||,A|TF_binding_site_variant|MODIFIER|||CTCF|MA0139.1|||n.139242->A||||||,T|TF_binding_site_variant|MODIFIER|||CTCF|MA0139.1|||n.139242->T||||||,C|upstream_gene_variant|MODIFIER|RP11-34P13.15|ENSG00000268903|transcript|ENST00000494149|processed_pseudogene||n.-3347->G|||||3347|,A|upstream_gene_variant|MODIFIER|RP11-34P13.15|ENSG00000268903|transcript|ENST00000494149|processed_pseudogene||n.-3347->T|||||3347

in the manual "This happens when your data was aligned to a different reference genome than the one used to create SnpEff's database. If there are many of these warnings, it's a strong indicator that the data doesn't match and all the annotations will be garbage (because you are using the wrong database)."

But i used human gene as a ref in multiple sequence alignment and than human database used for annotation please give me any idea about it

ADD REPLY
0
Entering edit mode

So, now your question is becoming way more specific and you also shared more information on how you obtained the data. That is very important because 12 hours ago this question was not about WARNING_REF_DOES_NOT_MATCH_GENOME.

The warning even wasn't shown in your example! You need to understand that you have to give us information in order to answer your question accurately.

So you aligned using one human gene in multiple sequence alignment and then used the genome wide snpeff database. So that's indeed a mismatch between your vcf and the human database. I assume this question follows on this one: How find variants between 26 genes sequence Do I guess correctly that you have fasta sequences of multiple species and want to check the differences?

As you can see, the vcf states that these variants are on "chromosome 1, position 2,4,5,6", which is obviously not the case. That's not the location of the gene. I'm not sure how to fix this. One approach I can think of is to align your sequences to the human reference genome using LAST, attempt variant calling on that and use the resulting vcf for annotation using SnpEff.

But please be more specific when asking questions.

ADD REPLY
0
Entering edit mode

Thankx Its helpful

yeah I have fasta sequences of multiple species and want to check the differences, I used mafft for alignment and snp-sites for variants(vcf), this vcf used for annotation by snpeff but after annotation it give error like above

1   90510   .   G   A   .   .   AB;ANN=A|downstream_gene_variant|MODIFIER|RP11-34P13.7|ENSG00000238009|transcript|ENST00000477740|lincRNA||n.*1720C>T|||||1720|,A|non_coding_transcript_exon_variant|MODIFIER|RP11-34P13.7|ENSG00000238009|transcript|ENST00000466430|lincRNA|4/4|n.1533C>T||||||WARNING_REF_DOES_NOT_MATCH_GENOME,A|non_coding_transcript_exon_variant|MODIFIER|RP11-34P13.8|ENSG00000239945|transcript|ENST00000495576|lincRNA|1/2|n.596C>T||||||WARNING_REF_DOES_NOT_MATCH_GENOME    .   .   .   .   .   A   A   .   .   .   .   A   .   .   A   .   A   .   .   .   A   .   .   A   A   .   A   A
1   90511   .   A   G,T .   .   AB;ANN=G|downstream_gene_variant|MODIFIER|RP11-34P13.7|ENSG00000238009|transcript|ENST00000477740|lincRNA||n.*1719T>C|||||1719|,T|downstream_gene_variant|MODIFIER|RP11-34P13.7|ENSG00000238009|transcript|ENST00000477740|lincRNA||n.*1719T>A|||||1719|,G|non_coding_transcript_exon_variant|MODIFIER|RP11-34P13.7|ENSG00000238009|transcript|ENST00000466430|lincRNA|4/4|n.1532T>C||||||WARNING_REF_DOES_NOT_MATCH_GENOME,T|non_coding_transcript_exon_variant|MODIFIER|RP11-34P13.7|ENSG00000238009|transcript|ENST00000466430|lincRNA|4/4|n.1532T>A||||||WARNING_REF_DOES_NOT_MATCH_GENOME,G|non_coding_transcript_exon_variant|MODIFIER|RP11-34P13.8|ENSG00000239945|transcript|ENST00000495576|lincRNA|1/2|n.595T>C||||||WARNING_REF_DOES_NOT_MATCH_GENOME,T|non_coding_transcript_exon_variant|MODIFIER|RP11-34P13.8|ENSG00000239945|transcript|ENST00000495576|lincRNA|1/2|n.595T>A||||||WARNING_REF_DOES_NOT_MATCH_GENOME  .   .   .   .   .   .   G   .   T   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .
1   90512   .   G   A   .   .   AB;ANN=A|downstream_gene_variant|MODIFIER|RP11-34P13.7|ENSG00000238009|transcript|ENST00000477740|lincRNA||n.*1718C>T|||||1718|,A|non_coding_transcript_exon_variant|MODIFIER|RP11-34P13.7|ENSG00000238009|transcript|ENST00000466430|lincRNA|4/4|n.1531C>T||||||WARNING_REF_DOES_NOT_MATCH_GENOME,A|non_coding_transcript_exon_variant|MODIFIER|RP11-34P13.8|ENSG00000239945|transcript|ENST00000495576|lincRNA|1/2|n.594C>T||||||WARNING_REF_DOES_NOT_MATCH_GENOME    .   .   .   .   .   .   A   A   A   A   A   A   .   A   .   A   A   .   .   A   A   A   A   .   .   .   .   .
1   90513   .   T   A,C .   .   AB;ANN=A|downstream_gene_variant|MODIFIER|RP11-34P13.7|ENSG00000238009|transcript|ENST00000477740|lincRNA||n.*1717A>T|||||1717|,C|downstream_gene_variant|MODIFIER|RP11-34P13.7|ENSG00000238009|transcript|ENST00000477740|lincRNA||n.*1717A>G|||||1717|,A|non_coding_transcript_exon_variant|MODIFIER|RP11-34P13.7|ENSG00000238009|transcript|ENST00000466430|lincRNA|4/4|n.1530A>T||||||WARNING_REF_DOES_NOT_MATCH_GENOME,C|non_coding_transcript_exon_variant|MODIFIER|RP11-34P13.7|ENSG00000238009|transcript|ENST00000466430|lincRNA|4/4|n.1530A>G||||||WARNING_REF_DOES_NOT_MATCH_GENOME,A|non_coding_transcript_exon_variant|MODIFIER|RP11-34P13.8|ENSG00000239945|transcript|ENST00000495576|lincRNA|1/2|n.593A>T||||||WARNING_REF_DOES_NOT_MATCH_GENOME,C|non_coding_transcript_exon_variant|MODIFIER|RP11-34P13.8|ENSG00000239945|transcript|ENST00000495576|lincRNA|1/2|n.593A>G||||||WARNING_REF_DOES_NOT_MATCH_GENOME  .   .   .   .   .   .   .   .   .   .   A   .   .   .   .   .   C   .   .   .   .   .   C   .   .   .   .   .

LAST is a local aligner I want multiple sequence alignment

ADD REPLY
1
Entering edit mode
6.6 years ago

if it helps, here is the output of vcf2table for your variants:

>>1/2/N (n. 1)
 Variant
 +--------+-------+
 | Key    | Value |
 +--------+-------+
 | CHROM  | 1     |
 | POS    | 2     |
 | end    | 2     |
 | ID     | .     |
 | REF    | N     |
 | ALT    | T,A   |
 | QUAL   |       |
 | FILTER |       |
 | Type   | SNP   |
 +--------+-------+
 ANN
 +-------------------+--------+----------+-------------------+---------------------------+-------------------+---------------------------+--------+
 | SO                | Allele | Impact   | GeneName          | GeneId                    | FeatureType       | FeatureId                 | HGVsc  |
 +-------------------+--------+----------+-------------------+---------------------------+-------------------+---------------------------+--------+
 | intergenic_region | T      | MODIFIER | CHR_START-DDX11L1 | CHR_START-ENSG00000223972 | intergenic_region | CHR_START-ENSG00000223972 | n.2->T |
 | intergenic_region | A      | MODIFIER | CHR_START-DDX11L1 | CHR_START-ENSG00000223972 | intergenic_region | CHR_START-ENSG00000223972 | n.2->A |
 +-------------------+--------+----------+-------------------+---------------------------+-------------------+---------------------------+--------+
<<1/2/N (n. 1)


>>1/4/N (n. 2)
 Variant
 +--------+-------+
 | Key    | Value |
 +--------+-------+
 | CHROM  | 1     |
 | POS    | 4     |
 | end    | 4     |
 | ID     | .     |
 | REF    | N     |
 | ALT    | T,G   |
 | QUAL   |       |
 | FILTER |       |
 | Type   | SNP   |
 +--------+-------+
 ANN
 +-------------------+--------+----------+-------------------+---------------------------+-------------------+---------------------------+--------+
 | SO                | Allele | Impact   | GeneName          | GeneId                    | FeatureType       | FeatureId                 | HGVsc  |
 +-------------------+--------+----------+-------------------+---------------------------+-------------------+---------------------------+--------+
 | intergenic_region | T      | MODIFIER | CHR_START-DDX11L1 | CHR_START-ENSG00000223972 | intergenic_region | CHR_START-ENSG00000223972 | n.4->T |
 | intergenic_region | G      | MODIFIER | CHR_START-DDX11L1 | CHR_START-ENSG00000223972 | intergenic_region | CHR_START-ENSG00000223972 | n.4->G |
 +-------------------+--------+----------+-------------------+---------------------------+-------------------+---------------------------+--------+
<<1/4/N (n. 2)


>>1/5/N (n. 3)
 Variant
 +--------+-------+
 | Key    | Value |
 +--------+-------+
 | CHROM  | 1     |
 | POS    | 5     |
 | end    | 5     |
 | ID     | .     |
 | REF    | N     |
 | ALT    | T,A   |
 | QUAL   |       |
 | FILTER |       |
 | Type   | SNP   |
 +--------+-------+
 ANN
 +-------------------+--------+----------+-------------------+---------------------------+-------------------+---------------------------+--------+
 | SO                | Allele | Impact   | GeneName          | GeneId                    | FeatureType       | FeatureId                 | HGVsc  |
 +-------------------+--------+----------+-------------------+---------------------------+-------------------+---------------------------+--------+
 | intergenic_region | T      | MODIFIER | CHR_START-DDX11L1 | CHR_START-ENSG00000223972 | intergenic_region | CHR_START-ENSG00000223972 | n.5->T |
 | intergenic_region | A      | MODIFIER | CHR_START-DDX11L1 | CHR_START-ENSG00000223972 | intergenic_region | CHR_START-ENSG00000223972 | n.5->A |
 +-------------------+--------+----------+-------------------+---------------------------+-------------------+---------------------------+--------+
<<1/5/N (n. 3)

(...)

ADD COMMENT

Login before adding your answer.

Traffic: 1356 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6