Question: Snpsift extractFields : How Fields Annotation ANN[*].XX is working ?
gravatar for ZheFrench
4.8 years ago by
ZheFrench300 wrote:

I use Snpeff 4.2 to annotate a VCF and then I use 'Snpsift extractFields' transform the vcf into txt as explained Here

I meet a strange behavior, i don't understand.

My code : 

        ${JAVA7}/java -jar -Xmx4G ${SNPEFF}snpEff.jar -c ${SNPEFF}snpEff.config  ${SNPEFF_VERSION} -noStats -t ${NAME_INPUT_PATH_TO_FILE}  > ${SAMPLE_FILE}.5.vcf

         ${JAVA7}/java -jar ${SNPSIFT}SnpSift.jar extractFields ${SAMPLE_FILE}.5.vcf -e "." -s "," "CHROM" "POS" "ID" "REF" "ALT" "FILTER" "AF" "AC" "DP" "MQ" "ANN[*].ALLELE" "ANN[*].EFFECT" "ANN[*].IMPACT"  > ${SAMPLE_FILE}.snpeff.txt

I get a kind of duplicates for the ANN.fields. If i set ANN[0]. , I will get no duplicates but what I will loose the other infos . I don't understand to what they correspond. The first line of my file give the same effect but for the second line, you  can see that the third effect given is different from the two previous. Moreover, sometimes we can get several effects join by "&", why all the effects are not joined by this "&".

chr1 955597 rs115173026 G T PASS 0.500 1 12 60.0 T,T synonymous_variant,synonymous_variant LOW,LOW
chr1 1267325 rs200330269 G GC PASS 0.500 1 206 59.86 GC,GC,GC downstream_gene_variant,downstream_gene_variant,intron_variant MODIFIER,MODIFIER,MODIFIER
chr1 987200 rs9803031 C T PASS 1.00 2 55 60.0 T,T splice_region_variant&intron_variant,splice_region_variant&intron_variant LOW,LOW


Don't understand why there is multiple ANN[*].ALLELE

For example..​ In fact they are the same in line one. (even the effects)

I'd like to have a clean readable txt file to open in xls (one info per column for each variant) and understand how it works could be great :)



snpeff annotation vcf • 3.8k views
ADD COMMENTlink modified 4.8 years ago by Biostar ♦♦ 20 • written 4.8 years ago by ZheFrench300

Maybe that is because there can be multiple transcripts for a particular gene?

E.g for Transcript 1, rs9803031 correspond to the splice region variant; In transcript 2, it in the intron, etc?

ADD REPLYlink modified 10 months ago by RamRS30k • written 4.8 years ago by Sam3.2k

Yeap the multiple transcripts stuff is the key :) Thanks

ADD REPLYlink written 4.8 years ago by ZheFrench300
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1244 users visited in the last hour