snpEff output gets zero results for synonymous SNPs
0
0
Entering edit mode
2.5 years ago

I have a vcf file after using snpEff annotation. I want to extract information from this file so I search for synonymous (and non-synonymous) sites. But when I do

grep -c "SYNONYMOUS" snpEffoutput.vcf


I get 0 results.

Does this mean there is something wrong with my file?

snpEff synonymous • 880 views
0
Entering edit mode

Can you show the command line you use , and part of your output file snpEffoutput.vcf ?

Best

0
Entering edit mode

Try: grep -c 'synonymous_variant' snpEffoutput.vcf?

0
Entering edit mode

Yes, this gives me 81949 results. But again, when I do

grep -c 'non_synonymous' snpEffoutput.vcf I still get 0 results.

0
Entering edit mode

You can add -csvStats when running snpEff: java -jar snpEff.jar eff -csvStats snpEffoutput.csv snpEff_database snpEffinput.vcf > snpEffoutput.vcf. There will be a section counting each effect in the csv:

\$ grep -A24 "# Count by effects" snpEffoutput.csv
# Count by effects

Type , Count , Percent
3_prime_UTR_variant , 92 , 0.147019%
5_prime_UTR_premature_start_codon_gain_variant , 9 , 0.014382%
5_prime_UTR_variant , 54 , 0.086294%
conservative_inframe_deletion , 9 , 0.014382%
conservative_inframe_insertion , 13 , 0.020774%
disruptive_inframe_deletion , 13 , 0.020774%
disruptive_inframe_insertion , 7 , 0.011186%
downstream_gene_variant , 15885 , 25.384726%
frameshift_variant , 167 , 0.266871%
initiator_codon_variant , 1 , 0.001598%
intergenic_region , 22782 , 36.406347%
intron_variant , 5008 , 8.00294%
missense_variant , 1667 , 2.663918%
splice_acceptor_variant , 12 , 0.019176%
splice_donor_variant , 8 , 0.012784%
splice_region_variant , 163 , 0.260479%
start_lost , 4 , 0.006392%
stop_gained , 28 , 0.044745%
stop_lost , 10 , 0.01598%
stop_retained_variant , 2 , 0.003196%
synonymous_variant , 1192 , 1.904853%
upstream_gene_variant , 15451 , 24.69118%

0
Entering edit mode

Thank you very much, that is every helpful. But, still, how do I get the non_synonymous?

Furthermore, I assume that the synonymous_variant are in the coding region?

0
Entering edit mode

Please check out the "Effect prediction details" section on Input & output files from snpEff document. Starting from version 4.0 VCF output uses SO terms by default, so the classic "NON_SYNONYMOUS_CODING" is now "missense_variant", "initiator_codon_variant", and "stop_retained_variant". If you add -classic when running snpEff, you can still count them by grep -c 'NON_SYNONYMOUS'.

Hope it helps.

0
Entering edit mode

You mean the command line to create the snpEffoutput.vcf file?

0
Entering edit mode

Yes but as @SMK said you can do grep -c 'synonymous_variant' snpEffoutput.vcf