How to find out codon changes in non-synonymous and synonymous SNPs
Entering edit mode
8.0 years ago
Ric ▴ 390

I used snpEff and have got the results vcf file.

How is it possible to find out the most common codon changes i.e (CCG (Proline) to CCA (Proline)) and their number of events (i.e 300) in non-synonymous and synonymous SNPs?

snpeff snpsift SNP effect • 4.6k views
Entering edit mode
8.0 years ago

Can you paste a line from your snpEff output. The snpEff output file that I have can be easily parsed using awk one-liner.

grep "SYNONYMOUS"  input.snpeff |  awk '{split($0,a,"|"); print a[3]}' | awk '{split($0,b,"/"); print b[1],"\t",b[2]}'

produces the following result:

tAt      tTt
cTt      cGt
ggT      ggC
Cga      Aga
acG      acT
acA      acT

grep "SYNONYMOUS" takes care of both synonymous and non-synonymous snps. You can take the output then and do the counting. Is this what you need.

Entering edit mode
8.0 years ago
DG 7.2k

Keep in mind that your INFO field with the snpEFF annotations, depending on what organism/databases you are using to annotate with, can have multiple predicted effects. So if you are dealing with human data for instance you get various annotations due to multiple transcripts overlapping a position which can have different impacts.

You can use awk and grep in combination as @Ashutosh recommended. You can also use something PyVCF to parse your VCF file programmatically, although you will have to parse the INFO field yourself to parse the snpEFF effect(s). If you are dealing with model organisms data you could also use a tool like GEMINI to parse out the top scoring impact per variant for you and have everything stored in an sqlite3 database which you can then use to do your counts.

Quite a few different ways to approach this problem depending on your level of programming comfort and what system you are working in.


Login before adding your answer.

Traffic: 2338 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6