Question: How to find out codon changes in non-synonymous and synonymous SNPs
gravatar for Ric
6.7 years ago by
Ric330 wrote:

I used snpEff and have got the results vcf file.

How is it possible to find out the most common codon changes i.e (CCG (Proline) to CCA (Proline)) and their number of events (i.e 300) in non-synonymous and synonymous SNPs?

snp effect snpeff snpsift • 3.7k views
ADD COMMENTlink modified 6.7 years ago by DG7.2k • written 6.7 years ago by Ric330
gravatar for Ashutosh Pandey
6.7 years ago by
Ashutosh Pandey12k wrote:

Can you paste a line from your snpEff output. The snpEff output file that I have can be easily parsed using awk one-liner.

grep "SYNONYMOUS"  input.snpeff |  awk '{split($0,a,"|"); print a[3]}' | awk '{split($0,b,"/"); print b[1],"\t",b[2]}'

produces the following result:

tAt      tTt
cTt      cGt
ggT      ggC
Cga      Aga
acG      acT
acA      acT

grep "SYNONYMOUS" takes care of both synonymous and non-synonymous snps. You can take the output then and do the counting. Is this what you need.

ADD COMMENTlink modified 12 months ago by _r_am32k • written 6.7 years ago by Ashutosh Pandey12k
gravatar for DG
6.7 years ago by
DG7.2k wrote:

Keep in mind that your INFO field with the snpEFF annotations, depending on what organism/databases you are using to annotate with, can have multiple predicted effects. So if you are dealing with human data for instance you get various annotations due to multiple transcripts overlapping a position which can have different impacts.

You can use awk and grep in combination as @Ashutosh recommended. You can also use something PyVCF to parse your VCF file programmatically, although you will have to parse the INFO field yourself to parse the snpEFF effect(s). If you are dealing with model organisms data you could also use a tool like GEMINI to parse out the top scoring impact per variant for you and have everything stored in an sqlite3 database which you can then use to do your counts.

Quite a few different ways to approach this problem depending on your level of programming comfort and what system you are working in.

ADD COMMENTlink modified 12 months ago by _r_am32k • written 6.7 years ago by DG7.2k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1665 users visited in the last hour