Question: How to find out codon changes in non-synonymous and synonymous SNPs
0
gravatar for Ric
4.8 years ago by
Ric190
Australia
Ric190 wrote:

I used snpEff and have got the results vcf file.

How is it possible to find out the most common codon changes i.e (CCG (Proline) to CCA (Proline)) and their number of events (i.e 300) in non-synonymous and synonymous SNPs?

snp effect snpeff snpsift • 2.7k views
ADD COMMENTlink modified 4.8 years ago by Dan Gaston7.1k • written 4.8 years ago by Ric190
1
gravatar for Ashutosh Pandey
4.8 years ago by
Philadelphia
Ashutosh Pandey11k wrote:

Can you paste a line from your snpEff output. The snpEff output file that I have can be easily parsed using awk one-liner.

grep "SYNONYMOUS"  input.snpeff |  awk '{split($0,a,"|"); print a[3]}' | awk '{split($0,b,"/"); print b[1],"\t",b[2]}' 

produces the following result:

tAt      tTt
cTt      cGt
ggT      ggC
Cga      Aga
acG      acT
acA      acT

grep "SYNONYMOUS" takes care of both synonymous and non-synonymous snps. You can take the output then and do the counting. Is this what you need. 

 

ADD COMMENTlink written 4.8 years ago by Ashutosh Pandey11k
1
gravatar for Dan Gaston
4.8 years ago by
Dan Gaston7.1k
Canada
Dan Gaston7.1k wrote:

Keep in mind that your INFO field with the snpEFF annotations, depending on what organism/databases you are using to annotate with, can have multiple predicted effects. So if you are dealing with human data for instance you get various annotations due to multiple transcripts overlapping a position which can have different impacts.

 

You can use awk and grep in combination as @Ashutosh recommended. You can also use something PyVCF to parse your VCF file programmatically, although you will have to parse the INFO field yourself to parse the snpEFF effect(s). If you are dealing with model organisms data you could also use a tool like GEMINI to parse out the top scoring impact per variant for you and have everything stored in an sqlite3 database which you can then use to do your counts.

 

Quite a few different ways to approach this problem depending on your level of programming comfort and what system you are working in.

ADD COMMENTlink written 4.8 years ago by Dan Gaston7.1k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1053 users visited in the last hour