How to know synonymous and non-synonymous SNPs from transcriptome sequencing
2
1
Entering edit mode
6.1 years ago
smallfish ▴ 10

I obtained a SNP list (VCF format) from a sample's transcriptome (assembled by Trinity). ORFs were identified using TransDecoder. How I classify these SNPs? How to calculate ka/ks?

RNA-Seq SNP • 5.9k views
ADD COMMENT
5
Entering edit mode
6.1 years ago

You can annotate VCFs using SnpEFf, but you will first have to build the database: http://snpeff.sourceforge.net/SnpEff_manual.html#databases

This will (in the end) make a new VCF with non-synonymous SNPs and their severity (introduces STOP codons, changes AA, changes nothing etc.). SNPEff's report will also report the total number of Ka and Ks, if I remember correctly.

If you want to calculate Ka/Ks on a more detailed scale, use this script: https://github.com/MerrimanLab/selectionTools/blob/master/extrascripts/kaks.py which takes SNPEff's output and gives out a Ka/Ks table per gene

ADD COMMENT
0
Entering edit mode

Thanks very much. It's a good idea.

ADD REPLY
0
Entering edit mode

Dear Philipp, I am dealing with an analysis on a vcf that I gave as input to snpeff (from a population of more than 400 closely related individuals (among varieties and species - also polyploids)). Then I've used this script you shared to calculate Ka/Ks table per gene. This analysis is resulting in a very broad ka/ks distribution which goes from 0 to hundreds. Any comment or tip on this?

ADD REPLY
1
Entering edit mode

Most papers I've seen filter out 'extreme' Ks values, here's a totally random paper that removes Ks > 3 https://onlinelibrary.wiley.com/doi/full/10.1111/mec.15275 Your niche must have something similar to cite

ADD REPLY
0
Entering edit mode

Thanks for your answer. It looks weird to me as the kaks.txt file after running

grep '^#\|missense_variant\|synonymous_variant' annotated_vcf.vcf > mis_syn.txt 
python kaks.py mis_syn.txt > kaks.txt

gives me these quantiles for ks column (similar quantiles for ka). Is that maybe a too heterogeneous panel ? I am working with cereals.

0%   25%   50%   75%  100% 
0     6    42   178   13976 

while for ka/ks+1 i get

  0%          25%          50%          75%         100% 
   0          0.17        1.07          5.28        2192 
ADD REPLY
0
Entering edit mode

I have some off-the-chart dN/dS values for few genes (>20) when using your script on VCF obtained from a 20K plasmodium genomes (pf7K).

Gene Coordinate ka ks ka_div_ks_plus1
PF3D7_0930300 Pf3D7_09_v3:1201305-1207576 1241737 175042 7.09389692818
PF3D7_1133400 Pf3D7_11_v3:1292966-1296696 1252020 46689 26.8155922039
PF3D7_0424100 Pf3D7_04_v3:1082005-1084464 63592 3690 17.2289352479
PF3D7_0323400 Pf3D7_03_v3:980706-983966 63452 2016 31.458601884
PF3D7_0731500 Pf3D7_07_v3:1357251-1363653 366973 5520 66.4685745336
PF3D7_0423800 Pf3D7_04_v3:1075910-1077829 12070 2031 5.93996062992
PF3D7_0507500 Pf3D7_05_v3:307090-310120 57532 8246 6.97611252577

These genes should have low polymorphism as they are vaccine candidates as stated here. I am not sure if I did something wrong?

ADD REPLY
0
Entering edit mode
5.5 years ago
jfo • 0

can we use SNPMeta for this? https://www.ncbi.nlm.nih.gov/pubmed/24237904

ADD COMMENT

Login before adding your answer.

Traffic: 2721 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6