I obtained a SNP list (VCF format) from a sample's transcriptome (assembled by Trinity). ORFs were identified using TransDecoder. How I classify these SNPs? How to calculate ka/ks?
I obtained a SNP list (VCF format) from a sample's transcriptome (assembled by Trinity). ORFs were identified using TransDecoder. How I classify these SNPs? How to calculate ka/ks?
You can annotate VCFs using SnpEFf, but you will first have to build the database: http://snpeff.sourceforge.net/SnpEff_manual.html#databases
This will (in the end) make a new VCF with non-synonymous SNPs and their severity (introduces STOP codons, changes AA, changes nothing etc.). SNPEff's report will also report the total number of Ka and Ks, if I remember correctly.
If you want to calculate Ka/Ks on a more detailed scale, use this script: https://github.com/MerrimanLab/selectionTools/blob/master/extrascripts/kaks.py which takes SNPEff's output and gives out a Ka/Ks table per gene
can we use SNPMeta for this? https://www.ncbi.nlm.nih.gov/pubmed/24237904
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Thanks very much. It's a good idea.
Dear Philipp, I am dealing with an analysis on a vcf that I gave as input to snpeff (from a population of more than 400 closely related individuals (among varieties and species - also polyploids)). Then I've used this script you shared to calculate Ka/Ks table per gene. This analysis is resulting in a very broad ka/ks distribution which goes from 0 to hundreds. Any comment or tip on this?
Most papers I've seen filter out 'extreme' Ks values, here's a totally random paper that removes Ks > 3 https://onlinelibrary.wiley.com/doi/full/10.1111/mec.15275 Your niche must have something similar to cite
Thanks for your answer. It looks weird to me as the kaks.txt file after running
gives me these quantiles for ks column (similar quantiles for ka). Is that maybe a too heterogeneous panel ? I am working with cereals.
while for ka/ks+1 i get
I have some off-the-chart dN/dS values for few genes (>20) when using your script on VCF obtained from a 20K plasmodium genomes (pf7K).
These genes should have low polymorphism as they are vaccine candidates as stated here. I am not sure if I did something wrong?