Question: Estimating site information from 1000 genomes vcf data
0
gravatar for spiral01
2.8 years ago by
spiral01100
spiral01100 wrote:

Hi. I am working with the phase 3 1000 genomes vcf data (available here: http://www.internationalgenome.org/data) and need to estimate the number of synonymous and non-synonymous sites

For my analysis, if I had details of what type of site is occurring - e.g. non-synonymous change at a 0-fold site, synonymous change at 2-fold site, then I could restrict my analysis to 0-fold and 4-fold sites and just count those sites and the numbers of polymorphisms at them.

However, I do not have complete codon information. The VCF files provide the reference allele and the alternative allele but not the codon within which the allele is located (which I would need to calculate whether a site is 0-fold etc). Is there any way of obtaining this information? I know UCSC has this data, but their set of alleles seem to be incomplete when compared to the data taken directly fromm 1000genomes.

If this is not possible, I would be grateful for any other suggested methods that might work.

snp • 724 views
ADD COMMENTlink written 2.8 years ago by spiral01100
3
gravatar for WouterDeCoster
2.8 years ago by
Belgium
WouterDeCoster43k wrote:

Can't you just annotate the vcf files using VEP or snpeff? That will give you the aminoacid substitution and mutation impact.

ADD COMMENTlink written 2.8 years ago by WouterDeCoster43k

Hi, I've just seen that nestled in the annotation is the codon information as you suggested. Many thanks, and apologies for the unnecessary question!

ADD REPLYlink written 2.8 years ago by spiral01100
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1933 users visited in the last hour