SNPs in population; vcftools
1
0
Entering edit mode
6.6 years ago
DanielC ▴ 170

Dear ALL,

I have been looking for a way to find "which nsSNP (with rs ID number like rs769971095) belong to what population(s), and if possible what gender"? I came to know of the vcftools but how this goal can be achieved is what I am struggling with? If am right, this info can be taken from vcf files from here "ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp//release/20130502/supporting/functional_annotation" (basically from the "filtered" and "unfiltered" folders of the provided link? ).Can you please let me know what could be the solution to this? Thank you so much! please let me know if something is not clear.

Regards, DK

SNP • 1.5k views
ADD COMMENT
2
Entering edit mode
6.6 years ago

ENSEMBL's browser allows you to search for lots of information on SNPS, including allele frequencies in each population group studied by 1000 Genomes: http://grch37.ensembl.org/index.html

Other than that, you could literally run a program like ANNOVAR or Variant Effect Predictor on all 1000 Genomes SNPs and short indels if you wanted very comprehensive annotation. Take a look at my thread here to see how you could download 1000 Genomes in VCF format and then annotate it. In my protocol, you can download a PED file, which contains gender-specific information for each of the 1000 Genomes samples.

Edit: SNPs don't 'belong' to any particular population. The vast majority have varying allele frequencies in each population, some with higher frequencies than in others. A minority of SNPs have 0% frequencies in certain groups, as they have only been encountered in very isolated population groups

ADD COMMENT
1
Entering edit mode

Thanks much! The link you provided is also helpful. However, if you could please share a way where if given a rsnumber ; lets say " rs769971095", I could get the allele frequencies for each population it belong to, from the vcf files or a source you know is better. Also, taking help from your post I could plot it too. Thanks!

ADD REPLY
2
Entering edit mode

For that particular SNP, it appears that the C allele dominates all populations. The only population where T appears is SAS (South-East Asian). You can take a look here: http://grch37.ensembl.org/Homo_sapiens/Variation/Population?db=core;r=3:12625875-12626875;v=rs769971095;vdb=variation;vf=135759093

I got that information by following my link to ENSEMBL (above), searching for your SNP and then clicking on 'Population Genetics' when the search record appeared

The SAS group contains:

  • GIH Gujarati Indian from Houston, Texas
  • PJL Punjabi from Lahore, Pakistan
  • BEB Bengali from Bangladesh
  • STU Sri Lankan Tamil from the UK
  • ITU Indian Telugu from the UK

source: http://www.internationalgenome.org/category/phenotype/

ADD REPLY
0
Entering edit mode

Thanks much! Based on your suggestion I think I am getting closer to results. Could you please give your comments on these following related queries :

a) I found the respective population genetics info for 2 rsIDs; rs559632360 & rs769971095

For rs769971095 the super-population it shows is: ALL, AFR, AMR, ASJ, EAS, FIN, NFE, OTH, SAS. For rs559632360 the super-population it shows is: ALL, AFR, AMR, EAS, SAS, EUR.

For rs559632360 rsID, it also shows population genetics from "1000 Genomes Project Phase 3 & gnomAD exomes" along with "subpopulation" information, whereas, for rs769971095 it shows only "gnomeAD exomes" population genetics.

http://grch37.ensembl.org/Homo_sapiens/Variation/Population?db=core;r=3:12625875-12626875;v=rs769971095;vdb=variation;vf=135759093

http://grch37.ensembl.org/Homo_sapiens/Variation/Population?db=core;r=3:12632759-12633759;v=rs559632360;vdb=variation;vf=92299087#population_freq_SAS

Does this mean that for "rs769971095" there is no "1000 genomes project phase 3" data available? I am interested to know if these two rsIDs belong to one population, so, can it be said that these rsIDs share same population? If yes, what population they share? It would be great if I could know how to make a reasonable interpretation for this.

Also, I need to do this for many rsIDs, could you please let me know how this process can be automated?

Thanks much! DK

ADD REPLY
0
Entering edit mode

Yes, they appear to generally have similar frequencies across each population. The only difference is that rs559632360 has a higher frequency in Non-Finnish Europeans (NFE), whereas rs769971095 has a higher frequency in SAS (South-East Asians). I would not look too much into the fact that 1 was listed under 1000 Genomes Phase III, whilst the other was not. As far as I know, they are still documenting all variants identified in 1000 Genomes and many may not have even made it to dbSNP yet (or else they have already been identified by other projects, like gnomADe).

To automate this process is not easy! This tool on ENSEMBL's website may be what you need (it outputs all sorts of info, including allele frequencies - check the output options further down the page): http://grch37.ensembl.org/Homo_sapiens/Tools/VEP?db=core

Alternatively, there is this, which allows you to look up SNPs within defined regions.

Hope that this helps somewhat.

ADD REPLY

Login before adding your answer.

Traffic: 3314 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6