Question: 1000Genomes population allele frequencies for list of SNPs
2
gravatar for sonia.olaechea
8 months ago by
sonia.olaechea90 wrote:

Hi all,

I have a list of more than 100 SNPs (rsXXXXXX) and I would like to obtain the different allele frequencies that each of them shows in each of the 1000Genomes populations (if possible not manually...). Is there any tool, R package etc, that allows to download them all at once? I've thought that I could maybe obtain those frequencies from the UCSC database or directly from the 1000Genomes database, but I'm open to any suggestions. Thank you very much in advanced!

snps 1000genomes • 832 views
ADD COMMENTlink modified 8 months ago by Pierre Lindenbaum122k • written 8 months ago by sonia.olaechea90
2
gravatar for chrchang523
8 months ago by
chrchang5235.5k
United States
chrchang5235.5k wrote:

Solution 1: The raw variant call data can be downloaded from http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ . Once you have those files, the INFO column in each VCF file contains superpopulation allele frequencies, and there are a bunch of tools which can look up the INFO column entry for a particular rsID. The one complication is that, since there's a separate VCF file per chromosome, you may first need to figure out which rsIDs are on which chromosomes.

Solution 2: With plink 2.0, http://www.cog-genomics.org/plink/2.0/resources#1kg_phase3 provides a single merged dataset containing all chromosomes (download the boldfaced links, then rename phase3_corrected.psam to all_phase3.psam). Then,

plink2 --pfile all_phase3 vzs --extract [your list of rsIDs] --export vcf

can then be used to export a VCF with only the rsIDs you care about; the precomputed superpopulation allele frequencies will be in the INFO column of this freshly generated VCF. You can also define your own populations with --keep and compute allele frequencies on the fly with --freq.

ADD COMMENTlink written 8 months ago by chrchang5235.5k

Sorry, I don't know what might be happening but my plink2 returns that it doesn't recognise the "--pfile" option... I'm launching the command in a folder with the files all_phase3.pgen.zst, all_phase3.psam, all_phase3.pvar.zst and my list of rsIDs... is it possible that it is not recognising some of the files? Or is it more likely a problem of my plink2 installation?

Thanks again :)

ADD REPLYlink written 8 months ago by sonia.olaechea90
  1. You need to decompress the .pgen.zst file first; see the instructions at the top of the resources page.

  2. This requires plink 2.0, not 1.9. What do you get when you type “plink2 —version”?

ADD REPLYlink written 8 months ago by chrchang5235.5k
3
gravatar for Pierre Lindenbaum
8 months ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum122k wrote:
$ cat rslist.txt | while read R ; do wget -q -O - "https://www.ncbi.nlm.nih.gov/snp/${R}?download=frequency" | grep -E '^(#Study|1000Genomes)' | sed "s/^/${R}\t/" ; done

rs25    #Study  Population  Group   Samplesize  Ref Allele  Alt Allele  BioProject ID   BioSample ID
rs25    1000Genomes Global  Study-wide  5008    T=0.485 C=0.515 PRJEB6930   SAMN07490465
rs25    1000Genomes African Sub 1322    T=0.493 C=0.507     SAMN07486022
rs25    1000Genomes East Asian  Sub 1008    T=0.474 C=0.526     SAMN07486024
rs25    1000Genomes Europe  Sub 1006    T=0.521 C=0.479     SAMN07488239
rs25    1000Genomes South Asian Sub 978 T=0.52  C=0.48      SAMN07486027
rs25    1000Genomes American    Sub 694 T=0.38  C=0.62      SAMN07488242
rs26    #Study  Population  Group   Samplesize  Ref Allele  Alt Allele  BioProject ID   BioSample ID
rs26    1000Genomes Global  Study-wide  5008    T=0.335 C=0.665 PRJEB6930   SAMN07490465
rs26    1000Genomes African Sub 1322    T=0.404 C=0.596     SAMN07486022
rs26    1000Genomes East Asian  Sub 1008    T=0.291 C=0.709     SAMN07486024
rs26    1000Genomes Europe  Sub 1006    T=0.341 C=0.659     SAMN07488239
rs26    1000Genomes South Asian Sub 978 T=0.36  C=0.64      SAMN07486027
rs26    1000Genomes American    Sub 694 T=0.22  C=0.78      SAMN07488242
rs27    #Study  Population  Group   Samplesize  Ref Allele  Alt Allele  BioProject ID   BioSample ID
rs27    1000Genomes Global  Study-wide  5008    G=0.283 C=0.717 PRJEB6930   SAMN07490465
rs27    1000Genomes African Sub 1322    G=0.355 C=0.645     SAMN07486022
rs27    1000Genomes East Asian  Sub 1008    G=0.284 C=0.716     SAMN07486024
rs27    1000Genomes Europe  Sub 1006    G=0.261 C=0.739     SAMN07488239
rs27    1000Genomes South Asian Sub 978 G=0.29  C=0.71      SAMN07486027
rs27    1000Genomes American    Sub 694 G=0.16  C=0.84      SAMN07488242
rs28    #Study  Population  Group   Samplesize  Ref Allele  Alt Allele  BioProject ID   BioSample ID
rs28    1000Genomes Global  Study-wide  5008    C=0.517 T=0.483 PRJEB6930   SAMN07490465
rs28    1000Genomes African Sub 1322    C=0.601 T=0.399     SAMN07486022
rs28    1000Genomes East Asian  Sub 1008    C=0.476 T=0.524     SAMN07486024
rs28    1000Genomes Europe  Sub 1006    C=0.517 T=0.483     SAMN07488239
rs28    1000Genomes South Asian Sub 978 C=0.53  T=0.47      SAMN07486027
rs28    1000Genomes American    Sub 694 C=0.40  T=0.60      SAMN07488242
ADD COMMENTlink written 8 months ago by Pierre Lindenbaum122k

Thank you so much for this solution @Pierre. Could you please tell me how can i write this to a file versus printing it on my screen because I need to get the frequencies for around 500 variants. Thank you so much!

ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by dnyanada.gokhale10
1

how can i write this to a file versus printing it on my screen

https://www.tecmint.com/linux-io-input-output-redirection-operators/

ADD REPLYlink written 5 weeks ago by Pierre Lindenbaum122k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1494 users visited in the last hour