Question: Finding SNPs in 1000genomes populations
0
gravatar for andrew.ghaiyed
16 months ago by
andrew.ghaiyed0 wrote:

Hi All,

I would like to identify SNPs that help distinguish Japanese and Chinese populations using 1000genomes data. Currently I am a little lost with all the potential programs and software packages to download and was wondering if anyone can direct me to a straight forward pipeline. I need to be able to "rank" in some way the possible set of SNPs and therefore prioritise the level of discrimination power of a SNP in separating two populations.

Thanks in advance!

gwas 1000genomes • 586 views
ADD COMMENTlink modified 16 months ago • written 16 months ago by andrew.ghaiyed0

Thank you very much Kevin,

I was able to download plink and set up a directory fine but when trying to download the VCF.gz files I see this error bash: wget: command not found -bash: wget: command not found -bash: wget: command not found -bash: wget: command not found -bash: wget: command not found -bash: wget: command not found -bash: wget: command not found -bash: wget: command not found

I have tried copy pasting each line of text and writing manually but nothing seems to change the result.

Thanks again,

Andrew

ADD REPLYlink written 16 months ago by andrew.ghaiyed0

It's possible that wget is not installed on your computer. You can try sudo apt install wget (you will require administrator rights, and will have to provide a password).

Otherwise, try curl -O instead of wget

ADD REPLYlink written 16 months ago by Kevin Blighe39k

Thanks Kevin you have been an incredible help!

I was able to get wget on my computer and moved a bit further along. I have managed to download the PED file and the reference files but struggling when converting 1000 Genomes files to BCF. my input code is (Volumes/Seagate is the directory on my external hard drive):

or chr in {1..22} X; do

    bcftools norm -Ou -m-any /Volumes/Seagate/1000Genomes/chr$chr.1kg.phase3.v5.vcf.gz | bcftools norm -Ou -f /Volumes/Seagate/ReferenceMaterial/1000Genomes/human_g1k_v37.fasta | bcftools annotate -Ob -I +'%CHROM:%POS:%POS:%REF:%ALT' > /Volumes/Seagate/1000Genomes/chr$chr.1kg.phase3.v5.bcf ;

    bcftools index /Volumes/Seagate/1000Genomes/chr$chr.1kg.phase3.v5.bcf ;

done

but plink returns with :

[main] Unrecognized command.

Just wondering if you knew what the most likely cause was..

Thanks again!

ADD REPLYlink written 16 months ago by andrew.ghaiyed0

Hey Andrew, are you specifying the --bcf command line parameter when trying to read into PLINK?

plink --noweb --bcf chr$chr.1kg.phase3.v5.bcf --keep-allele-order --vcf-idspace-to _ --const-fid --allow-extra-chr 0 --split-x b37 no-fail --make-bed --out /chr$chr.1kg.phase3.v5 ;
ADD REPLYlink written 16 months ago by Kevin Blighe39k
0
gravatar for Kevin Blighe
16 months ago by
Kevin Blighe39k
Republic of Ireland
Kevin Blighe39k wrote:

Hey Andrew,

You could follow my tutorial here: Produce PCA for 1000 Genomes Phase III in VCF format

You will have to adapt it to your own needs; however, even just by following this simple example, you will be capable of identifying SNPs in the 1000 Genomes Phase III populations that distinguish the major population groups. You should be easily able to focus down on just the Japanese and Chinese groups.

I have already built an ethnicity predictive model from the 1000 Genomes data that has >99.999% sensitivity (non-commercial). I have been using it in a private project in order to help identify samples with unreported ethnicity. It was built in R using glm() and has been cross-validated.

Good luck, Kevin

ADD COMMENTlink written 16 months ago by Kevin Blighe39k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1024 users visited in the last hour