Question: Beagle files using the latest 1000 genomes
0
gravatar for jamespoweraid2
4.0 years ago by
United States
jamespoweraid20 wrote:

Hi,

I would like to get the latest beagle files from vcf files from phase 3 of the 1000 genomes data with 2504 unrelated individuals that is here:

http://bochet.gcc.biostat.washington.edu/beagle/1000_Genomes_phase3_v5a/, which uses these 1000 Genomes vcf files: ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/

 

In particular I am trying to get something like what was available for the previous releases to create the files:

ALL.chr1.phase1_release_v2.20101123.filt.bgl.gz         
ALL.chr16.phase1_release_v2.20101123.filt.tabix.gz      
ALL.chr1.phase1_release_v2.20101123.filt.markers        

 

Would I need to use the script here with the BEAGLE utilities?

https://data.broadinstitute.org/srlab/BEAGLE/1kG-beagle-release3/READ_ME_beagle_phase1_v3

 

Thank you so much for any advice about how to get these files in beagle, very very much appreciated...

1000genomes • 1.9k views
ADD COMMENTlink modified 4.0 years ago by Kamil2.0k • written 4.0 years ago by jamespoweraid20
2
gravatar for Kamil
4.0 years ago by
Kamil2.0k
Boston
Kamil2.0k wrote:

Use the BEAGLE tools to change the file format. Here's an example that should get you started:

wget https://faculty.washington.edu/browning/beagle/bref.09Nov15.d2a.jar
wget http://bochet.gcc.biostat.washington.edu/beagle/1000_Genomes_phase3_v5a/individual_chromosomes/chr22.1kg.phase3.v5a.bref
java -jar bref.09Nov15.d2a.jar chr22.1kg.phase3.v5a.bref | gzip > chr22.1kg.phase3.v5a.vcf.gz
zcat chr22.1kg.phase3.v5a.vcf.gz | head -n6 | cut -c1-100 | grep -v '^#' | perl -ane 'print join("\t",@F[0..4]),"\t"; $i=0; foreach $G (@F[9..$#F]) { @A = split("\\|", $G, 2); print " " if $i++; print $F[3+$A[0]]," ",$F[3+$A[1]]; }; print "\n"'

Output

22    16050115    rs587755077    G    A    G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G
ADD COMMENTlink modified 6 days ago by RamRS25k • written 4.0 years ago by Kamil2.0k

Thank you very much Kamil!

Would this get me the same files as if I ran the script here then?

-- I am trying to get the .filt.bgl.gz, filt.tabix.gz, .filt.markers to be able to run EPIGWAS--

https://data.broadinstitute.org/srlab/BEAGLE/1kG-beagle-release3/READ_ME_beagle_phase1_v3

But using this version of the genome instead?

wget ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ALL*
wget ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/phase1*
wget ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/README*

Thanks again!!

ADD REPLYlink modified 6 days ago by RamRS25k • written 4.0 years ago by jamespoweraid20

For filtered variants, you might consider taking the files from the BEAGLE website instead of the 1000 Genomes website. The developer of BEAGLE filtered the variants from 1000 Genomes.

ADD REPLYlink modified 4.0 years ago • written 4.0 years ago by Kamil2.0k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1901 users visited in the last hour