Question

Tool:bed_to_tabix: Download the variants from 1,000 Genomes in the regions defined in one or more BED files.

1

Entering edit mode

7.4 years ago

Juan Manuel Berros ▴ 120

I wrote this tool to easily get variant genotypes from different populations, using the data from The 1,000 Genomes Project. You just provide one or more BED files and you get a VCF.

I hope it's useful!

--

bed_to_tabix

bed_to_tabix will download a gzipped VCF file with the 2,504 genotypes from The 1,000 Genomes Project at the regions defined in one or more BED files. The utility will specifically handle for you the BED sorting, merging of many BEDs, parallel-downloading of the different chromosome variants with tabix (you can even use HTTP URLs in case your FTP traffic is blocked) and it will merge the resulting VCFs in a single gzipped VCF. Afterwards, it will perform a cleanup of the temporary files, so you're done with a single results file.

bed_to_tabix is written in Python, but it can be used as a command line tool without any knowledge of the language.

Installation instructions here: https://github.com/biocodices/bed_to_tabix

Example Usages:

# Download the regions in regions1.bed to regions1.vcf.gz
bed_to_tabix --in regions1.bed

# Download the regions in regions1.bed, 10 downloads at a time, to 1kg.vcf
bed_to_tabix --in regions1.bed --threads 10 --unzipped --out 1kg

# Download the regions in both bed files to regions1__regions2.vcf.gz
bed_to_tabix --in regions1.bed --in regions2.bed

# Download from the HTTP URLs in case your traffic to FTP is blocked
bed_to_tabix --in regions1.bed --http

CLI Linux Python 1000Genomes • 2.4k views

ADD COMMENT • link updated 9 months ago by Ram 43k • written 7.4 years ago by Juan Manuel Berros ▴ 120