How To Filter Vcf Files From 1000 Genomes Release V3.2010-11 (Alternative Source)?
3
0
Entering edit mode
11.9 years ago
user56 ▴ 300

I want to use VCF files from WGS to arrive at pharmacogenomics clinical recommendations (relevant to a single patient, not a population).

I decided to use VCF as standard for input data and 1000 genomes as the test population. I belive the files I need are here: ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/release/20110521/ (if not, please comment on that)

The problem is that the files are too big. For example chromosome 6 data for all populations is 9 GB big. All chomomosomes data would be 80+ GB.

Example of chr6 file: ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/release/20110521/ALL.chr6.phase1_release_v3.20101123.snps_indels_svs.genotypes.vcf.gz

Is there any alternative source where the 1000genomes data would be in different shapes?

What I would want would be to:

  • use only call coming from SNPSOURCE=EXOME
  • I would like to filter the file only to known SNPs (within dbSNP)
  • Filter out all INDELS.
  • Make the number of genomes smaller (e.g., 1 patient or no more than 50 patients)

Is the only way to download 80GB, let it crunch for a long time? (and for me also improve my linux knowledge (I am windows and SQL and R person). Any advice greatly appreciated.

p.s. I seems all genomic stuff is in files. I am good with large databases and could do what I need much easier in a database. After all, a VCF file is like a database table.

1000genomes vcf • 3.6k views
ADD COMMENT
1
Entering edit mode
11.9 years ago
Laura ★ 1.8k

You could use tabix to stream these files from the ftp site and filter the sites you don't want out. You could also reduce the number of individuals if you wanted aswell

We have more info about how to use tabix on in our faq http://www.1000genomes.org/faq/how-do-i-get-sub-section-vcf-file

ADD COMMENT
0
Entering edit mode
11.9 years ago
hershman ▴ 40

I was unable to find exome calls from the 1000 genomes project about a month back. One option to avoid downloading the files is to play with them on Amazon

ADD COMMENT
0
Entering edit mode
11.9 years ago
thamathpanda ▴ 40

VCFtools bro

A database would probably be slower fyi.

ADD COMMENT

Login before adding your answer.

Traffic: 1971 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6