Vcf File With Variants And Genotypes For A "Single Individual"
4
1
Entering edit mode
11.6 years ago
Nasrin ▴ 30

Hi there,

I am working with a software that needs a VCF file with variants and genotypes for a single individual and a single chromosome. I searched and found this link to download data but the vcf file contains several individual information. Please help me download this data.

ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/release/20110521/ALL.chr10.phase1_release_v3.20101123.snps_indels_svs.genotypes.vcf.gz

Thanks in advance for any advice

vcf snp • 4.8k views
ADD COMMENT
4
Entering edit mode
11.6 years ago

Use VCFtools:

vcf-query file.vcf chr:start-end sample name

This will give you exactly what you want.

ADD COMMENT
2
Entering edit mode

you can also use vcf-subset -c sample_name to acheive the same end

This is described in the 1KG faq http://www.1000genomes.org/faq/how-do-i-get-sub-section-vcf-file

ADD REPLY
3
Entering edit mode
11.6 years ago
Johan ▴ 890

To extract the genotype information from a single individual you can use the SelectVariantWalker from the Genome Analysis Toolkit. Download the GATK from here: http://www.broadinstitute.org/gatk/download, and then run the walker with a command looking something like this:

java -Xmx2g -jar GenomeAnalysisTK.jar -T SelectVariants -V [your vcf file] --sample_name [name of your sample]

If you run this over the file that you linked to, you should get all the genotypes for the individual on chr 10. If you have a file containing genotype data from more than one chromosome you could use the "--select_expressions" option to select genotypes from a specific chromosome. Checkout the documentation for more info: http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_gatk_walkers_variantutils_SelectVariants.html#--select_expressions

ADD COMMENT
0
Entering edit mode
11.6 years ago
Nasrin ▴ 30

since files are very large is this possible to just download genotypes for specific individual?

ADD COMMENT
0
Entering edit mode
11.6 years ago
user56 ▴ 300

I had the same problem. I eventually used Complete Genomics public data set of only 69 genomes. And I used tabix unix program to extract just part of it. A 78GB problem turned into <5 MB file problem.

ADD COMMENT
0
Entering edit mode

The OP could have used tabix over HTTP to extract a subset of the original file, since the 1000 Genomes data are gzip compressed. This was not the issue. He wanted to extract information for a single sample from a multiple sample file.

ADD REPLY
0
Entering edit mode

yes, the problem is exactly what you mentioned above.

ADD REPLY

Login before adding your answer.

Traffic: 2635 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6