Question

Using Tabix And Vcf Tools To Get Cnv / Sv Frequencies From 1000 Genomes Data

2

Entering edit mode

12.4 years ago

Ryan D ★ 3.4k

I read this excellent post by Stephen on getting data from 1000 genomes with tabix, but it seems to not be working for me. I use tabix to get the data in the following manner:

tabix -fh ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20110521/ALL.chr22.phase1_integrated_calls.20101123.snps_indels_svs.genotypes.vcf.gz 22:1000000-10000000 > ~/delete.vcf

It gets a vcf file, but the file only seems to have headers, no variants info... like so:

##INFO=<ID=SNPSOURCE,Number=.,Type=String,Description="indicates if a snp was called when analysing the low coverage or exome alignment data">
##reference=GRCh37
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  HG00096...

So after running vcftools, obviously I get nothing since there is not genotype data:

vcftools --vcf ~/delete.vcf --freq --out ~/delete.txt

VCFtools - v0.1.7
(C) Adam Auton 2009

Parameters as interpreted:
        --vcf /home/delahar/delete.vcf
        --freq
        --out /home/delahar/delete.txt

Reading Index file.
File contains 0 entries and 1092 individuals.
Applying Required Filters.
After filtering, kept 1092 out of 1092 Individuals
After filtering, kept 0 out of a possible 0 Sites
Error:No data left for analysis!

I'm guessing this is an issue with the way I'm using tabix. Ultimately I want to get the fields that have VT=SV in their column. So extra help on getting that would be greatly appreciated.

Thanks,

Rx

genome tabix vcftools cnv • 5.2k views

ADD COMMENT • link updated 10.2 years ago by Biostar 20 • written 12.4 years ago by Ryan D ★ 3.4k

0

Entering edit mode

I have heard from 2 people that they can't get Tabix to retrieve data from the internet...

ADD REPLY • link 12.4 years ago by Zev.Kronenberg 12k

score 4 · Answer 1 · 2011-12-09

4

Entering edit mode

12.4 years ago

Adam ★ 1.0k

Your tabix command is returning no data as there are no SNPs in that region of chr22. The first SNPs on chr22 are around the 16Mb mark. Try:

tabix -fh <ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20110521/ALL.chr22.phase1_integrated_calls.20101123.snps_indels_svs.genotypes.vcf.gz> 22:1000000-16052250

And see if that returns some data.

ADD COMMENT • link 12.4 years ago by Adam ★ 1.0k

0

Entering edit mode

Thanks. I finally figured it out. You were exactly right.

ADD REPLY • link 12.4 years ago by Ryan D ★ 3.4k