Question: Reindex A Vcf File / How To Handle Cosmic Vcf Files
2
gravatar for secretjess
5.9 years ago by
secretjess170
Cambridge
secretjess170 wrote:

So I have managed to successfully query 1000 Genome VCF files before by following tutorials, I thought I could apply this to the COSMIC database. I can't.

I'm trying to make use of some WGS data from COSMIC. As I don't know the file structure I ran the following in an attempt to view the headers:

tabix -H ftp://ngs.sanger.ac.uk/production/cosmic/wgs/CosmicCodingMuts_v64_26032013_noLimit_wgs.vcf.gz
[get_local_version] downloading the index file...
[kftp_connect_file] 550 No such file.
[download_from_remote] fail to open remote file.
[tabix] failed to load the index file.

...so then I downloaded the file:

tabix -H CosmicCodingMuts_v64_26032013_noLimit_wgs.vcf.gz 
[tabix] the index file either does not exist or is older than the vcf file. Please reindex.

How do I reindex a vcf file? I tried this (I don't know what I'm doing at all, sorry):

tabix -f CosmicCodingMuts_v64_26032013_noLimit_wgs.vcf.gz 
[tabix] was bgzip used to compress this file? CosmicCodingMuts_v64_26032013_noLimit_wgs.vcf.gz

....and so in a last ditch attempt I tried just running vcftools (I thought this would filter out any mutations not on X):

vcftools --gzvcf CosmicCodingMuts_v64_26032013_noLimit_wgs.vcf.gz --chr X --out test

VCF index is older than VCF file. Will regenerate.
Building new index file.

Reading Index file.
File contains 641910 entries and 0 individuals.
Filtering by chromosome.
(list of chromosomes)
Skipping Remainder.
Keeping 28481 entries on specified chromosomes.
Applying Required Filters.
After filtering, kept 0 out of 0 Individuals
After filtering, kept 28481 out of a possible 28481 Sites

So didn't work as expected either. I then tried running tabix again but it still gives me the error that an index file doesn't exist.

Are there any read me files or guides that I've just not been able to find? I literally have no idea what I'm doing.

vcf tabix vcftools • 8.0k views
ADD COMMENTlink modified 3.1 years ago by bjlemmer20 • written 5.9 years ago by secretjess170
2

You may have a look at the example in tabix man page. You need to compress with bgzip. The error message has mentioned that. Gzip won't work. Vcftools uses a different index format and it cannot really achieve random access in a gzip'd vcf.

ADD REPLYlink written 5.9 years ago by lh331k
5
gravatar for Sean Davis
5.9 years ago by
Sean Davis25k
National Institutes of Health, Bethesda, MD
Sean Davis25k wrote:

Roughly, you'll need to do these steps (untested):

gunzip CosmicCodingMuts_v64_26032013_noLimit_wgs.vcf.gz
bgzip CosmicCodingMuts_v64_26032013_noLimit_wgs.vcf
tabix -p vcf CosmicCodingMuts_v64_26032013_noLimit_wgs.vcf.gz
ADD COMMENTlink written 5.9 years ago by Sean Davis25k

Hi Sean, It seems you know about tabix. Do you mind helping me with the tabix issue I'm having below as well. It's a similar problem...

ADD REPLYlink written 5.7 years ago by Sheila280
0
gravatar for Sheila
5.7 years ago by
Sheila280
United States
Sheila280 wrote:

I am having a similar problem, and I'm trying to subset based on regions in the genome that I've predefined in a bed file.

I have a SNP.vcf file provided by Illumina so as per the instructions above, I bgziped it:

bgzip SNP.vcf

and got a SNP.vcf.gz

Then when I tried this command:

tabix SNPs.vcf.gz -B testbed.bed

I got this error:

[tabix] the index file either does not exist or is older than the vcf file. Please reindex.

Any advice?

ADD COMMENTlink written 5.7 years ago by Sheila280

You have to create the tabix index first.

ADD REPLYlink written 5.0 years ago by Sean Davis25k
0
gravatar for bjlemmer
3.1 years ago by
bjlemmer20
bjlemmer20 wrote:

I have an extension of this problem. I have a file 

chr38.vcf.gz

ran:

tabix chr38.vfc.gz chr38:1-100000

I then ran these lines and get error.

rm chr38.vcf.gz.tbi
tabix -p vcf chr38.vcf.gz
tabix chr38.vfc.gz chr38:1-100000
[ti_index_load] wrong magic number 
[ti_index_load] fail to load the index: chr38.vcf.gz.tbi
[tabix] failed to load the index file

 

ADD COMMENTlink modified 3.1 years ago • written 3.1 years ago by bjlemmer20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1843 users visited in the last hour