Generating .Vcf.Idx Files From Cmdline
2
3
Entering edit mode
12.4 years ago

Hello,

A colleague pointed out this problem with GATK, which generates indexes in memory on the fly if it doesn't find those on disk:

INFO  14:05:14,023 RMDTrackBuilder - Creating Tribble index in memory for file dbsnp_132.vcf
WARN  14:09:31,798 FSLockWithShared - WARNING: Unable to lock file dbsnp_132.vcf.idx (could not open read/write file channel)
WARN  14:09:31,798 RMDTrackBuilder - Unable to write to dbsnp_132.vcf.idx for the index file, creating index in memory only

I've read and gone through these two resources:

http://www.broadinstitute.org/gsa/wiki/index.php/Tribble

http://code.google.com/p/tribble/

But I found it a bit awkward having to write my own java class just to generate an index file from a .vcf :-!

VCFTools does not seem to have that functionality either, at least according to their documentation:

http://vcftools.sourceforge.net/docs.html

Anybody knows how to generate those .vcf.idx files in a more straightforward way ?

Thanks in advance !

PD: I just couldn't refrain myself from including this link too :)

http://en.wikipedia.org/wiki/The_Trouble_With_Tribbles

vcf vcftools gatk • 31k views
ADD COMMENT
2
Entering edit mode

GATK should generate those indexes on disk as part of processing. It looks like you might not have permissions to write the directory. I don't know of a indexing function from GATK, but you might be able to run something like 'ValidateVariants' as a lightweight way to make an index as a side effect.

ADD REPLY
1
Entering edit mode

Hi Roman - curious what you set your permissions to for the VCF file? I am running into this same problem and cannot figure it out. Thanks!

ADD REPLY
0
Entering edit mode

Indeed this was a permissions issue. We intend to have a shared reference genomes repository in our HPC environment by using your script:

https://github.com/chapmanb/cloudbiolinux/blob/master/data_fabfile.py

This means that the directories/files there shouldn't be writeable by all users and that's why GATK complains about it.

I'll try your lightweight suggestion and integrate it on data_fabfile.py.

Thanks Brad !

ADD REPLY
0
Entering edit mode

Caddymob, I haven't set the permissions myself, but the HPC sysadmins. The only thing you've to do is set the directory permissions to "write" on the directory pointed by the error message (chmod ug+w dir).

ADD REPLY
5
Entering edit mode
12.0 years ago
Jitendra ▴ 60

Hi I found IGVTools, best for VCF indexing.

igvtools can be run from the command line or IGV itself (File>Run igvtools...) After launching, choose the Index command and browse to your .vcf file. The index file (.idx) will be created in the same directory as the .vcf file.

Thanks! Jitendra

ADD COMMENT
2
Entering edit mode
12.4 years ago
Rlong ▴ 340

Depending on what you want to do downstream with these vcf's, you also have the option of using Heng Li's bgzip and tabix. Bgzip compresses and chunks the contents into blocks, then tabix labels the chunks and produces a .tbi index file. Then you can use tabix on the vcf much like you would samtools on an indexed bam.

ADD COMMENT
0
Entering edit mode

That's not the use case I was looking for, but it's definitely worth knowing, thanks for your feedback !

ADD REPLY

Login before adding your answer.

Traffic: 2901 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6