Question: Generating .Vcf.Idx Files From Cmdline
1
gravatar for Roman Valls Guimerà
7.4 years ago by
Melbourne
Roman Valls Guimerà510 wrote:

Hello,

A colleague pointed out this problem with GATK, which generates indexes in memory on the fly if it doesn't find those on disk:

INFO  14:05:14,023 RMDTrackBuilder - Creating Tribble index in memory for file dbsnp_132.vcf
WARN  14:09:31,798 FSLockWithShared - WARNING: Unable to lock file dbsnp_132.vcf.idx (could not open read/write file channel)
WARN  14:09:31,798 RMDTrackBuilder - Unable to write to dbsnp_132.vcf.idx for the index file, creating index in memory only

I've read and gone through these two resources:

http://www.broadinstitute.org/gsa/wiki/index.php/Tribble

http://code.google.com/p/tribble/

But I found it a bit awkward having to write my own java class just to generate an index file from a .vcf :-!

VCFTools does not seem to have that functionality either, at least according to their documentation:

http://vcftools.sourceforge.net/docs.html

Anybody knows how to generate those .vcf.idx files in a more straightforward way ?

Thanks in advance !

PD: I just couldn't refrain myself from including this link too :)

http://en.wikipedia.org/wiki/The_Trouble_With_Tribbles

vcf gatk vcftools • 20k views
ADD COMMENTlink modified 7.4 years ago by Jitendra50 • written 7.4 years ago by Roman Valls Guimerà510
1

GATK should generate those indexes on disk as part of processing. It looks like you might not have permissions to write the directory. I don't know of a indexing function from GATK, but you might be able to run something like 'ValidateVariants' as a lightweight way to make an index as a side effect.

ADD REPLYlink written 7.4 years ago by Brad Chapman9.4k
1

Hi Roman - curious what you set your permissions to for the VCF file? I am running into this same problem and cannot figure it out. Thanks!

ADD REPLYlink written 7.4 years ago by Caddymob950

Indeed this was a permissions issue. We intend to have a shared reference genomes repository in our HPC environment by using your script:

https://github.com/chapmanb/cloudbiolinux/blob/master/data_fabfile.py

This means that the directories/files there shouldn't be writeable by all users and that's why GATK complains about it.

I'll try your lightweight suggestion and integrate it on data_fabfile.py.

Thanks Brad !

ADD REPLYlink written 7.4 years ago by Roman Valls Guimerà510

Caddymob, I haven't set the permissions myself, but the HPC sysadmins. The only thing you've to do is set the directory permissions to "write" on the directory pointed by the error message (chmod ug+w dir).

ADD REPLYlink written 7.3 years ago by Roman Valls Guimerà510
4
gravatar for Jitendra
7.0 years ago by
Jitendra50
India
Jitendra50 wrote:

Hi I found IGVTools, best for VCF indexing.

igvtools can be run from the command line or IGV itself (File>Run igvtools...) After launching, choose the Index command and browse to your .vcf file. The index file (.idx) will be created in the same directory as the .vcf file.

Thanks! Jitendra

ADD COMMENTlink written 7.0 years ago by Jitendra50
2
gravatar for Rlong
7.4 years ago by
Rlong340
US
Rlong340 wrote:

Depending on what you want to do downstream with these vcf's, you also have the option of using Heng Li's bgzip and tabix. Bgzip compresses and chunks the contents into blocks, then tabix labels the chunks and produces a .tbi index file. Then you can use tabix on the vcf much like you would samtools on an indexed bam.

ADD COMMENTlink written 7.4 years ago by Rlong340

That's not the use case I was looking for, but it's definitely worth knowing, thanks for your feedback !

ADD REPLYlink written 7.4 years ago by Roman Valls Guimerà510
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1068 users visited in the last hour