Question

how to store index files generated by bwa in different folder than the reference sequence ?

1

Entering edit mode

7.1 years ago

retrogenomics ▴ 30

Hi,

I have a folder which contains all my genome reference sequences (in fasta format, ex: hg19.fa, hg38.fa, mm10.fa, etc...) and I would like to store the index files generated by each short read mapper (ex: bwa, bowtie, ...) in a different folder. My problem is the following:

Is it possible with bwa aln/samse to specify the location of the index? A call to bwa would look like:

bwa aln <ref_genome.fasta> <reads.fastq> | bwa samse <ref_genome.fasta> - <reads.fastq> > mapped_reads.sam

Thanks

bwa reference-sequence • 5.5k views

ADD COMMENT • link updated 11 months ago by Ram 43k • written 7.1 years ago by retrogenomics ▴ 30

score 3 · Answer 1 · 2017-03-21

3

Entering edit mode

7.1 years ago

Pierre Lindenbaum 161k

It's not clear why you would want to do that.

Use a symbolic link with ln -s http://stackoverflow.com/a/1951752/58082

ADD COMMENT • link 7.1 years ago by Pierre Lindenbaum 161k

2

Entering edit mode

To avoid having multiple copies of the fasta file.

A symbolic link back to the original fasta file from each of the index directories would do the trick as suggested.

ADD REPLY • link 7.1 years ago by GenoMax 141k

0

Entering edit mode

We have shared reference sequences in our server, but not everyone does use the same short read aligner, and thus the indexes can be stored in private directories. In addition each short read aligner generates its own files for indexing, and it is somehow difficult to keep track of what is what. Yes, the symbolic link in the index directories is a good idea. Thanks.

ADD REPLY • link 7.1 years ago by retrogenomics ▴ 30

0

Entering edit mode

thus the indexes can be stored in private directories

That does not make sense. You are avoiding having multiple copies of the reference but potentially allowing multiple private copies of the index files (which are larger). Keeping all of these in a common location (and managed by centrally) is extremely useful.

I like the organization that iGenomes comes with. Under a "Sequence" directory store the sequence as well as separate directories for all aligner indexes that people use at your facility.

ADD REPLY • link 7.1 years ago by GenoMax 141k

0

Entering edit mode

I agree. Still I'm wondering how much an index is dependent of the version of the aligner used to make it. Do you think it could be a problem?

Independently of this, as for the shared reference/indexes, I'm wondering: if several users of a common server would start to map reads on the same reference genome, whether it will be slowed down or problematic in any way.

ADD REPLY • link 7.1 years ago by retrogenomics ▴ 30

0

Entering edit mode

Aligners change the indexing scheme rarely (Only examples I can think of are when bwa went from 0.6.x to 0.7.x and STAR may have done so once). A big change like this would generally be well published so it will give you time to react accordingly.

If the users are hitting the same storage system then having indexes in one location as opposed to several may not cause a significant effect (in terms of time or I/O problems). On a shared server/cluster you likley have a high performance shared storage solution.

ADD REPLY • link 7.1 years ago by GenoMax 141k

0

Entering edit mode

How to call bwa index on a cluster when using iGenomes separate folders for fasta genome and index files? I'm using the following command line

bwa mem -t 20 Sequence/WholeGenomeFasta/genome.fa sample_1_cleaned.fastq sample_2_cleaned.fastq

but I get the same error

[E::bwa_idx_load_from_disk] fail to locate the index files

ADD REPLY • link 6.2 years ago by youki • 0

0

Entering edit mode

You don't specify the fasta file but you use the base name of the genome index. So it should be something like

bwa mem -t 20 Sequence/BWAIndex/genome.fa sample_1_cleaned.fastq sample_2_cleaned.fastq

ADD REPLY • link 6.2 years ago by GenoMax 141k