Question: how to store index files generated by bwa in different folder than the reference sequence ?
0
gravatar for retrogenomics
3.5 years ago by
France, Nice
retrogenomics20 wrote:

Hi,

I have a folder which contains all my genome reference sequences (in fasta format, ex: hg19.fa, hg38.fa, mm10.fa, etc...) and I would like to store the index files generated by each short read mapper (ex: bwa, bowtie, ...) in a different folder. My problem is the following:

Is it possible with bwa aln/samse to specify the location of the index? A call to bwa would look like:

bwa aln <ref_genome.fasta> <reads.fastq> | bwa samse <ref_genome.fasta> - <reads.fastq> > mapped_reads.sam

Thanks

ADD COMMENTlink modified 3.5 years ago by Pierre Lindenbaum130k • written 3.5 years ago by retrogenomics20
3
gravatar for Pierre Lindenbaum
3.5 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum130k wrote:

It's not clear why you would want to do that.

Use a symbolic link with ln -s http://stackoverflow.com/a/1951752/58082

ADD COMMENTlink written 3.5 years ago by Pierre Lindenbaum130k
1

To avoid having multiple copies of the fasta file.

A symbolic link back to the original fasta file from each of the index directories would do the trick as suggested.

ADD REPLYlink written 3.5 years ago by genomax90k

We have shared reference sequences in our server, but not everyone does use the same short read aligner, and thus the indexes can be stored in private directories. In addition each short read aligner generates its own files for indexing, and it is somehow difficult to keep track of what is what. Yes, the symbolic link in the index directories is a good idea. Thanks.

ADD REPLYlink modified 3.5 years ago • written 3.5 years ago by retrogenomics20

thus the indexes can be stored in private directories

That does not make sense. You are avoiding having multiple copies of the reference but potentially allowing multiple private copies of the index files (which are larger). Keeping all of these in a common location (and managed by centrally) is extremely useful.

I like the organization that iGenomes comes with. Under a "Sequence" directory store the sequence as well as separate directories for all aligner indexes that people use at your facility.

ADD REPLYlink written 3.5 years ago by genomax90k

I agree. Still I'm wondering how much an index is dependent of the version of the aligner used to make it. Do you think it could be a problem?

Independently of this, as for the shared reference/indexes, I'm wondering: if several users of a common server would start to map reads on the same reference genome, whether it will be slowed down or problematic in any way.

ADD REPLYlink written 3.5 years ago by retrogenomics20

Aligners change the indexing scheme rarely (Only examples I can think of are when bwa went from 0.6.x to 0.7.x and STAR may have done so once). A big change like this would generally be well published so it will give you time to react accordingly.

If the users are hitting the same storage system then having indexes in one location as opposed to several may not cause a significant effect (in terms of time or I/O problems). On a shared server/cluster you likley have a high performance shared storage solution.

ADD REPLYlink modified 3.5 years ago • written 3.5 years ago by genomax90k

How to call bwa index on a cluster when using iGenomes separate folders for fasta genome and index files? I'm using the following command line

bwa mem -t 20 Sequence/WholeGenomeFasta/genome.fa sample_1_cleaned.fastq sample_2_cleaned.fastq

but I get the same error

[E::bwa_idx_load_from_disk] fail to locate the index files

ADD REPLYlink written 2.6 years ago by youki0

You don't specify the fasta file but you use the base name of the genome index. So it should be something like

bwa mem -t 20 Sequence/BWAIndex/genome.fa sample_1_cleaned.fastq sample_2_cleaned.fastq
ADD REPLYlink written 2.6 years ago by genomax90k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 932 users visited in the last hour