Merging BWT indices for BWA
0
4
Entering edit mode
5.5 years ago
rgc255 ▴ 60

Is it possible to merge two indexes created using BWA version 0.7.17 (https://github.com/lh3/bwa)? I need to create many BWA index files that included a large genome and a variable smaller bacterial genome. I need to do this many times as part of a pipeline and it takes a about 40 minutes, even using a large value for the -b parameter (i.e. -b 1000000000000). I'm looking for a way to combine the large reference genome index and the small bacterial genome reference since the large reference genome is fixed and only the bacterial genomes are different from run to run.

I've come across several different programs that can merge BWT index files, such as https://github.com/holtjma/msbwt, https://github.com/jltsiren/bwt-merge, and https://github.com/felipelouza/egap. However, these programs do not seem to produce BWT files that are in the format required by the BWA read alignment tool.

When I run the command bwa index -a bwtsw genome.fasta, I get five different output files: genome.bwt, genome.pac, genome.ann, genome.amb, genome.sa/ . Even if msbwt, bwt-merg and egap (etc) could produce BWT files formatted for BWA, I'm not sure how to merge the other file types (i.e. .pac, .ann, .amb, .sa). Does anyone know how to merge multiple BWA indexes?

10/26/18 UPDATE: I learned that if you have a bwt index in the format required by bwa that you can generate the .pac, .ann, .amb, and .sa files using bwa's bwt2sa and fa2pac commands. For example, if you have a bwt file named genome.fasta.bwt, you can run these commands: bwa bwt2sa genome.fasta.bwt genome.fasta.sa and bwa fa2pac genome.fasta.bwt genome.fasta.pac

BWA BWT read aligner merge • 2.4k views
ADD COMMENT
2
Entering edit mode

Are you sure this is less work than working with large genome and all the bacteria together?

ADD REPLY
1
Entering edit mode

Good question. We get the bacterial genomes in batches, so we don't get them all at once. We handle thousands of samples a year and it's not possible to predict beforehand what the bacterial genomes look like until we see them. I think it may save time and money if we can merge BWTs rather than creating a completely new index every time we have new bacterial genome.

ADD REPLY
1
Entering edit mode

AFAIK this is not possible

ADD REPLY
1
Entering edit mode

I think it must be possible. I know that you can merge bwt files, but I just don't know how to convert those bwt files to the format required by bwa. There's a post on the wiki for the msbwt program that explains how to convert bwt indexes from ropebwt2 format to msbwt format (https://github.com/holtjma/msbwt/wiki/Converting-to-msbwt's-RLE-format). You can then use the msbwt program to merge the bwt files. I was hoping to find a program that will convert merged bwt files produced by msbwt to the ropebwt2 format. I think that bwa can read ropebwt2 bwt indexes.

ADD REPLY
0
Entering edit mode

ok, let me rephrase - I have no doubt it is technically possible. At this stage, there is no working option though and I've done my share of investigation for a few months. I will keep an eye on this, I'd be more than happy to learn how it works.

ADD REPLY
0
Entering edit mode

I agree, I haven't found a program yet that can perform this type of bwt conversion and I've been looking for quite a while. I think I may have to write a program to do it, but I'd rather not if a program already exists.

ADD REPLY
0
Entering edit mode

very healthy thinking. In case you'll be successful and are able to share it - i'd be really interested

ADD REPLY

Login before adding your answer.

Traffic: 2369 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6