Question: Merging BWT indices for BWA
1
gravatar for rgc255
3 months ago by
rgc25510
rgc25510 wrote:

Is it possible to merge two indexes created using BWA version 0.7.17 (https://github.com/lh3/bwa)? I need to create many BWA index files that included a large genome and a variable smaller bacterial genome. I need to do this many times as part of a pipeline and it takes a about 40 minutes, even using a large value for the -b parameter (i.e. -b 1000000000000). I'm looking for a way to combine the large reference genome index and the small bacterial genome reference since the large reference genome is fixed and only the bacterial genomes are different from run to run.

I've come across several different programs that can merge BWT index files, such as https://github.com/holtjma/msbwt, https://github.com/jltsiren/bwt-merge, and https://github.com/felipelouza/egap. However, these programs do not seem to produce BWT files that are in the format required by the BWA read alignment tool.

When I run the command bwa index -a bwtsw genome.fasta, I get five different output files: genome.bwt, genome.pac, genome.ann, genome.amb, genome.sa/ . Even if msbwt, bwt-merg and egap (etc) could produce BWT files formatted for BWA, I'm not sure how to merge the other file types (i.e. .pac, .ann, .amb, .sa). Does anyone know how to merge multiple BWA indexes?

10/26/18 UPDATE: I learned that if you have a bwt index in the format required by bwa that you can generate the .pac, .ann, .amb, and .sa files using bwa's bwt2sa and fa2pac commands. For example, if you have a bwt file named genome.fasta.bwt, you can run these commands: bwa bwt2sa genome.fasta.bwt genome.fasta.sa and bwa fa2pac genome.fasta.bwt genome.fasta.pac

bwa read aligner bwt merge • 269 views
ADD COMMENTlink modified 3 months ago • written 3 months ago by rgc25510
2

Are you sure this is less work than working with large genome and all the bacteria together?

ADD REPLYlink written 3 months ago by swbarnes24.8k

Good question. We get the bacterial genomes in batches, so we don't get them all at once. We handle thousands of samples a year and it's not possible to predict beforehand what the bacterial genomes look like until we see them. I think it may save time and money if we can merge BWTs rather than creating a completely new index every time we have new bacterial genome.

ADD REPLYlink modified 3 months ago • written 3 months ago by rgc25510

AFAIK this is not possible

ADD REPLYlink written 3 months ago by Carambakaracho750

I think it must be possible. I know that you can merge bwt files, but I just don't know how to convert those bwt files to the format required by bwa. There's a post on the wiki for the msbwt program that explains how to convert bwt indexes from ropebwt2 format to msbwt format (https://github.com/holtjma/msbwt/wiki/Converting-to-msbwt's-RLE-format). You can then use the msbwt program to merge the bwt files. I was hoping to find a program that will convert merged bwt files produced by msbwt to the ropebwt2 format. I think that bwa can read ropebwt2 bwt indexes.

ADD REPLYlink written 3 months ago by rgc25510

ok, let me rephrase - I have no doubt it is technically possible. At this stage, there is no working option though and I've done my share of investigation for a few months. I will keep an eye on this, I'd be more than happy to learn how it works.

ADD REPLYlink modified 3 months ago • written 3 months ago by Carambakaracho750

I agree, I haven't found a program yet that can perform this type of bwt conversion and I've been looking for quite a while. I think I may have to write a program to do it, but I'd rather not if a program already exists.

ADD REPLYlink written 3 months ago by rgc25510

very healthy thinking. In case you'll be successful and are able to share it - i'd be really interested

ADD REPLYlink written 3 months ago by Carambakaracho750
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 713 users visited in the last hour