Question: Align short reads to large multi reference genome
1
gravatar for Sandeep
4.1 years ago by
Sandeep250
Manipal, India
Sandeep250 wrote:

I have been trying to align approximately 3 million short sequences (17 - 35 nucleotide long) to a multi fasta file of prokaryotic genomes. The size of my reference fasta is about 10 GB in size. I have tried using bowtie to create the index file and the extension for the same is .ebwtl (large index). 

Now, when I try to align using the command 

bowtie combined -p 12 -l 17 -a -m5 --best ../seq.fastq -v 2 -S test.sam

Where, combined being the name of the index, I get an error "Could not locate a Bowtie index corresponding to basename "combined""

I have run the same using blastn option with task -short and it took me around 5 days to finish the task. I have also tried it with SHRiMP aligner, and even that throws an error.

I have many such query files and its not feasible to wait for around 5 days to obtain the result. Also, I looked into this tool called MALT and it does not have an option to align short queries, here the default evalue is 50.

Any suggestions to get bowtie working for the index I have??

PS: I used cat command to build the multi fasta reference file.

 

ADD COMMENTlink modified 3.8 years ago by KatjaS10 • written 4.1 years ago by Sandeep250

What does your bowtie2-build command look like? (Are you really using Bowtie, or are you using bowtie2?)
The multi-fasta part should not matter because most references are multi-fasta, whether it be a genome, transcriptome, or something else. So i think it would either be the building of the index or maybe you specified a path in which you stored the combined index?  It could also be that there is a problem with the cat cmd you used to create the multi-fasta.

ADD REPLYlink modified 4.1 years ago • written 4.1 years ago by Lesley Sitter460

I used bowtie, not bowtie2. I build it twice once using the command

bowtie-build combined.fasta combined 

and also using 

bowtie-build --large-index combined.fasta combined

Both indexes give the same error. Also, note thatthe same fasta file when used with BLAST gives the output, but the time taken is really long.

ADD REPLYlink written 4.1 years ago by Sandeep250

Owhkee, the weird thing is it says that it cannot locate the index so i was expecting there to be something wrong with the path or a spelling error something... but if you did this in the same folder that is indeed strange :S
I'm sorry i can't help, i only use Bowtie2 and when trying your cmd line input (albeit with a smaller reference and only 2 read files) with Bowtie2 i don't get an error. 

ADD REPLYlink written 4.1 years ago by Lesley Sitter460

When the fasta sequence is ~ 4 -5 GB bowtie index extension remains .ebtw and it works fine. Upon adding more reference and the extension becomes .ebtwl it fails :(. The problem seems to be with the large index.

ADD REPLYlink written 4.1 years ago by Sandeep250

Make sure that the version of bowtie-build and bowtie are the same. Only the most recent version(s?) of bowtie are supposed to support large indexes, so if you happen to be using two different versions then that's likely the problem. In general, I think most people using large indices are using bowtie2, so you might have better luck with that, since it's more likely to work.
 

ADD REPLYlink written 4.1 years ago by Devon Ryan91k

bowtie is installed in the PATH. The commands which bowtie  which bowtie-build give the same version output (bowtie-1.1.1). 

The reason I did not use bowtie2 is because my query sequences are small ~20 nt long and bowtie2 is supposed to be for longer read lengths. I am currentl downloading the latest version of bowtie 1.1.2 and will check after building the index. It takes about 10 - 20 hours to build the index for 10 GB reference.

ADD REPLYlink written 4.1 years ago by Sandeep250

There's no need to rebuild the index. Just ensure that you're in the same directory as the indices.

ADD REPLYlink written 4.1 years ago by Devon Ryan91k

yes I am in the same directory.

ADD REPLYlink written 4.1 years ago by Sandeep250

I did that and still the same. I even tried using the latest version of bowtie and build index using it. Same problem.

ADD REPLYlink written 4.1 years ago by Sandeep250
1
gravatar for KatjaS
3.8 years ago by
KatjaS10
Sweden
KatjaS10 wrote:

When aligning files to a large genome (.ebwtl index extension), specify additional option --large-index. It seems there is a bug in Bowtie, that is why it does not recognise large genome index.
 

ADD COMMENTlink written 3.8 years ago by KatjaS10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1695 users visited in the last hour