Question: Pauda (Bowtie) Exception While Processing Microbial_Refseq
0
gravatar for Pavel Senin
6.5 years ago by
Pavel Senin1.9k
Los Alamos, NM
Pavel Senin1.9k wrote:

Happy new year folks! I've got an exception while trying to build PAUDA database for NCBI's refseq_microbial.faa:

$pauda-build microbial_refseq.faa microbial_refseq-idx
Start ...
Reading file: microbial_refseq-idx/ref.faa
10% 20% 30% 40% 50% 60% 70% 80% 90% 100% (929.5s)
Writing mapping file 1: microbial_refseq-idx/ref.map1
Processing sequences:
10% 20% 30% 40% 50% 60% 70% 80% 90% 100% (33764.4s)
Writing PNA file: microbial_refseq-idx/ref.pna
Writing mapping file1: microbial_refseq-idx/ref.map2
10% 20% 30% 40% 50% 60% 70% 80% 90% 100% (65.4s)
Total sequences in:  30842910
Total sequences out: 18385103
Time: 34759s
Start ...
bowtie2-build microbial_refseq-idx/ref.pna microbial_refseq-idx/ref
Settings:
  Output files: "microbial_refseq-idx/ref.*.bt2"
  Line rate: 6 (line is 64 bytes)
  Lines per side: 1 (side is 64 bytes)
  Offset rate: 4 (one in 16)
  FTable chars: 10
  Strings: unpacked
  Max bucket size: default
  Max bucket size, sqrt multiplier: default
  Max bucket size, len divisor: 4
  Difference-cover sample period: 1024
  Endianness: little
  Actual local endianness: little
  Sanity checking: disabled
  Assertions: disabled
  Random seed: 0
  Sizeofs: void*:8, int:4, long:8, size_t:8
Input files DNA, FASTA:
  microbial_refseq-idx/ref.pna
Reading reference sizes
Error: Reference sequence has more than 2^32-1 characters!  Please divide the
reference into batches or chunks of about 3.6 billion characters or less each
and index each independently.
  Time reading reference sizes: 00:01:15
Total time for call to driver() for forward index: 00:01:15
Error: Encountered internal Bowtie 2 exception (#1)
Command: bowtie2-build microbial_refseq-idx/ref.pna microbial_refseq-idx/ref 
Deleting "microbial_refseq-idx/ref.3.bt2" file written during aborted indexing attempt.
Deleting "microbial_refseq-idx/ref.4.bt2" file written during aborted indexing attempt.

Should I just split the input FASTA onto few files and run build and my searches on these combining results later?

bowtie • 3.2k views
ADD COMMENTlink modified 5.8 years ago by Biostar ♦♦ 20 • written 6.5 years ago by Pavel Senin1.9k

That or just use a different aligner. Bowtie doesn't support reference sequences that big.

ADD REPLYlink written 6.5 years ago by Devon Ryan96k

you mean instead of PAUDA or is there a way to plug a different aligner into it?

ADD REPLYlink written 6.5 years ago by Pavel Senin1.9k

I mean plugging a different aligner into it, though I imagine that that could be a real pain :( BWA can handle larger genomes, so maybe try to plug that in.

ADD REPLYlink written 6.5 years ago by Devon Ryan96k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1618 users visited in the last hour