Question: How to divide the reference sequence into batches or chunks
0
gravatar for ddzhangzz
4.2 years ago by
ddzhangzz90
United States
ddzhangzz90 wrote:

I got error message "error: Reference sequence has more than 2^32-1 characters! Please divide the reference into batches or chunks of about 3.6 billion characters or less each and index each independently.":

    $ bowtie2-build -f hg19mm10.fa hg19mm10
Settings:
  Output files: "hg19mm10.*.bt2"
  Line rate: 6 (line is 64 bytes)
  Lines per side: 1 (side is 64 bytes)
  Offset rate: 4 (one in 16)
  FTable chars: 10
  Strings: unpacked
  Max bucket size: default
  Max bucket size, sqrt multiplier: default
  Max bucket size, len divisor: 4
  Difference-cover sample period: 1024
  Endianness: little
  Actual local endianness: little
  Sanity checking: disabled
  Assertions: disabled
  Random seed: 0
  Sizeofs: void*:8, int:4, long:8, size_t:8
Input files DNA, FASTA:
  hg19mm10.fa
Reading reference sizes
Error: Reference sequence has more than 2^32-1 characters!  Please divide the
reference into batches or chunks of about 3.6 billion characters or less each
and index each independently.

I am wondering how to divide the reference into batches or chunks as suggested. Does someone have this experience?

rna-seq • 1.6k views
ADD COMMENTlink modified 4.2 years ago • written 4.2 years ago by ddzhangzz90
1

Use a more recent version of bowtie2, which supports large indexes.

ADD REPLYlink written 4.2 years ago by Devon Ryan95k

Thanks, very helpful!

ADD REPLYlink written 4.2 years ago by ddzhangzz90

what is the main goal behind combining hg19 and mm10 ?

ADD REPLYlink written 4.2 years ago by geek_y11k

build index for human and mouse combined

ADD REPLYlink written 4.2 years ago by ddzhangzz90

What about regions which are already rather similar/conserved? I don't know about your downstream application, but this sounds like a tricky approach.

ADD REPLYlink written 4.2 years ago by WouterDeCoster44k
1

This is a pretty standard approach for dealing with mixed samples (I assume that's what OP has).

ADD REPLYlink written 4.2 years ago by Devon Ryan95k

Learned something new, but I assume some ambiguity with highly conserved regions.

ADD REPLYlink written 4.2 years ago by WouterDeCoster44k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 857 users visited in the last hour