Entering edit mode
8.0 years ago
ddzhangzz
▴
90
I got error message "error: Reference sequence has more than 2^32-1 characters! Please divide the reference into batches or chunks of about 3.6 billion characters or less each and index each independently.":
$ bowtie2-build -f hg19mm10.fa hg19mm10
Settings:
Output files: "hg19mm10.*.bt2"
Line rate: 6 (line is 64 bytes)
Lines per side: 1 (side is 64 bytes)
Offset rate: 4 (one in 16)
FTable chars: 10
Strings: unpacked
Max bucket size: default
Max bucket size, sqrt multiplier: default
Max bucket size, len divisor: 4
Difference-cover sample period: 1024
Endianness: little
Actual local endianness: little
Sanity checking: disabled
Assertions: disabled
Random seed: 0
Sizeofs: void*:8, int:4, long:8, size_t:8
Input files DNA, FASTA:
hg19mm10.fa
Reading reference sizes
Error: Reference sequence has more than 2^32-1 characters! Please divide the
reference into batches or chunks of about 3.6 billion characters or less each
and index each independently.
I am wondering how to divide the reference into batches or chunks as suggested. Does someone have this experience?
Use a more recent version of bowtie2, which supports large indexes.
Thanks, very helpful!
what is the main goal behind combining hg19 and mm10 ?
build index for human and mouse combined
What about regions which are already rather similar/conserved? I don't know about your downstream application, but this sounds like a tricky approach.
This is a pretty standard approach for dealing with mixed samples (I assume that's what OP has).
Learned something new, but I assume some ambiguity with highly conserved regions.