Entering edit mode
4.3 years ago
UDAY.AGRI123
▴
20
I am using opossum for pre-processing my rna-seq bam files for variant calling and I got some errors as below, could you please help me with what the error means and how to fix it.
ERROR:
[E::hts_idx_push] Region 536872898..536872913 cannot be stored in a bai index. Try using a csi index with min_shift = 14, n_lvls >= 6
Traceback (most recent call last):
File "/apps/opossum/0.2/Opossum.py", line 2074, in <module>
main()
File "/apps/opossum/0.2/Opossum.py", line 482, in main
pysam.index(outputfile)
File "/apps/python/2.7.13/lib/python2.7/site-packages/pysam/utils.py", line 75, in __call__
stderr))
pysam.utils.SamtoolsError: 'samtools returned with error 1: stdout=, stderr=samtools index: failed to create index for "A_genome_mapping_sample_1_secondroundAligned.opossum.output.bam": Numerical result out of range\n'
I am using opossum/0.2
Working on wheat samples with SE reads
Script used : Opossum.py --BamFile=calm.out.bam --SoftClipsExist=True --ProperlyPaired=False --OutFile=A_genome_mapping_sample_1_secondroundAligned.opossum.output.bam
LOG.OUT is:
Executing command [Opossum.py --BamFile=calm.out.bam --SoftClipsExist=True --ProperlyPaired=False --OutFile=A_genome_mapping_sample_1_secondroundAligned.opossum.output.bam]...
1A
2A
3A
4A
5A
6A
7A
Number of discarded secondary reads: 251942224
Number of discarded unmapped reads: 0
Number of reads whose mate is unmapped: 0
Number of discarded reads whose mate has been mapped to a different chromosome: 0
Number of discarded reads whose mate has been mapped to junk: 0
Number of discarded read pairs that are pointing outwards: 0
Number of discarded reads where read and its mate have been mapped in the same direction: 0
Number of duplicate reads that have been merged into their non-duplicate counterpart: 0
Number of duplicate single reads that have been merged to their non-duplicate counterpart: 52426493
Number of discarded reads containing hard clips: 0
Number of discarded read pairs with base mismatch: 0
Number of discarded read pairs that are not aligned to same exons: 0
Number of discarded reads / read pairs having too low mapping quality (threshold 40): 47340861
Number of merged read pairs: 0
Number of split merged read pairs: 0
Number of independently treated read pairs: 0
Number of individual reads: 8059082
Number of leftover reads: 0 0 0 0 0 0 0
...completed
Thanks in advance
Uday
The genome is too large
bigto be indexed with the defaultbai
index. That is not unusual for plants. Usesamtools index -c your.bam
instead. It looks though that Opossum went fine, so it might not even need the index.Thanks for your reply. I hope you may know that wheat originated from three genomes - A, B & D (three different species precisely=~17Gb), in my analysis i am using only part of the entire genome that A genome (size is 5017140244/~5Gb). Why it indexing If it not needed for downstream steps? Anyway, i wanted to know whether i can go with the bam output i got from opossum for variant calling. Thanks again.