Entering edit mode
                    5.8 years ago
        UDAY.AGRI123
        
    
        ▴
    
    20
    I am using opossum for pre-processing my rna-seq bam files for variant calling and I got some errors as below, could you please help me with what the error means and how to fix it.
ERROR:
[E::hts_idx_push] Region 536872898..536872913 cannot be stored in a bai index. Try using a csi index with min_shift = 14, n_lvls >= 6
Traceback (most recent call last):
  File "/apps/opossum/0.2/Opossum.py", line 2074, in <module>
    main()
  File "/apps/opossum/0.2/Opossum.py", line 482, in main
    pysam.index(outputfile)
  File "/apps/python/2.7.13/lib/python2.7/site-packages/pysam/utils.py", line 75, in __call__
    stderr))
pysam.utils.SamtoolsError: 'samtools returned with error 1: stdout=, stderr=samtools index: failed to create index for "A_genome_mapping_sample_1_secondroundAligned.opossum.output.bam": Numerical result out of range\n'
I am using opossum/0.2
Working on wheat samples with SE reads
Script used : Opossum.py --BamFile=calm.out.bam --SoftClipsExist=True --ProperlyPaired=False --OutFile=A_genome_mapping_sample_1_secondroundAligned.opossum.output.bam
LOG.OUT is:
Executing command [Opossum.py --BamFile=calm.out.bam --SoftClipsExist=True --ProperlyPaired=False --OutFile=A_genome_mapping_sample_1_secondroundAligned.opossum.output.bam]...
1A
2A
3A
4A
5A
6A
7A
Number of discarded secondary reads:  251942224
Number of discarded unmapped reads:  0
Number of reads whose mate is unmapped:  0
Number of discarded reads whose mate has been mapped to a different chromosome:  0
Number of discarded reads whose mate has been mapped to junk:  0
Number of discarded read pairs that are pointing outwards:  0
Number of discarded reads where read and its mate have been mapped in the same direction:  0
Number of duplicate reads that have been merged into their non-duplicate counterpart:  0
Number of duplicate single reads that have been merged to their non-duplicate counterpart:  52426493
Number of discarded reads containing hard clips:  0
Number of discarded read pairs with base mismatch:  0
Number of discarded read pairs that are not aligned to same exons:  0
Number of discarded reads / read pairs having too low mapping quality (threshold 40): 47340861
Number of merged read pairs:  0
Number of split merged read pairs:  0
Number of independently treated read pairs:  0
Number of individual reads:  8059082
Number of leftover reads:  0 0 0 0 0 0 0
  ...completed
Thanks in advance
Uday
The genome is too large
bigto be indexed with the defaultbaiindex. That is not unusual for plants. Usesamtools index -c your.baminstead. It looks though that Opossum went fine, so it might not even need the index.Thanks for your reply. I hope you may know that wheat originated from three genomes - A, B & D (three different species precisely=~17Gb), in my analysis i am using only part of the entire genome that A genome (size is 5017140244/~5Gb). Why it indexing If it not needed for downstream steps? Anyway, i wanted to know whether i can go with the bam output i got from opossum for variant calling. Thanks again.