Question

Error during pre-processing of RNA-seq reads for variant calling using Opossum/0.2

1

Entering edit mode

4.3 years ago

UDAY.AGRI123 ▴ 20

I am using opossum for pre-processing my rna-seq bam files for variant calling and I got some errors as below, could you please help me with what the error means and how to fix it.

ERROR:
[E::hts_idx_push] Region 536872898..536872913 cannot be stored in a bai index. Try using a csi index with min_shift = 14, n_lvls >= 6
Traceback (most recent call last):
  File "/apps/opossum/0.2/Opossum.py", line 2074, in <module>
    main()
  File "/apps/opossum/0.2/Opossum.py", line 482, in main
    pysam.index(outputfile)
  File "/apps/python/2.7.13/lib/python2.7/site-packages/pysam/utils.py", line 75, in __call__
    stderr))
pysam.utils.SamtoolsError: 'samtools returned with error 1: stdout=, stderr=samtools index: failed to create index for "A_genome_mapping_sample_1_secondroundAligned.opossum.output.bam": Numerical result out of range\n'

I am using opossum/0.2

Working on wheat samples with SE reads

Script used : Opossum.py --BamFile=calm.out.bam --SoftClipsExist=True --ProperlyPaired=False --OutFile=A_genome_mapping_sample_1_secondroundAligned.opossum.output.bam

LOG.OUT is:
Executing command [Opossum.py --BamFile=calm.out.bam --SoftClipsExist=True --ProperlyPaired=False --OutFile=A_genome_mapping_sample_1_secondroundAligned.opossum.output.bam]...
1A
2A
3A
4A
5A
6A
7A
Number of discarded secondary reads:  251942224
Number of discarded unmapped reads:  0
Number of reads whose mate is unmapped:  0
Number of discarded reads whose mate has been mapped to a different chromosome:  0
Number of discarded reads whose mate has been mapped to junk:  0
Number of discarded read pairs that are pointing outwards:  0
Number of discarded reads where read and its mate have been mapped in the same direction:  0
Number of duplicate reads that have been merged into their non-duplicate counterpart:  0
Number of duplicate single reads that have been merged to their non-duplicate counterpart:  52426493
Number of discarded reads containing hard clips:  0
Number of discarded read pairs with base mismatch:  0
Number of discarded read pairs that are not aligned to same exons:  0
Number of discarded reads / read pairs having too low mapping quality (threshold 40): 47340861
Number of merged read pairs:  0
Number of split merged read pairs:  0
Number of independently treated read pairs:  0
Number of individual reads:  8059082
Number of leftover reads:  0 0 0 0 0 0 0
  ...completed

Thanks in advance

Uday

RNA-Seq software error sequencing SNP • 746 views

ADD COMMENT • link updated 4.3 years ago by ATpoint 82k • written 4.3 years ago by UDAY.AGRI123 ▴ 20

1

Entering edit mode

The genome is too large ~~big~~ to be indexed with the default bai index. That is not unusual for plants. Use samtools index -c your.bam instead. It looks though that Opossum went fine, so it might not even need the index.

ADD REPLY • link 4.2 years ago by ATpoint 82k

0

Entering edit mode

Thanks for your reply. I hope you may know that wheat originated from three genomes - A, B & D (three different species precisely=~17Gb), in my analysis i am using only part of the entire genome that A genome (size is 5017140244/~5Gb). Why it indexing If it not needed for downstream steps? Anyway, i wanted to know whether i can go with the bam output i got from opossum for variant calling. Thanks again.

ADD REPLY • link 4.2 years ago by UDAY.AGRI123 ▴ 20