Error during pre-processing of RNA-seq reads for variant calling using Opossum/0.2
0
1
Entering edit mode
5.8 years ago
UDAY.AGRI123 ▴ 20

I am using opossum for pre-processing my rna-seq bam files for variant calling and I got some errors as below, could you please help me with what the error means and how to fix it.

ERROR:
[E::hts_idx_push] Region 536872898..536872913 cannot be stored in a bai index. Try using a csi index with min_shift = 14, n_lvls >= 6
Traceback (most recent call last):
  File "/apps/opossum/0.2/Opossum.py", line 2074, in <module>
    main()
  File "/apps/opossum/0.2/Opossum.py", line 482, in main
    pysam.index(outputfile)
  File "/apps/python/2.7.13/lib/python2.7/site-packages/pysam/utils.py", line 75, in __call__
    stderr))
pysam.utils.SamtoolsError: 'samtools returned with error 1: stdout=, stderr=samtools index: failed to create index for "A_genome_mapping_sample_1_secondroundAligned.opossum.output.bam": Numerical result out of range\n'

I am using opossum/0.2

Working on wheat samples with SE reads

Script used : Opossum.py --BamFile=calm.out.bam --SoftClipsExist=True --ProperlyPaired=False --OutFile=A_genome_mapping_sample_1_secondroundAligned.opossum.output.bam

LOG.OUT is:
Executing command [Opossum.py --BamFile=calm.out.bam --SoftClipsExist=True --ProperlyPaired=False --OutFile=A_genome_mapping_sample_1_secondroundAligned.opossum.output.bam]...
1A
2A
3A
4A
5A
6A
7A
Number of discarded secondary reads:  251942224
Number of discarded unmapped reads:  0
Number of reads whose mate is unmapped:  0
Number of discarded reads whose mate has been mapped to a different chromosome:  0
Number of discarded reads whose mate has been mapped to junk:  0
Number of discarded read pairs that are pointing outwards:  0
Number of discarded reads where read and its mate have been mapped in the same direction:  0
Number of duplicate reads that have been merged into their non-duplicate counterpart:  0
Number of duplicate single reads that have been merged to their non-duplicate counterpart:  52426493
Number of discarded reads containing hard clips:  0
Number of discarded read pairs with base mismatch:  0
Number of discarded read pairs that are not aligned to same exons:  0
Number of discarded reads / read pairs having too low mapping quality (threshold 40): 47340861
Number of merged read pairs:  0
Number of split merged read pairs:  0
Number of independently treated read pairs:  0
Number of individual reads:  8059082
Number of leftover reads:  0 0 0 0 0 0 0
  ...completed

Thanks in advance

Uday

RNA-Seq software error sequencing SNP • 1.0k views
ADD COMMENT
1
Entering edit mode

The genome is too large big to be indexed with the default bai index. That is not unusual for plants. Use samtools index -c your.bam instead. It looks though that Opossum went fine, so it might not even need the index.

ADD REPLY
0
Entering edit mode

Thanks for your reply. I hope you may know that wheat originated from three genomes - A, B & D (three different species precisely=~17Gb), in my analysis i am using only part of the entire genome that A genome (size is 5017140244/~5Gb). Why it indexing If it not needed for downstream steps? Anyway, i wanted to know whether i can go with the bam output i got from opossum for variant calling. Thanks again.

ADD REPLY

Login before adding your answer.

Traffic: 2645 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6