Does htseq-count recognize soft clipping generated by STAR?
0
0
Entering edit mode
8.8 years ago
Dejian ★ 1.3k

When I apply htseq-count to bam files generated from STAR, I encounter the same error message repeatedly (see examples below). I extracted the corresponding line from bam and found that they all contained soft clipping. I thought htseq-count could correctly handle soft clipping (http://www-huber.embl.de/users/anders/HTSeq/doc/alignments.html#cigar-strings). Does anybody encounter the same problem? And how do you solve the problem?

EXAMPLE 1:

Error occured when processing SAM input (record #66220 in file ../SRR1974799.sorted.dedup.bam):
  unsigned byte integer is less than minimum
  [Exception type: OverflowError, raised in csamtools.pyx:2308]

samtools view ../SRR1974799.sorted.dedup.bam | sed -n '66220p'

SRR1974799.1020660.1    147     chr1    1549493 255     66M9S   =       1549429 -130    TGAACAGCAGGTACTCAATCATGAAGAGCTAAGCCTGATTTCATCACGACAGCTGTGAAAGTTGCACCCATGTAC     <FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF<FFFFFFFFFFFFFFAAAAA    RG:Z:SRR1974799 NH:i:1  HI:i:1  jI:B:i,-1       jM:B:c,-1       nM:i:0  AS:i:139

EXAMPLE 2:

Error occured when processing SAM input (record #174801 in file ../SRR1974808.sorted.dedup.bam):
  unsigned byte integer is less than minimum
  [Exception type: OverflowError, raised in csamtools.pyx:2308]

samtools view ../SRR1974808.sorted.dedup.bam | sed -n '174801p'

SRR1974808.1497057.1    83      chr1    40149760        255     67M8S   =       40148296        -1531   CCGTTCTTGTCGAAGGTGCGGAAAGCGTGCTGCGCGAACTTGGAGGCGTCGCCGTAGGGGAAGAACTTGATGTAG    FFFAAFFFFFFFFF7F.FFFFFF7FFF)FFFFFFF<FFF<7FFFFFFF<FFFFAF<FFFFFFFAFFFFFFAA<AA     PG:Z:MarkDuplicates     RG:Z:SRR1974808 NH:i:1  HI:i:1  jI:B:i,-1       jM:B:c,-1     nM:i:0   AS:i:139

EXAMPLE 3:

Error occured when processing SAM input (record #77098 in file ../SRR1974802.sorted.dedup.bam):
  unsigned byte integer is less than minimum
  [Exception type: OverflowError, raised in csamtools.pyx:2308]

samtools view ../SRR1974802.sorted.dedup.bam | sed -n '77098p'

SRR1974802.1214351.1    99      chr1    16045055        255     13S62M  =       16046228        1221    GAGTACATGGGAAGATCACCTGACGCTCTTCCTGACATTGGTGTCCGGGCTAGAGTTCATTCGTTCCGAGCTGGA    A)AAA)AFA.FF)FFF<7.)FFF.F<FFFF..F..F)FA.)F<7FA<F))F<FFFAFF.FFF<F)FA.<FFF7FF     PG:Z:MarkDuplicates     RG:Z:SRR1974802 NH:i:1  HI:i:1  jI:B:i,-1       jM:B:c,-1     nM:i:2   AS:i:103

EXAMPLE 4:

Error occured when processing SAM input (record #153985 in file ../SRR1974806.sorted.dedup.bam):
  unsigned byte integer is less than minimum
  [Exception type: OverflowError, raised in csamtools.pyx:2308]

samtools view ../SRR1974806.sorted.dedup.bam | sed -n '153985p'

SRR1974806.735761.1     99      chr1    45469184        255     68M7S   =       45469375        380     TGTCAGTGTCGATGGCCACGCAGTTGTAGGCCGCATAGCGGAGCTTCTCCTCGCATACCTTGGCACTGGCATAGT    <<AAAFFFFFFFFFFF<)FFFFF<FAFFFAFAFFFFFFFFFFFFAAFF.FAF<<F7<AFFFFFF.<FFFFA7FFA     PG:Z:MarkDuplicates     RG:Z:SRR1974806 NH:i:1  HI:i:1  jI:B:i,-1       jM:B:c,-1     nM:i:0   AS:i:141        XS:A:-
alignment htseq-count RNA-seq STAR • 2.6k views
ADD COMMENT
0
Entering edit mode

This is actually a pysam error that I've seen a few others run into (though with different programs). What version of pysam do you have installed and can you try upgrading it?

ADD REPLY

Login before adding your answer.

Traffic: 1507 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6