Htseq unable to process last read
1
1
Entering edit mode
3.7 years ago
Aspire ▴ 330

Counting reads with htseq-count produces the following error

Error occured when processing input (record #977823 in file <_io.TextIOWrapper name='<stdin>' mode='r' encoding='utf-8'>):
header not available in closed files [Exception type: ValueError, raised in libcalignmentfile.pyx:1884]

record #977823 is the last record (see below)

The command used to run htseq was

samtools view -F 2304 -f 1 -bh Samp1_PE.bam | samtools sort -n -O SAM - | htseq-count -s reverse -o Samp1_w_feature_PE.bam -p bam - /Annotation/Genes/masked.gtf >! htseq_pe.txt

The resultant text file is empty. The resultant bam file (Samp1_w_feature_PE) has all reads except for the last one (sorted by name). When counting the number of lines

> samtools view -F 2304 -f 1 -bh Samp1_PE.bam | wc -l

is exactly 977823, so I understand that record #977823 is the last record.

I have not found this error message when googling. Any help?

RNA-Seq htseq-count • 723 views
ADD COMMENT
2
Entering edit mode
3.7 years ago
Aspire ▴ 330

Turns out that the specific parameter which causes this problem is

-o Samp1_w_feature_PE.bam -p bam

Counting paired-end reads AND creating the output in bam format causes an error. None of them separately causes an error. I was able to output bam, when counting single reads, and more importantly, I was able to output to sam format when counting paired end reads. I have converted the sam to bam at a later stage, so this is effectively a successful workaround.

ADD COMMENT

Login before adding your answer.

Traffic: 1298 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6