Question: HTSeq error when reading BAM file
gravatar for Mike
3.3 years ago by
Mike1.4k wrote:


I am running HTseq for read count with STAR output BAM file, but I got following errors:

STAR --genomeDir /databank/igenomes/Homo_sapiens/STAR --readFilesIn R1.fastq --runThreadN 8 --outFileNamePrefix R1a --outSAMtype BAM Unsorted

htseq-count -m union -i gene_name R1aAligned.out.bam hg19.gtf > output_usingbam.counts

Error occured when reading first line of sam file.
  ('SAM line does not contain at least 11 tab-delimited fields.', 'line 1 of file R1aAligned.out.bam')
  [Exception type: ValueError, raised in _HTSeq.pyx:1276]

But no erorr when I use default STAR output (SAM ) file in htseq-count :

STAR --genomeDir /databank/igenomes/Homo_sapiens/STAR --readFilesIn R1.fastq --runThreadN 8 --outFileNamePrefix R1a

htseq-count -m union -i gene_name R1aAligned.out.sam hg19.gtf > output_usingsam.counts

it run successfully.

And I also got error when I sorted SAM file.

Please help where is problem.


rna-seq htseq • 4.0k views
ADD COMMENTlink modified 4 weeks ago by Biostar ♦♦ 20 • written 3.3 years ago by Mike1.4k
gravatar for michael.ante
3.3 years ago by
michael.ante3.6k wrote:

Hi Mike,

according to the HTSeq-count manual, you have to set the -f parameter:

 -f <format>, --format=<format>

     Format of the input data. Possible values are sam (for text SAM files) and bam (for binary BAM files). Default is sam.

By doing so, you also need to specify if your data is either position sorted or name sorted:

-r <order>, --order=<order>

 ...  Use this option, with name or pos for <order> to indicate how the input data has been sorted. The default is name.



ADD COMMENTlink written 3.3 years ago by michael.ante3.6k

Thanks Michael,

my bam file is not sorted, so I used following command, but still got error:

htseq-count -m union -i gene_name -f bam R1aAligned.out.bam hg19.gtf > output_usingbam.counts

htseq-count: error: no such option: -f

ADD REPLYlink written 3.3 years ago by Mike1.4k

Maybe you should update your htseq program.

You might also try the gene counting from STAR itself (it works with STAR 2.5.2). Setting the parameter --quantMode GeneCounts will produce a file which "counts coincide with those produced by htseq-count with default parameters (STAR Manual 2.5.2a)".

ADD REPLYlink written 3.3 years ago by michael.ante3.6k


Updated htseq, but.. still error

100000 GFF lines processed.
200000 GFF lines processed.
300000 GFF lines processed.
400000 GFF lines processed.
500000 GFF lines processed.
600000 GFF lines processed.
700000 GFF lines processed.
800000 GFF lines processed.
900000 GFF lines processed.
1000000 GFF lines processed.
1100000 GFF lines processed.
1200000 GFF lines processed.
1300000 GFF lines processed.
1400000 GFF lines processed.
1500000 GFF lines processed.
1600000 GFF lines processed.
1700000 GFF lines processed.
1800000 GFF lines processed.
1900000 GFF lines processed.
2000000 GFF lines processed.
2100000 GFF lines processed.
2200000 GFF lines processed.
2215898 GFF lines processed.
Error occured when reading beginning of SAM/BAM file.
  file header is empty (mode='rb') - is it SAM/BAM format?
  [Exception type: ValueError, raised in calignmentfile.pyx:574]
ADD REPLYlink written 3.3 years ago by Mike1.4k

Has your SAM file the correct header (samtools view -SH R1aAligned.out.sam) and does it match the gtf's chromosome-names? What version of STAR are you using and what is the output of the

ADD REPLYlink written 3.3 years ago by michael.ante3.6k

Thanks, now it works,

it was problem in STAR output sam file,

Could you please check, alignment results, Is it OK?

This is file:

                          Started job on |       Sep 22 14:30:25
                     Started mapping on |       Sep 22 14:56:52
                            Finished on |       Sep 22 14:58:27

Mapping speed, Million of reads per hour | 147.32

                  Number of input reads |       3887667
              Average input read length |       72
                            UNIQUE READS:
           Uniquely mapped reads number |       3180832
                Uniquely mapped reads % |       81.82%
                  Average mapped length |       71.33
               Number of splices: Total |       247696
    Number of splices: Annotated (sjdb) |       0
               Number of splices: GT/AG |       242886
               Number of splices: GC/AG |       1339
               Number of splices: AT/AC |       58
       Number of splices: Non-canonical |       3413
              Mismatch rate per base, % |       0.86%
                 Deletion rate per base |       0.01%
                Deletion average length |       1.67
                Insertion rate per base |       0.01%
               Insertion average length |       1.43
                     MULTI-MAPPING READS:
Number of reads mapped to multiple loci |       388865
     % of reads mapped to multiple loci |       10.00%
Number of reads mapped to too many loci |       11128
     % of reads mapped to too many loci |       0.29%
                          UNMAPPED READS:

% of reads unmapped: too many mismatches | 0.00% % of reads unmapped: too short | 7.66% % of reads unmapped: other | 0.23%

ADD REPLYlink modified 3.3 years ago • written 3.3 years ago by Mike1.4k

It looks OK; you have ~ 92% mapping rate (82 % uniquely mapped reads + 10% multi-mapping reads). With HTSeq-count only the 82% will be processed.

You can check your raw fastq file for adapter contamination (e.g. with FastQC reporter) and maybe trim if found. You can also check your alignment with RSeQC (gene body coverage; strandedness, read distribution over annotation).

ADD REPLYlink written 3.3 years ago by michael.ante3.6k

Thank you very much... highly appreciated for your nice explaination.

ADD REPLYlink written 3.3 years ago by Mike1.4k

Hi Mike, I am having the same problem. You mentioned it was the problem of STAR output sam file. So how do you solve it? Thanks!

ADD REPLYlink written 2.9 years ago by Megan40
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1341 users visited in the last hour