Question: HTSeq error when reading BAM file
0
gravatar for Mike
16 months ago by
Mike880
UK
Mike880 wrote:

Hi,

I am running HTseq for read count with STAR output BAM file, but I got following errors:

STAR --genomeDir /databank/igenomes/Homo_sapiens/STAR --readFilesIn R1.fastq --runThreadN 8 --outFileNamePrefix R1a --outSAMtype BAM Unsorted

htseq-count -m union -i gene_name R1aAligned.out.bam hg19.gtf > output_usingbam.counts

Error occured when reading first line of sam file.
  ('SAM line does not contain at least 11 tab-delimited fields.', 'line 1 of file R1aAligned.out.bam')
  [Exception type: ValueError, raised in _HTSeq.pyx:1276]

But no erorr when I use default STAR output (SAM ) file in htseq-count :

STAR --genomeDir /databank/igenomes/Homo_sapiens/STAR --readFilesIn R1.fastq --runThreadN 8 --outFileNamePrefix R1a

htseq-count -m union -i gene_name R1aAligned.out.sam hg19.gtf > output_usingsam.counts

it run successfully.

And I also got error when I sorted SAM file.

Please help where is problem.

Thanks

rna-seq htseq • 1.6k views
ADD COMMENTlink modified 16 months ago by michael.ante2.0k • written 16 months ago by Mike880
1
gravatar for michael.ante
16 months ago by
michael.ante2.0k
Austria/Vienna
michael.ante2.0k wrote:

Hi Mike,

according to the HTSeq-count manual, you have to set the -f parameter:

 -f <format>, --format=<format>

     Format of the input data. Possible values are sam (for text SAM files) and bam (for binary BAM files). Default is sam.

By doing so, you also need to specify if your data is either position sorted or name sorted:

-r <order>, --order=<order>

 ...  Use this option, with name or pos for <order> to indicate how the input data has been sorted. The default is name.

Cheers,

Michael

ADD COMMENTlink written 16 months ago by michael.ante2.0k

Thanks Michael,

my bam file is not sorted, so I used following command, but still got error:

htseq-count -m union -i gene_name -f bam R1aAligned.out.bam hg19.gtf > output_usingbam.counts

htseq-count: error: no such option: -f

ADD REPLYlink written 16 months ago by Mike880
3

Maybe you should update your htseq program.

You might also try the gene counting from STAR itself (it works with STAR 2.5.2). Setting the parameter --quantMode GeneCounts will produce a file which "counts coincide with those produced by htseq-count with default parameters (STAR Manual 2.5.2a)".

ADD REPLYlink written 16 months ago by michael.ante2.0k

Thanks,

Updated htseq, but.. still error

100000 GFF lines processed.
200000 GFF lines processed.
300000 GFF lines processed.
400000 GFF lines processed.
500000 GFF lines processed.
600000 GFF lines processed.
700000 GFF lines processed.
800000 GFF lines processed.
900000 GFF lines processed.
1000000 GFF lines processed.
1100000 GFF lines processed.
1200000 GFF lines processed.
1300000 GFF lines processed.
1400000 GFF lines processed.
1500000 GFF lines processed.
1600000 GFF lines processed.
1700000 GFF lines processed.
1800000 GFF lines processed.
1900000 GFF lines processed.
2000000 GFF lines processed.
2100000 GFF lines processed.
2200000 GFF lines processed.
2215898 GFF lines processed.
Error occured when reading beginning of SAM/BAM file.
  file header is empty (mode='rb') - is it SAM/BAM format?
  [Exception type: ValueError, raised in calignmentfile.pyx:574]
ADD REPLYlink written 16 months ago by Mike880

Has your SAM file the correct header (samtools view -SH R1aAligned.out.sam) and does it match the gtf's chromosome-names? What version of STAR are you using and what is the output of the R1aLog.final.out?

ADD REPLYlink written 16 months ago by michael.ante2.0k

Thanks, now it works,

it was problem in STAR output sam file,

Could you please check, alignment results, Is it OK?

This is R1aLog.final.out file:

                          Started job on |       Sep 22 14:30:25
                     Started mapping on |       Sep 22 14:56:52
                            Finished on |       Sep 22 14:58:27

Mapping speed, Million of reads per hour | 147.32

                  Number of input reads |       3887667
              Average input read length |       72
                            UNIQUE READS:
           Uniquely mapped reads number |       3180832
                Uniquely mapped reads % |       81.82%
                  Average mapped length |       71.33
               Number of splices: Total |       247696
    Number of splices: Annotated (sjdb) |       0
               Number of splices: GT/AG |       242886
               Number of splices: GC/AG |       1339
               Number of splices: AT/AC |       58
       Number of splices: Non-canonical |       3413
              Mismatch rate per base, % |       0.86%
                 Deletion rate per base |       0.01%
                Deletion average length |       1.67
                Insertion rate per base |       0.01%
               Insertion average length |       1.43
                     MULTI-MAPPING READS:
Number of reads mapped to multiple loci |       388865
     % of reads mapped to multiple loci |       10.00%
Number of reads mapped to too many loci |       11128
     % of reads mapped to too many loci |       0.29%
                          UNMAPPED READS:

% of reads unmapped: too many mismatches | 0.00% % of reads unmapped: too short | 7.66% % of reads unmapped: other | 0.23%

ADD REPLYlink modified 16 months ago • written 16 months ago by Mike880
1

It looks OK; you have ~ 92% mapping rate (82 % uniquely mapped reads + 10% multi-mapping reads). With HTSeq-count only the 82% will be processed.

You can check your raw fastq file for adapter contamination (e.g. with FastQC reporter) and maybe trim if found. You can also check your alignment with RSeQC (gene body coverage; strandedness, read distribution over annotation).

ADD REPLYlink written 16 months ago by michael.ante2.0k

Thank you very much... highly appreciated for your nice explaination.

ADD REPLYlink written 16 months ago by Mike880

Hi Mike, I am having the same problem. You mentioned it was the problem of STAR output sam file. So how do you solve it? Thanks!

ADD REPLYlink written 10 months ago by Megan30
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1280 users visited in the last hour