Question: htseq-count: error when reading beginning of SAM/BAM file.
0
gravatar for cgias
18 months ago by
cgias0
cgias0 wrote:

Hello,

I've mapped my sample using bowtie2 with the following command:

bowtie2 -p 10 -x /home/carolgs/Cpapaya/bowtieIndex/Cpapaya_113_r.Dec2008.fa.index -1 filtered_PE1.fastq -2 filtered_PE2.fastq -S map.fastq.sam --very-sensitive-local

Now I need to count the reads. I'm trying to run HTSeq-count but it isn't working. I have a .sam file and the annotation file used while mapping is a gff3 without exons information. I'm using this command line:

samtools flagstat map.fastq.sam > flagstat 
samtools view -Sb map.fastq.sam > mapped.bam
samtools sort -o SAM mapped.bam > sorted.sam
python -m HTSeq.scripts.count -s yes -r pos -i ID -m intersection-nonempty -q sorted.sam  /home/carolgs/Cpapaya/annotation/Cpapaya_113_ASGPBv0.4.gene.gff3  >  counts_b.txt

And the error message I get in the log file is

" [Exception type: StopIteration, raised in count.py:126]
[bam_header_read] EOF marker is absent. The input is probably truncated.
[bam_header_read] invalid BAM binary header (this is not a BAM file).
[bam_flagstat_core] Truncated file? Continue anyway.
[samopen] SAM header is present: 5901 sequences.
open: No such file or directory
[bam_sort_core] fail to open file SAM
Warning: No features of type 'exon' found.
Error occured when reading beginning of SAM/BAM file.

I'm not sure about the feature I have to put at -i, I've tried with ID and Name and both of them gives me the same error message. Does anybody have an idea about where is my mistake?

Thanks,

rna-seq htseq-count htseq • 1.8k views
ADD COMMENTlink modified 18 months ago by h.mon28k • written 18 months ago by cgias0

You must be missing header in your BAM file. What does samtools view -H your.bam produce?

ADD REPLYlink written 18 months ago by genomax74k

At the beggining:

@HD VN:1.0  SO:unsorted
@SQ SN:supercontig_0    LN:6239266
@SQ SN:supercontig_1    LN:5418800
@SQ SN:supercontig_2    LN:4646840
@SQ SN:supercontig_3    LN:3706331
@SQ SN:supercontig_4    LN:3897756
@SQ SN:supercontig_5    LN:3349961
@SQ SN:supercontig_6    LN:3337819
@SQ SN:supercontig_7    LN:2695179
@SQ SN:supercontig_8    LN:2727084
@SQ SN:supercontig_9    LN:2795995
@SQ SN:supercontig_10   LN:3250320
@SQ SN:supercontig_11   LN:2969618

And the last lines:

@SQ SN:contig_48203 LN:689
@SQ SN:contig_48208 LN:682
@SQ SN:contig_48211 LN:675
@SQ SN:contig_48213 LN:675
@SQ SN:contig_48225 LN:649
@SQ SN:contig_48241 LN:623
@SQ SN:contig_48249 LN:610
@SQ SN:contig_48254 LN:599
@SQ SN:contig_48276 LN:555
@SQ SN:contig_48289 LN:544
@SQ SN:contig_48297 LN:505
@SQ SN:contig_48300 LN:496
@SQ SN:contig_48349 LN:380
@SQ SN:contig_48372 LN:318
@SQ SN:contig_48381 LN:277
@PG ID:bowtie2  PN:bowtie2  VN:2.2.3    CL:"/opt/bowtie2-2.2.3/bowtie2-align-s --wrapper basic-0 -p 10 -x /home/carolgs/Cpapaya/bowtieIndex/Cpapaya_113_r.Dec2008.fa.index -S map.fastq.sam --very-sensitive-local -1 filtered_PE1.fastq -2 filtered_PE2.fastq"
ADD REPLYlink modified 18 months ago by genomax74k • written 18 months ago by cgias0

That looks good. I see that you had converted your bam file back into SAM for sorting. What does the header on sorted SAM file look like? You could just do head -15 sorted.sam.

There is this additional error in your log: No features of type 'exon' found.

Have you tried to use featureCounts (http://bioinf.wehi.edu.au/featureCounts/ ) with the sorted BAM file? It is fast and easier to use.

ADD REPLYlink modified 18 months ago • written 18 months ago by genomax74k

This head -15 sorted.sam gives me nothing back.

About this additional error, I thought it was because the annotation file I used has no exon information so I shoul tell it to htseq using -i ID, although I'm not sure it's really ID the name and I don't know how to check it.

I'll take a look at this program. Thank you!

ADD REPLYlink written 18 months ago by cgias0

This head -15 sorted.sam gives me nothing back.

Suggesting that the file is just empty, you may want to repeat the sorting step.

ADD REPLYlink written 18 months ago by WouterDeCoster42k

With a reasonable recent version of samtools you can simplify your commands substantially:

instead of

samtools view -Sb map.fastq.sam > mapped.bam
samtools sort -o SAM mapped.bam > sorted.sam

you can use

samtools sort -o sorted.bam map.fastq.sam

which will perform sorting and conversion to bam

ADD REPLYlink modified 18 months ago • written 18 months ago by WouterDeCoster42k

Great tip, I'm gonna improve it to my command. Thank you!

ADD REPLYlink written 18 months ago by cgias0

What version of samtools are you using?

ADD REPLYlink written 18 months ago by genomax74k
1
gravatar for h.mon
18 months ago by
h.mon28k
Brazil
h.mon28k wrote:

You are mapping RNAseq reads to an eukaryotic genome with Bowtie2. This is not a good idea, as Bowtie2 is not designed to align over splice junctions. Use STAR (as you are trying on your other thread) or HISAT2.

Additionally, this command is wrong, as the resulting file should be a bam file:

samtools sort -o SAM mapped.bam > sorted.sam

Finally, STAR has a parameter --quantMode geneCounts, which will output cunts equivalent to those of HTSeq.

edit: the above samtools sort is doubly wrong, with -o you can't redirect standard output. It should be:

samtools sort -o sorted.bam mapped.bam
ADD COMMENTlink modified 18 months ago • written 18 months ago by h.mon28k

I am mapping just a couple of samples with both programs to see which one is better, guess it really is STAR. Thanks for de correction.

I finally got it working

samtools sort mapped.bam sorted samtools view sorted.bam | htseq-count -s yes -r pos -m intersection-nonempty -i pacid - /home/carolgs/Cpapaya/STARindex2/Cpapaya_113_ASGPBv0.4.gene_exons.gff3 > counts_name.txt

ADD REPLYlink modified 18 months ago • written 18 months ago by cgias0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 857 users visited in the last hour