problem with counting reads per gene using HTseq

0

Entering edit mode

7.5 years ago

ashkan ▴ 160

I have bam file from RNA-seq data and using the following commnd to count the number of reads per gene:

python -m HTSeq.scripts.count -f bam -r pos -m intersection-nonempty -s no -a 10 accepted_hits.bam gencode.v19.annotation.gtf  > 2858counts.txt

I have used this command before and got the results, but this time I tried the same command and got 0 values for all genes(which is not possible). do you guys know what the problem is?

RNA-Seq • 2.0k views

ADD COMMENT • link 7.5 years ago by ashkan ▴ 160

0

Entering edit mode

The most common reason is a mismatch in chromosome names. Having said that, look at the BAM file in IGV, pick out a few reads that should be getting counted, and then use the -o option so you can then look those reads up and see why they're note being included.

ADD REPLY • link 7.5 years ago by Devon Ryan 104k

0

Entering edit mode

Try removing -r pos from the command,and then running it. It worked for me.

ADD REPLY • link 7.5 years ago by Ron ★ 1.2k

1

Entering edit mode

That flag tells htseq count about the order of reads, so just randomly setting or removing the flag is not your best guess. From the documentation:

-r <order>, --order=<order> For paired-end data, the alignment have to be sorted either by read name or by alignment position. If your data is not sorted, use the samtools sort function of samtools to sort it. Use this option, with name or pos for <order> to indicate how the input data has been sorted. The default is name.

ADD REPLY • link 7.5 years ago by WouterDeCoster 47k

0

Entering edit mode

accepted_hits.bam happens to match the file name produced by tophat2, which produces sorted BAM files by default...

ADD REPLY • link 7.5 years ago by Devon Ryan 104k

Login before adding your answer.