Question

Is it necessary to name sort single end reads before htseq count

1

Entering edit mode

8.1 years ago

natsterbug ▴ 10

I have single end RNAseq data that I have aligned with Tophat2 and now would like to use htseq to generate counts for EdgeR. I have read that it is necessary to sort the accepted_hits.bam file by name if using paired ended reads. http://www-huber.embl.de/users/anders/HTSeq/doc/count.html Further, since the default option for order is name , if I do not need to sort by name for single end reads is it necessary to use the pos option?

Currently, I am using the command below and receive the following counts: htseq-count -m intersection-nonempty --format=bam tophat_Kalkaska_control/tophat_K18C/accepted_hits.bam PGSC_DM_V403_genes_strand_filtered.gtf > htseq_counts_control/K18C_counts.txt

htseq RNAseq sam bam • 2.7k views

ADD COMMENT • link updated 8.1 years ago by h.mon 35k • written 8.1 years ago by natsterbug ▴ 10

score 0 · Answer 1 · 2016-03-17

0

Entering edit mode

8.1 years ago

GouthamAtla 12k

If your data is single-end, use the pos argument as your data is coordinate sorted. Just to be on safe side.

ADD COMMENT • link 8.1 years ago by GouthamAtla 12k

0

Entering edit mode

Will do. Thank you so much.

ADD REPLY • link 8.1 years ago by natsterbug ▴ 10

score 0 · Answer 2 · 2016-03-18

0

Entering edit mode

8.1 years ago

h.mon 35k

It should not make any difference for single-end data, but you can easily test: just run htseq-count with both options, one at a time.

The --pos argument is used to determine how paired reads are sorted on the sam/bam file, to select internally how to find read pairs.

ADD COMMENT • link 8.1 years ago by h.mon 35k