problem with counting reads per gene using HTseq
0
0
Entering edit mode
7.5 years ago
ashkan ▴ 160

I have bam file from RNA-seq data and using the following commnd to count the number of reads per gene:

python -m HTSeq.scripts.count -f bam -r pos -m intersection-nonempty -s no -a 10 accepted_hits.bam gencode.v19.annotation.gtf  > 2858counts.txt

I have used this command before and got the results, but this time I tried the same command and got 0 values for all genes(which is not possible). do you guys know what the problem is?

RNA-Seq • 2.0k views
ADD COMMENT
0
Entering edit mode

The most common reason is a mismatch in chromosome names. Having said that, look at the BAM file in IGV, pick out a few reads that should be getting counted, and then use the -o option so you can then look those reads up and see why they're note being included.

ADD REPLY
0
Entering edit mode

Try removing -r pos from the command,and then running it. It worked for me.

ADD REPLY
1
Entering edit mode

That flag tells htseq count about the order of reads, so just randomly setting or removing the flag is not your best guess. From the documentation:

-r <order>, --order=<order> For paired-end data, the alignment have to be sorted either by read name or by alignment position. If your data is not sorted, use the samtools sort function of samtools to sort it. Use this option, with name or pos for <order> to indicate how the input data has been sorted. The default is name.

ADD REPLY
0
Entering edit mode

accepted_hits.bam happens to match the file name produced by tophat2, which produces sorted BAM files by default...

ADD REPLY

Login before adding your answer.

Traffic: 2571 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6