Hello,
How can I get # of reads mapping to CDS/rRNA features or intergenic? I did my alignment with tophat and I have .gtf .gff and .bed annotations of my features.
Thank you,
Adrian
Hello,
How can I get # of reads mapping to CDS/rRNA features or intergenic? I did my alignment with tophat and I have .gtf .gff and .bed annotations of my features.
Thank you,
Adrian
Use HTSeq tool.
python -m HTSeq.scripts.count [options] <alignment_file> <gff_file>
example command line to count reads mapped to the CDS:
python -m HTSeq.scripts.count --type=CDS --idattr=gene_id --mode=union --stranded=yes <sam_file> <gff_file>
Thanks for the useful tool, but it doesn't seem to work with tophat alignemnts:
7584 GFF lines processed.
Warning: No features of type 'exon' found.
Error occured when reading beginning of SAM/BAM file.
('SAM line does not contain at least 11 tab-delimited fields.', 'line 1 of file con1/accepted_hits.bam')
[Exception type: ValueError, raised in _HTSeq.pyx:1276]
Hmm, looks like i need to specify -f bam and install a bam reader.
I can confirm that yes, all you need to do is specify "-f bam" and this error should go away. The error is misleading because it says "SAM/BAM". Many other tools perform this detection behind-the-scenes with no problems, so it surprising that HTSeq does not.
I am not sure (this is just a guess) but the problem may be because you are providing bam file to HTseq as an input. It requires a SAM file. You can make HTseq read from the standard input ("-") using the following command.
samtools view accepted_hits.bam | python -m HTSeq.scripts.count --type=CDS --idattr=gene_id --mode=union --stranded=yes - <gff_file>
Or you can convert your bam file to sam file
samtools view -h input.bam > output.sam and give it to HTSeq.
If this is not the case, would you mind pasting the first few lines from your bam/sam.
You may try https://www.broadinstitute.org/cancer/cga/rna-seqc or http://rseqc.sourceforge.net/