I think it would be good to post it here for future reference. I could not find a respective topic here, and only found one discussion on Biostars.
Picard tools has a tool named CollectRNASeqStatistics. It's a very useful program that requires, among other, more obvious things, a file of ribosomal intervals, in SAM-like format (SAM-type header, and intervals in 5 fields: chr, begin, end, strand (+ or -), and actual gene name. Since I mostly deal with mouse (in mm9 assembly) and human (in hg19 assembly) genomes, I wanted to find these files or make them myself.
I've tried to make sense of the files http://www.arb-silva.de/ and just flat-out failed. If someone can tell me how to convert files they have available for download into genomic intervals that correspond to rRNA, I'd be very grateful.
At any rate, I've proceeded to the latest version of GENCODE. There are 1587 intervals annotated as "rRNA" transcript type in v17 of GENCODE. However, I've found that many intervals I had in a previous rRNA interval file (origins of which are mysterious), such as LSU-rRNAs, are absent.
So, here are two main questions:
- what should be the ultimate source of the information for rDNA annotated intervals?
- what would be such source for mouse genome, considering that there's no GENCODE data for mouse?
Thank you for your inputs.