I would like to screen for rRNA contamination in my RNA-seq. I tried two different methods, which can be summed up as:
- take human GENCODE GTF, filter for "rRNA", extract sequence for the matching coordinates
- download Rfam FASTA file, filter for "ribosomal_rna" and "homo_sapiens" (example protocol)
I then align against each one separately. I don't expect them to yield very close results, but the difference can be 100X. Why such a big difference? Can I trust either?
One possibility is that the Rfam sequences are overestimating the abundance and I am getting a lot of false positives. However, I frequently have alignment rate of less than 5%, which is very reasonable.