Question: Other RNA removal from RNA-Seq Data
4.7 years ago
VineethVenumadhavan wrote:

What is the best resource for other RNA sequences if our aim is to omit Other RNA(if any) contaminants in RNA-Seq Data. This Query seems discussed in different threads, but currently looking for a summarized solution of all such.

a) Select all OtherRNAs from NCBI of higher taxonomy filter For ex if sample is plant: Generate OtherRNA db from NCBI all[filter] AND "green plants"[porgn] AND (biomol_trna[PROP] OR biomol_snorna[PROP] OR biomol_snrna[PROP] OR biomol_rrna[PROP] OR biomol_scrna[PROP] OR biomol_crna[PROP])

b) Generate RNA db from specialized databases like silva,greengenes etc. But I think many of them harbour only rRNA sequences.

c) rfam db

d) riboPicker

pardon if query is not clear / is irrelevant

Why not focus on what you want?, instead of focusing on what not.

Can't you just align your reads to the genome, and use only these that align to your 'real' genes? I assume you want the coding genes only?

Good Thought there. I appreciate. I am also concerned in de-novo experiments where we lack a reference genome.

Why not first find out what is contaminating your data. Do a denovo transcript assembly, plot GC content of transcripts, see if you get multiple peaks. You can also blast your transcripts and find out if there are obvious contaminants.

4.7 years ago
michael.ante wrote:

I would choose any repository (TAIR/NCBI/..) which describes my plant the best/closest. Download the sequences of interests (species-specific) and built a bowtie2 index out of them. Afterwards, I'd align the reads and save the unmapped reads which will be processed further.

Yes, this is normal procedure we do for the removal of other RNA from RNA Seq data.

