Question: Filter all rRNA sequences when doing ribosomal profiling, using rRNA database or GenBank file?
3
gravatar for AlicePsyche
2.3 years ago by
AlicePsyche30
United Kingdom
AlicePsyche30 wrote:

Hi, all

Recently I need to do some analysis about ribosomal profiling data. A lot of papers recommended to filter rRNA sequences before mapping to genome. So which way is better? Using rRNA database or only use your organism annotation file(GenBank)? It seems that many people use this website database: https://www.arb-silva.de/download/arb-files/

For genbank method, I plan to download files from ensembl: ftp://ftp.ensembl.org/pub/release-79/genbank/danio_rerio/ and then use biopython to pull out all rRNA related sequences.

What is the normal way to get rid of rRNA in ribosomal profiling data analysis? Any suggestion is welcome!

ADD COMMENTlink modified 2.3 years ago by Charles Plessy2.7k • written 2.3 years ago by AlicePsyche30
2
gravatar for Devon Ryan
2.3 years ago by
Devon Ryan92k
Freiburg, Germany
Devon Ryan92k wrote:

If you happen to be working on an organism that has the full rRNA cassette sequence in its reference genome then you won't need to additionally align against something from the silva database. If not, align against that beforehand.

Note that this is probably not sufficient (at least, it hasn't been for me). You should additionally filter out anything that hits 5S rRNA or 5.8S rRNA or any rRNA repeat regions that repeatmasker found. I would do the same for tRNA regions. You may also have issues with some piRNAs, but perhaps you'll have more luck there. Every time I work with a new organism with RiboSeq data I end up having to come up with a slightly modified filtering procedure :(

ADD COMMENTlink written 2.3 years ago by Devon Ryan92k

Thanks for your helpful reply! I am working on zebrafish, not having a full rRNA cassette sequence :( If I understand correctly, I probably should merge silva database and repeatmasker(rRNA&tRNA) file then do filtering mapping, right?

ADD REPLYlink written 2.3 years ago by AlicePsyche30

Given the answer from Charles Plessy, you can probably just blacklist a bunch of regions. I don't know how you were planning on analysing the data. When I last did something like this, I used deepTools and a bit of python for the final stuff, so I could trivially blacklist regions. If you plan to use something else, you'll want to make a BED file and reverse intersect with it (bedtools intersect).

ADD REPLYlink modified 2.3 years ago • written 2.3 years ago by Devon Ryan92k

I plan to map raw data to rRNA region using bowtie and then map unmapped fastq file to genome using tophat, as this paper suggested: http://www.nature.com/nature/journal/v503/n7476/full/nature12632.html

By the way, I have tested my data using repeatmasker file (rRNA&tRNA), it turned out that only 2% raw reads mapped to rRNA region. Is it normal?

I use deepTools a lot when dealing with ChIP-seq data ;) never try with RNA related experiments. Maybe it's time... Thanks!

ADD REPLYlink written 2.3 years ago by AlicePsyche30

2% is lower than what I got with human/mouse, but that likely just means your library prep was better than what I was given :)

You'll find the --Offset option in bamCoverage useful. I created that for RiboSeq and related datasets so I could quickly check for pausing with a bit of python.

ADD REPLYlink written 2.3 years ago by Devon Ryan92k
1
gravatar for Charles Plessy
2.3 years ago by
Charles Plessy2.7k
Japan
Charles Plessy2.7k wrote:

Some years ago I searched for the rRNA sequences in the zebrafish genome version 9. You can find my notes on GitHub (charles-plessy/zebrafish_rRNA). I hope they can be useful to you. Comments are welcome.

ADD COMMENTlink written 2.3 years ago by Charles Plessy2.7k

Thanks! It is very helpful!

ADD REPLYlink written 2.3 years ago by AlicePsyche30
0
gravatar for h.mon
2.3 years ago by
h.mon27k
Brazil
h.mon27k wrote:

What was used on the papers you read? That would be a good start.

Or use SortMeRNA or BBDuk, both of them are mentioned plenty of times on this forum.

ADD COMMENTlink written 2.3 years ago by h.mon27k

Oh, the papers do not specify a rRNA database... I searched and found there may be many options.

Thanks! I am not familiar with rRNA, would study the database you mentioned carefully.

ADD REPLYlink written 2.3 years ago by AlicePsyche30
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1167 users visited in the last hour