Use Rfam to remove contaminants from miRNA-seq
1
2
Entering edit mode
3.0 years ago

Dear all, My project is about mastitis disease in dairy cows. I am working on miRNA -seq data. Now I want to remove the contaminants from my miRNA sequencing data. I know Rfam is original dataset for this purpose, but I do not know how to get Rfam dataset in a fasta file. I use the following link for get thee data: http://ftp.ebi.ac.uk/pub/databases/Rfam/CURRENT/fasta_files/ But there are nearly a thousand files in this page. Can anyone guide me to solve this problem? ‌Best regards, S. Sharifi

Rfam miRNA contaminant sequensing • 1.3k views
ADD COMMENT
0
Entering edit mode

Rfam is a dataset of RNA families. How are you planning to use it to remove contaminants (and what is your definition of contaminant)? If you used a miRNA specific kit to prepare your libraries there should only be miRNA in your data.

ADD REPLY
0
Entering edit mode

I know Rfam is RNA family database and use for discard other non-coding RNA (e.g. rRNAs and tRNAs) from our data. Therefore, the remaining readings will include miRNA and unmapped sequences that will be used to search for novels.

ADD REPLY
0
Entering edit mode
3.0 years ago

I'm not sure what your are reffering to be "contaminants". If you mean cow RNAs that are not miRNAs, then what you want is a complete reference sequence of all RNAs excluding miRNAs. Each of the files in that link you post is the sequence of all RNAs in a given family. I don't know how many of them would be relevant the Cow. On the flip side, they would also include the sequence of miRNAs, so if you filtered against them, you would also remove miRNA sequences.

RNACentral has a more accessible set of RNA reference sequences - but again I don't know if it includes Bos taurus sequences, and I know that it definately does include miRNA sequences, that at the very least you'd have to filter out of it before use.

Why do you want to filter out such "contaminants"? If you map to the genomic sequence, you can just count the reads that DO map to miRNAs. Inflation of counts by reads mapping to miRNAs when they should map elsewhere is likely to be minimal. In fact, I don't really think it would be a major problem even if you mapped directly to the miRNA sequences.

We almost always map to RNACentral, and then look what is and isn't a miRNA posthoc, but then we work almost exclusively in mouse or human.

ADD COMMENT
0
Entering edit mode

Thank you for your reply. I think my explanation in the previous post will help you clarify the issue.

ADD REPLY
0
Entering edit mode

Okay. I now understand why, but I still don't think its the best idea. Do you any reason to believe that unmapped sequences are more likely to be novel miRNAs, rather than novel tRNAs or novel snRNAs, or even fragments of novel protein coding genes?

ADD REPLY

Login before adding your answer.

Traffic: 2942 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6