I have two miRNA-seq dataset, already removing adapters appeared in raw data. What I want to do is to detect the differential expression between two samples(no replicates, I know it is not a good experimental design). The pipeline: 1)remove contaminations like mRNA, tRNA, snoRNA and so on; 2) select reads length between 18-26; 3) mapping to mature miRNA downloaded from miRBasev20 using bwa; 4) detect the differential expression between two samples using DESeq. However, I cannot do the first step, is there any mRNA, tRNA, snoRNA database downloaded online and is this step operated in linux or windows? And is this pipeline reasonable or not, do you have any more suggestions?
What I recommend to do:
- Clip adapter sequences
- Map the reads against a complete reference genome and not only against miRBase microRNAs.
- Quantify microRNA expression using e.g.
DESeqto find differentially expressed microRNAs
I don't think you have to filter out reads of length between 18-26. The length fraction you get after clipping should be in that range. The microRNA-Seq protocol should result in this length anyway.
Also reads mapping to rRNAs, mRNAs, etc. should be no problem. They map to these regions... no problem for your analysis.
Important: Allow multiple hits of each read when mapping! there are multiple copies of some microRNAs in the genome and if you do not allow it, the results might be misleading, or wrong.
If you are aligning to a genomic reference, then I don't think you need to filter the results per se so much as you need to annotate them appropriately. In fact, a QC plot showing the distribution of reads by gene type is probably useful.
If your institution has a license, I think you can do all of these steps in Partek.
I haven't personally tried them out, but I think these are open-source alternatives:
You can use RFAM (ftp://ftp.sanger.ac.uk/pub/databases/Rfam/CURRENT) database that contains sequences for non-coding RNAs. It contains sequences for rRNA, snRNA, snoRNA, tRNA etc. It also has a few sequences for miRNA that you would like to remove before your analysis. The Rfam.fasta.gz file contains sequences for all the organisms, so you may have to grep sequences specific to your species.
You can also try miRanalyzer. It maps the reads against Rfam and other ncRNA databases, quantifies known microRNAs and predict new microRNA candidates.
There is an online version: http://bioinfo5.ugr.es/miRanalyzer/miRanalyzer.php
and a stand-alone version: http://bioinfo5.ugr.es/miRanalyzer/standalone.html
Hi everyOne, This conversation is more helpful for me, coz the same question what i have, is discussed here. I have an another doubt, can anyone help me? How can i exclude microRNA sequences which are below 18 and above 30 N. Is there any Tools ?