Question: How To Include Biological And Technical Replicates While Using Mirdeep2 For Mirna Prediction?
gravatar for Jordan
7.0 years ago by
Jordan1.2k wrote:


I'm planning to analyze RNA-Seq data of mouse. I would like find miR's in it and I realized that one of the tools for that is miRDeep2. I read the through the documentation but could not find any information regarding how to include biological replicates or technical replicates.

For e.g., to map single fastq file using I would use the following command: -e sampleA.fastq -p mm9 -t sampleAmapping.arf

But how do it for biological replicates? For e.g, sampleA has another biological replicated called sampleB. How would I include this in analysis?

Otherwise, I have mapped all the samples using Tophat2 and the mapping file is in bam format. Is there a way to convert it to .arf format?


mirna • 4.4k views
ADD COMMENTlink modified 5.3 years ago by Biostar ♦♦ 20 • written 7.0 years ago by Jordan1.2k

Alignment Failed. config.txt -d -e -h -i -j -l 18 -m -p Mus_musculus.GRCm38.74 -s reads.fa -t reads_vs_genome.arf -v -u -n

mapping reads to genome index
# reads processed: 4676885
# reads with at least one reported alignment: 8353 (0.18%)
# reads that failed to align: 4666847 (99.79%)
# reads with alignments suppressed due to -m: 1685 (0.04%)
Reported 14385 alignments to 1 output stream(s)
trimming unmapped nts in the 3' ends
Mapping statistics

#desc    total    mapped    unmapped    %mapped    %unmapped
total: 115234760    224420    115010340    0.002    0.998
CN1: 51022439    14690    51007749    0.000    1.000
CN2: 64212321    209730    64002591    0.003    0.997


sequence1.txt CN1
sequence2.txt CN1
sequence3.txt CL1
sequence4.txt CL1

What is possible cause of this low read alignments?

I have tried with -k TCGTATGCCGTCTTCTGCTTGT but the alignment does't improve. Not sure if the adapter sequence is correct.

ADD REPLYlink modified 12 months ago by _r_am32k • written 6.7 years ago by bob0
gravatar for IV
7.0 years ago by
IV1.3k wrote:

In miRDeep you can create a config file containing multiple samples.

For instance you can make a config.txt containing the following lines (filename and 3 letter code):

wt1.fastq WT1
wt2.fastq WT2
wt3.fastq WT3
ko1.fastq KO1
ko2.fastq KO2

and you could call as follows (example): config.txt --d --e --h --i --j --l 17 --m --p genome.index --s reads.fa --t reads vs genome.arf -v --o 4 --q

This is the "official" approach to this and this is definitely the way to go for quantification and for miRNA DE.

You could also make a config.txt only with the WTs and another with the KOs. This depends on your study design actually. I'll paste below the relevant passage from the online help.


PS the relevant text from mirDeep2 doc

The user has sequencing data from different samples e.g. different cell-types. A config.txt file has to be created in which each line designates file locations and a unique 3 letter code. For instance:

sequencing_data_sample1.fa  sd1
sequencing_data_sample2.fa  sd2
sequencing_data_sample3.fa  sd3

The user wishes then to pool these files and use the generated files reads.fa and reads_vs_genome.fa for the miRDeep2 analysis. config.txt -d -c -i -j -l 18 -m -p genome_index -s reads.fa -t reads_vs_genome.arf

Since the reads_vs_genome.arf still contains the 3 letter code for each read mapped to genome the user can then later on dilute the contribution of the different samples to a predicted or known miRNA. It can also be used for example to define 'high confident' predictions if the results are filtered for miRNAs that have sequencing evidence from at least two samples.

ADD COMMENTlink modified 12 months ago by _r_am32k • written 7.0 years ago by IV1.3k

Thanks for the suggestions. Can you explain the reason behind giving it as -b option, which is for qseq.txt format? The files given are fastq format aren't they?

ADD REPLYlink written 7.0 years ago by Jordan1.2k

You're absollutely right.Sorry for that.

I pasted the code from an old file and it was not for fastq data. I pasted a corrected version of the call.

ADD REPLYlink written 7.0 years ago by IV1.3k

my config.txt file looks like this:

QC/TRIM/A1_S1_QC_TRIM.fastq S01 
QC/TRIM/A2_S2_QC_TRIM.fastq S02
QC/TRIM/A3_S3_QC_TRIM.fastq S03

calling mapper like this: config.txt -d -e -h -m -j -v -n -l 18 -p Amel4.5 -s QC/reads3.fa -t QC/rdvdb3.arf

get this error:

No reads file in fasta format given

I've called the same way but with all the files individually and it works fine. QC/TRIM/A1_S1_QC_TRIM.fastq -e -h -m -j -v -n -l 18 -p Amel4.5 -s QC/A1_S1_col_read2.fa -t QC/A1_S1_rdvdb2.arf

Can't understand what is wrong. Any help appreciated.

ADD REPLYlink modified 12 months ago by _r_am32k • written 3.4 years ago by wcjasper0

Solution: The sequence file and the three letter code needs to be tab-separated.

ADD REPLYlink written 3.4 years ago by wcjasper0

Also, in case anyone runs into this like I did, the sequence file and the three letter code needs to be tab-separated.

ADD REPLYlink written 3.4 years ago by wcjasper0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1250 users visited in the last hour