Question: How To Include Biological And Technical Replicates While Using Mirdeep2 For Mirna Prediction?
2
gravatar for Jordan
5.8 years ago by
Jordan1.1k
Pittsburgh
Jordan1.1k wrote:

Hi,

I'm planning to analyze RNA-Seq data of mouse. I would like find miR's in it and I realized that one of the tools for that is miRDeep2. I read the through the documentation but could not find any information regarding how to include biological replicates or technical replicates.

For e.g., to map single fastq file using mapper.pl I would use the following command:

mapper.pl -e sampleA.fastq -p mm9 -t sampleAmapping.arf

But how do it for biological replicates? For e.g, sampleA has another biological replicated called sampleB. How would I include this in analysis?

Otherwise, I have mapped all the samples using Tophat2 and the mapping file is in bam format. Is there a way to convert it to .arf format?

Thanks!

mirna • 3.8k views
ADD COMMENTlink modified 4.1 years ago by Biostar ♦♦ 20 • written 5.8 years ago by Jordan1.1k

Alignment Failed.

mapper.pl config.txt -d -e -h -i -j -l 18 -m -p Mus_musculus.GRCm38.74 -s reads.fa -t reads_vs_genome.arf -v -u -n

mapping reads to genome index
# reads processed: 4676885
# reads with at least one reported alignment: 8353 (0.18%)
# reads that failed to align: 4666847 (99.79%)
# reads with alignments suppressed due to -m: 1685 (0.04%)
Reported 14385 alignments to 1 output stream(s)
trimming unmapped nts in the 3' ends
Mapping statistics

#desc    total    mapped    unmapped    %mapped    %unmapped
total: 115234760    224420    115010340    0.002    0.998
CN1: 51022439    14690    51007749    0.000    1.000
CN2: 64212321    209730    64002591    0.003    0.997

config.txt:

sequence1.txt CN1
sequence2.txt CN1
sequence3.txt CL1
sequence4.txt CL1

What is possible cause of this low read alignments?

I have tried with -k TCGTATGCCGTCTTCTGCTTGT but the alignment does't improve. Not sure if the adapter sequence is correct.

ADD REPLYlink modified 10 months ago by h.mon27k • written 5.4 years ago by bob0
5
gravatar for IV
5.8 years ago by
IV1.2k
USA
IV1.2k wrote:

In miRDeep you can create a config file containing multiple samples.

For instance you can make a config.txt containing the following lines (filename and 3 letter code):

wt1.fastq WT1

wt2.fastq WT2

wt3.fastq WT3

ko1.fastq KO1

ko2.fastq KO2

and you could call mapper.pl as follows (example):

mapper.pl config.txt --d --e --h --i --j --l 17 --m --p genome.index --s reads.fa --t reads vs genome.arf -v --o 4 --q

This is the "official" approach to this and this is definitely the way to go for quantification and for miRNA DE.

You could also make a config.txt only with the WTs and another with the KOs. This depends on your study design actually. I'll paste below the relevant passage from the online help.

Cheers,

IV

PS the relevant text from mirDeep2 doc

The user has sequencing data from different samples e.g. different cell-types. A config.txt file has to be created in which each line designates file locations and a unique 3 letter code. For instance:

sequencing_data_sample1.fa sd1

sequencing_data_sample2.fa sd2

sequencing_data_sample3.fa sd3

The user wishes then to pool these files and use the generated files reads.fa and reads_vs_genome.fa for the miRDeep2 analysis.

mapper.pl config.txt -d -c -i -j -l 18 -m -p genome_index -s reads.fa -t reads_vs_genome.arf

Since the reads_vs_genome.arf still contains the 3 letter code for each read mapped to genome the user can then later on dilute the contribution of the different samples to a predicted or known miRNA. It can also be used for example to define 'high confident' predictions if the results are filtered for miRNAs that have sequencing evidence from at least two samples.

ADD COMMENTlink modified 5.8 years ago • written 5.8 years ago by IV1.2k

Thanks for the suggestions. Can you explain the reason behind giving it as -b option, which is for qseq.txt format? The files given are fastq format aren't they?

ADD REPLYlink written 5.8 years ago by Jordan1.1k

You're absollutely right.Sorry for that.

I pasted the code from an old file and it was not for fastq data. I pasted a corrected version of the call.

ADD REPLYlink written 5.8 years ago by IV1.2k

Also, in case anyone runs into this like I did, the sequence file and the three letter code needs to be tab-separated.

ADD REPLYlink written 2.2 years ago by wcjasper0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1268 users visited in the last hour