Read Counts from miRDeep2 for posterior DE analysis
1
1
Entering edit mode
4.6 years ago
Emilio Marmol ▴ 140

Hello everyone, I would appreciate some help with my data:

I've been analysing some NGS miRNA data in order to do Differential Expression profiling.

Using the miRDeep2 program and algorithm, I'm in trouble when dealing with results to make a proper table in order to use it as input for DE software such as EdgeR or DEseq2

Such softwares require an input table in format i-rows containing ids from targets (miRNAs in this case), and j-columns containing read counts from each sample within the groups.

I have four groups with 12 samples each.

When analysing the miRDeep results, I have two types of docs.csv from each four groups to look at:

miRNAs_expressed_all_samples_14_09_2016_t_15_05_38.csv

and

result_14_09_2016_t_15_05_38.csv

and so on for the thre remaining groups.

According to the first doc.csv, there are some miRNAs that can come from different precursors located in different regions of the genome, being the same though, but the output takes both as different miRNAs, but with the same name in two different arrows with different read counts, each corresponding to different precursors.

But, if I add the -W option to the quantifier.pl step in the miRDeep2 analysis, the results turns out to be quite similar, with two different arrows with the same miRNA name both coming from two different precursors, and showing different read counts too, BUT, the value turns out to be aproximately half of the value comparing it with the same result but without the -W option in quantifier.pl step, as -W adds a 0.5 to a read count instead of a 1 if multimapping is detected, as it's the case of equal mature miRNAs coming from different precursors alocated in different regions.

So, in order to do a proper read count table for DE analysis. Should I sum both absolute read count values and take this as final read count? or should I use the divided value (converting to integer) provided by the -W option an sum them to obtain a final read count? In both cases, with -W and without this option, I obtain paired rows with the same mature miRNAs but coming from different precursors, so I have two rows named the same, but in the first case, with a total read count which is double the value if we take the -W option.

Example:

Without -W option:

miRNA_ID        precursor       read_count_sample_1     read_count_sample_2
ssc-let-7       ssc-let-7-1         1450393                1034593
ssc-let-7       ssc-let-7-2          1634574                1200943


With -W option:

miRNA_ID    precursor       read_count_sample_1       read_count_sample_2
ssc-let-7   ssc-let-7-1     654832.23                  570432.21
ssc-let-7   ssc-let-7-2     765342.40                   647823.78


In order to do DE analysis, what kind of matrix should I build? Should I consider both miRNAs as just one, using the summatory value from the -W option and transforming to integer? Should I use the firs without -W values an adding them to form another ssc-let-7(row1) + ssc-let-7(row2) row?

This just happens with some of the microRNAs, not with all of them...

Any help?

mirdeep2 RNA-Seq DE read counts miRNAs • 2.5k views
0
Entering edit mode
3.4 years ago

Hi V82masae,

I am having the same issue,

1) Should I use all_samples.csv or result.csv as read count?

miRNAs_expressed_all_samples_14_09_2016_t_15_05_38.csv

and

result_14_09_2016_t_15_05_38.csv

2) Did you consider both miRNAs or just one?