Question: Read Counts from miRDeep2 for posterior DE analysis
1
gravatar for v82masae
2.9 years ago by
v82masae140
v82masae140 wrote:

Hello everyone, I would appreciate some help with my data:

I've been analysing some NGS miRNA data in order to do Differential Expression profiling.

Using the miRDeep2 program and algorithm, I'm in trouble when dealing with results to make a proper table in order to use it as input for DE software such as EdgeR or DEseq2

Such softwares require an input table in format i-rows containing ids from targets (miRNAs in this case), and j-columns containing read counts from each sample within the groups.

I have four groups with 12 samples each.

When analysing the miRDeep results, I have two types of docs.csv from each four groups to look at:

miRNAs_expressed_all_samples_14_09_2016_t_15_05_38.csv

and

result_14_09_2016_t_15_05_38.csv

and so on for the thre remaining groups.

According to the first doc.csv, there are some miRNAs that can come from different precursors located in different regions of the genome, being the same though, but the output takes both as different miRNAs, but with the same name in two different arrows with different read counts, each corresponding to different precursors.

But, if I add the -W option to the quantifier.pl step in the miRDeep2 analysis, the results turns out to be quite similar, with two different arrows with the same miRNA name both coming from two different precursors, and showing different read counts too, BUT, the value turns out to be aproximately half of the value comparing it with the same result but without the -W option in quantifier.pl step, as -W adds a 0.5 to a read count instead of a 1 if multimapping is detected, as it's the case of equal mature miRNAs coming from different precursors alocated in different regions.

So, in order to do a proper read count table for DE analysis. Should I sum both absolute read count values and take this as final read count? or should I use the divided value (converting to integer) provided by the -W option an sum them to obtain a final read count? In both cases, with -W and without this option, I obtain paired rows with the same mature miRNAs but coming from different precursors, so I have two rows named the same, but in the first case, with a total read count which is double the value if we take the -W option.

Example:

Without -W option:

miRNA_ID        precursor       read_count_sample_1     read_count_sample_2 
ssc-let-7       ssc-let-7-1         1450393                1034593 
ssc-let-7       ssc-let-7-2          1634574                1200943

With -W option:

miRNA_ID    precursor       read_count_sample_1       read_count_sample_2
ssc-let-7   ssc-let-7-1     654832.23                  570432.21
ssc-let-7   ssc-let-7-2     765342.40                   647823.78

In order to do DE analysis, what kind of matrix should I build? Should I consider both miRNAs as just one, using the summatory value from the -W option and transforming to integer? Should I use the firs without -W values an adding them to form another ssc-let-7(row1) + ssc-let-7(row2) row?

This just happens with some of the microRNAs, not with all of them...

Any help?

ADD COMMENTlink modified 21 months ago by bioinforesearchquestions250 • written 2.9 years ago by v82masae140
0
gravatar for bioinforesearchquestions
21 months ago by
United States
bioinforesearchquestions250 wrote:

Hi V82masae,

I am having the same issue,

1) Should I use all_samples.csv or result.csv as read count?

miRNAs_expressed_all_samples_14_09_2016_t_15_05_38.csv

and

result_14_09_2016_t_15_05_38.csv

2) Did you consider both miRNAs or just one?

ADD COMMENTlink written 21 months ago by bioinforesearchquestions250
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1657 users visited in the last hour