Question

p value determination of normalized values

0

Entering edit mode

7.3 years ago

rrsowmya ▴ 20

Hi,

I am trying to find deferentially expression microRNAs from my dataset which has four treatments. My experiment contains 4 treatments with 3 biological reps of each, and I have already normalized my data. Now I am looking to perform multiple comparisions on my select 200 miRNAs and obtain a p-value and adjusted p-value so as to determine deferentially expressed ones. 1. Which statistical test should I be using? - I am not sure if my data is normally distributed and/or has equal variance. 2. Packages like DESeq as for raw reads but that is not useful for me since raw reads has junk sRNA sequences that dilute my pvalue significantly. 3.IS there a simple way to perform this on excel/R/using a formula?

Thank you!

RNA-Seq • 3.4k views

ADD COMMENT • link updated 7.3 years ago by i.sudbery 19k • written 7.3 years ago by rrsowmya ▴ 20

score 2 · Answer 1 · 2016-12-30

Your normalised data is not normally distributed and definitely does not have equal variance.

The best thing to do for miRNA-seq is definitely one of the count based packages (DESeq2/edgeR/voom). What these packages take in is read counts for each miRNA rather than the raw reads per-se. They will not work if you feed them pre-normalised data. The data must be raw counts. Thus you start with a table containing a row for each miRNA and column for each sample. If you are unsure how to go about producing such a table, the Sequence Imp pipeline can perform the necessary steps including, importantly for miRNAs, combining counts from multiple genomic copies of the same miRNA.

If you have reads aligning to genomic features you are not interested in (like sRNA, or unimportant miRNAs), you could exclude thus rows from the table. However, excluding things from the count table can mess up the normalisation, so what I would recommend is performing the DESeq analysis on the whole table, and then subsetting the features you are interested in. To prevent wasting statistical power (i.e. diluting your pvalue), you can easily redo the FDR adjustment on just the rows you are interested in, as long as you don't so much as peek at the pvalues from the rows you are not interested in.