Question: p value determination of normalized values
0
gravatar for rrsowmya
3.9 years ago by
rrsowmya20
United States
rrsowmya20 wrote:

Hi,

I am trying to find deferentially expression microRNAs from my dataset which has four treatments. My experiment contains 4 treatments with 3 biological reps of each, and I have already normalized my data. Now I am looking to perform multiple comparisions on my select 200 miRNAs and obtain a p-value and adjusted p-value so as to determine deferentially expressed ones. 1. Which statistical test should I be using? - I am not sure if my data is normally distributed and/or has equal variance. 2. Packages like DESeq as for raw reads but that is not useful for me since raw reads has junk sRNA sequences that dilute my pvalue significantly. 3.IS there a simple way to perform this on excel/R/using a formula?

Thank you!

rna-seq • 1.7k views
ADD COMMENTlink modified 3.9 years ago by i.sudbery9.7k • written 3.9 years ago by rrsowmya20
2
gravatar for i.sudbery
3.9 years ago by
i.sudbery9.7k
Sheffield, UK
i.sudbery9.7k wrote:

Your normalised data is not normally distributed and definitely does not have equal variance.

The best thing to do for miRNA-seq is definitely one of the count based packages (DESeq2/edgeR/voom). What these packages take in is read counts for each miRNA rather than the raw reads per-se. They will not work if you feed them pre-normalised data. The data must be raw counts. Thus you start with a table containing a row for each miRNA and column for each sample. If you are unsure how to go about producing such a table, the Sequence Imp pipeline can perform the necessary steps including, importantly for miRNAs, combining counts from multiple genomic copies of the same miRNA.

If you have reads aligning to genomic features you are not interested in (like sRNA, or unimportant miRNAs), you could exclude thus rows from the table. However, excluding things from the count table can mess up the normalisation, so what I would recommend is performing the DESeq analysis on the whole table, and then subsetting the features you are interested in. To prevent wasting statistical power (i.e. diluting your pvalue), you can easily redo the FDR adjustment on just the rows you are interested in, as long as you don't so much as peek at the pvalues from the rows you are not interested in.

ADD COMMENTlink written 3.9 years ago by i.sudbery9.7k

Yes, I suppose I could use the p values estimated by DESeq2 and then do a for correction on the subset of sRNA that are miRNAs. Thanks for the suggestion!

ADD REPLYlink written 3.9 years ago by rrsowmya20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1007 users visited in the last hour