Question

Deleted:Help me confirm if my reads are properly normalized and transformed or do I need to re-do

0

Entering edit mode

4 months ago

Faith ▴ 40

Hello, can you help me confirm whether or not my reads dataset were normalized properly prior to my work on them? The data source is 40 florets from Arabidopsis for 3 replicates across 8 stages of development (columns = stages each stage has 3 replicates "8x3"). They were normalized and filtered based on how flowers reads should follow stages of development whether increasing or decreasing.

Research goal: To see if there are genes that regulate each other using WGCNA.

Here is a very short snippet of the steps used to transform and normalize the data. Total was 29K reads, now it's 19K DEGs normalized on log2 scale.

I did heatmaps on subsets of these data and euclidean distance between stages and samples are perfectly ordered, as well as all the data follow sensible decrease/increase across stages of development. The data doesn't have any outliers too.

R function - FilterByExpr - to keeps genes with about 10 read counts or more in a minimum number of samples.
rowSums(cpm>1) >= 8 to keep genes that pass the CPM value of 1 (log-CPM = 0)
Scaled on TMM (trimmed mean of M-values) for masked biological effects.
PCA and boxplots were made to visualize changes after every step.
splines <- ns(as.numeric(as.factor(samplenames$tissue)), df = 3) all genes were filtered based on fitting quadratic regression.
Filtered genes picked on at adjusted.p=0.001 (No normalization) DEG with contrasts calculated based on all possible one to one combinations of all stages of development.
Using limma a contrast for every stage of development with the other for every possible contrast, and was used for the next step and also to get DE genes.
Normalization step -voom(x, design, plot = TRUE), voom normalization (log2) values.

Questions

Are these steps considered sufficient for data transformation and normalization with end goal listed above in mind?
Is Log2 good or should I switch to rlog?
Is it better to repeat the whole steps if the problem with the log scale, or use the new log scale on the current logged data, or unlog the data and then relog on the new scale?
When doing correlation, like Pearson, should I use the given P-values and do FDR on them and use them as a way to filter the correlations or researchers don't do that?

That is all I have and worried about for now. I understand it is quite alot, but it is for my MSc work and I just wanted to confirm that my supervisor did prior to my work on the data is enough for me to continue. I had a discussion with them and they said I can double check with professionals I trust.

Data snippet

Gene_ID A_1 A_2 A_3 B_1 B_2 B_3 C_1 C_2 C_3 D_1

AT1G30580 7.602072096 7.679844728 7.652590863 7.702691747 7.700970816 7.617229362 7.691860453 7.683066902 7.603116476 7.636158315

WGCNA normalization RNA-seq • 397 views

ADD COMMENT • link 4 months ago by Faith ▴ 40