Question

How to Normalize the effect between group in bulk RNAseq?

0

Entering edit mode

4.0 years ago

bioinforesearchquestions ▴ 370

Hi guys,

I have a bulk RNAseq dataset with 24 samples under 4 groups/conditions. I would like

Group/Condition1 : With tumor and treated with a drug (T-D-P1 and T-D-P2)(two populations each with three replicates)
Group/Condition2 : With tumor and no treatment (T-S-P1 and T-S-P2)(two populations each with three replicates)
Group/Condition3 : No tumor but treated with a drug (WT-D-P1 and WT-D-P2)(two conditions each with three replicates)
Group/Condition4 : No tumor and No treatment (WT-P1 and WT-P2)(two conditions each with three replicates)

We are interested in normalizing the effect of drug. I am therefore looking for a way to somehow "subtract" the known gene expression signature of between T-D-P1 and WT-D-P1 and similarly for T-D-P2 and WT-D-P2.

Then I am interested in comparing

the normalized (T-D-P1 minus WT-D-P1) versus T-P1
the normalized (T-D-P2 minus WT-D-P2) versus T-P2
(normalized (T-D-P1 minus WT-D-P1) versus T-P1) versus normalized (T-D-P2 minus WT-D-P2) versus T-P2

RNA-Seq Normalize Drug treatment • 1.5k views

ADD COMMENT • link 4.0 years ago by bioinforesearchquestions ▴ 370

1

Entering edit mode

Please consider validating your past questions by upvoting comments/accepting answers. This is an appropriate way to show appreciation for the help that you receive here.

ADD REPLY • link 4.0 years ago by GenoMax 141k

0

Entering edit mode

First I posted this question and then I checked my previous query. Sorry genomax you people are really great in sparing your valuable time and helping us with our queries.

ADD REPLY • link 4.0 years ago by bioinforesearchquestions ▴ 370

0

Entering edit mode

Unless original posters accept answers it becomes difficult to judge if answers provided actually helped solve the problem (mods can accept answers but we prefer not to do that since that is best done by person who asked the original question). This also gives future visitors to the forum confidence that the answer is correct.

You have many past questions that have answers that fall in this category. Would be great if you could accept valid answers (green check mark) when you find some time.

ADD REPLY • link 4.0 years ago by GenoMax 141k

0

Entering edit mode

Yes, in some of the posts, I didn't accept the answers. I thought it didn't completely address my query. Hereafter in future I will make sure of it. Thanks genomax.

ADD REPLY • link 4.0 years ago by bioinforesearchquestions ▴ 370

0

Entering edit mode

You could add a note to that effect (what was not addressed). Information future visitors would find useful. You don't have to accept the answer if it does not do what you needed to a large extent.

ADD REPLY • link 4.0 years ago by GenoMax 141k

0

Entering edit mode

Sure, I will update myposts.

ADD REPLY • link 4.0 years ago by bioinforesearchquestions ▴ 370

score 1 · Answer 1 · 2020-04-14

1

Entering edit mode

4.0 years ago

ATpoint 81k

Sounds like you need a design of Treatment + Tumor + Treatment:Tumor and then have to use the respective contrasts to get the comparions you want. There is plenty of example code for this in the manuals of e.g. DESeq2 or edgeR. A design without an intercept is probably what you want to have full flexibility over the contrasts.

ADD COMMENT • link 4.0 years ago by ATpoint 81k

0

Entering edit mode

My collaborator gave me a list of 40 genes and asked me to do the comparisons on those genes. But I told him that I will do on the entire gene list and filter those 40 genes. Is there a way to do only on those 40 genes?

For instance, some set of genes are highly expressed in both Tumor-Drug-Condition1-Replicate1-2-3 and in WT-Drug-Condition1-Replicate1-2-3. So he suspects those genes are influenced by the drug. Therefore we need to normalize between the groups before comparison.

ADD REPLY • link 4.0 years ago by bioinforesearchquestions ▴ 370

1

Entering edit mode

You are right, do standard analysis with all genes and then filter for the genes you want in the results table. The tools share information across genes for the internal statistics and normalization, so using only 40 genes would most likely produce non-sense results. Normalization is the first step for DE analysis but this is handled by the default normalization strategies of e.g. DESeq2 and edgeR. What you are referring to are changes in library composition. Here is a nice video that explains how DESeq2 accounts for this during normalization. There is also a video for edgeR normalization on the same Youtube channel.

DESeq2 normalization

ADD REPLY • link 4.0 years ago by ATpoint 81k

0

Entering edit mode

I have made few changes on my original post. As you mentioned I read the DESeq2 manual page 42. I have 2 populations in 4 conditions (tumor_drug,tumor alone, wt_drug, wt) as mentioned above. So I prepared the sample metadata with the below details. Is my sample metadata in the correct format for the design formula (design <- as.formula(~ Treatment + Tumor + Treatment:Tumor)?

I tried the following design

design_Tr_Gen <- as.formula(~ Treatment + Genotype + Treatment:Genotype)

ddsObj_Tr_Gen <- DESeqDataSetFromMatrix(countData = rawData, colData = sampleInfo, design = design_Tr_Gen)

ddsObj_Tr_Gen <- DESeq(ddsObj_Tr_Gen)

res_Tr_Gen <- results(ddsObj_Tr_Gen, alpha=0.05)

resultsNames(ddsObj_Tr_Tu)

[1] "Intercept" "Treatment_Untreated_vs_Treated" "Tumor_Yes_vs_No" "TreatmentUntreated.TumorYes"

Did I executed it correctly?

ADD REPLY • link 4.0 years ago by bioinforesearchquestions ▴ 370

0

Entering edit mode

Hi ATpoint, I have also another kind of analysis.

Total 24 samples (12samples from population1 and 12 samples from population2). I separated the pop1 samples in separate dataframe and similarly the corresponding sampleinfo to a separate dataframe. Then I used following

ADD REPLY • link 4.0 years ago by bioinforesearchquestions ▴ 370

0

Entering edit mode

Your comment is incomplete. You also cross-posted this to SE. This is quite a large experiment you have there and you cannot expect people to plan the statistics for you. I suggest you get in contact with an experienced statistician of bioinformatician at your institute to discuss the analysis in depth. I personally feel like it is too extensive for an online community. At least I personally feel reluctant to really work myself into the experimental details here. What you need are the basics of RNA-seq analysis first. The DESeq2 manual is a good starting point. See if you can get in contact with an experienced person, then make an analysis strategy. I say this in your best interest because you want to trust your results and I know by personal experience that these kinds of analysis have pitfalls if not done properly if you are inexperienced.