Question: Differential analysis using cell-line data with replicate information
0
gravatar for newbie
6 weeks ago by
newbie50
newbie50 wrote:

I have a total of 8 samples, 4 controls and 4 Foxcut gene over expressed samples.

The column data for all the 8 samples look like below with replicate and cell-line information:

Samples             TYPE                 Replicate   Cell-lines
Cell1_HA1         Control                  1             1
Cell1_HA2         Control                  2             1
Cell1_foxcut11  FOXCUT_OverExpression      1             1
Cell1_foxcut12  FOXCUT_OverExpression      2             1
Cell2_HA1         Control                  3             2
Cell2_HA2         Control                  4             2
Cell2_foxcut11  FOXCUT_OverExpression      3             2
Cell2_foxcut12  FOXCUT_OverExpression      4             2

I have counts data for all the 8 samples after star alignment. I'm using edgeR package for differential analysis. This is the first time I'm doing differential analysis with cell-line data with replicate information. I'm not aware about how to create design matrix and contrast.matrix for differential analysis between different samples.

I wanted to compare the below samples and do differential analysis:

Cell1_foxcut samples vs Cell1_HA samples
Cell2_foxcut samples vs Cell2_HA samples

Can anyone please help me how to group the samples and how to create design matrix and how to mention coef for differential analysis between different samples.

ADD COMMENTlink modified 6 weeks ago • written 6 weeks ago by newbie50

How does a PCA plot of the whole dataset look like? If your cell lines are considerably different (which is very likely), you are better off performing a separate analysis for each cell line.

ADD REPLYlink modified 6 weeks ago • written 6 weeks ago by Martombo2.4k

Please check this:

### Differential Analysis
library(edgeR)
group <- factor(paste0(coldata$Type))
y <- DGEList(data,group = group)
y$samples 

## Filtering (Based on smallest number of samples among two groups do the filtering)
keep <- rowSums(cpm(y) > 0.5) >= 1
table(keep)
summary(keep)

y <- y[keep, , keep.lib.sizes=FALSE]
y <- calcNormFactors(y,method = "TMM") ##Normalization

# MDS Plot
#The RNA samples can be clustered in two dimensions using multi-dimensional scaling (MDS) plots
pch <- c(0,1,2,15,16,17)
colors <- rep(c("darkgreen", "red", "blue"), 2)
plotMDS(y, col=colors[group], pch=pch[group], labels = colnames(y))
legend("bottomleft", legend=levels(group), pch=pch, col=colors, ncol=2)

The plot looks like this MDS plot

ADD REPLYlink modified 6 weeks ago • written 6 weeks ago by newbie50

Your samples cluster mainly based on the cell line and not the treatment which is what I would expect for cell lines. Therefore, only compare within the same cell line based on the different treatment but not across cell lines as the confounding effect is probably (most likely) too dominant.

ADD REPLYlink modified 6 weeks ago • written 6 weeks ago by ATpoint17k

Yes, differential analysis needs to be done within the same cell-line. I edited my question. Could you please tell me how to give the syntax for group, design matrix and contrasts using edgeR? thanq

ADD REPLYlink written 6 weeks ago by newbie50

@ATpoint Hi, could you please tell me how to create design matrix for the differential analysis within the same cell-line

Do you think the below code is right?

library(edgeR)
group <- factor(paste0(coldata$TYPE))
y <- DGEList(data,group = group)
y$samples 

## Filtering 
keep <- rowSums(cpm(y) > 0.5) >= 1

y <- y[keep, , keep.lib.sizes=FALSE]
y <- calcNormFactors(y,method = "TMM") ##Normalization

## Create design matrix
design2 <- model.matrix(~ 0 + group + coldata$Replicate + coldata$Cell-lines)
ADD REPLYlink modified 6 weeks ago • written 6 weeks ago by newbie50

I would simply make two separate experiments (y) and then use ~ TYPE. As the cell lines are probably quite different from each other, having them in one y might screw up the normalization factors.

ADD REPLYlink written 6 weeks ago by ATpoint17k

May I know how this can be done please. I haven't seen anywhere about this type of analysis, so I'm not at all aware about how to do this.

ADD REPLYlink written 6 weeks ago by newbie50

Instead of importing all 8 samples into R, simply import the first 4 as one object and the second 4 as a second object. Can you show the code that imported the data into R?

ADD REPLYlink written 6 weeks ago by ATpoint17k

Instead of showing in table, I'm showing the counts data for all samples with some genes.

data <- structure(list(Cell1_foxcut12 = c(4L, 8L, 3L, 4L, 7318L, 25317L, 
41L, 0L, 0L, 0L), Cell2_foxcut11 = c(9L, 11L, 2L, 6L, 4959L, 
2621L, 38L, 0L, 0L, 0L), Cell1_foxcut11 = c(0L, 3L, 2L, 0L, 4163L, 
23581L, 33L, 0L, 0L, 0L), Cell2_foxcut12 = c(16L, 13L, 5L, 4L, 
6554L, 3220L, 68L, 12L, 0L, 0L), Cell2_HA1 = c(4L, 17L, 2L, 0L, 
3981L, 2395L, 44L, 0L, 0L, 0L), Cell1_HA1 = c(0L, 9L, 3L, 0L, 
5234L, 25810L, 18L, 0L, 0L, 0L), Cell2_HA2 = c(7L, 11L, 0L, 2L, 
3803L, 2695L, 30L, 0L, 0L, 0L), Cell1_HA2 = c(9L, 9L, 2L, 7L, 
6524L, 25617L, 40L, 0L, 0L, 0L)), row.names = c("5S_rRNA", "7SK", 
"A1BG", "A1BG-AS1", "A1CF", "A2M", "A2M-AS1", "A2ML1", "A2ML1-AS1", 
"A2ML1-AS2"), class = "data.frame")

colnames(data) %in% coldata$Samples
coldata <- coldata[match(colnames(data), coldata$Samples),]
table(coldata$Type)

library(edgeR)
group <- factor(paste0(coldata$TYPE))
y <- DGEList(data,group = group)
y$samples 

## Filtering 
keep <- rowSums(cpm(y) > 0.5) >= 1

y <- y[keep, , keep.lib.sizes=FALSE]
y <- calcNormFactors(y,method = "TMM") ##Normalization

## Create design matrix
design2 <- model.matrix(~ 0 + group + coldata$Replicate + coldata$Cell-lines)

This is the code I used.

ADD REPLYlink written 6 weeks ago by newbie50

@ATpoint Could you please tell me what is wrong in my above code

ADD REPLYlink modified 6 weeks ago • written 6 weeks ago by newbie50
0
gravatar for swbarnes2
6 weeks ago by
swbarnes25.8k
United States
swbarnes25.8k wrote:

Cell1_foxcut11 vs Cell1_HA1

You want to compare one sample to one sample?

Without replicates, you don't really need and can't use sophisticated software. The fancy software takes into account the variance between replicates, but you don't have any.

I don't think you'll be able to do much but look at the very largest differences between your two samples and say "Yeah, those are probably real".

ADD COMMENTlink written 6 weeks ago by swbarnes25.8k

Sorry, my mistake. It should be something like this

Cell1_foxcut samples vs Cell1_HA samples
Cell2_foxcut samples vs Cell2_HA samples
ADD REPLYlink written 6 weeks ago by newbie50

You need to do what everyone else does their first time. Work through tutorial examples.

ADD REPLYlink written 6 weeks ago by swbarnes25.8k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1553 users visited in the last hour