Question: Extract common differentially expressed genes (DEGs) of different data sets. (Microarray Data Analysis)
1
gravatar for jeevan92ultimate
4.6 years ago by
India
jeevan92ultimate10 wrote:

Hi, I have 10 microarray data sets (each data set related to a disease) in which I already compiled to get differentially expressed genes (DEGs) in every individual data set. I want to extract common DEGs of all the datasets, is there any tool/R package/R function to do it. 

I have the ProbesetIDs, Genenames, GeneSymbols, Entrez IDs, LogFC values, B values, P values; in all the datasets.

Any help would be appreciated. Thanks !

ADD COMMENTlink modified 4.6 years ago • written 4.6 years ago by jeevan92ultimate10
4
gravatar for David Fredman
4.6 years ago by
David Fredman980
University of Bergen, Norway
David Fredman980 wrote:

If I understand it correctly, you simply want to find the unique gene identifiers (or probe ids) that are differentially expressed in all experiments? One simple way to do that in R would be (here for three sets):

a = c('gene1','gene3','gene5','gene7','gene9')
b = c('gene3','gene6','gene8','gene9','gene10')
c = c('gene2','gene3','gene4','gene5','gene7','gene9')

Reduce(intersect, list(a,b,c))

[1] "gene3" "gene9"

On a side note, if you are requiring a gene to be significantly differentially expressed in all experiments, that is a fairly tough threshold. Since experiments typically do not have the power to detect all genes that are truly differentially expressed, you are likely to miss some in each set due to randomness. You could, alternatively, require that a gene is significantly differentially expressed in some samples, and has a fold change in the same direction (or over some meaningful threshold) in others.

ADD COMMENTlink modified 4.6 years ago • written 4.6 years ago by David Fredman980

True indeed. Hope I would atleast get 25-30 genes in common. 

ADD REPLYlink written 4.6 years ago by jeevan92ultimate10
3
gravatar for Cytosine
4.6 years ago by
Cytosine440
Ljubljana, Slovenia
Cytosine440 wrote:

Trying to do something like this?

gene <- c("a", "b", "c"); expr <- c(2, 2, 3); x <- data.frame(gene, expr)

gene <- c("c", "b", "e"); y <- data.frame(gene, expr)

temp <- merge(x,y,by=match("gene", colnames(x)))

gene <- c("c", "e", "d"); z <-data.frame(gene, expr)

temp <- merge(temp, z, by=match("gene", colnames(temp)))

#...

#repeat for all your dataframes

 

Essentially you're matching the dataframes 1 by 1 on a specific column until you've merged all of them.

In your case you could go matching by e.g. "Genenames".

 

 

ADD COMMENTlink written 4.6 years ago by Cytosine440

Will try it. Thank you.

ADD REPLYlink written 4.6 years ago by jeevan92ultimate10

This is really useful. I can actually extract the values like LogFCs, P values with the genenames using the above method. Didn't try it yet, but it should work for sure. Thank you so much :)

ADD REPLYlink modified 4.6 years ago • written 4.6 years ago by jeevan92ultimate10
0
gravatar for jeevan92ultimate
4.6 years ago by
India
jeevan92ultimate10 wrote:

That's so simple, why didn't I get this :facepalm:

For example, the 4th column of every dataset has the gene entrez number (Il anyway do it with genesymbol & genename). So I'l do it as follows

1D <- as.vector(dataset1[ ,4])
2D <- as.vector(dataset2[ ,4])
.
.
.
10D <- as.vector(dataset10[ ,4]

11D <- Reduce(intersect, list(1D,2D...,10D))

Thank u so much :)

ADD COMMENTlink modified 4.6 years ago • written 4.6 years ago by jeevan92ultimate10
1

you're welcome ;) the functional nature of R is powerful.

upvoting and/or accepting useful answers makes the site more efficient, so is encouraged.

ADD REPLYlink modified 4.6 years ago • written 4.6 years ago by David Fredman980

Sorry for this noob question, stuck at some point for past few days.

I do the following commands.

>dataset1
Probe-ID    Genename    Genesymbol    LogFC

A              ATPoly          ATP                0.2    
B              BTPoly          BTP               -0.5
C              CTPoly          CTP                0.8
D              DTPoly          DTP                0.7
E              ETPoly          ETP               -0.3

>dataset2
Probe-ID    Genename    Genesymbol    LogFC

C               CTPoly         CTP               0.1    
D               DTPoly         DTP              -0.6
E               ETPoly         ETP               0.7
F               FTPoly         FTP                0.9
G              GTPoly         GTP               -0.2

D1 <- as.vector(dataset1[ ,3]) 
D2 <- as.vector(dataset2[ ,3])
AD <- Reduce(intersect, list(D1,D2))

>AD
Genesymbol
CTP
DTP
ETP

By doing the above commands, I can only get back the common genesymbols which are common in dataset1 & dataset2.
I couldn't figure out how to retrieve LogFC values and Genenames with the genesymbols of both the datasets. I need something like this.

Genesymbol    Genename    LogFC-dataset1    LogFC-dataset2
CTP                CTPoly          0.8                      0.1
DTP                DTPoly          0.7                     -0.6
ETP                ETPoly         -0.3                      0.7

I think the LogFC values & Genenames of both dataset1 and dataset2 should be retrieved individually on the basis of 'AD'.
How can I actually do it? I tried the merge function, but couldn't get it. Being a hardcore biologist and beginner in bioinformatics, its a really confusing to get it. 

ADD REPLYlink written 4.6 years ago by jeevan92ultimate10

I got it :) match fn did the job,

final <- match(AD[,1],dataset1[,3],nomatch=NA_integer_,incomparables=NULL)

>final
3 4 5

#above numbers are the rows

>dataset1[c(3,4,5),]

Probe-ID    Genename    Genesymbol    LogFC

C              CTPoly          CTP                0.8
D              DTPoly          DTP                0.7
E              ETPoly          ETP               -0.3

I can do it on every individual dataset and combine everything.
Thanks :)

 

ADD REPLYlink written 4.6 years ago by jeevan92ultimate10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1195 users visited in the last hour