Making An Expression Matrix
2
1
Entering edit mode
11.2 years ago
moranr ▴ 290

Hi,

I have 5 data series(GSE). 4 from GEO database, 2 of which have the raw CEL files. I want to get all of this information into a single data matrix for analysis, using the RAW data where possible. I am very new to this whole area and it is proving difficult. Can anyone offer any help on this?

I can get a series into an expression set when using series matrix files via gsexxxx= GEOquery("GSExxxx", GSEMAtrix=TRUE)- i think this is correct ?! I think I can also get all cel files , normalise and into an expression set using ReadAffy function and gcRMA?

Any Help much appreciated, sorry if this question doesnt even make sense, as I said, I'm very new to it all!

Thanks, Ray

r bioconductor microarray • 4.9k views
ADD COMMENT
0
Entering edit mode

Are your datasets all from the same platform? That will affect if/how you combine them.

ADD REPLY
0
Entering edit mode

Yes the same platform is being use as I think it will give a powerful output

ADD REPLY
0
Entering edit mode

If you have 5 GSE series and 4 are from GEO, where does the fifth come from? So far as I know, all GSE come from GEO.

ADD REPLY
0
Entering edit mode

Oh sorry the 5th is not a GSE, it is just similar, it comes from CA express, as a downloadable files with supplementary files.

ADD REPLY
4
Entering edit mode
11.2 years ago

It sounds like you understand the details of getting data from GEO and taking .CEL files to an ExpressionSet. Where things are going to get complicated is in getting "all of this information into a single data matrix for analysis". Doing so may not be the best approach, but it is impossible to know without a good deal more background, a level of background that is not easily communicated in a forum or email. Since you say you are relatively new to the whole area, I suggest you find a local bioinformatics collaborator who can work through the data with you.

ADD COMMENT
0
Entering edit mode

Hi,

I'm stock in getting all the information into a single data matrix. Would you please help me how I can do that?

Thanks.

ADD REPLY
0
Entering edit mode

Perhaps you could ask a new question and give the details of where you are getting stuck.

ADD REPLY
3
Entering edit mode
11.2 years ago

If it was me, I would try hard to get CEL files for all of them. If not in GEO you might try requesting directly from the author. I've had about 50% success with this in the past. Then with all CEL files, you can use affy, gcrma, and custom cdf to create one consistently summarized and normalized dataset mapped to gene symbols (Retrieving Probe To Gene Ids For Affymetrix Chips In Bioconductor). The latter would allow you to compare with any datasets at gene level where you don't have raw CEL files. Whether you are able to process all together or you try to combine differently processed data after the fact I would be VERY aware of the potential for batch effects.

ADD COMMENT

Login before adding your answer.

Traffic: 2917 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6