Processing gene expression data
2
0
Entering edit mode
4.5 years ago
Natasha ▴ 40

This is a follow up to my previous question.

I would like to implement the following steps given in the supplementary file of this study to reproduce the figure 1 displayed in the paper.

We used Affymetrix microarray data from a recent thorough analysis of the mouse and human transcriptomes [1]. We selected all 54 adult mouse non-cancer samples. The raw intensity data were transformed to normalized expression levels with the robust multi-array average (RMA) lowlevel algorithm [2] implemented in the BioConductor package [3]. We used standard settings, including perfect match (PM) only, model-based background and quantile normalization across experiments [4]. Similar results were obtained using the microarray analysis suite (MAS5) function followed by log-transformation to calculate expression levels (data not shown).

Mouse data is available on GEO with access ion number GSE1133. The data is available in different formats like CDF, CIF, GIN, PSI, SIF, PROBE, TAB, TXT. I am not sure which data format, containing the raw intensity data, has to be downloaded for implementing the procedure described above.

gene-expression • 1.2k views
ADD COMMENT
1
Entering edit mode
4.5 years ago
ATpoint 81k

It is the CEL files under GSE1133 at the bottom of the page. Under GSE1133_RAW.tar press Custom to make a selection for the samples you need. CEL files store the raw intensity values that can be processed with standard software for normalization. Please search around on how to read and handle CEL files.

ADD COMMENT
1
Entering edit mode
4.5 years ago
c.chakraborty ▴ 170

Isn't there access to raw .CEL files for you to work on? Plus which paper, could you please share the link or doi.! I checked and there are .CEL files available for microarray analysis. If you want to analyse microarray data for gene expression analysis using, you should use the .CEL files. They are TARzipped in the supplementary files section.

ADD COMMENT
0
Entering edit mode

Many thanks for the response. Yes, the raw CEL files are available here. The figure 1 that I want to reproduce is available in this article. (Please find the link here) . Description of how the figure was created can be found in the supplementary. Also, figure one has been created using the data available from this study (Please find the link here).

In total 438 GSM files are listed . I am not sure how to distinguish Human and Mouse samples( I think this can be filtered using the platform id) ; cancerous and normal samples. Any suggestion on which package has to be used for RMA normalization illustrated here will be really helpful.

ADD REPLY
0
Entering edit mode

I think everything prefixed MGM is mouse, and the rest 1B/ 3A is human. Simply click the GSM... links, it will tell you the organism. Check if this pattern I suggested above holds true for the majority of the samples.

ADD REPLY
0
Entering edit mode

Thank you. It is mentioned that GPL1073 GNF1M platform is for Mouse (GSM18584 to 18705) GPL1074 GNF1H is for Human. (18706 to 18863)

However, I couldn't find the platform id in the CEL files .

ADD REPLY
0
Entering edit mode

Go to the supplementary GSE1133_RAW.tar. Click on custom and it will lead you to all the .CEL files in this dataset. You can download whichever you need for your analysis.

ADD REPLY
0
Entering edit mode

Thank you. I am trying to normalize using the following code

library(affy) %IN bioconductor package
Data <- ReadAffy() % reads all .CEL files
eset <- rma(Data) % RMA normalization

Is this right? I am trying to normalize all samples(i.e GSM18584 to 18705) together

ADD REPLY

Login before adding your answer.

Traffic: 1477 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6