Extra probeID in microarray raw data (CEL) as compared to Series Matrix File(s) & SOFT files?
1
0
Entering edit mode
13 months ago

GEO accession: GSE115167; Platform: GPL15207 [PrimeView] Affymetrix Human Gene Expression Array

#Series Matrix File(s) 
>data_matrix<- getGEO(filename="GSE115167_series_matrix.txt.gz")
>dim(exprs(data_matrix))
[1] 49395    12

#SOFT formatted family file(s)
>data_soft<-getGEO(filename="GSE115167_family.soft.gz")
>probesets <- Table(GPLList(data_soft)[[1]])$ID
>data_soft1<- do.call("cbind", lapply(GSMList(data_soft), function(x) {
 tab <- Table(x)
 mymatch <- match(probesets, tab$ID_REF)
 return(tab$VALUE[mymatch])}))
>data_soft1 <- apply(data_soft1, 2, function(x) {
 as.numeric(as.character(x))})
>rownames(data_soft1) <- probesets
>colnames(data_soft1) <- names(GSMList(data_soft))
>dim(data_soft1)
[1] 49395    12

#Raw data (GSE115167_RAW) .CEL file
>data_raw <- ReadAffy(widget=TRUE)
>data_raw<-rma(data_raw)
>dim(exprs(data_raw))
[1] 49495    12

There are 100 extra probe ID in raw data.Those 100 extra probe ID are not found in the
GPL15207 platform.

 >tail(exprs(data_matrix), 2)
 #shows probe ID
 AFFX-TrpnX-5_at
 AFFX-TrpnX-M_at

>tail(exprs(data_raw),2)
#shows probe ID
ERCC-00172-01_at
ERCC-00176-01_at

ERCC-00172-01_at, ERCC-00176-01_at probe ID are not found in GPL15207 platform. I found there are 100 of them in raw data. Can anyone tell what are those probe IDs?

microarray probeID R • 363 views
ADD COMMENT
1
Entering edit mode
13 months ago
h.mon 32k

ERCC stands for External RNA Controls Consortium, they are spike-ins designed to provide quality controls and / or normalization probes within and between arrays. See Revisiting Global Gene Expression Analysis for a example about its usage.

ADD COMMENT

Login before adding your answer.

Traffic: 1554 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6