Entering edit mode
4.6 years ago
Paul Paodumai
▴
10
GEO accession: GSE115167; Platform: GPL15207 [PrimeView] Affymetrix Human Gene Expression Array
#Series Matrix File(s)
>data_matrix<- getGEO(filename="GSE115167_series_matrix.txt.gz")
>dim(exprs(data_matrix))
[1] 49395 12
#SOFT formatted family file(s)
>data_soft<-getGEO(filename="GSE115167_family.soft.gz")
>probesets <- Table(GPLList(data_soft)[[1]])$ID
>data_soft1<- do.call("cbind", lapply(GSMList(data_soft), function(x) {
tab <- Table(x)
mymatch <- match(probesets, tab$ID_REF)
return(tab$VALUE[mymatch])}))
>data_soft1 <- apply(data_soft1, 2, function(x) {
as.numeric(as.character(x))})
>rownames(data_soft1) <- probesets
>colnames(data_soft1) <- names(GSMList(data_soft))
>dim(data_soft1)
[1] 49395 12
#Raw data (GSE115167_RAW) .CEL file
>data_raw <- ReadAffy(widget=TRUE)
>data_raw<-rma(data_raw)
>dim(exprs(data_raw))
[1] 49495 12
There are 100 extra probe ID in raw data.Those 100 extra probe ID are not found in the
GPL15207 platform.
>tail(exprs(data_matrix), 2)
#shows probe ID
AFFX-TrpnX-5_at
AFFX-TrpnX-M_at
>tail(exprs(data_raw),2)
#shows probe ID
ERCC-00172-01_at
ERCC-00176-01_at
ERCC-00172-01_at, ERCC-00176-01_at probe ID are not found in GPL15207 platform. I found there are 100 of them in raw data. Can anyone tell what are those probe IDs?
Cross-posted: https://bioinformatics.stackexchange.com/questions/11619/why-there-is-extra-probeid-in-microarry-raw-data-cel-as-compare-to-data-in-ser