How To Get Genotype/Variation Information From Geo Gse Data Set In R
1
2
Entering edit mode
11.1 years ago
fm271 ▴ 20

I am using 'GEOquery' package to get GSE data. How can I fetch the "genotype/variation" column.

gset <- getGEO('GSE25205')[[1]]
sampleTitles = as.character(pData(gset)$characteristics_ch1.1)

> sampleTitles
 [1] "genotype/variation: Caspase 1 null" "genotype/variation: Caspase 1 null" "genotype/variation: Caspase 1 null"
 [4] "genotype/variation: wild-type"      "genotype/variation: wild-type"      "genotype/variation: wild-type"     
 [7] "genotype/variation: wild-type"      "genotype/variation: ASC1 null"      "genotype/variation: ASC1 null"     
[10] "genotype/variation: ASC1 null"     

mystrsplit = function(x) strsplit(x,split=': ', fixed=TRUE)[[1]][2] 
sampleTitles = unlist(lapply(sampleTitles, mystrsplit))

Works fine, but I see that the column name for "genotype/variation" is not fixed (The name may be 'characteristics_ch1' or 'characteristics_ch1.2' and so on.. ) and in some cases (with difft GSEs) the column name is different. Is there any other way to fetch the the above information.

EDIT:

I have the following solution, but if there is some direct function, I would be happy to know that.

library(gdata)
pdata = phenoData(gset)@data
cols = which(startsWith(colnames(pdata), "characteristics_ch"))
#Find which columns corr to the column which contain "genotype/variation:"
for (l in 1:length(cols)) 
  {
  cvals = pdata[,cols[l]]
  if(any(startsWith(cvals, "genotype/variation:")))
      genoVariCol = cols[l]
  }
genoVariCol #contains the
geo r bioconductor • 3.4k views
ADD COMMENT
0
Entering edit mode

Hello,

Can you guide me how I can extract the required sample by applying multiple conditions?

c1=as.numeric(pData(eset)$characteristics_ch1.2=="name: metformin")

After this step c1 contains "1" for right samples, but I want to apply an additional constraint which is that cell line should be MCF7, hence I tried this:

c1=as.numeric(pData(eset)$characteristics_ch1.2=="name: metformin") && (pData(eset)$characteristics_ch1.7=="cell: MCF7")

But it is not working..! Can you guide me how I can do that?

ADD REPLY
0
Entering edit mode

I'd suggest writing a new question with the details, including code and error messages.

ADD REPLY
0
Entering edit mode

I have posted a question, kindly share your views about it.

ADD REPLY
0
Entering edit mode
11.1 years ago

You are correct that NCBI GEO does not have a formal naming convention for columns or a convention for providing computable content in those columns. Your approach of writing code to "discover" information in the files is a good way to go. I'll just note that the GEOmetadb package can be useful for accessing GEO metadata quickly in bulk, but when we load the data into GEOmetadb, we do not attempt to change it or add semantic meaning to it, so you'll have essentially the same challenges.

ADD COMMENT
0
Entering edit mode

Dear Sean Davis,

Can you reply against my above comment? I think "fm271" is not using Biostar anymore, I was wondering if you can guide me in that context.

ADD REPLY

Login before adding your answer.

Traffic: 2639 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6