Question: Build the classes to determine which replicate is in control vs. treated condition
1
gravatar for Bioinformatist Newbie
3.7 years ago by
Germany
Bioinformatist Newbie230 wrote:

Hello,

I want to analyze entire Connectivity Map dataset (~120 drugs, ~560 arrays,  two chipsets (HG-U133A and HTHG-U133A)). I am reading the series matrix file available on GEO. As I want the differential expression of group of instances where cell line is same, platform is same, drug and its concentration is same so to build classes to determine which replicate is in which particular condition I have to do something like this:

data <- getGEO('GSE5258')

eset <- data[[2]] # Taking GPL96 array into an expression set

show(pData(phenoData(eset))[1:2,])

                   title geo_accession                status submission_date last_update_date type channel_count source_name_ch1 organism_ch1         characteristics_ch1 characteristics_ch1.1 characteristics_ch1.2
GSM118720 EC2003090503AA     GSM118720 Public on Sep 27 2006     Jul 06 2006      Sep 18 2006  RNA             1     cmap_well:3 Homo sapiens perturbagen: small molecule       type: treatment       name: metformin
GSM118721 EC2003090502AA     GSM118721 Public on Sep 27 2006     Jul 06 2006      Sep 18 2006  RNA             1     cmap_well:2 Homo sapiens perturbagen: small molecule         type: control            name: null
            characteristics_ch1.3 characteristics_ch1.4 characteristics_ch1.5 characteristics_ch1.6 characteristics_ch1.7 molecule_ch1 label_ch1 taxid_ch1                                    description data_processing platform_id
GSM118720 concentration: .00001 M       vehicle: medium   vehicle_final: null         duration: 6 h            cell: MCF7    total RNA    biotin      9606 MCF7 treated with metformin (.00001 M) for 6 h         MAS 5.0       GPL96
GSM118721     concentration: null       vehicle: medium   vehicle_final: null         duration: 6 h            cell: MCF7    total RNA    biotin      9606             MCF7 with vehicle (medium) for 6 h         MAS 5.0       GPL96

h1=as.numeric(pData(eset)["characteristics_ch1.2"]=="name: metformin") # In a logical operator (h1) put 1 where drug = metformin and 0 otherwise and it works fine.

h1

[1] 1 0 1 1 1 0 0 0 0 1 0 0 ...... [346] 0

Now I want to apply multiple conditions: where drug == metformin AND cell line == MCF7

c1=as.numeric(pData(eset)$characteristics_ch1.2=="name: metformin" && characteristics_ch1.7=="cell: MCF7")

Error: object 'characteristics_ch1.7' not found

I am unable to apply multiple conditions here. I am even not sure if the approach I am following will work as well. Kindly share your views about the problem. Thank you.

 

 

ADD COMMENTlink modified 3.7 years ago by Devon Ryan89k • written 3.7 years ago by Bioinformatist Newbie230
1
gravatar for Devon Ryan
3.7 years ago by
Devon Ryan89k
Freiburg, Germany
Devon Ryan89k wrote:
c1=as.numeric(pData(eset)$characteristics_ch1.2=="name: metformin" && pData(eset)$characteristics_ch1.7=="cell: MCF7")

The error message was very informative in this case.

ADD COMMENTlink modified 3.7 years ago • written 3.7 years ago by Devon Ryan89k

I already tried this but then the output is:

 c1=as.numeric(pData(eset)$characteristics_ch1.2=="name: metformin" && pData(eset)$characteristics_ch1.7=="cell: MCF7")

c1
[1] 1
And it should not be the result. 3 samples are there which are treated with metformin and cell line is MCF7. And beside this it should give 0 for every other sample..!

In following case it returns just 'TRUE'

c1=as.numeric(pData(eset)$characteristics_ch1.2=="name: metformin") && (pData(eset)$characteristics_ch1.7=="cell: MCF7")

c1
[1] TRUE

 

ADD REPLYlink modified 3.7 years ago • written 3.7 years ago by Bioinformatist Newbie230
1

What you probably really want to do is something like:

idx <- which(pData(eset)$characteristics_ch1.2=="name: metformin" & pData(eset)$characteristics_ch1.7=="cell: MCF7")

That will then give you the indices for subsetting. Note that & and && are somewhat different in R.

ADD REPLYlink written 3.7 years ago by Devon Ryan89k

Do you think this is the right approach for creating classes for each drug and then using a for loop I will iterate through every one of them.

Another thing is I want to add the condition "Drug concentration is same" which is "characteristics_ch1.3" and values for it are as following:

"concentration: .00001 M"

"concentration: null"

"concentration: .001 M"

How can I apply this criteria that concentration of drug should be same and number of instances should be more than 3 (I mean 3 samples having same drug applied, same concentration of drug used, same cell line and same platform and sample size should be at least 3). Thanks for your help.

ADD REPLYlink written 3.7 years ago by Bioinformatist Newbie230
1

There are multiple ways to go about this, with the most convenient entirely depending on the exact details of what you're doing. Personally, I would just paste() things together into a factor and then run split() on the dataframe accordingly. That's often a convenient way of creating large numbers of subsets according to multiple criteria.

ADD REPLYlink written 3.7 years ago by Devon Ryan89k

As I don't have any R experience so I am encountering difficulty in this. Can you suggest if there is any online tutorial where people are trying to use this approach. How they make contrast matrix and design? I have searched a lot but every time I find just little dataset where you can make contrast matrix and design manually and you don't need to use phenodata.

ADD REPLYlink written 3.7 years ago by Bioinformatist Newbie230

And problem with idx is that I am only getting those indexes:

> idx
[1]  1  3  4  5 10

Whereas I want a complete vector denoting all of the samples and turning 1 for sample where all conditions meet while 0 for every other sample.

ADD REPLYlink written 3.7 years ago by Bioinformatist Newbie230
1

So use as.numeric() instead of which().

ADD REPLYlink written 3.7 years ago by Devon Ryan89k

how to apply following limitations?

"same drug concentration" and "minimum 3 sample meeting the criteria" 

ADD REPLYlink written 3.7 years ago by Bioinformatist Newbie230

I have found a way for 'minimum 3 samples' but I am unable to figure out how I can put a filter of 'same drug concentration' instead of giving a hard coded value. Can you tell me how I can do that. Thanks and sorry for bothering you again.

ADD REPLYlink written 3.7 years ago by Bioinformatist Newbie230

Dear Devon,

Can you shed some light on this problem

ADD REPLYlink written 3.7 years ago by Bioinformatist Newbie230

Please don't solicit responses to your questions in the comments to other peoples' questions.  

ADD REPLYlink written 3.7 years ago by Sean Davis25k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1094 users visited in the last hour