Question: Methylation 450k data preprocessing using Python
1
gravatar for ciemanek
4.1 years ago by
ciemanek140
The Netherlands/Amsterdam
ciemanek140 wrote:

Hello, I am going to analyze some data from Illumina 450k Methylation array, which are in .idat format. My supervisor wants me to work with Python but it's a new language for me, therefore I'm really lost. I am trying to use rpy2 tool to combine Python and R functions, but it seems to be out of my range. Currently I am working with minfi package and trying to remove some probes, eg.probes for sex chromosomes. In R it is easy job - I am finding indexes of probes which are NOT related to sex chromosomes using command:

keep <- !(featureNames(mSetSqFlt) %in% ann450k$Name[ann450k$chr %in%
                                                      c("chrX","chrY")])

and then I am replacing my old MethylSet with the new one, not containing unwanted probes:

mSet <- mSet[keep,]

But in Python I don't know how to extract only the data of interest from MethylSet. For now I have indexes of probes which I want to keep:

ann450k = minfi.getAnnotation('IlluminaHumanMethylation450kanno.ilmn12.hg19')    
tmp = biobase.featureNames(mSet)

fnames = []
for i in range(len(matches)):
       fnames.append(tmp[i])

names = dollar(ann450k, "Name") #dollar is in dictionary so it's basically chrs=ann450k$Name
chrs = dollar(ann450k, "chr") 

matches = [i for i in range(0,len(names)) if chrs[i]!="chrX" and chrs[i]!="chrY"]

namesNew = []
for i in range(len(matches)):
        namesNew.append(names[matches[i]])

idxs = [i for i, item in enumerate(fnames) if item in namesNew]

If someone could give me a piece of advice on how to process my data, I would be really grateful. Maybe there are some Python tools that would allow me to read and process .idat files? I haven't found any so far. My goal for now is to preprocess them with subset quantile normalization and SWAN algorithms to reconstruct pipeline used by the authors of experiment.

Kind regards, Agata

ADD COMMENTlink modified 4.1 years ago by Laurent Gautier810 • written 4.1 years ago by ciemanek140

Can't you explain to your supervisor that you have to use the right tool for the right job, and that in this case R is far more convenient? Rpy2 can be a pain and generally not worth the trouble.

ADD REPLYlink written 4.1 years ago by WouterDeCoster44k
1
gravatar for Laurent Gautier
4.1 years ago by
Laurent Gautier810 wrote:

Rpy2 lets you blend R and Python. You could start by encapsulating snippets of R into functions that you call from Python. For example:

from rpy2.objects import r
keep_this = r("""
function(mSetSqFlt, mSet) {
  keep <- !(featureNames(mSetSqFlt) %in% ann450k$Name[ann450k$chr %in% c("chrX","chrY")])
  mSet <- mSet[keep,]
}""")

keep_this(msetsqfit, mset)

Once you have this working, and this is needed, you can start porting the top-level block in your function to Python with the same approach.

ADD COMMENTlink written 4.1 years ago by Laurent Gautier810

This method worked fine for me after a little modification:

from rpy2.robjects import r
keep_this = r("""
function(mSetSqFlt, mSet) {
  keep <- !(featureNames(mSetSqFlt) %in% ann450k$Name[ann450k$chr %in% c("chrX","chrY")])
  mSet <- mSet[keep,]
}""")

keep_this(msetsqfit, mset, ann450k)

r should me imported from rpy2.robjects no rpy2.objects and I had to put ann450k object as function input :)

Once again, thank you very much, your response was extremely helpful.

ADD REPLYlink written 4.1 years ago by ciemanek140
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 875 users visited in the last hour