Question

Methylation 450k data preprocessing using Python

1

Entering edit mode

7.7 years ago

ciemanek ▴ 140

Hello, I am going to analyze some data from Illumina 450k Methylation array, which are in .idat format. My supervisor wants me to work with Python but it's a new language for me, therefore I'm really lost. I am trying to use rpy2 tool to combine Python and R functions, but it seems to be out of my range. Currently I am working with minfi package and trying to remove some probes, eg.probes for sex chromosomes. In R it is easy job - I am finding indexes of probes which are NOT related to sex chromosomes using command:

keep <- !(featureNames(mSetSqFlt) %in% ann450k$Name[ann450k$chr %in%
                                                      c("chrX","chrY")])

and then I am replacing my old MethylSet with the new one, not containing unwanted probes:

mSet <- mSet[keep,]

But in Python I don't know how to extract only the data of interest from MethylSet. For now I have indexes of probes which I want to keep:

ann450k = minfi.getAnnotation('IlluminaHumanMethylation450kanno.ilmn12.hg19')    
tmp = biobase.featureNames(mSet)

fnames = []
for i in range(len(matches)):
       fnames.append(tmp[i])

names = dollar(ann450k, "Name") #dollar is in dictionary so it's basically chrs=ann450k$Name
chrs = dollar(ann450k, "chr") 

matches = [i for i in range(0,len(names)) if chrs[i]!="chrX" and chrs[i]!="chrY"]

namesNew = []
for i in range(len(matches)):
        namesNew.append(names[matches[i]])

idxs = [i for i, item in enumerate(fnames) if item in namesNew]

If someone could give me a piece of advice on how to process my data, I would be really grateful. Maybe there are some Python tools that would allow me to read and process .idat files? I haven't found any so far. My goal for now is to preprocess them with subset quantile normalization and SWAN algorithms to reconstruct pipeline used by the authors of experiment.

Kind regards, Agata

methylation python rpy2 R preprocessing • 3.2k views

ADD COMMENT • link updated 7.7 years ago by Laurent Gautier ▴ 810 • written 7.7 years ago by ciemanek ▴ 140

0

Entering edit mode

Can't you explain to your supervisor that you have to use the right tool for the right job, and that in this case R is far more convenient? Rpy2 can be a pain and generally not worth the trouble.

ADD REPLY • link 7.7 years ago by WouterDeCoster 47k

score 1 · Answer 1 · 2016-08-19

1

Entering edit mode

7.7 years ago

Laurent Gautier ▴ 810

Rpy2 lets you blend R and Python. You could start by encapsulating snippets of R into functions that you call from Python. For example:

from rpy2.objects import r
keep_this = r("""
function(mSetSqFlt, mSet) {
  keep <- !(featureNames(mSetSqFlt) %in% ann450k$Name[ann450k$chr %in% c("chrX","chrY")])
  mSet <- mSet[keep,]
}""")

keep_this(msetsqfit, mset)

Once you have this working, and this is needed, you can start porting the top-level block in your function to Python with the same approach.

ADD COMMENT • link 7.7 years ago by Laurent Gautier ▴ 810

0

Entering edit mode

This method worked fine for me after a little modification:

from rpy2.robjects import r
keep_this = r("""
function(mSetSqFlt, mSet) {
  keep <- !(featureNames(mSetSqFlt) %in% ann450k$Name[ann450k$chr %in% c("chrX","chrY")])
  mSet <- mSet[keep,]
}""")

keep_this(msetsqfit, mset, ann450k)

r should me imported from rpy2.robjects no rpy2.objects and I had to put ann450k object as function input :)

Once again, thank you very much, your response was extremely helpful.

ADD REPLY • link 7.7 years ago by ciemanek ▴ 140