Question: Qc Of Illumina450K Data From Geo With Lumi
2
gravatar for sea.array
5.6 years ago by
sea.array20
sea.array20 wrote:

Dear all,

I am planning to work with several methylation Illumina 450K datasets from the GEO Expression Omnibus database. I want to use any R package available to do normalization and other QC steps as well as to remove batch effects. I’ve tried using lumi, methylumi and minfi.

The problem is that I am getting errors when reading the files available at GEO with lumi/methylumi/minfi. If I understand it OK, the infile for these packages is the outfile of GenomeStudio (the Final Report). However, this file or original idat files are not in GEO.

The files in GEO Database are: 1- one matrix with beta values for all individuals (series matrix) 2- one file with methylated and unmethylated probe signal intensities (in some cases p-values too) 3- RAW data containing: manifest_header_descriptions, csv, bpm files

My questions are: 1- How can I convert the files in GEO to generate the input file for lumi/methylumi/minfi to do QC steps? Any preferences for packages? 2- In case I have to parse the input file myself, where can I found find a template of GenomeStudio outfile (including COLOR_CHANNEL column)? 3- How can I combine different GEO datasets to perform joint QC assessment?

Thank your for your help!

geo normalization methylation • 3.4k views
ADD COMMENTlink modified 5.2 years ago by Charles Warden6.6k • written 5.6 years ago by sea.array20
0
gravatar for Charles Warden
5.2 years ago by
Charles Warden6.6k
Duarte, CA
Charles Warden6.6k wrote:

I've only tested those programs using .idat files as inputs. ArrayExpress sometimes provides the .idat files (and I think you can also get the .idat files for TCGA samples, which won't be in GEO), but this will be an issue with getting data from GEO.

COHCAP can provide QC stats from the FinalReport file and you could specify the batches as pairing IDs to correct for batch effects for the statistical analysis (as a 2-way ANOVA, for example), but it doesn't really do other types of normalization (although, in my opinion, I think the background correction and other normalization techniques within Genome Studio should be sufficient).

You can either run COHCAP as a standalone program or as a Bioconductor package:

http://sourceforge.net/projects/cohcap/

http://bioconductor.org/packages/devel/bioc/html/COHCAP.html

I have a protocol exchange listing specifically for using COHCAP to analyze 450k data using the Bioconductor package:

http://www.nature.com/protocolexchange/protocols/2965

The only difference is that you'll want to skip the .idat processing instructions and using the FinalReport.txt file in place of the "minfi.txt" file. I believe this should provide the instructions on how to get the FinalReport.txt file in the right format (except you should not export the detection p-values for the Bioconductor package; that should only be used for the standalone version): https://docs.google.com/uc?id=0B1xpw6_kQMKuVm1kS3V6dlJBbmc&export=download&revid=0B1xpw6_kQMKuQ25yS0RDQ3JlYWNKTEk0THRGZmxQQjVGTU40PQ

The protocol exchange listing also provides some templates and simple benchmarks for some other tools (that can define differentially methylated regions). I feel like there probably should be some way to import your FinalReport.txt file into minfi, but I can't give you specific instructions right now. You can also see if RnBeads accepts the GenomeStudio output as an acceptable input format (if you aren't satisfied with COHCAP).

ADD COMMENTlink modified 5.2 years ago • written 5.2 years ago by Charles Warden6.6k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1886 users visited in the last hour