Question: EPIC array annotation
10 months ago by
bruce.moran600 wrote:

Hi all,

I have EPIC array data analysed, but colleagues are concerned about annotation which is massively incomplete: 56% of probes do not have UCSC 'relation to CpG island' data, and 25% no gene annotated. This seems crazy to me, but is confirmed by Illumina. They say a lot of users go and annotate from public resources.

I presume others have come across this? Any chance anyone has a full(er) annotation? Or even ideas on where to source/create?

Thanks, Bruce.

methylation epic array
10 months ago by
10 months ago by
andrew.j.skelton735.6k wrote:

The annotation that comes with Minfi's workflow is generally pretty complete...but as with anything, it has gaps. The manifest that Minfi uses is here. Check out the annotation there and see if it's as incomplete as Illumina have suggested to you.

ADD COMMENTlink written 10 months ago by andrew.j.skelton735.6k

Thanks Andrew, I checked this out and it uses the 'B2' Illumina annotation, but any 'unannotated' CpGs are designated as 'Open_Sea'. Strange that Illumina just leave the field blank.

For anyone stumbling across this looking for help, I modified this tutorial, the script for which is here (NB this is for hg19).

Ultimately, I used the Illumina B4 annotation, with the extra 977 probes that were removed for chips starting after 201172200001, you can find info on that here.

ADD REPLYlink written 10 months ago by bruce.moran600

I was trying to find out all options about annotating my data as well - I've got TCGA data that is not annotated yet.

I was unsure about the manifest files some packages used, but did not know how to proceed. Your script seems very helpful, but is it solving your problem? I haven't found the time to read through it, and I'm not that proficient in your coding languages, so it'll take me a bit before I get what's going on. In your question, you say you think 25% cpgs without annotated gene seems crazy, but the description from illumina just states that the epic array covers cpg islands, genes and enhancers. So it wouldn't surprise me if there would be CpGs located in intergenomic regions.. or is 25% really that much?

Basically, what I'm asking is; Are you satisfied with the results from your own script, and why?

ADD REPLYlink modified 10 months ago • written 10 months ago by mathias.heydt90

Hi Mathias,

after going through the tutorial I am reasonably happy that the annotation from Illumina is as good as you can get. Unlessm you have specific hypothesis regarding biology of the experiment you are working on, and so need to annotate some features not included in Illumina manifest.

I only really looked into this because Illumina tech support stated to me that other groups were using annotations they had made from external resources, and the Illumina manifest/annotation was 'conservative'. I don't think this is really the case after looking into it. I think the Illumina annotation is about as much info as is available for the fields they include.

Basically: I used the Illumina annotation because it is almost the same as the one from my script for the fields in which we are interested. Also, the people I am working with are happy with that.

ADD REPLYlink written 10 months ago by bruce.moran600
