Homology + Bioconductor
4
5
Entering edit mode
14.0 years ago
Mike Dewar ★ 1.6k

After a few days (3? seems like longer) of trying to patch Bioconductor's annotationTools package I'm now giving up. The various annotation files I'm trying to use have changed in formatting since annotationTools' authors wrote the code. Whoever assembles the Affy annotation files uses different annotations for different arrays (which I guess is fine and makes sense) but that makes mapping from one array to another via a common symbol sort of tricky.

Anyway, I'm now back at step 1, trying to think about mapping human expression data into the same 'space' as mouse expression data so that a tool I have developed using mouse expression can maybe say something about human genes.

My main question is this: Is there a package in Bioconductor, which isn't annotationTools, which will map a bioconductor expression set in one organism, via the NCBI HomoloGene db, onto a bioconductor expression set in another organism?

Assuming this doesn't exist, is there a 'standard' R-bioconductor way of building a mapping of all the probes (or all the genes) from one array to another? I've seen solutions using something called BioMart, but I'm keen not to leave the comfy world of bioconductor. The package biomaRt seems a bit over the top, as I think all I need is a little interface onto the HomoloGene file.

Ideally, what I would like as an output of this code is the expression level of all the orthologous probes from one array, where the orthology is in terms of, say, human->mouse.

Apologies for the lack of correct vocabulary, I feel I should have asked this question using set-builder notation rather than my suddenly lacking bio-english. A useful answer to this question would also be "your question makes no sense for these reasons" (then fill in the reasons).

r bioconductor homology gene orthologues • 6.2k views
ADD COMMENT
2
Entering edit mode

I've written up the solution to this problem using biomaRt here: http://mikedewar.wordpress.com/2010/05/14/generating-homologues-using-biomart/ -- turns out biomaRt is awesome and easy to use!

ADD REPLY
0
Entering edit mode

I'd like to emphasis Alexandre's comment below, and point out that annotationTools isn't buggy, it just doesn't handle the arrays I've found in front of me! I'm going to edit the post a bit so that I've removed the implication that annotationTools is buggy.

ADD REPLY
7
Entering edit mode
14.0 years ago
Neilfws 49k

I'd also recommend BioMart and/or biomaRt. It's incredibly useful and if you're comfortable in R/Bioconductor, not difficult to learn. Here's a brief introduction.

The basic idea is that you have an object (e.g. a gene), described by some attributes and you query using those attributes to get more attributes. So, for instance, you can get the probes for a gene given its HGNC symbol. And you can also fetch orthologs for that gene - and probes for those orthologs. Then you just link it all together. That could be all be done in R, or you might want to move to a relational database. If you're comfortable using SQL queries, you can stay within R using the sqldf package, which permits SQL-style queries on data frames.

As Andrew says, getting tabular output from other sources into R is not too hard. Basically you're looking at 5 tables/data frames: human genes (symbol, chromosome, strand, start, end), mouse genes (the same), human-mouse gene mapping, human probes (ID, chromosome, strand, start, end) and mouse probes (the same).

ADD COMMENT
0
Entering edit mode

Thanks Neil. This is really helpful. I guess it's time to break out of R for a while!

ADD REPLY
0
Entering edit mode

I should add that not all probesets are in biomart, hence the possibility of requiring tabular files from other sources.

ADD REPLY
5
Entering edit mode
14.0 years ago
Andrew Su 4.9k

If you're willing to believe Affy's annotation pipelines, they post mappings between different species' chip types on their netaffx page. For example, on the U133 Plus 2.0 page, you can find a link to HG-U133_Plus_2 Orthologs/Homologs...

... and BioMart is awesome for identifier translation. Reasonably easy to learn, so IMHO it's well worth the time to learn to use it.

(Sorry, these suggestions do require you to venture out of the R/bioconductor world, but the output of either option above is easily imported into R...)

ADD COMMENT
0
Entering edit mode

Right. I'm quite happy outside bioconductor, it's just I've spent a lot of effort trying to get happy inside bioconductor that any deviation now feels like I'm missing something. However, with your encouragement I will embark on BioMart.

Also, whenever anyone mentions Affy data (I've been in the field 4 months now and this is has been pretty consistent for those 4 months), they always precaveat their comments with things like "if you're willing to believe" - are these annotation files not good? Do better exist?

ADD REPLY
0
Entering edit mode

For a long while, Affy's informatics left a lot to be desired (hence the explosion of all the summarization methods to handle Affy data). So it's fun to needle them based on that history. Having said that, my comment above really reflects the vagaries and assumptions you have to use when annotating sequences, especially orthologs between species. The devil is in the details! But I would happily use those files as an easy starting point...

ADD REPLY
3
Entering edit mode
13.9 years ago

Follow up on annotationTools: There is no bug in annotationTools. Moreover, and in contrast to what was said here, column indices into annotation files are not hardcoded in annotationTools; they can actually be set according to the specific annotation file used. The defaults for column indices are matched to Affymetrix annotation files for 3' Gene Expression arrays (see documentation at http://bioconductor.org/packages/release/bioc/html/annotationTools.html).

However, Affymetrix uses a different annotation file format for Exon and Gene Level arrays (compared to 3' Gene Expression arrays). Mike was interested in mapping from a mouse Gene Level array to a human 3' Gene Expression array. The change in annotation files for Gene Level arrays is such that functions annotationTools do not handle these files properly.

In conclusion, regarding mapping across Affymetrix arrays using Affymetrix annotation files, annotationTools handles 3' Gene expression arrays properly but does not presently handle annotation files for Gene Level array. As a result, annotationTools cannot map across these two formats at the moment.

Alex

ADD COMMENT
0
Entering edit mode

I'd like to emphasise Alexandre's comment here!

ADD REPLY
0
Entering edit mode

I'd like to emphasise Alexandre's comment here. He is referring to my original posting which implied that annotationTools was buggy, whereas instead it was developed to do a slightly different job to the one I was trying to do! You know what they say about the craftsmen who blame their tools? That they post questions on StackOverflow too quickly!

ADD REPLY
2
Entering edit mode
14.0 years ago
Chris Fields ★ 2.2k

There are also BioC InParanoid mappings for Human and Mouse that help with this. Haven't used them myself, yet.

ADD COMMENT

Login before adding your answer.

Traffic: 1957 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6