I am attempting to create a script/pipeline which will conduct a form of pathway analysis. The overall idea involves automating a way of extracting all of the genes from a given biological pathway from any one of a few big online resources (KEGG, Reactome and Wikipathways), this is fairly challenging on its own as each resource is curated differently so the genes in one resource will differ slightly from in the others.
The real challenge, however, is the next stage of the pipeline which would involve defining a specific "end point" of the pathway. While obviously this isn't exactly the way things work, as I don't think a great many pathways will have a specific point where they just stop, the idea is still that there will be some final gene/protein (or even metabolite) which is the "end" of that pathway.
Doing this manually is easy enough of course, just looking at the online maps to see which genes are at the ends of the pathways, however automating it currently seems impossible. Specifically with relation to the resources I mentioned, when you extract genes from a pathway you lose all sense or order and hierarchy of them.
Ideally, in my mind, I'd see it being an ordered list of genes/transcripts with genes further down the pathway lower on the list.
Does anyone know of any method of doing this? Any online resource or package (either for R or maybe Bash/Python) which might allow the ordering of genes from a pathway?