Anyone Used/Using Pentaho Kettle For Data Integration?
2
1
Entering edit mode
11.7 years ago
Blunders ★ 1.1k

If so, please answer with a review and how you use it.

If you've used another ETL (extract, transform and load) system, please feel free to post.

Related Links:

database • 3.8k views
ADD COMMENT
2
Entering edit mode

Istvan, I disagree. I wish there was more BI (i.e. those kinds of tools and thinking) in bioinformatics. You'd have higher quality science. Most pharma companies use some very high quality data warehouses (sometimes built from open source tools), and the quality of how they function is way better than most academic research

ADD REPLY
0
Entering edit mode

I think you will find that the so called "business intelligence" and bioinformatics has little in common.

ADD REPLY
0
Entering edit mode

Istvan Albert: Is Perl the most common method for extracting, transforming and loading data? Not looking to use Pentaho, just Kettle for data integration automation.

ADD REPLY
0
Entering edit mode

ETL sounds like something more applicable to medical informatics. But I imagine some of the bigger repositories do use something like that.

ADD REPLY
4
Entering edit mode
11.4 years ago
Richard Smith ▴ 400

I don't know about ETL tools from other domains but this is the process we go through when loading data into InterMine.

InterMine provides a set of scripts for reading from many standard biological data formats. These read XML, flat files or databases and translate data into the InterMine model (based on Sequence Ontology). You can also add your own sources which provide a script and any new classes/fields you want to add to the data model.

You configure the sources you wish to include in your data warehouse and each one is loaded in turn, integration and conflict resolution is all configurable.

ADD COMMENT
0
Entering edit mode

@Richard Smith: +1 Thanks for sharing, excepting your post as the answer since it's based on real world experience. Thanks!

ADD REPLY
3
Entering edit mode
11.7 years ago

Why use business tools for data integration when there are better alternatives such as biomart, intermine and DAS?

ADD COMMENT
0
Entering edit mode

@Alastair Kerr: As far as I'm able to tell none products you linked to have a ETL feature, and only import files/xml -- which is not an ETL. What am I missing?

ADD REPLY
0
Entering edit mode

@Alastair Kerr: As far as I'm able to tell none of the products you linked to have a ETL feature, but do have biofile/xml import functions -- which is not an ETL. What am I missing?

ADD REPLY
0
Entering edit mode

Extraction and loading are normally done with wrappers depending on the data source: each project will have associated scripts. The Bio-* projects (e.g. bioruby, biopython, bioperl, biojava) can take this role via their modules, including transforming the data. Depending on the data source, standardisation is often achieved via a reference sequence: often a genome build or a uniprot reference etc.

ADD REPLY

Login before adding your answer.

Traffic: 1965 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6