Question: Anyone Used/Using Pentaho Kettle For Data Integration?
1
gravatar for Blunders
8.0 years ago by
Blunders1.1k
Blunders1.1k wrote:

If so, please answer with a review and how you use it.

If you've used another ETL (extract, transform and load) system, please feel free to post.

Related Links:

database • 3.0k views
ADD COMMENTlink modified 4 months ago by Biostar ♦♦ 20 • written 8.0 years ago by Blunders1.1k
2

Istvan, I disagree. I wish there was more BI (i.e. those kinds of tools and thinking) in bioinformatics. You'd have higher quality science. Most pharma companies use some very high quality data warehouses (sometimes built from open source tools), and the quality of how they function is way better than most academic research

ADD REPLYlink written 8.0 years ago by Mndoci1.2k

I think you will find that the so called "business intelligence" and bioinformatics has little in common.

ADD REPLYlink written 8.0 years ago by Istvan Albert ♦♦ 78k

Istvan Albert: Is Perl the most common method for extracting, transforming and loading data? Not looking to use Pentaho, just Kettle for data integration automation.

ADD REPLYlink written 8.0 years ago by Blunders1.1k

ETL sounds like something more applicable to medical informatics. But I imagine some of the bigger repositories do use something like that.

ADD REPLYlink written 7.9 years ago by Jeremy Leipzig18k
4
gravatar for Richard Smith
7.7 years ago by
Richard Smith400
Cambridge, UK
Richard Smith400 wrote:

I don't know about ETL tools from other domains but this is the process we go through when loading data into InterMine.

InterMine provides a set of scripts for reading from many standard biological data formats. These read XML, flat files or databases and translate data into the InterMine model (based on Sequence Ontology). You can also add your own sources which provide a script and any new classes/fields you want to add to the data model.

You configure the sources you wish to include in your data warehouse and each one is loaded in turn, integration and conflict resolution is all configurable.

ADD COMMENTlink written 7.7 years ago by Richard Smith400

@Richard Smith: +1 Thanks for sharing, excepting your post as the answer since it's based on real world experience. Thanks!

ADD REPLYlink written 7.7 years ago by Blunders1.1k
2
gravatar for Alastair Kerr
8.0 years ago by
Alastair Kerr5.2k
The University of Edinburgh, UK
Alastair Kerr5.2k wrote:

Why use business tools for data integration when there are better alternatives such as biomart, intermine and DAS?

ADD COMMENTlink written 8.0 years ago by Alastair Kerr5.2k

@Alastair Kerr: As far as I'm able to tell none products you linked to have a ETL feature, and only import files/xml -- which is not an ETL. What am I missing?

ADD REPLYlink written 8.0 years ago by Blunders1.1k

@Alastair Kerr: As far as I'm able to tell none of the products you linked to have a ETL feature, but do have biofile/xml import functions -- which is not an ETL. What am I missing?

ADD REPLYlink written 8.0 years ago by Blunders1.1k

Extraction and loading are normally done with wrappers depending on the data source: each project will have associated scripts. The Bio-* projects (e.g. bioruby, biopython, bioperl, biojava) can take this role via their modules, including transforming the data. Depending on the data source, standardisation is often achieved via a reference sequence: often a genome build or a uniprot reference etc.

ADD REPLYlink written 8.0 years ago by Alastair Kerr5.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1291 users visited in the last hour