DelimitedReader is a sophisticated reader that explores CSV (Comma-Separated Values) files like a database.
It enables you to set the following conditions:
- From which row to start the reading
- Read which columns
- The valid data pattern for each column, e.g. number, text, non-empty, positive, etc.
- Until which row to stop the reading
You can use regular expression in the above settings.
According to the conditions set, DelimitedReader can either read the valid rows in a CSV file one by one or return all of them together in a dataset.
You can put all your data into one single CSV file, and then use DelimitedReader to fetch the data sections you need. This is very useful when you have relevant fields between two CSV datasets. DelimitedReader helps you find the connection between two datasets and read the relevant data.
The reader is a part of the LeoTask - a lightweight, productive, and reliable MapReduce framework for computational research on a multicore computer.
Here are the source code and usage demo of the DelimitedReader.
----Example usage: Get data from related data sets----------
The CSV file "dreader.csv":
id | name | age |
mm1 | Leo | 25 |
mm2 | Emily | 18 |
id | date | blood pressure | heart rate | mood |
mm1 | 01-Apr | 100 | 50 | Happy |
mm1 | 05-Apr | 120 | 60 | Sad |
mm2 | 01-Apr | 80 | 40 | |
mm2 | 03-Apr | 90 | Sad | |
mm2 | 05-Aprl | 50 | Happy |
The code to get the clinical data for "Emily" from the CSV file:
DelimitedReader dr = new DelimitedReader("dreader.csv"); dr.prep(null, new String[] { "id", "name" }); dr.setValidRowPattern(new String[] { null, "Emily" }); String[] row = dr.readValidRow(); String id = row[0]; dr.prep(null, new String[] { "id", "date", "mood", "blood pressure", "heart rate" }); dr.setValidRowPattern(new String[] { id }); DataTable dt = dr.readValidDataTable();
The dt is the obtained result data table:
id | date | mood | blood pressure | heart rate |
mm2 | 01-Apr | 80 | 40 | |
mm2 | 03-Apr | Sad | 90 | |
mm2 | 05-Aprl | Happy | 50 |
Note: If you want, DelimitedReader can provide the data columns in different orders. In the example the "mood" is moved to before the "blood pressure".
Here is the code to print the obtained data table dt:
log(dt.getColNames()); for (int i = 0, mi = dt.nRows(); i < mi; i++) { log(dt.getRow(i)); }
Here is output:
[id,date,mood,blood pressure,heart rate] [mm2,01-Apr, ,80,40] [mm2,03-Apr,Sad,90,] [mm2,05-Apr,Happy,,50]
--------------------------------------------
For more details, please refer to this more comprehensive demo code.