DelimitedReader is a sophisticated reader that explores CSV (Comma-Separated Values) files like a database.
It enables you to set the following conditions:
- From which row to start the reading
- Read which columns
- The valid data pattern for each column, e.g. number, text, non-empty, positive, etc.
- Until which row to stop the reading
You can use regular expression in the above settings.
According to the conditions set, DelimitedReader can either read the valid rows in a CSV file one by one or return all of them together in a dataset.
You can put all your data into one single CSV file, and then use DelimitedReader to fetch the data sections you need. This is very useful when you have relevant fields between two CSV datasets. DelimitedReader helps you find the connection between two datasets and read the relevant data.
The reader is a part of the LeoTask - a lightweight, productive, and reliable MapReduce framework for computational research on a multicore computer.
Here are the source code and usage demo of DelimitedReader.
----Example usage: Get data from related data sets----------
The CSV file dreader.csv
:
id name age
mm1 Leo 25
mm2 Emily 18
id date blood pressure heart rate mood
mm1 01-Apr 100 50 Happy
mm1 05-Apr 120 60 Sad
mm2 01-Apr 80 40
mm2 03-Apr 90 Sad
mm2 05-Aprl 50 Happy
The code to get the clinical data for "Emily" from the CSV file:
DelimitedReader dr = new DelimitedReader("dreader.csv");
dr.prep(null, new String[] { "id", "name" });
dr.setValidRowPattern(new String[] { null, "Emily" });
String[] row = dr.readValidRow();
String id = row[0];
dr.prep(null, new String[] { "id", "date", "mood", "blood pressure", "heart rate" });
dr.setValidRowPattern(new String[] { id });
DataTable dt = dr.readValidDataTable();
The dt is the obtained result data table:
id date mood blood pressure heart rate
mm2 01-Apr 80 40
mm2 03-Apr Sad 90
mm2 05-Aprl Happy 50
Note: If you want, DelimitedReader can provide the data columns in different orders. In the example the "mood" is moved to before the "blood pressure".
Here is the code to print the obtained data table dt:
log(dt.getColNames());
for (int I = 0, mi = dt.nRows(); I < mi; i++) {
log(dt.getRow(i));
}
Here is output:
[id,date,mood,blood pressure,heart rate]
[mm2,01-Apr, ,80,40]
[mm2,03-Apr,Sad,90,]
[mm2,05-Apr,Happy,,50]
For more details, please refer to this more comprehensive demo code.