Entering edit mode
9.6 years ago
mohinimhetre
•
0
I want to save large scientific data (e.g Time Series) to a file for the future analysis. I have files of size in multiple GB. Previously I was storing data in Sqlite but because of space problem I moved to HDF5. By using HDF5 I am able to reduce file size and writing speed but reading speed is increased as compare to Sqlite. So, I want to know that can I improve reading performance. Now I am storing data in hdf5 file using struct. (I consider struct as table schema) in the form of chunks.
Here is my code to store & retrieve data
Struct Signal
{
public int TimeOffset;
public float Value;
}
FunctToWriteDataToHdf5()
{
long[] dim = new long[1];
Signal[] data = new Signal[1000000];
long[] chunkSize = new long[1];
//add sample data
for (int i = 0; i < 1000000; i++)
{
data[i].TimeOffset = i + 1;
data[i].Value= i * 10 + 1;
}
dim[0] = 1000000;
//Create file
fileId = H5F.create("hyperslab.h5", H5F.CreateMode.ACC_TRUNC);
//calculate size of Signal struct
int size = System.Runtime.InteropServices.Marshal.SizeOf(typeof(Signal));
//create new group
H5GroupId gid = H5G.create(fileId, "Signals");
//create type of dataset...in this case it is compoud...we need to add struct members
H5DataTypeId tid = H5T.create(H5T.CreateClass.COMPOUND, size);
H5T.insert(tid, "TimeOffset", 0, new H5DataTypeId(H5T.H5Type.NATIVE_INT));
H5T.insert(tid, "Value", 4, new H5DataTypeId(H5T.H5Type.NATIVE_FLOAT));
//create space for dataset storage
H5DataSpaceId filesid = H5S.create_simple(1, dim,);
//define property list of dataset to set chunksize
H5PropertyListId plist = H5P.create(H5P.PropertyListClass.DATASET_CREATE);
//Set chunk size
chunkSize[0] = 512;
H5P.setChunk(plist, chunkSize);
//if asked for comprsion
if (m_cbCompress.IsChecked == true)
{
int level = 1;
//take compression level from the user
if (m_txtCompressLevel.Text != "")
level = Convert.ToInt32(m_txtCompressLevel.Text);
H5P.setDeflate(plist, level);
}
//create new dataset
H5DataSetId ds = H5D.create(gid, "SignalsDS", tid, filesid,
new H5PropertyListId(H5P.Template.DEFAULT),
plist,
new H5PropertyListId(H5P.Template.DEFAULT));
//Write signal Data
H5D.write(ds, tid, new H5Array<Signal>(data));
//close all the resources
H5D.close(ds);
H5S.close(filesid);
H5G.close(gid);
H5F.close(fileId);
}
FunctToReadDataFromHdf5()
{
Signal[] data;
long[] dim = new long[1];
long[] offset = new long[1];
long[] count = new long[1];
fileId = H5F.open("hyperslab.h5", H5F.OpenMode.ACC_RDONLY);
offset[0] = 0;
count[0] = 1000000;
H5GroupId gid = H5G.open(fileId, "Signals");
H5DataSetId ds = H5D.open(gid, "SignalsDS");
H5DataTypeId tid = H5D.getType(ds);
H5DataSpaceId filesid = H5D.getSpace(ds);
dim = H5S.getSimpleExtentDims(filesid);
data = new Signal[dim[0]];
{
H5D.read(ds, tid, new H5Array<Signal>(data));
}
H5D.close(ds);
H5S.close(filesid);
H5G.close(gid);
H5F.close(fileId);
}
Performance analysis HDF5 Vs Sqlite
For 10,00,000 records, only one (Time Offset & Value)
File Size Writing Speed Reading Speed
SQLITE 28 MB 15260 ms 0 ms
chunk size 512 7 MB 78.25 ms 46 ms
chunk size 1024 7 MB 27 ms 31.25 ms
chunk size 2048 7 MB 20.33 ms 27 ms
chunk size 4096 7 MB 20.33 ms 15.265 ms
chunk size 8192 7 MB 15.625 ms 15.625 ms
chunk size 512
Compression lev 1 3 MB 655.6 ms 171.87 ms
chunk size 1024
Compression lev 1 3 MB 390.625 ms 125 ms
chunk size 2048
Compression lev 1 3 MB 406.25 ms 109.37 ms
chunk size 4096
Compression lev 1 3 MB 390.625 ms 109.37 ms