python - compressing data with HDFStore -
I'm a newbie for pietsables and have a question about storing compressed panda dataframe. My current code is:
  import pandas # HDF5 File name H5name = "C: \\ MyDir \\ MyHDF.h5" # Make HDF5 file store # pandas.io.pytables.HDFStore (H5name) # myDF.to_hdf (H5name, "myDFname" writes a Panda DataFrame to the created HDF5 file, attached = true) # Panda DataFrame HDF5 file from myDF1 = pandas.io.pytables.read_hdf (H5name, "myDFname created from Read back ") # Close the file store. Close ()   When I checked the size of HDF 5, the size (212 KB) was much larger than the original CSV file (58kb)
HDFStore (H5name, complevel = 1)
 and the file size was not changed. I tried to do all  completions  from 1 to 9 and the size still remained the same. 
I tried to create
  # HDF 5 file store - pandas.io.pytables.HDFStore (H5name, complevel = 1, complib = "zlib")   But there was no change in compression.  
 What could be the problem?  
 In addition, ideally I would like to use a compression which R does for my saving function (like in my case 58kb file was saved in RDATA at 27kb size)? Do I need to do an additional numbering in Python to reduce the size?  
  Edit:    I am using Python 3.3.3 and Panda 0.13.1  
  Edit Do:  I tried a large file with a 487 MB CSV file, whose RData is  size  (via the RS saved function) is 16 9 MB, for larger files, I have to compress See I Bzip2 gave the best compression of 202MB (level = 9) and the slowest to read / write. Blosc Compression (level = 9) gave the largest size of 276MB, but was too fast to write / read 
  Not sure that R does it differently in its  save.  function, but it is both equally faster and more compressed than any of these compression algos. 
 
If you actually have a small file, HDF 5 is basically a part of your data; Normally 64 KB is the minimum size size. Depedening on what data is, then it may not even be sec in that size.
 You can try the  msgpack  for a simple soln for this size data. HDF 5 is quite efficient for large size and will be well-contained. 
Comments
Post a Comment