python - compressing data with HDFStore -


I'm a newbie for pietsables and have a question about storing compressed panda dataframe. My current code is:

  import pandas # HDF5 File name H5name = "C: \\ MyDir \\ MyHDF.h5" # Make HDF5 file store # pandas.io.pytables.HDFStore (H5name) # myDF.to_hdf (H5name, "myDFname" writes a Panda DataFrame to the created HDF5 file, attached = true) # Panda DataFrame HDF5 file from myDF1 = pandas.io.pytables.read_hdf (H5name, "myDFname created from Read back ") # Close the file store. Close ()  

When I checked the size of HDF 5, the size (212 KB) was much larger than the original CSV file (58kb)

HDFStore (H5name, complevel = 1)

and the file size was not changed. I tried to do all completions from 1 to 9 and the size still remained the same.

I tried to create

  # HDF 5 file store - pandas.io.pytables.HDFStore (H5name, complevel = 1, complib = "zlib")  

But there was no change in compression.

What could be the problem?

In addition, ideally I would like to use a compression which R does for my saving function (like in my case 58kb file was saved in RDATA at 27kb size)? Do I need to do an additional numbering in Python to reduce the size?

Edit:

I am using Python 3.3.3 and Panda 0.13.1

Edit Do: I tried a large file with a 487 MB CSV file, whose RData is size (via the RS saved function) is 16 9 MB, for larger files, I have to compress See I Bzip2 gave the best compression of 202MB (level = 9) and the slowest to read / write. Blosc Compression (level = 9) gave the largest size of 276MB, but was too fast to write / read

Not sure that R does it differently in its save. function, but it is both equally faster and more compressed than any of these compression algos.

If you actually have a small file, HDF 5 is basically a part of your data; Normally 64 KB is the minimum size size. Depedening on what data is, then it may not even be sec in that size.

You can try the msgpack for a simple soln for this size data. HDF 5 is quite efficient for large size and will be well-contained.


Comments

Popular posts from this blog

ios - How do I use CFArrayRef in Swift? -

eclipse plugin - Run java code error: Workspace is closed -

c - Error on building source code in VC 6 -