python - compressing data with HDFStore -

June 15, 2014

I'm a newbie for pietsables and have a question about storing compressed panda dataframe. My current code is:

  import pandas # HDF5 File name H5name = "C: \\ MyDir \\ MyHDF.h5" # Make HDF5 file store # pandas.io.pytables.HDFStore (H5name) # myDF.to_hdf (H5name, "myDFname" writes a Panda DataFrame to the created HDF5 file, attached = true) # Panda DataFrame HDF5 file from myDF1 = pandas.io.pytables.read_hdf (H5name, "myDFname created from Read back ") # Close the file store. Close ()

When I checked the size of HDF 5, the size (212 KB) was much larger than the original CSV file (58kb)

HDFStore (H5name, complevel = 1)

and the file size was not changed. I tried to do all completions from 1 to 9 and the size still remained the same.

I tried to create

  # HDF 5 file store - pandas.io.pytables.HDFStore (H5name, complevel = 1, complib = "zlib")   But there was no change in compression.  
 What could be the problem?  
 In addition, ideally I would like to use a compression which R does for my saving function (like in my case 58kb file was saved in RDATA at 27kb size)? Do I need to do an additional numbering in Python to reduce the size?  
  Edit:   
 I am using Python 3.3.3 and Panda 0.13.1  
  Edit Do:  I tried a large file with a 487 MB CSV file, whose RData is  size  (via the RS saved function) is 16 9 MB, for larger files, I have to compress See I Bzip2 gave the best compression of 202MB (level = 9) and the slowest to read / write. Blosc Compression (level = 9) gave the largest size of 276MB, but was too fast to write / read 
  Not sure that R does it differently in its  save.  function, but it is both equally faster and more compressed than any of these compression algos.

  
  If you actually have a small file, HDF 5 is basically a part of your data; Normally 64 KB is the minimum size size. Depedening on what data is, then it may not even be sec in that size. 
  You can try the  msgpack  for a simple soln for this size data. HDF 5 is quite efficient for large size and will be well-contained.




















Get link





Facebook





X





Pinterest





Email





Other Apps




Comments





Post a Comment



Popular posts from this blog




c# - Reactive Extensions ControlScheduler -






January 15, 2013








    Well, I'm using Reactive Extension Event Handler to handle my app events, I run the headlines on the Yuri thread. I also use ControlSubular for However, recently I am getting cross thread exception despite using control scheduler and I do not know what the problem is.   Code:    Observable.FromEventPattern & Lt; String & gt; (CC, "UiAlertMessage", New Control Session ()) Subscribe (_ = & gt; {Alert Control. (Show this language, title, _.EventArgs.UppercaseFirst ());});    The new control scheduler (this) is not  to run the code on the UI control thread, so I do not get cross threading exceptions?       you should do    Observable.FromEventPattern & lt; String & gt; (CC, "UiAlertMessage"). AboveOption (this). Subscribe (_ = & gt; {Alert Control. (Show this language, Titles, _.EventArgs.UppercaseFirst ());});    This is the standard way of sending a scheduler related to specific controls. Passing control scheduler as you have got  subsc...





Read more





scala - Play Framework - how to bind form to a session field -






March 15, 2011








    Is there any way, I can get some parameters from the header, cookies (log in userId in my case) , And then apply it in a form which I know who will deposit the ticket?   SupportForm  supportForm: form [supportTicket] = form (mapping ("question" -> text, "priority" -> text) (apply support ticket. (HelpText.update)    What are the good practices here? What is the call to apply the request, when I can use it (and also a good practice?)   Edit An issue, absolutely deceiving anyone if I were to create a hidden area with this value. It could Ript, but the issue may be re-used in any way to verify and return the form, it can not be sure how it ....      





Read more





c++ - Why does Visual Studio Release build break on non-executing code
line -






January 15, 2013








    When I created my C ++ program with the default release configuration in Visual Studio 10, I put a breakpoint on one line Who should never be reached. What confunded me was that when the program is running then it stopped at the breaking point.   Adding a volatile statement and transferring this line to the break point does not break the execution of the code where the brake appears (though does not produce the same behavior as creating a minimum project with this function).   Why is the "Release Debugger" (?) Step on this line and when executing breakpoints are executed? If I keep going after the break then false statement continues to return code execution.    bool bounding box :: Cutting (int32_t xpos, int32_t ypos, int32_t dx, int32_t dy) const {int32_t intersectionX; Int32_t intersections; If ((dx> 0 & amp; amp; and & amp; xpos> rights) || (dx  gt; Top) || (D & LT; 0 & amp; AOPO & Lt; BOT)) Return; If (DX! = 0) {intersection Y = ypos + (left...





Read more

Search This Blog

LAva

python - compressing data with HDFStore -

Comments

Post a Comment

Popular posts from this blog

c# - Reactive Extensions ControlScheduler -

scala - Play Framework - how to bind form to a session field -

c++ - Why does Visual Studio Release build break on non-executing code line -