Pandas HDFStore unload dataframe from memory
OK I am experimenting with pandas to load around a 30GB csv file with 40
million+ rows and 150+ columns in to HDFStore. The majority of the columns
are strings, followed by numerical and dates.
I have never really used numpy, pandas or pytables before but have played
around with data frames in R.
I am currently just storing a sample file of around 20000 rows in to
HDFStore. When I try to read the table from HDFStore the table is loaded
to memory and memory usage goes up by ~100MB
f=HDFStore('myfile.h5')
g=f['df']
Then I delete the variable containing the DataFrame:
del g
At the point the memory usage decreases by about 5MB
If I again load the data into g using g=f['df'], the memory usage shoots
up another 100MB
Cleanup only happens when I actually close the window.
The way the data is organized, I am probably going to divide the data into
individual tables with the max table size around 1GB which can fit into
memory and then use it one at a time. However, this approach will not work
if I am not able to clear memory.
Any ideas on how I can achieve this?
No comments:
Post a Comment