There is the command %whos but it doesn't show everything, and the size of the data is not easy to read. So I wrote the little piece of code below: it displays all objects that use more than 1MB, and the total. The line starts with the list of names that are used in the notebook for that object.
import numpy as np import pandas as pd def show_mem_usage(): '''Displays memory usage from inspection of global variables in this notebook''' gl = sys._getframe(1).f_globals vars= {} for k,v in list(gl.items()): # for pandas dataframes if hasattr(v, 'memory_usage'): mem = v.memory_usage(deep=True) if not np.isscalar(mem): mem = mem.sum() vars.setdefault(id(v),[mem]).append(k) # work around for a bug elif isinstance(v,pd.Panel): v = v.values vars.setdefault(id(v),[sys.getsizeof(v)]).append(k) total = 0 for k,(value,*names) in vars.items(): if value>1e6: print(names,"%.3fMB"%(value/1e6)) total += value print("%.3fMB"%(total/1e6))
It assumes that the data is stored in numpy arrays or pandas dataframe. If the data is stored in a variable that is not directly a global variable, but is indirectly referenced, then it will not be listed. For many applications in Data Analytics, this is sufficient. Just delete the variables (del XXX) that you no longer need and the garbage collector will recover the memory.
Thanks to that code, I realized that Jupyter stores the result of a cell in a variable named as the cell execution number, prefixed with a _ (e.g _1), and it will keep that variable until you restart the kernel. Therefore, the proper way to display the result of a cell is to print it, not to just write it at the end of the cell code.
I tried using your code, but got the following:
ReplyDeleteTypeError Traceback (most recent call last)
in ()
----> 1 show_mem_usage()
2
3 # clean up the date column to move from 'object' to real date
4
5 # original format is '02.01.2013'
in show_mem_usage()
9 # for pandas dataframes
10 if hasattr(v, 'memory_usage'):
---> 11 mem = v.memory_usage(deep=True)
12 if not np.isscalar(mem):
13 mem = mem.sum()
TypeError: memory_usage() missing 1 required positional argument: 'self'