optimize_pd

hhpy.ds.optimize_pd(df: pandas.core.frame.DataFrame, c_int: bool = True, c_float: bool = True, c_cat: bool = True, cat_frac: bool = 0.5) → pandas.core.frame.DataFrame[source]

optimize memory usage of a pandas df, automatically downcast all var types and converts objects to categories

Parameters:
  • df – pandas DataFrame to be optimized. Other objects are implicitly cast to DataFrame
  • c_int – whether to downcast integers
  • c_float – whether to downcast floats
  • c_cat – whether to cast objects to categories. Uses cat_frac as condition
  • cat_frac – if c_cat is True and the column has less than cat_frac unique values it will be cast to category
Returns:

the optimized pandas DataFrame