optimize_pd

hhpy.ds.optimize_pd(df: pandas.core.frame.DataFrame, c_int: bool = True, c_float: bool = True, c_cat: bool = True, cat_frac: float = 0.5, convert_dtypes: bool = True, drop_all_na_cols: bool = False) → pandas.core.frame.DataFrame[source]

optimize memory usage of a pandas df, automatically downcast all var types and converts objects to categories

Parameters:
  • df – pandas DataFrame to be optimized. Other objects are implicitly cast to DataFrame
  • c_int – Whether to downcast integers [optional]
  • c_float – Whether to downcast floats [optional]
  • c_cat – Whether to cast objects to categories. Uses cat_frac as condition [optional]
  • cat_frac – If c_cat: If the column has less than cat_frac percent unique values it will be cast to category [optional]
  • convert_dtypes – Whether to call convert dtypes (pandas 1.0.0+) [optional]
  • drop_all_na_cols – Whether to drop columns that contain only missing values [optional]
Returns:

the optimized pandas DataFrame