DFMapping

class hhpy.ds.DFMapping(df: Union[pandas.core.frame.DataFrame, dict, str] = None, **kwargs)[source]

Bases: hhpy.main.BaseClass

Mapping object bound to a pandas DataFrame that standardizes column names and values according to the chosen conventions. Also implements google translation. Can be used like an sklearn scalar object. The mapping can be saved and later used to restore the original shape of the DataFrame. Note that the index is exempt.

Parameters:
  • name – name of the object [Optional]
  • df – a DataFrame to init on or path to a saved DFMapping object [Optional]
  • kwargs – other arguments passed to the respective init function

Methods Summary

fit(*args, **kwargs) Alias for from_df() to be inline with sklearn conventions
fit_transform(df, col_names, values, …) First applies DFMapping.from_df() (which has alias fit) and then DFMapping.transform()
from_df(df, col_names, values, columns, …) Initialize the DFMapping from a pandas DataFrame.
from_excel(path) Init the DFMapping object from an excel file.
inverse_transform(*args, **kwargs) wrapper for DFMapping.transform() with inverse=True
to_excel(path, if_exists) Save the DFMapping object as an excel file.
transform(df, col_names, values, columns, …) Apply a mapping created using create_df_mapping().

Methods Documentation

fit(*args, **kwargs) → Optional[Tuple[dict, dict]][source]

Alias for from_df() to be inline with sklearn conventions

Parameters:
  • args – passed to from_df
  • kwargs – passed to from_df
Returns:

see from_df

fit_transform(df: pandas.core.frame.DataFrame, col_names: bool = True, values: bool = True, columns: Optional[List[str]] = None, kwargs_fit: Mapping[KT, VT_co] = None, **kwargs) → Optional[pandas.core.frame.DataFrame][source]

First applies DFMapping.from_df() (which has alias fit) and then DFMapping.transform()

Parameters:
  • df – pandas DataFrame to fit against and then transform.
  • col_names – Whether to transform the column names [optional]
  • values – Whether to transform the column values [optional]
  • columns – Columns to transform, defaults to all columns [optional]
  • kwargs – passed to transform
  • kwargs_fit – passed to fit
Returns:

see transform

from_df(df: pandas.core.frame.DataFrame, col_names: bool = True, values: bool = True, columns: Optional[List[str]] = None, return_type: str = 'self', printf: Callable = <function tprint>, duplicate_limit: int = 10, warn: bool = True, **kwargs) → Optional[Tuple[dict, dict]][source]

Initialize the DFMapping from a pandas DataFrame.

Parameters:
  • df – Pandas DataFrame containing the data, other objects are implicitly cast to DataFrame
  • col_names – Whether to transform the column names [optional]
  • values – Whether to transform the column values [optional]
  • columns – Columns to transform, defaults to all columns [optional]
  • return_type – if ‘self’: writes to self, ‘tuple’ returns (col_mapping, value_mapping) [optional]
  • printf – The function used for printing in-function messages. Set to None or False to suppress printing [optional]
  • duplicate_limit – allowed number of reformated duplicates per column, each duplicate is suffixed with ‘_’ but if you have too many you likely have a column of non allowed character strings and the mapping would take a very long time. The duplicate handling therefore stops and a warning is triggered since the transformation is no longer invertible. Consider excluding the column or using cat codes [optional]
  • warn – Whether to show UserWarnings triggered by this function. Set to False to suppress, other warnings will still be triggered [optional]
  • kwargs – Other keyword arguments passed to reformat_string() [optional]
Returns:

see return_type

from_excel(path: str) → None[source]

Init the DFMapping object from an excel file. For example you could auto generate a DFMapping using googletrans and then adjust the translations you feel are inappropriate in the excel file. Then regenerate the object from the edited excel file.

Parameters:path – Path to the excel file
Returns:None
inverse_transform(*args, **kwargs) → Optional[pandas.core.frame.DataFrame][source]

wrapper for DFMapping.transform() with inverse=True

Parameters:
  • args – passed to transform
  • kwargs – passed to transform
Returns:

see transform

to_excel(path: str, if_exists: str = 'error') → None[source]

Save the DFMapping object as an excel file. Useful if you want to edit the results of the automatically generated object to fit your specific needs.

Parameters:
  • path – Path to save the excel file to
  • if_exists – One of %(DFMapping__to_excel__if_exists)s, if ‘error’ raises exception, if ‘replace’ replaces existing files and if ‘append’ appends to file (while checking for duplicates)
Returns:

None

transform(df: pandas.core.frame.DataFrame, col_names: bool = True, values: bool = True, columns: Optional[List[str]] = None, inverse: bool = False, inplace: bool = False) → Optional[pandas.core.frame.DataFrame][source]

Apply a mapping created using create_df_mapping(). Intended to make a DataFrame standardized and human readable. The same mapping can also be applied with inverse=True to restore the original form of the transformed DataFrame.

Parameters:
  • df – Pandas DataFrame containing the data, other objects are implicitly cast to DataFrame
  • col_names – Whether to transform the column names [optional]
  • values – Whether to transform the column values [optional]
  • columns – Columns to transform, defaults to all columns [optional]
  • inverse – Whether to apply the mapping in inverse order to restore the original form of the DataFrame [optional]
  • inplace – Whether to modify the DataFrame inplace [optional]
Returns:

if inplace: None, else: Transformed DataFrame