DFMapping¶
-
class
hhpy.ds.
DFMapping
(df: Union[pandas.core.frame.DataFrame, dict, str] = None, **kwargs)[source]¶ Bases:
hhpy.main.BaseClass
Mapping object bound to a pandas DataFrame that standardizes column names and values according to the chosen conventions. Also implements google translation. Can be used like an sklearn scalar object. The mapping can be saved and later used to restore the original shape of the DataFrame. Note that the index is exempt.
Parameters: - name – name of the object [Optional]
- df – a DataFrame to init on or path to a saved DFMapping object [Optional]
- kwargs – other arguments passed to the respective init function
Methods Summary
fit
(*args, **kwargs)Alias for from_df()
to be inline with sklearn conventionsfit_transform
(df, col_names, values, …)First applies DFMapping.from_df()
(which has alias fit) and thenDFMapping.transform()
from_df
(df, col_names, values, columns, …)Initialize the DFMapping from a pandas DataFrame. from_excel
(path)Init the DFMapping object from an excel file. inverse_transform
(*args, **kwargs)wrapper for DFMapping.transform()
with inverse=Trueto_excel
(path, if_exists)Save the DFMapping object as an excel file. transform
(df, col_names, values, columns, …)Apply a mapping created using create_df_mapping()
.Methods Documentation
-
fit
(*args, **kwargs) → Optional[Tuple[dict, dict]][source]¶ Alias for
from_df()
to be inline with sklearn conventionsParameters: - args – passed to from_df
- kwargs – passed to from_df
Returns: see from_df
-
fit_transform
(df: pandas.core.frame.DataFrame, col_names: bool = True, values: bool = True, columns: Optional[List[str]] = None, kwargs_fit: Mapping[KT, VT_co] = None, **kwargs) → Optional[pandas.core.frame.DataFrame][source]¶ First applies
DFMapping.from_df()
(which has alias fit) and thenDFMapping.transform()
Parameters: - df – pandas DataFrame to fit against and then transform.
- col_names – Whether to transform the column names [optional]
- values – Whether to transform the column values [optional]
- columns – Columns to transform, defaults to all columns [optional]
- kwargs – passed to transform
- kwargs_fit – passed to fit
Returns: see transform
-
from_df
(df: pandas.core.frame.DataFrame, col_names: bool = True, values: bool = True, columns: Optional[List[str]] = None, return_type: str = 'self', printf: Callable = <function tprint>, duplicate_limit: int = 10, warn: bool = True, **kwargs) → Optional[Tuple[dict, dict]][source]¶ Initialize the DFMapping from a pandas DataFrame.
Parameters: - df – Pandas DataFrame containing the data, other objects are implicitly cast to DataFrame
- col_names – Whether to transform the column names [optional]
- values – Whether to transform the column values [optional]
- columns – Columns to transform, defaults to all columns [optional]
- return_type – if ‘self’: writes to self, ‘tuple’ returns (col_mapping, value_mapping) [optional]
- printf – The function used for printing in-function messages. Set to None or False to suppress printing [optional]
- duplicate_limit – allowed number of reformated duplicates per column, each duplicate is suffixed with ‘_’ but if you have too many you likely have a column of non allowed character strings and the mapping would take a very long time. The duplicate handling therefore stops and a warning is triggered since the transformation is no longer invertible. Consider excluding the column or using cat codes [optional]
- warn – Whether to show UserWarnings triggered by this function. Set to False to suppress, other warnings will still be triggered [optional]
- kwargs – Other keyword arguments passed to
reformat_string()
[optional]
Returns: see return_type
-
from_excel
(path: str) → None[source]¶ Init the DFMapping object from an excel file. For example you could auto generate a DFMapping using googletrans and then adjust the translations you feel are inappropriate in the excel file. Then regenerate the object from the edited excel file.
Parameters: path – Path to the excel file Returns: None
-
inverse_transform
(*args, **kwargs) → Optional[pandas.core.frame.DataFrame][source]¶ wrapper for
DFMapping.transform()
with inverse=TrueParameters: - args – passed to transform
- kwargs – passed to transform
Returns: see transform
-
to_excel
(path: str, if_exists: str = 'error') → None[source]¶ Save the DFMapping object as an excel file. Useful if you want to edit the results of the automatically generated object to fit your specific needs.
Parameters: - path – Path to save the excel file to
- if_exists – One of %(DFMapping__to_excel__if_exists)s, if ‘error’ raises exception, if ‘replace’ replaces existing files and if ‘append’ appends to file (while checking for duplicates)
Returns: None
-
transform
(df: pandas.core.frame.DataFrame, col_names: bool = True, values: bool = True, columns: Optional[List[str]] = None, inverse: bool = False, inplace: bool = False) → Optional[pandas.core.frame.DataFrame][source]¶ Apply a mapping created using
create_df_mapping()
. Intended to make a DataFrame standardized and human readable. The same mapping can also be applied with inverse=True to restore the original form of the transformed DataFrame.Parameters: - df – Pandas DataFrame containing the data, other objects are implicitly cast to DataFrame
- col_names – Whether to transform the column names [optional]
- values – Whether to transform the column values [optional]
- columns – Columns to transform, defaults to all columns [optional]
- inverse – Whether to apply the mapping in inverse order to restore the original form of the DataFrame [optional]
- inplace – Whether to modify the DataFrame inplace [optional]
Returns: if inplace: None, else: Transformed DataFrame