Models¶
-
class
hhpy.modelling.Models(*args, name: str = None, df: pandas.core.frame.DataFrame = None, X_ref: Union[Sequence[T_co], str] = None, y_ref: Union[Sequence[T_co], str] = None, scaler_X: object = None, scaler_y: object = None, printf: Callable = <function tprint>)[source]¶ Bases:
hhpy.modelling._BaseModelCollection of Models that allow for fitting and predicting with multiple Models at once, comparing accuracy and creating Ensembles
Parameters: - args – multiple Model objects that will form a Models Collection
- name – name of the collection
- df – Pandas DataFrame containing the training and testing data. Can be saved to the Model object or supplied on an as needed basis.
- X_ref – List of features (predictors) used for training the model
- y_ref – List of labels (targets) to be predicted
- scaler_X – Scalar object that implements .transform and .inverse_transform, applied to the features (predictors)before training and inversely after predicting
- scaler_y – Scalar object that implements .transform and .inverse_transform, applied to the labels (targets)before training and inversely after predicting
- printf – print function to use for logging
Methods Summary
fit(fit_type, do_print)fit all Model objects in collection k_split(**kwargs)apply hhpy.ds.k_split to self to create train-test or k-cross ready data model_by_name(name, str])extract a list of Models from the collection by their names predict([X, df, return_type, ensemble, do_print])predict with all models in collection score(return_type, pivot, do_print, …)calculate score of the Model predictions scoreplot([x, y, hue, hue_order, row, …])plot the score(s) using sns.barplot train(df, k, groupby, str] = None, sortby, …)wrapper method that combined k_split, train, predict and score Methods Documentation
-
fit(fit_type: str = 'train_test', do_print: bool = True)[source]¶ fit all Model objects in collection
Parameters: - fit_type – one of [‘train_test’, ‘k_cross’, ‘final’]
- do_print – Whether to print the steps to console
Returns: None
-
k_split(**kwargs)[source]¶ apply hhpy.ds.k_split to self to create train-test or k-cross ready data
Parameters: kwargs – keyword arguments passed to hhpy.ds.k_split Returns: None
-
model_by_name(name: Union[list, str]) → Union[hhpy.modelling.Model, list][source]¶ extract a list of Models from the collection by their names
Parameters: name – name of the Model Returns: list of Models
-
predict(X=None, df=None, return_type='self', ensemble=False, do_print=True)[source]¶ predict with all models in collection
Parameters: - X – The feature (predictor) data used for predicting as DataFrame, np.array or column names
- df – Pandas DataFrame containing the predict data, optional if array like data is passed for X_predict
- return_type – one of [‘y’, ‘df’, ‘DataFrame’, ‘self’]
- ensemble – if True also predict with Ensemble like combinations of models. If True or mean calculatemean of individual predictions. If median calculate median of individual predictions.
- do_print – Whether to print the steps to console
Returns: if return_type is self: None, else see Model.predict
-
score(return_type: str = 'self', pivot: bool = False, do_print: bool = True, display_score: bool = True, **kwargs) → Optional[pandas.core.frame.DataFrame][source]¶ calculate score of the Model predictions
Parameters: - return_type – one of [‘self’, ‘df’, ‘DataFrame’]
- pivot – whether to pivot the DataFrame for easier readability [optional]
- do_print – Whether to print the steps to console
- display_score – Whether to display the score DataFrame
- kwargs – other keyword arguments passed to :func: ~hhpy.ds.df_score
Returns: if return_type is ‘self’: None, else: pandas DataFrame containing the scores
-
scoreplot(x='y_ref', y='value', hue='model', hue_order=None, row='score', row_order=None, palette=['xkcd:blue', 'xkcd:red', 'xkcd:green', 'xkcd:cyan', 'xkcd:magenta', 'xkcd:golden yellow', 'xkcd:dark cyan', 'xkcd:red orange', 'xkcd:dark yellow', 'xkcd:easter green', 'xkcd:baby blue', 'xkcd:light brown', 'xkcd:strong pink', 'xkcd:light navy blue', 'xkcd:deep blue', 'xkcd:deep red', 'xkcd:ultramarine blue', 'xkcd:sea green', 'xkcd:plum', 'xkcd:old pink', 'xkcd:lawn green', 'xkcd:amber', 'xkcd:green blue', 'xkcd:yellow green', 'xkcd:dark mustard', 'xkcd:bright lime', 'xkcd:aquamarine', 'xkcd:very light blue', 'xkcd:light grey blue', 'xkcd:dark sage', 'xkcd:dark peach', 'xkcd:shocking pink'], width=16, height=4.5, scale=None, query=None, return_fig_ax=False, **kwargs) → Optional[tuple][source]¶ plot the score(s) using sns.barplot
Parameters: - x – Name of the x variable in data or vector data
- y – Name of the y variable in data or vector data
- hue – Further split the plot by the levels of this variable [optional]
- hue_order – Either a string describing how the (hue) levels or to be ordered or an explicit list of levels to beused for plotting. Accepted strings are:
sorted: following python standard sorting conventions (alphabetical for string, ascending for value)inv: following sort of python standard sorting conventions but in inverse ordercount: sorted by value countsmean,mean_ascending,mean_descending: sorted by mean value, defaults to descendingmedian,mean_ascending,median_descending: sorted by median value, defaults to descending
Parameters: - row – the variable to wrap around the rows [optional]
- row_order – Either a string describing how the (hue) levels or to be ordered or an explicit list of levels to beused for plotting. Accepted strings are:
sorted: following python standard sorting conventions (alphabetical for string, ascending for value)inv: following sort of python standard sorting conventions but in inverse ordercount: sorted by value countsmean,mean_ascending,mean_descending: sorted by mean value, defaults to descendingmedian,mean_ascending,median_descending: sorted by median value, defaults to descending
Parameters: - palette – Collection of colors to be used for plotting. Can be a dictionary for with names for each level or a list of colors or an individual color name. Must be valid colors known to pyplot [optional]
- width – Width of each individual subplot [optional]
- height – Height of each individual subplot [optional]
- scale – scale the values [optional
- query – query to be passed to pd.DataFrame.query before plotting [optional]
- return_fig_ax – Whether to return the figure and axes objects as tuple to be captured as fig,ax = …, If False pyplot.show() is called and the plot returns None [optional]
- kwargs – other keyword arguments passed to sns.barplot
Returns: see return_fig_ax
-
train(df: pandas.core.frame.DataFrame = None, k: int = 5, groupby: Union[Sequence[T_co], str] = None, sortby: Union[Sequence[T_co], str] = None, random_state: int = None, fit_type: str = 'train_test', ensemble: bool = False, scores: Union[Sequence[T_co], str] = None, scale: float = None, do_predict: bool = True, do_score: bool = True, do_split: bool = True, do_fit: bool = True, do_print: bool = True, display_score: bool = True)[source]¶ wrapper method that combined k_split, train, predict and score
Parameters: - df – Pandas DataFrame containing the training and testing data. Can be saved to the Model object or supplied on an as needed basis.
- k – see hhpy.ds.k_split see hhpy.ds.k_split
- groupby – see hhpy.ds.k_split
- sortby – see hhpy.ds.k_split
- random_state – see hhpy.ds.k_split
- fit_type – see .fit
- ensemble – if True also predict with Ensemble like combinations of models. If True or mean calculatemean of individual predictions. If median calculate median of individual predictions.
- scores – see .score
- scale – see .score
- do_print – Whether to print the steps to console
- display_score – Whether to display the score DataFrame
- do_split – whether to apply k_split [optional]
- do_fit – whether to fit the Models [optional]
- do_predict – whether to add predictions to DataFrame [optional]
- do_score – whether to create self.df_score [optional]
Returns: None