Skip to content

hildar/RecSys-Retail

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

52 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Two-layer Hybrid Recommender System for retail

About

Two-layer hybrid recommender system for retail. Layer 1 uses an Implicit library for sparse data (KNN and ALS approaches). Level 2 uses a ranking model using the CatBoost (gradient boosting). This gave double the growth compared to the baseline. Evaluated by custom precision metric.

Stack:

  • 1-st layer: Implicit, ItemItemRecommender, ALS, sklearn, pandas, numpy, matplotlib
  • 2-nd layer: CatBoost, LightGBM

Data: from Retail X5 Hero Competition

Steps:

  1. Prepare data: prefiltering
  2. Matching model (initialize MainRecommender 1-st layer model as baseline)
  3. Evaluate Top@k Recall
  4. Ranking model (choose 2-nd layer model)
  5. Feature engineering for ranking

Usage

Please, open train.ipynb Jupiter notebook file and explore how to create Recommender system step-by-step.

Project has next few steps:

1. Prepare data

First is looking at datasets and prefiltering data

data

2. Matching model

Learn first-layer model as baseline. In MainRecommender class we have two base models from implicit lib - ItemItemRecommender and AlternatingLeastSquares:

implicit

ALS used to find similar users, items and als recommendations. ItemItemRecommender used to find own item recommendations among user's purchases.

3. Evaluate Top@k Recall

For first-layer model we have taken Recall@k metric because it is show the proportion of correct answers from real purchases. With this approach we going to significantly cut dataset size for second-layer model.

Here we are evaluating different types of recommendations:

types_recs

And are selecting optimal value of Recall:

recall

4. Ranking model

In that step we are making new X_train dataset with target based on purchases:

target

Here we are choosing classifier from LightGBM and CatBoost, evaluate it by Precision@k at test data. In this step we have not impressive result.

5. Feature engineering for ranking

Adding new features for ranking model based on user, item and paired user-item data.

paired

Controling overfitting for CatBoost and cutting extra estimators:

catboost

Ranking model gave us double the growth compared to the baseline..

As we see the best feature importance is paired user-item features:

catfeature_importance