Two-layer Hybrid Recommender System for retail

About

Two-layer hybrid recommender system for retail. Layer 1 uses an Implicit library for sparse data (KNN and ALS approaches). Level 2 uses a ranking model using the CatBoost (gradient boosting). This gave double the growth compared to the baseline. Evaluated by custom precision metric.

Stack:

1-st layer: Implicit, ItemItemRecommender, ALS, sklearn, pandas, numpy, matplotlib
2-nd layer: CatBoost, LightGBM

Data: from Retail X5 Hero Competition

Steps:

Prepare data: prefiltering
Matching model (initialize MainRecommender 1-st layer model as baseline)
Evaluate Top@k Recall
Ranking model (choose 2-nd layer model)
Feature engineering for ranking

Usage

Please, open train.ipynb Jupiter notebook file and explore how to create Recommender system step-by-step.

Project has next few steps:

1. Prepare data

First is looking at datasets and prefiltering data

2. Matching model

Learn first-layer model as baseline. In MainRecommender class we have two base models from implicit lib - ItemItemRecommender and AlternatingLeastSquares:

ALS used to find similar users, items and als recommendations. ItemItemRecommender used to find own item recommendations among user's purchases.

3. Evaluate Top@k Recall

For first-layer model we have taken Recall@k metric because it is show the proportion of correct answers from real purchases. With this approach we going to significantly cut dataset size for second-layer model.

Here we are evaluating different types of recommendations:

And are selecting optimal value of Recall:

4. Ranking model

In that step we are making new X_train dataset with target based on purchases:

Here we are choosing classifier from LightGBM and CatBoost, evaluate it by Precision@k at test data. In this step we have not impressive result.

5. Feature engineering for ranking

Adding new features for ranking model based on user, item and paired user-item data.

Controling overfitting for CatBoost and cutting extra estimators:

Ranking model gave us double the growth compared to the baseline..

As we see the best feature importance is paired user-item features:

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
img		img
src		src
.gitignore		.gitignore
README.md		README.md
train.ipynb		train.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

img

img

src

src

.gitignore

.gitignore

README.md

README.md

train.ipynb

train.ipynb

Repository files navigation

Two-layer Hybrid Recommender System for retail

About