Laptop Recommendation System

Majid
6 min readApr 28, 2021

Let's quickly build a baseline laptop recommendation web app! We will create a Flask app with Swagger for the front-end and deploy it on heruko. treat this one as an evening follow along to base your next blockbuster start-up on.

Ready!..go...

Oh, and before I forget, you can refer to here for an explanation of recommender systems and their different types and varieties, etc.

Alright, let’s cut to the chase. We need a data set. For now, I found a small data set on Kaggle. I will probably build a full-fledged scraper in a future post.

Check out the data below:

before anything, remove items with null values:

I like the columns with the float data type. They are easy to work with, not too much preprocessing required. For example, you give me a price ballpark you are looking for and I promptly provide a few options based on the column old_price. What about the columns with ‘object’ data type i.e. strings in here? each deserves to be treated differently.

Rating_5max

Let's remove the denominators and just leave the absolute rating. but before that, the data is copied to keep the original intact. That’s just how I do business, you don’t have to:

Disk_Space

This field contains two main segments, i.e., the size of the disk and its type. Firstly, let’s check out the unique disk type values:

now let’s correct the typo with ‘SSD)’ and ‘Flash)’ and separate the disk type in a different column. We should also standardize the unit for disk size values to megabytes:

ok, personally I like SSD better than HDD better than eMMC but I won’t replace the disk_type with personalized scores. I will just treat them as categorical values at the end

Graphics_Card

Let’s check out the unique values in this column too:

There are quite a lot of values here. Simply treating this as a categorical variable results in missing similarities between them as well as the addition of too many dimensions. One simple solution is to map these values to their benchmark score. I scraped the scores from passmark.com

processor_type

Treat processor type the same as Graphics_card. Benchmarks can be found on the same website.

laptop_name

I don’t believe laptop_name is going to have an impact so will just drop it for now.

brand

Maybe laptops from the same brands are more similar. We can treat ‘brand’ as a categorical variable and create dummy columns in its place.

Finally, columns of type object (namely ‘processor_type’ and ‘graphics_card’) are dropped and categorical variables are on-hot encoded:

The difference in the ranges of the values is an important point. For our recommender tool which will be a similarity matrix, the magnanimous values in e.x. old_price columns totally overtake the influence of smaller values in a column such as ratings_5max. One solution would be to weigh columns according to an important yardstick but that's a long story for another day. Fear not though, normalization is our friend:

time to build the similarity matrix. For this app, the goal is to recommend the 3 most similar items to an input item. This similarity is measured via cosine similarity:

This saved “similarity_mtrx” is like a trained model in our app. Next up, let’s crank up the web app itself.

As mentioned at the beginning, this would be a Flask app that uses Swagger for the front-end.

First, import the required packages:

We need “Flask” and “request” to make requests to our app. “numpy” to load the similarity matrix trained above; pandas to extract the similar laptops’ features and specifics to view to the client and “Swagger” obviously for the front-end which is part of the flasgger package.

I assume you are familiar with Flask. To integrate Swagger all you really need to do is to wrap it around the app:

Now take a close look at the predict function:

Skip the code for now and look at the multiline comment at the top of the function. You need to copy this formatting to get the Swagger front-end working.

  1. A little description of the functionality at the top.
  2. 3 dashes below which comes parameter descriptions
  3. For each parameter, 4 items need to be specified (“name”, “in”, “type”, “required”). “name” is used for presentation on the front-end; “in” denotes the input gate; “type” is obviously the type of the parameter and “required” is a boolean denoting whether or not the parameter is required.

Here we only need one input parameter which would be the name of a laptop therefore there is only one set of these items.

After the parameters, you need to specify the type of responses that this module returns. I only return one response of code 200, with a simple description.

Again, take note of the indentations, dash lines, spaces, etc. Miss any of these details and spend an hour debugging a comment in your code.

The rest is pretty self-explanatory I guess. It receives a laptop name and extracts the ids of the 3 most similar laptops in our data, returns a small data frame containing these three items from the database.

The full picture:

Now go to localhost:6006/apidocs/ to view your app.

All that’s remained is containerizing and deploying the app so that everybody from any OS can make requests to it and enjoy the service. We are going to do it on heroku. If you don’t have an account, please go ahead and create one for free.

Go to the Deploy section of your dashboard. I suggest you upload your code to GitHub and select Github like below as your deployment method:

You will be prompted for authorization but once connected, you can select the repository and magic will take of the rest. Your app will be identified and by clicking “Deploy Branch” at the bottom of the page which refers to the branch with the final cut, it will be deployed.

Hope you liked this short tutorial. We‘ve practically done an end-to-end machine learning project. Don’t hesitate to ask questions or point out any errors.

I hope to expand on this soon with a new tutorial. I will probably improve on the recommender with a larger data set or a clustering method and turn this into an API.

until then, happy learning!

--

--