From f6d2141a480dd6b5b8ee0e48d43bb64773232791 Mon Sep 17 00:00:00 2001 From: Navan Chauhan Date: Tue, 26 Mar 2024 23:38:14 -0600 Subject: add header ids --- .../2022-05-21-Similar-Movies-Recommender.html | 34 +++++++++++----------- 1 file changed, 17 insertions(+), 17 deletions(-) (limited to 'docs/posts/2022-05-21-Similar-Movies-Recommender.html') diff --git a/docs/posts/2022-05-21-Similar-Movies-Recommender.html b/docs/posts/2022-05-21-Similar-Movies-Recommender.html index c7f3b3a..717513f 100644 --- a/docs/posts/2022-05-21-Similar-Movies-Recommender.html +++ b/docs/posts/2022-05-21-Similar-Movies-Recommender.html @@ -6,13 +6,13 @@ - Building a Similar Movies Recommendation System + id="building-a-similar-movies-recommendation-system">Building a Similar Movies Recommendation System - - + Building a Similar Movies Recommendation System" /> + Building a Similar Movies Recommendation System" /> @@ -44,21 +44,21 @@
-

Building a Similar Movies Recommendation System

+

Building a Similar Movies Recommendation System

-

Why?

+

Why?

I recently came across a movie/tv-show recommender, couchmoney.tv. I loved it. I decided that I wanted to build something similar, so I could tinker with it as much as I wanted.

I also wanted a recommendation system I could use via a REST API. Although I have not included that part in this post, I did eventually create it.

-

How?

+

How?

By measuring the cosine of the angle between two vectors, you can get a value in the range [0,1] with 0 meaning no similarity. Now, if we find a way to represent information about movies as a vector, we can use cosine similarity as a metric to find similar movies.

As we are recommending just based on the content of the movies, this is called a content based recommendation system.

-

Data Collection

+

Data Collection

Trakt exposes a nice API to search for movies/tv-shows. To access the API, you first need to get an API key (the Trakt ID you get when you create a new application).

@@ -140,7 +140,7 @@

In the end, I could have dropped the embeddings field from the table schema as I never got around to using it.

-

Scripting Time

+

Scripting Time

from database import *
@@ -243,7 +243,7 @@
 
 

Running this script took me approximately 3 hours, and resulted in an SQLite database of 141.5 MB

-

Embeddings!

+

Embeddings!

I did not want to put my poor Mac through the estimated 23 hours it would have taken to embed the sentences. I decided to use Google Colab instead.

@@ -308,7 +308,7 @@ As of writing this post, I did not include any other database except Trakt.

That's it!

-

Interacting with Vectors

+

Interacting with Vectors

We use the trakt_id for the movie as the ID for the vectors and upsert it into the index.

@@ -359,7 +359,7 @@ It is possible that this additional step of mapping could be avoided by storing
-

Testing it Out

+

Testing it Out

movie_name = "Now You See Me"
@@ -405,21 +405,21 @@ Spies (2015): A secret agent must perform a heist without time on his side
 
 

For now, I am happy with the recommendations.

-

Simple UI

+

Simple UI

The code for the flask app can be found on GitHub: navanchauhan/FlixRec or on my Gitea instance

I quickly whipped up a simple Flask App to deal with problems of multiple movies sharing the title, and typos in the search query.

-

Home Page

+

Home Page

Home Page

-

Handling Multiple Movies with Same Title

+

Handling Multiple Movies with Same Title

Multiple Movies with Same Title

-

Results Page

+

Results Page

Results Page

@@ -429,14 +429,14 @@ Spies (2015): A secret agent must perform a heist without time on his side

Test it out at https://flixrec.navan.dev

-

Current Limittations

+

Current Limittations

  • Does not work well with popular franchises
  • No Genre Filter
-

Future Addons

+

Future Addons

  • Include Cast Data -- cgit v1.2.3