summaryrefslogtreecommitdiff
path: root/Content/posts
diff options
context:
space:
mode:
authornavanchauhan <navanchauhan@gmail.com>2022-05-22 12:30:17 -0600
committernavanchauhan <navanchauhan@gmail.com>2022-05-22 12:30:17 -0600
commit41afee9614e63c17e1a875a2ed2f2a550c1b7266 (patch)
tree06ec08d880bc46f81e9fcbf1448a0722c2b60430 /Content/posts
parent4cb855c1ccfc5fed6b29168546f9988ebadd437e (diff)
fixed for twitter thread
Diffstat (limited to 'Content/posts')
-rw-r--r--Content/posts/2022-05-21-Similar-Movies-Recommender.md7
1 files changed, 5 insertions, 2 deletions
diff --git a/Content/posts/2022-05-21-Similar-Movies-Recommender.md b/Content/posts/2022-05-21-Similar-Movies-Recommender.md
index b889002..66dd54a 100644
--- a/Content/posts/2022-05-21-Similar-Movies-Recommender.md
+++ b/Content/posts/2022-05-21-Similar-Movies-Recommender.md
@@ -208,7 +208,9 @@ I did not want to put my poor Mac through the estimated 23 hours it would have t
Because of the small size of the database file, I was able to just upload the file.
-For the encoding model, I decided to use the pretrained `paraphrase-multilingual-MiniLM-L12-v2` model for SentenceTransformers, a Python framework for SOTA sentence, text and image embeddings. I wanted to use a multilingual model as I personally consume content in various languages (natively, no dubs or subs) and some of the sources for their information do not translate to English. As of writing this post, I did not include any other database except Trakt.
+For the encoding model, I decided to use the pretrained `paraphrase-multilingual-MiniLM-L12-v2` model for SentenceTransformers, a Python framework for SOTA sentence, text and image embeddings.
+I wanted to use a multilingual model as I personally consume content in various languages and some of the sources for their information do not translate to English.
+As of writing this post, I did not include any other database except Trakt.
While deciding how I was going to process the embeddings, I came across multiple solutions:
@@ -269,7 +271,8 @@ That's it!
We use the `trakt_id` for the movie as the ID for the vectors and upsert it into the index.
-To find similar items, we will first have to map the name of the movie to its trakt_id, get the embeddings we have for that id and then perform a similarity search. It is possible that this additional step of mapping could be avoided by storing information as metadata in the index.
+To find similar items, we will first have to map the name of the movie to its trakt_id, get the embeddings we have for that id and then perform a similarity search.
+It is possible that this additional step of mapping could be avoided by storing information as metadata in the index.
```python
def get_trakt_id(df, title: str):