From 41afee9614e63c17e1a875a2ed2f2a550c1b7266 Mon Sep 17 00:00:00 2001 From: navanchauhan Date: Sun, 22 May 2022 12:30:17 -0600 Subject: fixed for twitter thread --- docs/feed.rss | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-) (limited to 'docs/feed.rss') diff --git a/docs/feed.rss b/docs/feed.rss index 11e6861..85c0a02 100644 --- a/docs/feed.rss +++ b/docs/feed.rss @@ -4,8 +4,8 @@ Navan's Archive Rare Tips, Tricks and Posts https://web.navan.dev/en - Sun, 22 May 2022 12:18:20 -0000 - Sun, 22 May 2022 12:18:20 -0000 + Sun, 22 May 2022 12:30:06 -0000 + Sun, 22 May 2022 12:30:06 -0000 250 @@ -776,7 +776,9 @@ export BABEL_LIBDIR="/usr/lib/openbabel/3.1.0"

Because of the small size of the database file, I was able to just upload the file.

-

For the encoding model, I decided to use the pretrained paraphrase-multilingual-MiniLM-L12-v2 model for SentenceTransformers, a Python framework for SOTA sentence, text and image embeddings. I wanted to use a multilingual model as I personally consume content in various languages (natively, no dubs or subs) and some of the sources for their information do not translate to English. As of writing this post, I did not include any other database except Trakt.

+

For the encoding model, I decided to use the pretrained paraphrase-multilingual-MiniLM-L12-v2 model for SentenceTransformers, a Python framework for SOTA sentence, text and image embeddings. +I wanted to use a multilingual model as I personally consume content in various languages and some of the sources for their information do not translate to English. +As of writing this post, I did not include any other database except Trakt.

While deciding how I was going to process the embeddings, I came across multiple solutions:

@@ -835,7 +837,8 @@ export BABEL_LIBDIR="/usr/lib/openbabel/3.1.0"

We use the trakt_id for the movie as the ID for the vectors and upsert it into the index.

-

To find similar items, we will first have to map the name of the movie to its trakt_id, get the embeddings we have for that id and then perform a similarity search. It is possible that this additional step of mapping could be avoided by storing information as metadata in the index.

+

To find similar items, we will first have to map the name of the movie to its trakt_id, get the embeddings we have for that id and then perform a similarity search. +It is possible that this additional step of mapping could be avoided by storing information as metadata in the index.

def get_trakt_id(df, title: str):
   rec = df[df["title"].str.lower()==movie_name.lower()]
-- 
cgit v1.2.3