diff options
author | navanchauhan <navanchauhan@gmail.com> | 2022-05-22 12:30:17 -0600 |
---|---|---|
committer | navanchauhan <navanchauhan@gmail.com> | 2022-05-22 12:30:17 -0600 |
commit | 41afee9614e63c17e1a875a2ed2f2a550c1b7266 (patch) | |
tree | 06ec08d880bc46f81e9fcbf1448a0722c2b60430 | |
parent | 4cb855c1ccfc5fed6b29168546f9988ebadd437e (diff) |
fixed for twitter thread
-rw-r--r-- | Content/posts/2022-05-21-Similar-Movies-Recommender.md | 7 | ||||
-rw-r--r-- | docs/feed.rss | 11 | ||||
-rw-r--r-- | docs/posts/2022-05-21-Similar-Movies-Recommender.html | 7 |
3 files changed, 17 insertions, 8 deletions
diff --git a/Content/posts/2022-05-21-Similar-Movies-Recommender.md b/Content/posts/2022-05-21-Similar-Movies-Recommender.md index b889002..66dd54a 100644 --- a/Content/posts/2022-05-21-Similar-Movies-Recommender.md +++ b/Content/posts/2022-05-21-Similar-Movies-Recommender.md @@ -208,7 +208,9 @@ I did not want to put my poor Mac through the estimated 23 hours it would have t Because of the small size of the database file, I was able to just upload the file. -For the encoding model, I decided to use the pretrained `paraphrase-multilingual-MiniLM-L12-v2` model for SentenceTransformers, a Python framework for SOTA sentence, text and image embeddings. I wanted to use a multilingual model as I personally consume content in various languages (natively, no dubs or subs) and some of the sources for their information do not translate to English. As of writing this post, I did not include any other database except Trakt. +For the encoding model, I decided to use the pretrained `paraphrase-multilingual-MiniLM-L12-v2` model for SentenceTransformers, a Python framework for SOTA sentence, text and image embeddings. +I wanted to use a multilingual model as I personally consume content in various languages and some of the sources for their information do not translate to English. +As of writing this post, I did not include any other database except Trakt. While deciding how I was going to process the embeddings, I came across multiple solutions: @@ -269,7 +271,8 @@ That's it! We use the `trakt_id` for the movie as the ID for the vectors and upsert it into the index. -To find similar items, we will first have to map the name of the movie to its trakt_id, get the embeddings we have for that id and then perform a similarity search. It is possible that this additional step of mapping could be avoided by storing information as metadata in the index. +To find similar items, we will first have to map the name of the movie to its trakt_id, get the embeddings we have for that id and then perform a similarity search. +It is possible that this additional step of mapping could be avoided by storing information as metadata in the index. ```python def get_trakt_id(df, title: str): diff --git a/docs/feed.rss b/docs/feed.rss index 11e6861..85c0a02 100644 --- a/docs/feed.rss +++ b/docs/feed.rss @@ -4,8 +4,8 @@ <title>Navan's Archive</title> <description>Rare Tips, Tricks and Posts</description> <link>https://web.navan.dev/</link><language>en</language> - <lastBuildDate>Sun, 22 May 2022 12:18:20 -0000</lastBuildDate> - <pubDate>Sun, 22 May 2022 12:18:20 -0000</pubDate> + <lastBuildDate>Sun, 22 May 2022 12:30:06 -0000</lastBuildDate> + <pubDate>Sun, 22 May 2022 12:30:06 -0000</pubDate> <ttl>250</ttl> <atom:link href="https://web.navan.dev/feed.rss" rel="self" type="application/rss+xml"/> @@ -776,7 +776,9 @@ export BABEL_LIBDIR="/usr/lib/openbabel/3.1.0" <p>Because of the small size of the database file, I was able to just upload the file.</p> -<p>For the encoding model, I decided to use the pretrained <code>paraphrase-multilingual-MiniLM-L12-v2</code> model for SentenceTransformers, a Python framework for SOTA sentence, text and image embeddings. I wanted to use a multilingual model as I personally consume content in various languages (natively, no dubs or subs) and some of the sources for their information do not translate to English. As of writing this post, I did not include any other database except Trakt. </p> +<p>For the encoding model, I decided to use the pretrained <code>paraphrase-multilingual-MiniLM-L12-v2</code> model for SentenceTransformers, a Python framework for SOTA sentence, text and image embeddings. +I wanted to use a multilingual model as I personally consume content in various languages and some of the sources for their information do not translate to English. +As of writing this post, I did not include any other database except Trakt. </p> <p>While deciding how I was going to process the embeddings, I came across multiple solutions:</p> @@ -835,7 +837,8 @@ export BABEL_LIBDIR="/usr/lib/openbabel/3.1.0" <p>We use the <code>trakt_id</code> for the movie as the ID for the vectors and upsert it into the index. </p> -<p>To find similar items, we will first have to map the name of the movie to its trakt_id, get the embeddings we have for that id and then perform a similarity search. It is possible that this additional step of mapping could be avoided by storing information as metadata in the index.</p> +<p>To find similar items, we will first have to map the name of the movie to its trakt_id, get the embeddings we have for that id and then perform a similarity search. +It is possible that this additional step of mapping could be avoided by storing information as metadata in the index.</p> <div class="codehilite"><pre><span></span><code><span class="k">def</span> <span class="nf">get_trakt_id</span><span class="p">(</span><span class="n">df</span><span class="p">,</span> <span class="n">title</span><span class="p">:</span> <span class="nb">str</span><span class="p">):</span> <span class="n">rec</span> <span class="o">=</span> <span class="n">df</span><span class="p">[</span><span class="n">df</span><span class="p">[</span><span class="s2">"title"</span><span class="p">]</span><span class="o">.</span><span class="n">str</span><span class="o">.</span><span class="n">lower</span><span class="p">()</span><span class="o">==</span><span class="n">movie_name</span><span class="o">.</span><span class="n">lower</span><span class="p">()]</span> diff --git a/docs/posts/2022-05-21-Similar-Movies-Recommender.html b/docs/posts/2022-05-21-Similar-Movies-Recommender.html index 2c0b488..2e2fb6b 100644 --- a/docs/posts/2022-05-21-Similar-Movies-Recommender.html +++ b/docs/posts/2022-05-21-Similar-Movies-Recommender.html @@ -240,7 +240,9 @@ <p>Because of the small size of the database file, I was able to just upload the file.</p> -<p>For the encoding model, I decided to use the pretrained <code>paraphrase-multilingual-MiniLM-L12-v2</code> model for SentenceTransformers, a Python framework for SOTA sentence, text and image embeddings. I wanted to use a multilingual model as I personally consume content in various languages (natively, no dubs or subs) and some of the sources for their information do not translate to English. As of writing this post, I did not include any other database except Trakt. </p> +<p>For the encoding model, I decided to use the pretrained <code>paraphrase-multilingual-MiniLM-L12-v2</code> model for SentenceTransformers, a Python framework for SOTA sentence, text and image embeddings. +I wanted to use a multilingual model as I personally consume content in various languages and some of the sources for their information do not translate to English. +As of writing this post, I did not include any other database except Trakt. </p> <p>While deciding how I was going to process the embeddings, I came across multiple solutions:</p> @@ -299,7 +301,8 @@ <p>We use the <code>trakt_id</code> for the movie as the ID for the vectors and upsert it into the index. </p> -<p>To find similar items, we will first have to map the name of the movie to its trakt_id, get the embeddings we have for that id and then perform a similarity search. It is possible that this additional step of mapping could be avoided by storing information as metadata in the index.</p> +<p>To find similar items, we will first have to map the name of the movie to its trakt_id, get the embeddings we have for that id and then perform a similarity search. +It is possible that this additional step of mapping could be avoided by storing information as metadata in the index.</p> <div class="codehilite"><pre><span></span><code><span class="k">def</span> <span class="nf">get_trakt_id</span><span class="p">(</span><span class="n">df</span><span class="p">,</span> <span class="n">title</span><span class="p">:</span> <span class="nb">str</span><span class="p">):</span> <span class="n">rec</span> <span class="o">=</span> <span class="n">df</span><span class="p">[</span><span class="n">df</span><span class="p">[</span><span class="s2">"title"</span><span class="p">]</span><span class="o">.</span><span class="n">str</span><span class="o">.</span><span class="n">lower</span><span class="p">()</span><span class="o">==</span><span class="n">movie_name</span><span class="o">.</span><span class="n">lower</span><span class="p">()]</span> |