summaryrefslogtreecommitdiff
path: root/docs/posts/2019-12-22-Fake-News-Detector.html
diff options
context:
space:
mode:
Diffstat (limited to 'docs/posts/2019-12-22-Fake-News-Detector.html')
-rw-r--r--docs/posts/2019-12-22-Fake-News-Detector.html96
1 files changed, 64 insertions, 32 deletions
diff --git a/docs/posts/2019-12-22-Fake-News-Detector.html b/docs/posts/2019-12-22-Fake-News-Detector.html
index 46297b0..9b62b00 100644
--- a/docs/posts/2019-12-22-Fake-News-Detector.html
+++ b/docs/posts/2019-12-22-Fake-News-Detector.html
@@ -60,48 +60,63 @@ Whenever you are looking for a dataset, always try searching on Kaggle and GitHu
This allows you to train the model on the GPU. Turicreate is built on top of Apache's MXNet Framework, for us to use GPU we need to install
a CUDA compatible MXNet package.</p>
-<div class="codehilite"><pre><span></span><code><span class="nt">!pip</span><span class="na"> install turicreate</span>
+<div class="codehilite">
+<pre><span></span><code><span class="nt">!pip</span><span class="na"> install turicreate</span>
<span class="na">!pip uninstall -y mxnet</span>
<span class="na">!pip install mxnet-cu100==1.4.0.post0</span>
-</code></pre></div>
+</code></pre>
+</div>
<p>If you do not wish to train on GPU or are running it on your computer, you can ignore the last two lines</p>
<h3>Downloading the Dataset</h3>
-<div class="codehilite"><pre><span></span><code><span class="nt">!wget</span><span class="na"> -q &quot;https</span><span class="p">:</span><span class="nc">//github.com/joolsa/fake_real_news_dataset/raw/master/fake_or_real_news.csv.zip&quot;</span><span class="w"></span>
+<div class="codehilite">
+<pre><span></span><code><span class="nt">!wget</span><span class="na"> -q &quot;https</span><span class="p">:</span><span class="nc">//github.com/joolsa/fake_real_news_dataset/raw/master/fake_or_real_news.csv.zip&quot;</span><span class="w"></span>
<span class="nt">!unzip</span><span class="na"> fake_or_real_news.csv.zip</span>
-</code></pre></div>
+</code></pre>
+</div>
<h3>Model Creation</h3>
-<div class="codehilite"><pre><span></span><code><span class="kn">import</span> <span class="nn">turicreate</span> <span class="k">as</span> <span class="nn">tc</span>
+<div class="codehilite">
+<pre><span></span><code><span class="kn">import</span> <span class="nn">turicreate</span> <span class="k">as</span> <span class="nn">tc</span>
<span class="n">tc</span><span class="o">.</span><span class="n">config</span><span class="o">.</span><span class="n">set_num_gpus</span><span class="p">(</span><span class="o">-</span><span class="mi">1</span><span class="p">)</span> <span class="c1"># If you do not wish to use GPUs, set it to 0</span>
-</code></pre></div>
+</code></pre>
+</div>
-<div class="codehilite"><pre><span></span><code><span class="n">dataSFrame</span> <span class="o">=</span> <span class="n">tc</span><span class="o">.</span><span class="n">SFrame</span><span class="p">(</span><span class="s1">&#39;fake_or_real_news.csv&#39;</span><span class="p">)</span>
-</code></pre></div>
+<div class="codehilite">
+<pre><span></span><code><span class="n">dataSFrame</span> <span class="o">=</span> <span class="n">tc</span><span class="o">.</span><span class="n">SFrame</span><span class="p">(</span><span class="s1">&#39;fake_or_real_news.csv&#39;</span><span class="p">)</span>
+</code></pre>
+</div>
<p>The dataset contains a column named "X1", which is of no use to us. Therefore, we simply drop it</p>
-<div class="codehilite"><pre><span></span><code><span class="n">dataSFrame</span><span class="o">.</span><span class="n">remove_column</span><span class="p">(</span><span class="s1">&#39;X1&#39;</span><span class="p">)</span>
-</code></pre></div>
+<div class="codehilite">
+<pre><span></span><code><span class="n">dataSFrame</span><span class="o">.</span><span class="n">remove_column</span><span class="p">(</span><span class="s1">&#39;X1&#39;</span><span class="p">)</span>
+</code></pre>
+</div>
<h4>Splitting Dataset</h4>
-<div class="codehilite"><pre><span></span><code><span class="n">train</span><span class="p">,</span> <span class="n">test</span> <span class="o">=</span> <span class="n">dataSFrame</span><span class="o">.</span><span class="n">random_split</span><span class="p">(</span><span class="mf">.9</span><span class="p">)</span>
-</code></pre></div>
+<div class="codehilite">
+<pre><span></span><code><span class="n">train</span><span class="p">,</span> <span class="n">test</span> <span class="o">=</span> <span class="n">dataSFrame</span><span class="o">.</span><span class="n">random_split</span><span class="p">(</span><span class="mf">.9</span><span class="p">)</span>
+</code></pre>
+</div>
<h4>Training</h4>
-<div class="codehilite"><pre><span></span><code><span class="n">model</span> <span class="o">=</span> <span class="n">tc</span><span class="o">.</span><span class="n">text_classifier</span><span class="o">.</span><span class="n">create</span><span class="p">(</span>
+<div class="codehilite">
+<pre><span></span><code><span class="n">model</span> <span class="o">=</span> <span class="n">tc</span><span class="o">.</span><span class="n">text_classifier</span><span class="o">.</span><span class="n">create</span><span class="p">(</span>
<span class="n">dataset</span><span class="o">=</span><span class="n">train</span><span class="p">,</span>
<span class="n">target</span><span class="o">=</span><span class="s1">&#39;label&#39;</span><span class="p">,</span>
<span class="n">features</span><span class="o">=</span><span class="p">[</span><span class="s1">&#39;title&#39;</span><span class="p">,</span><span class="s1">&#39;text&#39;</span><span class="p">]</span>
<span class="p">)</span>
-</code></pre></div>
+</code></pre>
+</div>
-<div class="codehilite"><pre><span></span><code><span class="o">+-----------+----------+-----------+--------------+-------------------+---------------------+</span>
+<div class="codehilite">
+<pre><span></span><code><span class="o">+-----------+----------+-----------+--------------+-------------------+---------------------+</span>
<span class="o">|</span> <span class="n">Iteration</span> <span class="o">|</span> <span class="n">Passes</span> <span class="o">|</span> <span class="n">Step</span> <span class="n">size</span> <span class="o">|</span> <span class="n">Elapsed</span> <span class="n">Time</span> <span class="o">|</span> <span class="n">Training</span> <span class="n">Accuracy</span> <span class="o">|</span> <span class="n">Validation</span> <span class="n">Accuracy</span> <span class="o">|</span>
<span class="o">+-----------+----------+-----------+--------------+-------------------+---------------------+</span>
<span class="o">|</span> <span class="mi">0</span> <span class="o">|</span> <span class="mi">2</span> <span class="o">|</span> <span class="mf">1.000000</span> <span class="o">|</span> <span class="mf">1.156349</span> <span class="o">|</span> <span class="mf">0.889680</span> <span class="o">|</span> <span class="mf">0.790036</span> <span class="o">|</span>
@@ -111,39 +126,50 @@ a CUDA compatible MXNet package.</p>
<span class="o">|</span> <span class="mi">4</span> <span class="o">|</span> <span class="mi">8</span> <span class="o">|</span> <span class="mf">1.000000</span> <span class="o">|</span> <span class="mf">1.814194</span> <span class="o">|</span> <span class="mf">0.999063</span> <span class="o">|</span> <span class="mf">0.925267</span> <span class="o">|</span>
<span class="o">|</span> <span class="mi">9</span> <span class="o">|</span> <span class="mi">14</span> <span class="o">|</span> <span class="mf">1.000000</span> <span class="o">|</span> <span class="mf">2.507072</span> <span class="o">|</span> <span class="mf">1.000000</span> <span class="o">|</span> <span class="mf">0.911032</span> <span class="o">|</span>
<span class="o">+-----------+----------+-----------+--------------+-------------------+---------------------+</span>
-</code></pre></div>
+</code></pre>
+</div>
<h3>Testing the Model</h3>
-<div class="codehilite"><pre><span></span><code><span class="n">est_predictions</span> <span class="o">=</span> <span class="n">model</span><span class="o">.</span><span class="n">predict</span><span class="p">(</span><span class="n">test</span><span class="p">)</span>
+<div class="codehilite">
+<pre><span></span><code><span class="n">est_predictions</span> <span class="o">=</span> <span class="n">model</span><span class="o">.</span><span class="n">predict</span><span class="p">(</span><span class="n">test</span><span class="p">)</span>
<span class="n">accuracy</span> <span class="o">=</span> <span class="n">tc</span><span class="o">.</span><span class="n">evaluation</span><span class="o">.</span><span class="n">accuracy</span><span class="p">(</span><span class="n">test</span><span class="p">[</span><span class="s1">&#39;label&#39;</span><span class="p">],</span> <span class="n">test_predictions</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s1">&#39;Topic classifier model has a testing accuracy of </span><span class="si">{</span><span class="n">accuracy</span><span class="o">*</span><span class="mi">100</span><span class="si">}</span><span class="s1">% &#39;</span><span class="p">,</span> <span class="n">flush</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
-</code></pre></div>
+</code></pre>
+</div>
-<div class="codehilite"><pre><span></span><code><span class="n">Topic</span> <span class="n">classifier</span> <span class="n">model</span> <span class="n">has</span> <span class="n">a</span> <span class="n">testing</span> <span class="n">accuracy</span> <span class="n">of</span> <span class="mf">92.3076923076923</span><span class="o">%</span>
-</code></pre></div>
+<div class="codehilite">
+<pre><span></span><code><span class="n">Topic</span> <span class="n">classifier</span> <span class="n">model</span> <span class="n">has</span> <span class="n">a</span> <span class="n">testing</span> <span class="n">accuracy</span> <span class="n">of</span> <span class="mf">92.3076923076923</span><span class="o">%</span>
+</code></pre>
+</div>
<p>We have just created our own Fake News Detection Model which has an accuracy of 92%!</p>
-<div class="codehilite"><pre><span></span><code><span class="n">example_text</span> <span class="o">=</span> <span class="p">{</span><span class="s2">&quot;title&quot;</span><span class="p">:</span> <span class="p">[</span><span class="s2">&quot;Middling ‘Rise Of Skywalker’ Review Leaves Fan On Fence About Whether To Threaten To Kill Critic&quot;</span><span class="p">],</span> <span class="s2">&quot;text&quot;</span><span class="p">:</span> <span class="p">[</span><span class="s2">&quot;Expressing ambivalence toward the relatively balanced appraisal of the film, Star Wars fan Miles Ariely admitted Thursday that an online publication’s middling review of The Rise Of Skywalker had left him on the fence about whether he would still threaten to kill the critic who wrote it. “I’m really of two minds about this, because on the one hand, he said the new movie fails to live up to the original trilogy, which makes me at least want to throw a brick through his window with a note telling him to watch his back,” said Ariely, confirming he had already drafted an eight-page-long death threat to Stan Corimer of the website Screen-On Time, but had not yet decided whether to post it to the reviewer’s Facebook page. “On the other hand, though, he commended J.J. Abrams’ skillful pacing and faithfulness to George Lucas’ vision, which makes me wonder if I should just call the whole thing off. Now, I really don’t feel like camping outside his house for hours. Maybe I could go with a response that’s somewhere in between, like, threatening to kill his dog but not everyone in his whole family? I don’t know. This is a tough one.” At press time, sources reported that Ariely had resolved to wear his Ewok costume while he murdered the critic in his sleep.&quot;</span><span class="p">]}</span>
+<div class="codehilite">
+<pre><span></span><code><span class="n">example_text</span> <span class="o">=</span> <span class="p">{</span><span class="s2">&quot;title&quot;</span><span class="p">:</span> <span class="p">[</span><span class="s2">&quot;Middling ‘Rise Of Skywalker’ Review Leaves Fan On Fence About Whether To Threaten To Kill Critic&quot;</span><span class="p">],</span> <span class="s2">&quot;text&quot;</span><span class="p">:</span> <span class="p">[</span><span class="s2">&quot;Expressing ambivalence toward the relatively balanced appraisal of the film, Star Wars fan Miles Ariely admitted Thursday that an online publication’s middling review of The Rise Of Skywalker had left him on the fence about whether he would still threaten to kill the critic who wrote it. “I’m really of two minds about this, because on the one hand, he said the new movie fails to live up to the original trilogy, which makes me at least want to throw a brick through his window with a note telling him to watch his back,” said Ariely, confirming he had already drafted an eight-page-long death threat to Stan Corimer of the website Screen-On Time, but had not yet decided whether to post it to the reviewer’s Facebook page. “On the other hand, though, he commended J.J. Abrams’ skillful pacing and faithfulness to George Lucas’ vision, which makes me wonder if I should just call the whole thing off. Now, I really don’t feel like camping outside his house for hours. Maybe I could go with a response that’s somewhere in between, like, threatening to kill his dog but not everyone in his whole family? I don’t know. This is a tough one.” At press time, sources reported that Ariely had resolved to wear his Ewok costume while he murdered the critic in his sleep.&quot;</span><span class="p">]}</span>
<span class="n">example_prediction</span> <span class="o">=</span> <span class="n">model</span><span class="o">.</span><span class="n">classify</span><span class="p">(</span><span class="n">tc</span><span class="o">.</span><span class="n">SFrame</span><span class="p">(</span><span class="n">example_text</span><span class="p">))</span>
<span class="nb">print</span><span class="p">(</span><span class="n">example_prediction</span><span class="p">,</span> <span class="n">flush</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
-</code></pre></div>
+</code></pre>
+</div>
-<div class="codehilite"><pre><span></span><code><span class="o">+-------+--------------------+</span>
+<div class="codehilite">
+<pre><span></span><code><span class="o">+-------+--------------------+</span>
<span class="o">|</span> <span class="k">class</span> <span class="err">| </span><span class="nc">probability</span> <span class="o">|</span>
<span class="o">+-------+--------------------+</span>
<span class="o">|</span> <span class="n">FAKE</span> <span class="o">|</span> <span class="mf">0.9245648658345308</span> <span class="o">|</span>
<span class="o">+-------+--------------------+</span>
<span class="p">[</span><span class="mi">1</span> <span class="n">rows</span> <span class="n">x</span> <span class="mi">2</span> <span class="n">columns</span><span class="p">]</span>
-</code></pre></div>
+</code></pre>
+</div>
<h3>Exporting the Model</h3>
-<div class="codehilite"><pre><span></span><code><span class="n">model_name</span> <span class="o">=</span> <span class="s1">&#39;FakeNews&#39;</span>
+<div class="codehilite">
+<pre><span></span><code><span class="n">model_name</span> <span class="o">=</span> <span class="s1">&#39;FakeNews&#39;</span>
<span class="n">coreml_model_name</span> <span class="o">=</span> <span class="n">model_name</span> <span class="o">+</span> <span class="s1">&#39;.mlmodel&#39;</span>
<span class="n">exportedModel</span> <span class="o">=</span> <span class="n">model</span><span class="o">.</span><span class="n">export_coreml</span><span class="p">(</span><span class="n">coreml_model_name</span><span class="p">)</span>
-</code></pre></div>
+</code></pre>
+</div>
<p><strong>Note: To download files from Google Colab, simply click on the files section in the sidebar, right click on filename and then click on download</strong></p>
@@ -162,7 +188,8 @@ DescriptionThe bag-of-words model is a simplifying representation used in NLP, i
<p>We define our bag of words function</p>
-<div class="codehilite"><pre><span></span><code><span class="kd">func</span> <span class="nf">bow</span><span class="p">(</span><span class="n">text</span><span class="p">:</span> <span class="nb">String</span><span class="p">)</span> <span class="p">-&gt;</span> <span class="p">[</span><span class="nb">String</span><span class="p">:</span> <span class="nb">Double</span><span class="p">]</span> <span class="p">{</span>
+<div class="codehilite">
+<pre><span></span><code><span class="kd">func</span> <span class="nf">bow</span><span class="p">(</span><span class="n">text</span><span class="p">:</span> <span class="nb">String</span><span class="p">)</span> <span class="p">-&gt;</span> <span class="p">[</span><span class="nb">String</span><span class="p">:</span> <span class="nb">Double</span><span class="p">]</span> <span class="p">{</span>
<span class="kd">var</span> <span class="nv">bagOfWords</span> <span class="p">=</span> <span class="p">[</span><span class="nb">String</span><span class="p">:</span> <span class="nb">Double</span><span class="p">]()</span>
<span class="kd">let</span> <span class="nv">tagger</span> <span class="p">=</span> <span class="bp">NSLinguisticTagger</span><span class="p">(</span><span class="n">tagSchemes</span><span class="p">:</span> <span class="p">[.</span><span class="n">tokenType</span><span class="p">],</span> <span class="n">options</span><span class="p">:</span> <span class="mi">0</span><span class="p">)</span>
@@ -181,22 +208,26 @@ DescriptionThe bag-of-words model is a simplifying representation used in NLP, i
<span class="k">return</span> <span class="n">bagOfWords</span>
<span class="p">}</span>
-</code></pre></div>
+</code></pre>
+</div>
<p>We also declare our variables</p>
-<div class="codehilite"><pre><span></span><code><span class="p">@</span><span class="n">State</span> <span class="kd">private</span> <span class="kd">var</span> <span class="nv">title</span><span class="p">:</span> <span class="nb">String</span> <span class="p">=</span> <span class="s">&quot;&quot;</span>
+<div class="codehilite">
+<pre><span></span><code><span class="p">@</span><span class="n">State</span> <span class="kd">private</span> <span class="kd">var</span> <span class="nv">title</span><span class="p">:</span> <span class="nb">String</span> <span class="p">=</span> <span class="s">&quot;&quot;</span>
<span class="p">@</span><span class="n">State</span> <span class="kd">private</span> <span class="kd">var</span> <span class="nv">headline</span><span class="p">:</span> <span class="nb">String</span> <span class="p">=</span> <span class="s">&quot;&quot;</span>
<span class="p">@</span><span class="n">State</span> <span class="kd">private</span> <span class="kd">var</span> <span class="nv">alertTitle</span> <span class="p">=</span> <span class="s">&quot;&quot;</span>
<span class="p">@</span><span class="n">State</span> <span class="kd">private</span> <span class="kd">var</span> <span class="nv">alertText</span> <span class="p">=</span> <span class="s">&quot;&quot;</span>
<span class="p">@</span><span class="n">State</span> <span class="kd">private</span> <span class="kd">var</span> <span class="nv">showingAlert</span> <span class="p">=</span> <span class="kc">false</span>
-</code></pre></div>
+</code></pre>
+</div>
<p>Finally, we implement a simple function which reads the two text fields, creates their bag of words representation and displays an alert with the appropriate result</p>
<p><strong>Complete Code</strong></p>
-<div class="codehilite"><pre><span></span><code><span class="kd">import</span> <span class="nc">SwiftUI</span>
+<div class="codehilite">
+<pre><span></span><code><span class="kd">import</span> <span class="nc">SwiftUI</span>
<span class="kd">struct</span> <span class="nc">ContentView</span><span class="p">:</span> <span class="n">View</span> <span class="p">{</span>
<span class="p">@</span><span class="n">State</span> <span class="kd">private</span> <span class="kd">var</span> <span class="nv">title</span><span class="p">:</span> <span class="nb">String</span> <span class="p">=</span> <span class="s">&quot;&quot;</span>
@@ -271,7 +302,8 @@ DescriptionThe bag-of-words model is a simplifying representation used in NLP, i
<span class="n">ContentView</span><span class="p">()</span>
<span class="p">}</span>
<span class="p">}</span>
-</code></pre></div>
+</code></pre>
+</div>
<script data-isso="//comments.navan.dev/"
src="//comments.navan.dev/js/embed.min.js"></script>