add blog post

author: navanchauhan <navanchauhan@gmail.com> 2022-11-07 23:36:11 -0700
committer: navanchauhan <navanchauhan@gmail.com> 2022-11-07 23:36:11 -0700
commit: d75527f7eecc4e2fcdd18ab157412506717c8adb (patch)
tree: 8a96e3036d59030f5654725edb1ca5ad6db4cb4e /docs/posts/2022-11-07-a-new-method-to-blog.html
parent: 8ca94ab784138ef673bc7c1691b99e2d4d69e015 (diff)
1 files changed, 90 insertions, 0 deletions
diff --git a/docs/posts/2022-11-07-a-new-method-to-blog.html b/docs/posts/2022-11-07-a-new-method-to-blog.html
new file mode 100644
index 0000000..aa209b2
--- /dev/null
+++ b/docs/posts/2022-11-07-a-new-method-to-blog.html
@@ -0,0 +1,90 @@
+<!DOCTYPE html>
+<html lang="en">
+<head>
+    
+    <link rel="stylesheet" href="/assets/main.css" />
+    <link rel="stylesheet" href="/assets/sakura.css" />
+    <meta charset="utf-8">
+    <meta name="viewport" content="width=device-width, initial-scale=1.0">
+    <title>Hey - Post - A new method to blog</title>
+    <meta name="og:site_name" content="Navan Chauhan" />
+    <link rel="canonical" href="https://web.navan.dev/" />
+    <meta name="twitter:url" content="https://web.navan.dev/" />
+    <meta name="og:url" content="https://web.navan.dev/" />
+    <meta name="twitter:title" content="Hey - Post - A new method to blog" />
+    <meta name="og:title" content="Hey - Post - A new method to blog" />
+    <meta name="description" content=" Writing posts in markdown using pen and paper " />
+    <meta name="twitter:description" content=" Writing posts in markdown using pen and paper " />
+    <meta name="og:description" content=" Writing posts in markdown using pen and paper " />
+    <meta name="twitter:card" content=" Writing posts in markdown using pen and paper " />
+    <meta name="viewport" content="width=device-width, initial-scale=1.0" />
+    <link rel="shortcut icon" href="/images/favicon.png" type="image/png" />
+    <link rel="alternate" href="/feed.rss" type="application/rss+xml" title="Subscribe to Navan Chauhan" />
+    <meta name="twitter:image" content="https://web.navan.dev/images/logo.png" />
+    <meta name="og:image" content="https://web.navan.dev/images/logo.png" />
+    <link rel="manifest" href="manifest.json" />
+    <meta name="google-site-verification" content="LVeSZxz-QskhbEjHxOi7-BM5dDxTg53x2TwrjFxfL0k" />
+    <script async src="//gc.zgo.at/count.js" data-goatcounter="https://navanchauhan.goatcounter.com/count"></script>
+    <script defer data-domain="web.navan.dev" src="https://plausible.io/js/plausible.js"></script>
+    <script defer data-domain="web.navan.dev" src="https://plausible.navan.dev/js/plausible.js"></script>
+    
+</head>
+<body>
+    <nav style="display: block;">
+|
+<a href="/">home</a> |
+<a href="/about/">about/links</a> |
+<a href="/posts/">posts</a> |
+<a href="/publications/">publications</a> |
+<a href="/repo/">iOS repo</a> |
+<a href="/feed.rss">RSS Feed</a> |
+</nav>
+    
+<main>
+	<h1>A new method to blog</h1>
+
+<p><a rel="noopener" target="_blank" href="https://paperwebsite.com">Paper Website</a> is a service that lets you build a website with just pen and paper. I am going to try and replicate the process.</p>
+
+<h2>The Plan</h2>
+
+<p>The continuity feature on macOS + iOS lets you scan PDFs directly from your iPhone. I want to be able to scan these pages and automatically run an Automator script that takes the PDF and OCRs the text. Then I can further clean the text and convert from markdown.</p>
+
+<h2>Challenges</h2>
+
+<p>I quickly realised that the OCR software I planned on using could not detect my shitty handwriting accurately. I tried using ABBY Finereader, Prizmo and OCRMyPDF. (Abby Finereader and Prizmo support being automated by Automator).</p>
+
+<p>Now, I could either write neater, or use an external API like Microsoft Azure</p>
+
+<h2>Solution</h2>
+
+<h3>OCR</h3>
+
+<p>In the PDFs, all the scans are saved as images on a page. I extract the image and then send it to Azure's API. </p>
+
+<h3>Paragraph Breaks</h3>
+
+<p>The recognised text had multiple lines breaking in the middle of the sentence, Therefore, I use what is called a <a rel="noopener" target="_blank" href="https://en.wikipedia.org/wiki/Pilcrow">pilcrow</a> to specify paragraph breaks. But, rather than trying to draw the normal pilcrow, I just use the HTML entity <code>&amp;#182;</code> which is the pilcrow character. </p>
+
+<h2>Where is the code?</h2>
+
+<p>I created a <a rel="noopener" target="_blank" href="https://gist.github.com/navanchauhan/5fc602b1e023b60a66bc63bd4eecd4f8">GitHub Gist</a> for a sample Python script to take the PDF and print the text </p>
+
+<p>A more complete version with Auomator scripts and an entire publishing pipeline will be available as a GitHub and Gitea repo soon.</p>
+
+<p><em>* In Part 2, I will discuss some more features *</em> </p>
+
+	<script data-isso="//comments.navan.dev/"
+        src="//comments.navan.dev/js/embed.min.js"></script>
+	<section id="isso-thread">
+	    <noscript>Javascript needs to be activated to view comments.</noscript>
+	</section>
+	<!--<div class="commentbox"></div>
+	<script src="https://unpkg.com/commentbox.io/dist/commentBox.min.js"></script>
+	<script>commentBox('5650347917836288-proj')</script>-->
+</main>
+
+
+<script src="assets/manup.min.js"></script>
+<script src="/pwabuilder-sw-register.js"></script>    
+</body>
+</html>
+\ No newline at end of file
author	navanchauhan <navanchauhan@gmail.com>	2022-11-07 23:36:11 -0700
committer	navanchauhan <navanchauhan@gmail.com>	2022-11-07 23:36:11 -0700
commit	d75527f7eecc4e2fcdd18ab157412506717c8adb (patch)
tree	8a96e3036d59030f5654725edb1ca5ad6db4cb4e /docs/posts/2022-11-07-a-new-method-to-blog.html
parent	8ca94ab784138ef673bc7c1691b99e2d4d69e015 (diff)