diff options
author | Navan Chauhan <navanchauhan@gmail.com> | 2024-03-26 23:38:14 -0600 |
---|---|---|
committer | Navan Chauhan <navanchauhan@gmail.com> | 2024-03-26 23:38:14 -0600 |
commit | f6d2141a480dd6b5b8ee0e48d43bb64773232791 (patch) | |
tree | 2c1debfc78746324b9e38be0bf4796b7a84a6348 /docs/posts/2022-11-07-a-new-method-to-blog.html | |
parent | aae00025bd8bff04de90b22b2472aed8a232f476 (diff) |
add header ids
Diffstat (limited to 'docs/posts/2022-11-07-a-new-method-to-blog.html')
-rw-r--r-- | docs/posts/2022-11-07-a-new-method-to-blog.html | 20 |
1 files changed, 10 insertions, 10 deletions
diff --git a/docs/posts/2022-11-07-a-new-method-to-blog.html b/docs/posts/2022-11-07-a-new-method-to-blog.html index 427acd9..cbf80ec 100644 --- a/docs/posts/2022-11-07-a-new-method-to-blog.html +++ b/docs/posts/2022-11-07-a-new-method-to-blog.html @@ -6,13 +6,13 @@ <link rel="stylesheet" href="/assets/main.css" /> <meta charset="utf-8"> <meta name="viewport" content="width=device-width, initial-scale=1.0"> - <title>A new method to blog</title> + <title>id="a-new-method-to-blog">A new method to blog</title> <meta name="og:site_name" content="Navan Chauhan" /> <link rel="canonical" href="https://web.navan.dev/posts/2022-11-07-a-new-method-to-blog.html" /> <meta name="twitter:url" content="https://web.navan.dev/posts/2022-11-07-a-new-method-to-blog.html /> <meta name="og:url" content="https://web.navan.dev/posts/2022-11-07-a-new-method-to-blog.html" /> - <meta name="twitter:title" content="A new method to blog" /> - <meta name="og:title" content="A new method to blog" /> + <meta name="twitter:title" content="id="a-new-method-to-blog">A new method to blog" /> + <meta name="og:title" content="id="a-new-method-to-blog">A new method to blog" /> <meta name="description" content="Writing posts in markdown using pen and paper" /> <meta name="twitter:description" content="Writing posts in markdown using pen and paper" /> <meta name="og:description" content="Writing posts in markdown using pen and paper" /> @@ -44,33 +44,33 @@ <main> - <h1>A new method to blog</h1> + <h1 id="a-new-method-to-blog">A new method to blog</h1> <p><em><a rel="noopener" target="_blank" href="/assets/pdfs/2022-11-07-a-new-way-to-blog.pdf">Here</a> is the original PDF. I made some edits to the content after generating the markdown file</em></p> <p><a rel="noopener" target="_blank" href="https://paperwebsite.com">Paper Website</a> is a service that lets you build a website with just pen and paper. I am going to try and replicate the process.</p> -<h2>The Plan</h2> +<h2 id="the-plan">The Plan</h2> <p>The continuity feature on macOS + iOS lets you scan PDFs directly from your iPhone. I want to be able to scan these pages and automatically run an Automator script that takes the PDF and OCRs the text. Then I can further clean the text and convert from markdown.</p> -<h2>Challenges</h2> +<h2 id="challenges">Challenges</h2> <p>I quickly realised that the OCR software I planned on using could not detect my shitty handwriting accurately. I tried using ABBY Finereader, Prizmo and OCRMyPDF. (Abby Finereader and Prizmo support being automated by Automator).</p> <p>Now, I could either write neater, or use an external API like Microsoft Azure</p> -<h2>Solution</h2> +<h2 id="solution">Solution</h2> -<h3>OCR</h3> +<h3 id="ocr">OCR</h3> <p>In the PDFs, all the scans are saved as images on a page. I extract the image and then send it to Azure's API. </p> -<h3>Paragraph Breaks</h3> +<h3 id="paragraph-breaks">Paragraph Breaks</h3> <p>The recognised text had multiple lines breaking in the middle of the sentence, Therefore, I use what is called a <a rel="noopener" target="_blank" href="https://en.wikipedia.org/wiki/Pilcrow">pilcrow</a> to specify paragraph breaks. But, rather than trying to draw the normal pilcrow, I just use the HTML entity <code>&#182;</code> which is the pilcrow character. </p> -<h2>Where is the code?</h2> +<h2 id="where-is-the-code">Where is the code?</h2> <p>I created a <a rel="noopener" target="_blank" href="https://gist.github.com/navanchauhan/5fc602b1e023b60a66bc63bd4eecd4f8">GitHub Gist</a> for a sample Python script to take the PDF and print the text </p> |