From f6d2141a480dd6b5b8ee0e48d43bb64773232791 Mon Sep 17 00:00:00 2001 From: Navan Chauhan Date: Tue, 26 Mar 2024 23:38:14 -0600 Subject: add header ids --- docs/posts/2022-11-07-a-new-method-to-blog.html | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) (limited to 'docs/posts/2022-11-07-a-new-method-to-blog.html') diff --git a/docs/posts/2022-11-07-a-new-method-to-blog.html b/docs/posts/2022-11-07-a-new-method-to-blog.html index 427acd9..cbf80ec 100644 --- a/docs/posts/2022-11-07-a-new-method-to-blog.html +++ b/docs/posts/2022-11-07-a-new-method-to-blog.html @@ -6,13 +6,13 @@ - A new method to blog + id="a-new-method-to-blog">A new method to blog - - + A new method to blog" /> + A new method to blog" /> @@ -44,33 +44,33 @@
-

A new method to blog

+

A new method to blog

Here is the original PDF. I made some edits to the content after generating the markdown file

Paper Website is a service that lets you build a website with just pen and paper. I am going to try and replicate the process.

-

The Plan

+

The Plan

The continuity feature on macOS + iOS lets you scan PDFs directly from your iPhone. I want to be able to scan these pages and automatically run an Automator script that takes the PDF and OCRs the text. Then I can further clean the text and convert from markdown.

-

Challenges

+

Challenges

I quickly realised that the OCR software I planned on using could not detect my shitty handwriting accurately. I tried using ABBY Finereader, Prizmo and OCRMyPDF. (Abby Finereader and Prizmo support being automated by Automator).

Now, I could either write neater, or use an external API like Microsoft Azure

-

Solution

+

Solution

-

OCR

+

OCR

In the PDFs, all the scans are saved as images on a page. I extract the image and then send it to Azure's API.

-

Paragraph Breaks

+

Paragraph Breaks

The recognised text had multiple lines breaking in the middle of the sentence, Therefore, I use what is called a pilcrow to specify paragraph breaks. But, rather than trying to draw the normal pilcrow, I just use the HTML entity ¶ which is the pilcrow character.

-

Where is the code?

+

Where is the code?

I created a GitHub Gist for a sample Python script to take the PDF and print the text

-- cgit v1.2.3