From d75527f7eecc4e2fcdd18ab157412506717c8adb Mon Sep 17 00:00:00 2001 From: navanchauhan Date: Mon, 7 Nov 2022 23:36:11 -0700 Subject: add blog post --- docs/posts/2022-11-07-a-new-method-to-blog.html | 90 +++++++++++++++++++++++++ 1 file changed, 90 insertions(+) create mode 100644 docs/posts/2022-11-07-a-new-method-to-blog.html (limited to 'docs/posts/2022-11-07-a-new-method-to-blog.html') diff --git a/docs/posts/2022-11-07-a-new-method-to-blog.html b/docs/posts/2022-11-07-a-new-method-to-blog.html new file mode 100644 index 0000000..aa209b2 --- /dev/null +++ b/docs/posts/2022-11-07-a-new-method-to-blog.html @@ -0,0 +1,90 @@ + + + + + + + + + Hey - Post - A new method to blog + + + + + + + + + + + + + + + + + + + + + + + + + +
+

A new method to blog

+ +

Paper Website is a service that lets you build a website with just pen and paper. I am going to try and replicate the process.

+ +

The Plan

+ +

The continuity feature on macOS + iOS lets you scan PDFs directly from your iPhone. I want to be able to scan these pages and automatically run an Automator script that takes the PDF and OCRs the text. Then I can further clean the text and convert from markdown.

+ +

Challenges

+ +

I quickly realised that the OCR software I planned on using could not detect my shitty handwriting accurately. I tried using ABBY Finereader, Prizmo and OCRMyPDF. (Abby Finereader and Prizmo support being automated by Automator).

+ +

Now, I could either write neater, or use an external API like Microsoft Azure

+ +

Solution

+ +

OCR

+ +

In the PDFs, all the scans are saved as images on a page. I extract the image and then send it to Azure's API.

+ +

Paragraph Breaks

+ +

The recognised text had multiple lines breaking in the middle of the sentence, Therefore, I use what is called a pilcrow to specify paragraph breaks. But, rather than trying to draw the normal pilcrow, I just use the HTML entity ¶ which is the pilcrow character.

+ +

Where is the code?

+ +

I created a GitHub Gist for a sample Python script to take the PDF and print the text

+ +

A more complete version with Auomator scripts and an entire publishing pipeline will be available as a GitHub and Gitea repo soon.

+ +

* In Part 2, I will discuss some more features *

+ + +
+ +
+ +
+ + + + + + \ No newline at end of file -- cgit v1.2.3