diff options
Diffstat (limited to 'docs/posts/2023-02-08-Interact-with-siri-from-the-terminal.html')
| -rw-r--r-- | docs/posts/2023-02-08-Interact-with-siri-from-the-terminal.html | 321 |
1 files changed, 0 insertions, 321 deletions
diff --git a/docs/posts/2023-02-08-Interact-with-siri-from-the-terminal.html b/docs/posts/2023-02-08-Interact-with-siri-from-the-terminal.html deleted file mode 100644 index 0a488de..0000000 --- a/docs/posts/2023-02-08-Interact-with-siri-from-the-terminal.html +++ /dev/null @@ -1,321 +0,0 @@ -<!DOCTYPE html> -<html lang="en"> -<head> - - <meta http-equiv="X-UA-Compatible" content="IE=edge"> - <meta http-equiv="content-type" content="text/html; charset=utf-8"> - <meta name="viewport" content="width=device-width, initial-scale=1.0"> - <meta name="theme-color" content="#6a9fb5"> - - <title>Interacting with Siri using the command line</title> - - <!-- - <link rel="stylesheet" href="https://unpkg.com/latex.css/style.min.css" /> - --> - - <link rel="stylesheet" href="/assets/c-hyde.css"> - - <link rel="stylesheet" href="https://fonts.googleapis.com/css?family=Abril+Fatface"> - <link rel="stylesheet" href="https://fonts.googleapis.com/css?family=PT+Sans:400,400italic,700"> - - <link rel="stylesheet" href="/assets/main.css"> - <meta name="viewport" content="width=device-width, initial-scale=1.0"> - <meta name="og:site_name" content="Navan Chauhan"> - <link rel="canonical" href="https://web.navan.dev/posts/2023-02-08-Interact-with-siri-from-the-terminal.html"> - <meta name="twitter:url" content="https://web.navan.dev/posts/2023-02-08-Interact-with-siri-from-the-terminal.html"> - <meta name="og:url" content="https://web.navan.dev/posts/2023-02-08-Interact-with-siri-from-the-terminal.html"> - <meta name="twitter:title" content="Interacting with Siri using the command line"> - <meta name="og:title" content="Interacting with Siri using the command line"> - <meta name="description" content="Code snippet to interact with Siri by issuing commands from the command-line."> - <meta name="twitter:description" content="Code snippet to interact with Siri by issuing commands from the command-line."> - <meta name="og:description" content="Code snippet to interact with Siri by issuing commands from the command-line."> - <meta name="twitter:card" content="summary_large_image"> - <meta name="viewport" content="width=device-width, initial-scale=1.0"> - <link rel="shortcut icon" href="/images/favicon.png" type="image/png"> - <link href="/feed.rss" type="application/atom+xml" rel="alternate" title="Sitewide Atom feed"> - <meta name="twitter:image" content="https://web.navan.dev/images/opengraph/posts/2023-02-08-Interact-with-siri-from-the-terminal.png"> - <meta name="og:image" content="https://web.navan.dev/images/opengraph/posts/2023-02-08-Interact-with-siri-from-the-terminal.png"> - <meta name="google-site-verification" content="LVeSZxz-QskhbEjHxOi7-BM5dDxTg53x2TwrjFxfL0k"> - <script data-goatcounter="https://navanchauhan.goatcounter.com/count" - async src="//gc.zgo.at/count.js"></script> - <script defer data-domain="web.navan.dev" src="https://plausible.io/js/plausible.js"></script> - <link rel="manifest" href="/manifest.json"> - -</head> -<body class="theme-base-0d"> - <div class="sidebar"> - <div class="container sidebar-sticky"> - <div class="sidebar-about"> - <h1><a href="/">Navan</a></h1> - <p class="lead" id="random-lead">Alea iacta est.</p> - </div> - - <ul class="sidebar-nav"> - <li><a class="sidebar-nav-item" href="/about/">about/links</a></li> - <li><a class="sidebar-nav-item" href="/posts/">posts</a></li> - <li><a class="sidebar-nav-item" href="/3D-Designs/">3D designs</a></li> - <li><a class="sidebar-nav-item" href="/feed.rss">RSS Feed</a></li> - <li><a class="sidebar-nav-item" href="/colophon/">colophon</a></li> - </ul> - <div class="copyright"><p>© 2019-2024. Navan Chauhan <br> <a href="/feed.rss">RSS</a></p></div> - </div> -</div> - -<script> -let phrases = [ - "Something Funny", "Veni, vidi, vici", "Alea iacta est", "In vino veritas", "Acta, non verba", "Castigat ridendo mores", - "Cui bono?", "Memento vivere", "अहम् ब्रह्मास्मि", "अनुगच्छतु प्रवाहं", "चरन्मार्गान्विजानाति", "coq de cheval", "我愛啤酒" - ]; - -let new_phrase = phrases[Math.floor(Math.random()*phrases.length)]; - -let lead = document.getElementById("random-lead"); -lead.innerText = new_phrase; -</script> - <div class="content container"> - - <div class="post"> - <h1 id="interacting-with-siri-using-the-command-line">Interacting with Siri using the command line</h1> - -<p>My main objective was to see if I could issue multi-intent commands in one go. Obviously, Siri cannot do that (neither can Alexa, Cortana, or Google Assistant). The script here can issue either a single command, or use the help of OpenAI's DaVinci model to extract multiple commands and pass them onto siri.</p> - -<h2 id="prerequisites">Prerequisites</h2> - -<ul> -<li>Run macOS</li> -<li>Enable type to Siri (Settings > Accessibility -> Type to Siri)</li> -<li>Enable the Terminal to control System Events (The first time you run the script, it will prompt you to enable it)</li> -</ul> - -<h2 id="show-me-ze-code">Show me ze code</h2> - -<p>If you are here just for the code:</p> - -<div class="codehilite"> -<pre><span></span><code><span class="kn">import</span> <span class="nn">argparse</span> -<span class="kn">import</span> <span class="nn">applescript</span> -<span class="kn">import</span> <span class="nn">openai</span> - -<span class="kn">from</span> <span class="nn">os</span> <span class="kn">import</span> <span class="n">getenv</span> - -<span class="n">openai</span><span class="o">.</span><span class="n">api_key</span> <span class="o">=</span> <span class="n">getenv</span><span class="p">(</span><span class="s2">"OPENAI_KEY"</span><span class="p">)</span> -<span class="n">engine</span> <span class="o">=</span> <span class="s2">"text-davinci-003"</span> - -<span class="k">def</span> <span class="nf">execute_with_llm</span><span class="p">(</span><span class="n">command_text</span><span class="p">:</span> <span class="nb">str</span><span class="p">)</span> <span class="o">-></span> <span class="kc">None</span><span class="p">:</span> - <span class="n">llm_prompt</span> <span class="o">=</span> <span class="sa">f</span><span class="s2">"""You are provided with multiple commands as a single command. Break down all the commands and return them in a list of strings. If you are provided with a single command, return a list with a single string, trying your best to understand the command.</span> - -<span class="s2"> Example:</span> -<span class="s2"> Q: "Turn on the lights and turn off the lights"</span> -<span class="s2"> A: ["Turn on the lights", "Turn off the lights"]</span> - -<span class="s2"> Q: "Switch off the lights and then play some music"</span> -<span class="s2"> A: ["Switch off the lights", "Play some music"]</span> - -<span class="s2"> Q: "I am feeling sad today, play some music"</span> -<span class="s2"> A: ["Play some cheerful music"]</span> - -<span class="s2"> Q: "</span><span class="si">{</span><span class="n">command_text</span><span class="si">}</span><span class="s2">"</span> -<span class="s2"> A: </span> -<span class="s2"> """</span> - - <span class="n">completion</span> <span class="o">=</span> <span class="n">openai</span><span class="o">.</span><span class="n">Completion</span><span class="o">.</span><span class="n">create</span><span class="p">(</span><span class="n">engine</span><span class="o">=</span><span class="n">engine</span><span class="p">,</span> <span class="n">prompt</span><span class="o">=</span><span class="n">llm_prompt</span><span class="p">,</span> <span class="n">max_tokens</span><span class="o">=</span><span class="nb">len</span><span class="p">(</span><span class="n">command_text</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s2">" "</span><span class="p">))</span><span class="o">*</span><span class="mi">2</span><span class="p">)</span> - - <span class="k">for</span> <span class="n">task</span> <span class="ow">in</span> <span class="nb">eval</span><span class="p">(</span><span class="n">completion</span><span class="o">.</span><span class="n">choices</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">.</span><span class="n">text</span><span class="p">):</span> - <span class="n">execute_command</span><span class="p">(</span><span class="n">task</span><span class="p">)</span> - - -<span class="k">def</span> <span class="nf">execute_command</span><span class="p">(</span><span class="n">command_text</span><span class="p">:</span> <span class="nb">str</span><span class="p">)</span> <span class="o">-></span> <span class="kc">None</span><span class="p">:</span> -<span class="w"> </span><span class="sd">"""Execute a Siri command."""</span> - - <span class="n">script</span> <span class="o">=</span> <span class="n">applescript</span><span class="o">.</span><span class="n">AppleScript</span><span class="p">(</span><span class="sa">f</span><span class="s2">"""</span> -<span class="s2"> tell application "System Events" to tell the front menu bar of process "SystemUIServer"</span> -<span class="s2"> tell (first menu bar item whose description is "Siri")</span> -<span class="s2"> perform action "AXPress"</span> -<span class="s2"> end tell</span> -<span class="s2"> end tell</span> - -<span class="s2"> delay 2</span> - -<span class="s2"> tell application "System Events"</span> -<span class="s2"> set textToType to "</span><span class="si">{</span><span class="n">command_text</span><span class="si">}</span><span class="s2">"</span> -<span class="s2"> keystroke textToType</span> -<span class="s2"> key code 36</span> -<span class="s2"> end tell</span> -<span class="s2"> """</span><span class="p">)</span> - - <span class="n">script</span><span class="o">.</span><span class="n">run</span><span class="p">()</span> - - -<span class="k">if</span> <span class="vm">__name__</span> <span class="o">==</span> <span class="s2">"__main__"</span><span class="p">:</span> - <span class="n">parser</span> <span class="o">=</span> <span class="n">argparse</span><span class="o">.</span><span class="n">ArgumentParser</span><span class="p">()</span> - <span class="n">parser</span><span class="o">.</span><span class="n">add_argument</span><span class="p">(</span><span class="s2">"command"</span><span class="p">,</span> <span class="n">nargs</span><span class="o">=</span><span class="s2">"?"</span><span class="p">,</span> <span class="nb">type</span><span class="o">=</span><span class="nb">str</span><span class="p">,</span> <span class="n">help</span><span class="o">=</span><span class="s2">"The command to pass to Siri"</span><span class="p">,</span> <span class="n">default</span><span class="o">=</span><span class="s2">"What time is it?"</span><span class="p">)</span> - <span class="n">parser</span><span class="o">.</span><span class="n">add_argument</span><span class="p">(</span><span class="s1">'--openai'</span><span class="p">,</span> <span class="n">action</span><span class="o">=</span><span class="n">argparse</span><span class="o">.</span><span class="n">BooleanOptionalAction</span><span class="p">,</span> <span class="n">help</span><span class="o">=</span><span class="s2">"Use OpenAI to detect multiple intents"</span><span class="p">,</span> <span class="n">default</span><span class="o">=</span><span class="kc">False</span><span class="p">)</span> - <span class="n">args</span> <span class="o">=</span> <span class="n">parser</span><span class="o">.</span><span class="n">parse_args</span><span class="p">()</span> - - <span class="k">if</span> <span class="n">args</span><span class="o">.</span><span class="n">openai</span><span class="p">:</span> - <span class="n">execute_with_llm</span><span class="p">(</span><span class="n">args</span><span class="o">.</span><span class="n">command</span><span class="p">)</span> - <span class="k">else</span><span class="p">:</span> - <span class="n">execute_command</span><span class="p">(</span><span class="n">args</span><span class="o">.</span><span class="n">command</span><span class="p">)</span> -</code></pre> -</div> - -<p>Usage:</p> - -<div class="codehilite"> -<pre><span></span><code>python3<span class="w"> </span>main.py<span class="w"> </span><span class="s2">"play some taylor swift"</span> -python3<span class="w"> </span>main.py<span class="w"> </span><span class="s2">"turn off the lights and play some music"</span><span class="w"> </span>--openai -</code></pre> -</div> - -<h2 id="eli5">ELI5</h2> - -<p>I am not actually going to explain it as if I am explaining to a five-year old kid.</p> - -<h3 id="applescript">AppleScript</h3> - -<p>In the age of Siri Shortcuts, AppleScript can still do more. It is a scripting language created by Apple that can help you automate pretty much anything you see on your screen.</p> - -<p>We use the following AppleScript to trigger Siri and then type in our command:</p> - -<div class="codehilite"> -<pre><span></span><code><span class="k">tell</span> <span class="nb">application</span> <span class="s2">"System Events"</span> <span class="k">to</span> <span class="k">tell</span> <span class="nb">the</span> <span class="nb">front</span> <span class="na">menu</span> <span class="nv">bar</span> <span class="k">of</span> <span class="nv">process</span> <span class="s2">"SystemUIServer"</span> - <span class="k">tell</span> <span class="p">(</span><span class="nb">first</span> <span class="na">menu</span> <span class="nv">bar</span> <span class="nb">item</span> <span class="nb">whose</span> <span class="nv">description</span> <span class="ow">is</span> <span class="s2">"Siri"</span><span class="p">)</span> - <span class="nb">perform action</span> <span class="s2">"AXPress"</span> - <span class="k">end</span> <span class="k">tell</span> -<span class="k">end</span> <span class="k">tell</span> - -<span class="nb">delay</span> <span class="mi">2</span> - -<span class="k">tell</span> <span class="nb">application</span> <span class="s2">"System Events"</span> - <span class="k">set</span> <span class="nv">textToType</span> <span class="k">to</span> <span class="s2">"Play some rock music"</span> - <span class="nv">keystroke</span> <span class="nv">textToType</span> - <span class="na">key code</span> <span class="mi">36</span> -<span class="k">end</span> <span class="k">tell</span> -</code></pre> -</div> - -<p>This first triggers Siri, waits for a couple of seconds, and then types in our command. In the script, this functionality is handled by the <code>execute_command</code> function.</p> - -<div class="codehilite"> -<pre><span></span><code><span class="kn">import</span> <span class="nn">applescript</span> - -<span class="k">def</span> <span class="nf">execute_command</span><span class="p">(</span><span class="n">command_text</span><span class="p">:</span> <span class="nb">str</span><span class="p">)</span> <span class="o">-></span> <span class="kc">None</span><span class="p">:</span> -<span class="w"> </span><span class="sd">"""Execute a Siri command."""</span> - - <span class="n">script</span> <span class="o">=</span> <span class="n">applescript</span><span class="o">.</span><span class="n">AppleScript</span><span class="p">(</span><span class="sa">f</span><span class="s2">"""</span> -<span class="s2"> tell application "System Events" to tell the front menu bar of process "SystemUIServer"</span> -<span class="s2"> tell (first menu bar item whose description is "Siri")</span> -<span class="s2"> perform action "AXPress"</span> -<span class="s2"> end tell</span> -<span class="s2"> end tell</span> - -<span class="s2"> delay 2</span> - -<span class="s2"> tell application "System Events"</span> -<span class="s2"> set textToType to "</span><span class="si">{</span><span class="n">command_text</span><span class="si">}</span><span class="s2">"</span> -<span class="s2"> keystroke textToType</span> -<span class="s2"> key code 36</span> -<span class="s2"> end tell</span> -<span class="s2"> """</span><span class="p">)</span> - - <span class="n">script</span><span class="o">.</span><span class="n">run</span><span class="p">()</span> -</code></pre> -</div> - -<h3 id="multi-intent-commands">Multi-Intent Commands</h3> - -<p>We can call OpenAI's API to autocomplete our prompt and extract multiple commands. We don't need to use OpenAI's API, and can also simply use Google's Flan-T5 model using HuggingFace's transformers library. </p> - -<h4 id="ze-prompt">Ze Prompt</h4> - -<div class="codehilite"> -<pre><span></span><code>You are provided with multiple commands as a single command. Break down all the commands and return them in a list of strings. If you are provided with a single command, return a list with a single string, trying your best to understand the command. - - Example: - Q: "Turn on the lights and turn off the lights" - A: ["Turn on the lights", "Turn off the lights"] - - Q: "Switch off the lights and then play some music" - A: ["Switch off the lights", "Play some music"] - - Q: "I am feeling sad today, play some music" - A: ["Play some cheerful music"] - - Q: "{command_text}" - A: -</code></pre> -</div> - -<p>This prompt gives the model a few examples to increase the generation accuracy, along with instructing it to return a Python list. </p> - -<h4 id="ze-code">Ze Code</h4> - -<div class="codehilite"> -<pre><span></span><code><span class="kn">import</span> <span class="nn">openai</span> - -<span class="kn">from</span> <span class="nn">os</span> <span class="kn">import</span> <span class="n">getenv</span> - -<span class="n">openai</span><span class="o">.</span><span class="n">api_key</span> <span class="o">=</span> <span class="n">getenv</span><span class="p">(</span><span class="s2">"OPENAI_KEY"</span><span class="p">)</span> -<span class="n">engine</span> <span class="o">=</span> <span class="s2">"text-davinci-003"</span> - -<span class="k">def</span> <span class="nf">execute_with_llm</span><span class="p">(</span><span class="n">command_text</span><span class="p">:</span> <span class="nb">str</span><span class="p">)</span> <span class="o">-></span> <span class="kc">None</span><span class="p">:</span> - <span class="n">llm_prompt</span> <span class="o">=</span> <span class="sa">f</span><span class="s2">"""You are provided with multiple commands as a single command. Break down all the commands and return them in a list of strings. If you are provided with a single command, return a list with a single string, trying your best to understand the command.</span> - -<span class="s2"> Example:</span> -<span class="s2"> Q: "Turn on the lights and turn off the lights"</span> -<span class="s2"> A: ["Turn on the lights", "Turn off the lights"]</span> - -<span class="s2"> Q: "Switch off the lights and then play some music"</span> -<span class="s2"> A: ["Switch off the lights", "Play some music"]</span> - -<span class="s2"> Q: "I am feeling sad today, play some music"</span> -<span class="s2"> A: ["Play some cheerful music"]</span> - -<span class="s2"> Q: "</span><span class="si">{</span><span class="n">command_text</span><span class="si">}</span><span class="s2">"</span> -<span class="s2"> A: </span> -<span class="s2"> """</span> - - <span class="n">completion</span> <span class="o">=</span> <span class="n">openai</span><span class="o">.</span><span class="n">Completion</span><span class="o">.</span><span class="n">create</span><span class="p">(</span><span class="n">engine</span><span class="o">=</span><span class="n">engine</span><span class="p">,</span> <span class="n">prompt</span><span class="o">=</span><span class="n">llm_prompt</span><span class="p">,</span> <span class="n">max_tokens</span><span class="o">=</span><span class="nb">len</span><span class="p">(</span><span class="n">command_text</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s2">" "</span><span class="p">))</span><span class="o">*</span><span class="mi">2</span><span class="p">)</span> - - <span class="k">for</span> <span class="n">task</span> <span class="ow">in</span> <span class="nb">eval</span><span class="p">(</span><span class="n">completion</span><span class="o">.</span><span class="n">choices</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">.</span><span class="n">text</span><span class="p">):</span> <span class="c1"># NEVER EVAL IN PROD RIGHT LIKE THIS</span> - <span class="n">execute_command</span><span class="p">(</span><span class="n">task</span><span class="p">)</span> -</code></pre> -</div> - -<h3 id="gluing-together-code">Gluing together code</h3> - -<p>To finish it all off, we can use argparse to only send the input command to OpenAI when asked to do so.</p> - -<div class="codehilite"> -<pre><span></span><code><span class="kn">import</span> <span class="nn">argparse</span> - -<span class="k">if</span> <span class="vm">__name__</span> <span class="o">==</span> <span class="s2">"__main__"</span><span class="p">:</span> - <span class="n">parser</span> <span class="o">=</span> <span class="n">argparse</span><span class="o">.</span><span class="n">ArgumentParser</span><span class="p">()</span> - <span class="n">parser</span><span class="o">.</span><span class="n">add_argument</span><span class="p">(</span><span class="s2">"command"</span><span class="p">,</span> <span class="n">nargs</span><span class="o">=</span><span class="s2">"?"</span><span class="p">,</span> <span class="nb">type</span><span class="o">=</span><span class="nb">str</span><span class="p">,</span> <span class="n">help</span><span class="o">=</span><span class="s2">"The command to pass to Siri"</span><span class="p">,</span> <span class="n">default</span><span class="o">=</span><span class="s2">"What time is it?"</span><span class="p">)</span> - <span class="n">parser</span><span class="o">.</span><span class="n">add_argument</span><span class="p">(</span><span class="s1">'--openai'</span><span class="p">,</span> <span class="n">action</span><span class="o">=</span><span class="n">argparse</span><span class="o">.</span><span class="n">BooleanOptionalAction</span><span class="p">,</span> <span class="n">help</span><span class="o">=</span><span class="s2">"Use OpenAI to detect multiple intents"</span><span class="p">,</span> <span class="n">default</span><span class="o">=</span><span class="kc">False</span><span class="p">)</span> - <span class="n">args</span> <span class="o">=</span> <span class="n">parser</span><span class="o">.</span><span class="n">parse_args</span><span class="p">()</span> - - <span class="k">if</span> <span class="n">args</span><span class="o">.</span><span class="n">openai</span><span class="p">:</span> - <span class="n">execute_with_llm</span><span class="p">(</span><span class="n">args</span><span class="o">.</span><span class="n">command</span><span class="p">)</span> - <span class="k">else</span><span class="p">:</span> - <span class="n">execute_command</span><span class="p">(</span><span class="n">args</span><span class="o">.</span><span class="n">command</span><span class="p">)</span> -</code></pre> -</div> - -<h2 id="conclusion">Conclusion</h2> - -<p>Siri is still dumb. When I ask it to <code>Switch off the lights</code>, it default to the home thousands of miles away. But, this code snippet definitely does work!</p> - - </div> - <blockquote>If you have scrolled this far, consider subscribing to my mailing list <a href="https://listmonk.navan.dev/subscription/form">here.</a> You can subscribe to either a specific type of post you are interested in, or subscribe to everything with the "Everything" list.</blockquote> - <script data-isso="https://comments.navan.dev/" - src="https://comments.navan.dev/js/embed.min.js"></script> - <div id="isso-thread"> - <noscript>Javascript needs to be activated to view comments.</noscript> - </div> - - </div> - <script src="assets/manup.min.js"></script> - <script src="/pwabuilder-sw-register.js"></script> -</body> -</html>
\ No newline at end of file |
