summaryrefslogtreecommitdiff
path: root/docs/posts/2023-02-08-Interact-with-siri-from-the-terminal.html
blob: 0c239642ea81d461af97d16d60949609c08d008c (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
<!DOCTYPE html>
<html lang="en">
<head>
    
    <link rel="stylesheet" href="https://unpkg.com/latex.css/style.min.css" />
    <link rel="stylesheet" href="/assets/main.css" />
    <meta charset="utf-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>id="interacting-with-siri-using-the-command-line">Interacting with Siri using the command line</title>
    <meta name="og:site_name" content="Navan Chauhan" />
    <link rel="canonical" href="https://web.navan.dev/posts/2023-02-08-Interact-with-siri-from-the-terminal.html" />
    <meta name="twitter:url" content="https://web.navan.dev/posts/2023-02-08-Interact-with-siri-from-the-terminal.html />
    <meta name="og:url" content="https://web.navan.dev/posts/2023-02-08-Interact-with-siri-from-the-terminal.html" />
    <meta name="twitter:title" content="id="interacting-with-siri-using-the-command-line">Interacting with Siri using the command line" />
    <meta name="og:title" content="id="interacting-with-siri-using-the-command-line">Interacting with Siri using the command line" />
    <meta name="description" content="Code snippet to interact with Siri by issuing commands from the command-line." />
    <meta name="twitter:description" content="Code snippet to interact with Siri by issuing commands from the command-line." />
    <meta name="og:description" content="Code snippet to interact with Siri by issuing commands from the command-line." />
    <meta name="twitter:card" content="summary_large_image" />
    <meta name="viewport" content="width=device-width, initial-scale=1.0" />
    <link rel="shortcut icon" href="/images/favicon.png" type="image/png" />
    <link rel="alternate" href="/feed.rss" type="application/rss+xml" title="Subscribe to Navan Chauhan" />
    <meta name="twitter:image" content="https://web.navan.dev/images/opengraph/posts/2023-02-08-Interact-with-siri-from-the-terminal.png" />
    <meta name="og:image" content="https://web.navan.dev/images/opengraph/posts/2023-02-08-Interact-with-siri-from-the-terminal.png" />
    <meta name="google-site-verification" content="LVeSZxz-QskhbEjHxOi7-BM5dDxTg53x2TwrjFxfL0k" />
    <script data-goatcounter="https://navanchauhan.goatcounter.com/count"
        async src="//gc.zgo.at/count.js"></script>
    <script defer data-domain="web.navan.dev" src="https://plausible.io/js/plausible.js"></script>
    <link rel="manifest" href="/manifest.json" />
    
</head>
<body>
    <center><nav style="display: block;">
|
<a href="/">home</a> |
<a href="/about/">about/links</a> |
<a href="/posts/">posts</a> |
<a href="/3D-Designs/">3D designs</a> |
<!--<a href="/publications/">publications</a> |-->
<!--<a href="/repo/">iOS repo</a> |-->
<a href="/feed.rss">RSS Feed</a> |
</nav>
</center>
    
<main>

	<h1 id="interacting-with-siri-using-the-command-line">Interacting with Siri using the command line</h1>

<p>My main objective was to see if I could issue multi-intent commands in one go. Obviously, Siri cannot do that (neither can Alexa, Cortana, or Google Assistant). The script here can issue either a single command, or use the help of OpenAI's DaVinci model to extract multiple commands and pass them onto siri.</p>

<h2 id="prerequisites">Prerequisites</h2>

<ul>
<li>Run macOS</li>
<li>Enable type to Siri (Settings &gt; Accessibility -> Type to Siri)</li>
<li>Enable the Terminal to control System Events (The first time you run the script, it will prompt you to enable it)</li>
</ul>

<h2 id="show-me-ze-code">Show me ze code</h2>

<p>If you are here just for the code:</p>

<div class="codehilite">
<pre><span></span><code><span class="kn">import</span> <span class="nn">argparse</span>
<span class="kn">import</span> <span class="nn">applescript</span>
<span class="kn">import</span> <span class="nn">openai</span>

<span class="kn">from</span> <span class="nn">os</span> <span class="kn">import</span> <span class="n">getenv</span>

<span class="n">openai</span><span class="o">.</span><span class="n">api_key</span> <span class="o">=</span> <span class="n">getenv</span><span class="p">(</span><span class="s2">&quot;OPENAI_KEY&quot;</span><span class="p">)</span>
<span class="n">engine</span> <span class="o">=</span> <span class="s2">&quot;text-davinci-003&quot;</span>

<span class="k">def</span> <span class="nf">execute_with_llm</span><span class="p">(</span><span class="n">command_text</span><span class="p">:</span> <span class="nb">str</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="kc">None</span><span class="p">:</span>
    <span class="n">llm_prompt</span> <span class="o">=</span> <span class="sa">f</span><span class="s2">&quot;&quot;&quot;You are provided with multiple commands as a single command. Break down all the commands and return them in a list of strings. If you are provided with a single command, return a list with a single string, trying your best to understand the command.</span>

<span class="s2">    Example:</span>
<span class="s2">    Q: &quot;Turn on the lights and turn off the lights&quot;</span>
<span class="s2">    A: [&quot;Turn on the lights&quot;, &quot;Turn off the lights&quot;]</span>

<span class="s2">    Q: &quot;Switch off the lights and then play some music&quot;</span>
<span class="s2">    A: [&quot;Switch off the lights&quot;, &quot;Play some music&quot;]</span>

<span class="s2">    Q: &quot;I am feeling sad today, play some music&quot;</span>
<span class="s2">    A: [&quot;Play some cheerful music&quot;]</span>

<span class="s2">    Q: &quot;</span><span class="si">{</span><span class="n">command_text</span><span class="si">}</span><span class="s2">&quot;</span>
<span class="s2">    A: </span>
<span class="s2">    &quot;&quot;&quot;</span>

    <span class="n">completion</span> <span class="o">=</span> <span class="n">openai</span><span class="o">.</span><span class="n">Completion</span><span class="o">.</span><span class="n">create</span><span class="p">(</span><span class="n">engine</span><span class="o">=</span><span class="n">engine</span><span class="p">,</span> <span class="n">prompt</span><span class="o">=</span><span class="n">llm_prompt</span><span class="p">,</span> <span class="n">max_tokens</span><span class="o">=</span><span class="nb">len</span><span class="p">(</span><span class="n">command_text</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s2">&quot; &quot;</span><span class="p">))</span><span class="o">*</span><span class="mi">2</span><span class="p">)</span>

    <span class="k">for</span> <span class="n">task</span> <span class="ow">in</span> <span class="nb">eval</span><span class="p">(</span><span class="n">completion</span><span class="o">.</span><span class="n">choices</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">.</span><span class="n">text</span><span class="p">):</span>
        <span class="n">execute_command</span><span class="p">(</span><span class="n">task</span><span class="p">)</span>


<span class="k">def</span> <span class="nf">execute_command</span><span class="p">(</span><span class="n">command_text</span><span class="p">:</span> <span class="nb">str</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="kc">None</span><span class="p">:</span>
<span class="w">    </span><span class="sd">&quot;&quot;&quot;Execute a Siri command.&quot;&quot;&quot;</span>

    <span class="n">script</span> <span class="o">=</span> <span class="n">applescript</span><span class="o">.</span><span class="n">AppleScript</span><span class="p">(</span><span class="sa">f</span><span class="s2">&quot;&quot;&quot;</span>
<span class="s2">        tell application &quot;System Events&quot; to tell the front menu bar of process &quot;SystemUIServer&quot;</span>
<span class="s2">            tell (first menu bar item whose description is &quot;Siri&quot;)</span>
<span class="s2">                perform action &quot;AXPress&quot;</span>
<span class="s2">            end tell</span>
<span class="s2">        end tell</span>

<span class="s2">        delay 2</span>

<span class="s2">        tell application &quot;System Events&quot;</span>
<span class="s2">            set textToType to &quot;</span><span class="si">{</span><span class="n">command_text</span><span class="si">}</span><span class="s2">&quot;</span>
<span class="s2">            keystroke textToType</span>
<span class="s2">            key code 36</span>
<span class="s2">        end tell</span>
<span class="s2">    &quot;&quot;&quot;</span><span class="p">)</span>

    <span class="n">script</span><span class="o">.</span><span class="n">run</span><span class="p">()</span>


<span class="k">if</span> <span class="vm">__name__</span> <span class="o">==</span> <span class="s2">&quot;__main__&quot;</span><span class="p">:</span>
    <span class="n">parser</span> <span class="o">=</span> <span class="n">argparse</span><span class="o">.</span><span class="n">ArgumentParser</span><span class="p">()</span>
    <span class="n">parser</span><span class="o">.</span><span class="n">add_argument</span><span class="p">(</span><span class="s2">&quot;command&quot;</span><span class="p">,</span> <span class="n">nargs</span><span class="o">=</span><span class="s2">&quot;?&quot;</span><span class="p">,</span> <span class="nb">type</span><span class="o">=</span><span class="nb">str</span><span class="p">,</span> <span class="n">help</span><span class="o">=</span><span class="s2">&quot;The command to pass to Siri&quot;</span><span class="p">,</span> <span class="n">default</span><span class="o">=</span><span class="s2">&quot;What time is it?&quot;</span><span class="p">)</span>
    <span class="n">parser</span><span class="o">.</span><span class="n">add_argument</span><span class="p">(</span><span class="s1">&#39;--openai&#39;</span><span class="p">,</span> <span class="n">action</span><span class="o">=</span><span class="n">argparse</span><span class="o">.</span><span class="n">BooleanOptionalAction</span><span class="p">,</span> <span class="n">help</span><span class="o">=</span><span class="s2">&quot;Use OpenAI to detect multiple intents&quot;</span><span class="p">,</span> <span class="n">default</span><span class="o">=</span><span class="kc">False</span><span class="p">)</span>
    <span class="n">args</span> <span class="o">=</span> <span class="n">parser</span><span class="o">.</span><span class="n">parse_args</span><span class="p">()</span>

    <span class="k">if</span> <span class="n">args</span><span class="o">.</span><span class="n">openai</span><span class="p">:</span>
        <span class="n">execute_with_llm</span><span class="p">(</span><span class="n">args</span><span class="o">.</span><span class="n">command</span><span class="p">)</span>
    <span class="k">else</span><span class="p">:</span>
        <span class="n">execute_command</span><span class="p">(</span><span class="n">args</span><span class="o">.</span><span class="n">command</span><span class="p">)</span>
</code></pre>
</div>

<p>Usage:</p>

<div class="codehilite">
<pre><span></span><code>python3<span class="w"> </span>main.py<span class="w"> </span><span class="s2">&quot;play some taylor swift&quot;</span>
python3<span class="w"> </span>main.py<span class="w"> </span><span class="s2">&quot;turn off the lights and play some music&quot;</span><span class="w"> </span>--openai
</code></pre>
</div>

<h2 id="eli5">ELI5</h2>

<p>I am not actually going to explain it as if I am explaining to a five-year old kid.</p>

<h3 id="applescript">AppleScript</h3>

<p>In the age of Siri Shortcuts, AppleScript can still do more. It is a scripting language created by Apple that can help you automate pretty much anything you see on your screen.</p>

<p>We use the following AppleScript to trigger Siri and then type in our command:</p>

<div class="codehilite">
<pre><span></span><code><span class="k">tell</span> <span class="nb">application</span> <span class="s2">&quot;System Events&quot;</span> <span class="k">to</span> <span class="k">tell</span> <span class="nb">the</span> <span class="nb">front</span> <span class="na">menu</span> <span class="nv">bar</span> <span class="k">of</span> <span class="nv">process</span> <span class="s2">&quot;SystemUIServer&quot;</span>
    <span class="k">tell</span> <span class="p">(</span><span class="nb">first</span> <span class="na">menu</span> <span class="nv">bar</span> <span class="nb">item</span> <span class="nb">whose</span> <span class="nv">description</span> <span class="ow">is</span> <span class="s2">&quot;Siri&quot;</span><span class="p">)</span>
        <span class="nb">perform action</span> <span class="s2">&quot;AXPress&quot;</span>
    <span class="k">end</span> <span class="k">tell</span>
<span class="k">end</span> <span class="k">tell</span>

<span class="nb">delay</span> <span class="mi">2</span>

<span class="k">tell</span> <span class="nb">application</span> <span class="s2">&quot;System Events&quot;</span>
    <span class="k">set</span> <span class="nv">textToType</span> <span class="k">to</span> <span class="s2">&quot;Play some rock music&quot;</span>
    <span class="nv">keystroke</span> <span class="nv">textToType</span>
    <span class="na">key code</span> <span class="mi">36</span>
<span class="k">end</span> <span class="k">tell</span>
</code></pre>
</div>

<p>This first triggers Siri, waits for a couple of seconds, and then types in our command. In the script, this functionality is handled by the <code>execute_command</code> function.</p>

<div class="codehilite">
<pre><span></span><code><span class="kn">import</span> <span class="nn">applescript</span>

<span class="k">def</span> <span class="nf">execute_command</span><span class="p">(</span><span class="n">command_text</span><span class="p">:</span> <span class="nb">str</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="kc">None</span><span class="p">:</span>
<span class="w">    </span><span class="sd">&quot;&quot;&quot;Execute a Siri command.&quot;&quot;&quot;</span>

    <span class="n">script</span> <span class="o">=</span> <span class="n">applescript</span><span class="o">.</span><span class="n">AppleScript</span><span class="p">(</span><span class="sa">f</span><span class="s2">&quot;&quot;&quot;</span>
<span class="s2">        tell application &quot;System Events&quot; to tell the front menu bar of process &quot;SystemUIServer&quot;</span>
<span class="s2">            tell (first menu bar item whose description is &quot;Siri&quot;)</span>
<span class="s2">                perform action &quot;AXPress&quot;</span>
<span class="s2">            end tell</span>
<span class="s2">        end tell</span>

<span class="s2">        delay 2</span>

<span class="s2">        tell application &quot;System Events&quot;</span>
<span class="s2">            set textToType to &quot;</span><span class="si">{</span><span class="n">command_text</span><span class="si">}</span><span class="s2">&quot;</span>
<span class="s2">            keystroke textToType</span>
<span class="s2">            key code 36</span>
<span class="s2">        end tell</span>
<span class="s2">    &quot;&quot;&quot;</span><span class="p">)</span>

    <span class="n">script</span><span class="o">.</span><span class="n">run</span><span class="p">()</span>
</code></pre>
</div>

<h3 id="multi-intent-commands">Multi-Intent Commands</h3>

<p>We can call OpenAI's API to autocomplete our prompt and extract multiple commands. We don't need to use OpenAI's API, and can also simply use Google's Flan-T5 model using HuggingFace's transformers library. </p>

<h4 id="ze-prompt">Ze Prompt</h4>

<div class="codehilite">
<pre><span></span><code>You are provided with multiple commands as a single command. Break down all the commands and return them in a list of strings. If you are provided with a single command, return a list with a single string, trying your best to understand the command.

    Example:
    Q: &quot;Turn on the lights and turn off the lights&quot;
    A: [&quot;Turn on the lights&quot;, &quot;Turn off the lights&quot;]

    Q: &quot;Switch off the lights and then play some music&quot;
    A: [&quot;Switch off the lights&quot;, &quot;Play some music&quot;]

    Q: &quot;I am feeling sad today, play some music&quot;
    A: [&quot;Play some cheerful music&quot;]

    Q: &quot;{command_text}&quot;
    A:
</code></pre>
</div>

<p>This prompt gives the model a few examples to increase the generation accuracy, along with instructing it to return a Python list. </p>

<h4 id="ze-code">Ze Code</h4>

<div class="codehilite">
<pre><span></span><code><span class="kn">import</span> <span class="nn">openai</span>

<span class="kn">from</span> <span class="nn">os</span> <span class="kn">import</span> <span class="n">getenv</span>

<span class="n">openai</span><span class="o">.</span><span class="n">api_key</span> <span class="o">=</span> <span class="n">getenv</span><span class="p">(</span><span class="s2">&quot;OPENAI_KEY&quot;</span><span class="p">)</span>
<span class="n">engine</span> <span class="o">=</span> <span class="s2">&quot;text-davinci-003&quot;</span>

<span class="k">def</span> <span class="nf">execute_with_llm</span><span class="p">(</span><span class="n">command_text</span><span class="p">:</span> <span class="nb">str</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="kc">None</span><span class="p">:</span>
    <span class="n">llm_prompt</span> <span class="o">=</span> <span class="sa">f</span><span class="s2">&quot;&quot;&quot;You are provided with multiple commands as a single command. Break down all the commands and return them in a list of strings. If you are provided with a single command, return a list with a single string, trying your best to understand the command.</span>

<span class="s2">    Example:</span>
<span class="s2">    Q: &quot;Turn on the lights and turn off the lights&quot;</span>
<span class="s2">    A: [&quot;Turn on the lights&quot;, &quot;Turn off the lights&quot;]</span>

<span class="s2">    Q: &quot;Switch off the lights and then play some music&quot;</span>
<span class="s2">    A: [&quot;Switch off the lights&quot;, &quot;Play some music&quot;]</span>

<span class="s2">    Q: &quot;I am feeling sad today, play some music&quot;</span>
<span class="s2">    A: [&quot;Play some cheerful music&quot;]</span>

<span class="s2">    Q: &quot;</span><span class="si">{</span><span class="n">command_text</span><span class="si">}</span><span class="s2">&quot;</span>
<span class="s2">    A: </span>
<span class="s2">    &quot;&quot;&quot;</span>

    <span class="n">completion</span> <span class="o">=</span> <span class="n">openai</span><span class="o">.</span><span class="n">Completion</span><span class="o">.</span><span class="n">create</span><span class="p">(</span><span class="n">engine</span><span class="o">=</span><span class="n">engine</span><span class="p">,</span> <span class="n">prompt</span><span class="o">=</span><span class="n">llm_prompt</span><span class="p">,</span> <span class="n">max_tokens</span><span class="o">=</span><span class="nb">len</span><span class="p">(</span><span class="n">command_text</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s2">&quot; &quot;</span><span class="p">))</span><span class="o">*</span><span class="mi">2</span><span class="p">)</span>

    <span class="k">for</span> <span class="n">task</span> <span class="ow">in</span> <span class="nb">eval</span><span class="p">(</span><span class="n">completion</span><span class="o">.</span><span class="n">choices</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">.</span><span class="n">text</span><span class="p">):</span> <span class="c1"># NEVER EVAL IN PROD RIGHT LIKE THIS</span>
        <span class="n">execute_command</span><span class="p">(</span><span class="n">task</span><span class="p">)</span>
</code></pre>
</div>

<h3 id="gluing-together-code">Gluing together code</h3>

<p>To finish it all off, we can use argparse to only send the input command to OpenAI when asked to do so.</p>

<div class="codehilite">
<pre><span></span><code><span class="kn">import</span> <span class="nn">argparse</span>

<span class="k">if</span> <span class="vm">__name__</span> <span class="o">==</span> <span class="s2">&quot;__main__&quot;</span><span class="p">:</span>
    <span class="n">parser</span> <span class="o">=</span> <span class="n">argparse</span><span class="o">.</span><span class="n">ArgumentParser</span><span class="p">()</span>
    <span class="n">parser</span><span class="o">.</span><span class="n">add_argument</span><span class="p">(</span><span class="s2">&quot;command&quot;</span><span class="p">,</span> <span class="n">nargs</span><span class="o">=</span><span class="s2">&quot;?&quot;</span><span class="p">,</span> <span class="nb">type</span><span class="o">=</span><span class="nb">str</span><span class="p">,</span> <span class="n">help</span><span class="o">=</span><span class="s2">&quot;The command to pass to Siri&quot;</span><span class="p">,</span> <span class="n">default</span><span class="o">=</span><span class="s2">&quot;What time is it?&quot;</span><span class="p">)</span>
    <span class="n">parser</span><span class="o">.</span><span class="n">add_argument</span><span class="p">(</span><span class="s1">&#39;--openai&#39;</span><span class="p">,</span> <span class="n">action</span><span class="o">=</span><span class="n">argparse</span><span class="o">.</span><span class="n">BooleanOptionalAction</span><span class="p">,</span> <span class="n">help</span><span class="o">=</span><span class="s2">&quot;Use OpenAI to detect multiple intents&quot;</span><span class="p">,</span> <span class="n">default</span><span class="o">=</span><span class="kc">False</span><span class="p">)</span>
    <span class="n">args</span> <span class="o">=</span> <span class="n">parser</span><span class="o">.</span><span class="n">parse_args</span><span class="p">()</span>

    <span class="k">if</span> <span class="n">args</span><span class="o">.</span><span class="n">openai</span><span class="p">:</span>
        <span class="n">execute_with_llm</span><span class="p">(</span><span class="n">args</span><span class="o">.</span><span class="n">command</span><span class="p">)</span>
    <span class="k">else</span><span class="p">:</span>
        <span class="n">execute_command</span><span class="p">(</span><span class="n">args</span><span class="o">.</span><span class="n">command</span><span class="p">)</span>
</code></pre>
</div>

<h2 id="conclusion">Conclusion</h2>

<p>Siri is still dumb. When I ask it to <code>Switch off the lights</code>, it default to the home thousands of miles away. But, this code snippet definitely does work!</p>

	<blockquote>If you have scrolled this far, consider subscribing to my mailing list <a href="https://listmonk.navan.dev/subscription/form">here.</a> You can subscribe to either a specific type of post you are interested in, or subscribe to everything with the "Everything" list.</blockquote>
	<script data-isso="https://comments.navan.dev/"
        src="https://comments.navan.dev/js/embed.min.js"></script>
	<section id="isso-thread">
	    <noscript>Javascript needs to be activated to view comments.</noscript>
	</section>
</main>

    <script src="assets/manup.min.js"></script>
    <script src="/pwabuilder-sw-register.js"></script>    
</body>
</html>