summaryrefslogtreecommitdiff
path: root/Content/posts/2023-02-08-Interact-with-siri-from-the-terminal.md
blob: 78de77cdf19c3a3e86ddcc6dd83e04bf9f389271 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
---
date: 2023-02-08 17:21
description: Code snippet to interact with Siri by issuing commands from the command-line.
tags: Tutorial, Code-Snippet, Python, Siri, macOS, AppleScript
---

# Interacting with Siri using the command line

My main objective was to see if I could issue multi-intent commands in one go. Obviously, Siri cannot do that (neither can Alexa, Cortana, or Google Assistant). The script here can issue either a single command, or use the help of OpenAI's DaVinci model to extract multiple commands and pass them onto siri.

## Prerequisites

* Run macOS
* Enable type to Siri (Settings > Accessibility -> Type to Siri)
* Enable the Terminal to control System Events (The first time you run the script, it will prompt you to enable it)

## Show me ze code

If you are here just for the code:

```python
import argparse
import applescript
import openai

from os import getenv

openai.api_key = getenv("OPENAI_KEY")
engine = "text-davinci-003"

def execute_with_llm(command_text: str) -> None:
	llm_prompt = f"""You are provided with multiple commands as a single command. Break down all the commands and return them in a list of strings. If you are provided with a single command, return a list with a single string, trying your best to understand the command.
	
	Example:
	Q: "Turn on the lights and turn off the lights"
	A: ["Turn on the lights", "Turn off the lights"]

	Q: "Switch off the lights and then play some music"
	A: ["Switch off the lights", "Play some music"]

	Q: "I am feeling sad today, play some music"
	A: ["Play some cheerful music"]

	Q: "{command_text}"
	A: 
	"""

	completion = openai.Completion.create(engine=engine, prompt=llm_prompt, max_tokens=len(command_text.split(" "))*2)

	for task in eval(completion.choices[0].text):
		execute_command(task)


def execute_command(command_text: str) -> None:
	"""Execute a Siri command."""

	script = applescript.AppleScript(f"""
		tell application "System Events" to tell the front menu bar of process "SystemUIServer"
			tell (first menu bar item whose description is "Siri")
				perform action "AXPress"
			end tell
		end tell

		delay 2

		tell application "System Events"
			set textToType to "{command_text}"
			keystroke textToType
			key code 36
		end tell
	""")

	script.run()


if __name__ == "__main__":
	parser = argparse.ArgumentParser()
	parser.add_argument("command", nargs="?", type=str, help="The command to pass to Siri", default="What time is it?")
	parser.add_argument('--openai', action=argparse.BooleanOptionalAction, help="Use OpenAI to detect multiple intents", default=False)
	args = parser.parse_args()

	if args.openai:
		execute_with_llm(args.command)
	else:
		execute_command(args.command)
```

Usage:

```bash
python3 main.py "play some taylor swift"
python3 main.py "turn off the lights and play some music" --openai
```

## ELI5 

I am not actually going to explain it as if I am explaining to a five-year old kid.

### AppleScript

In the age of Siri Shortcuts, AppleScript can still do more. It is a scripting language created by Apple that can help you automate pretty much anything you see on your screen.

We use the following AppleScript to trigger Siri and then type in our command:

```applescript
tell application "System Events" to tell the front menu bar of process "SystemUIServer"
	tell (first menu bar item whose description is "Siri")
		perform action "AXPress"
	end tell
end tell

delay 2

tell application "System Events"
	set textToType to "Play some rock music"
	keystroke textToType
	key code 36
end tell
```

This first triggers Siri, waits for a couple of seconds, and then types in our command. In the script, this functionality is handled by the `execute_command` function.

```python
import applescript

def execute_command(command_text: str) -> None:
	"""Execute a Siri command."""

	script = applescript.AppleScript(f"""
		tell application "System Events" to tell the front menu bar of process "SystemUIServer"
			tell (first menu bar item whose description is "Siri")
				perform action "AXPress"
			end tell
		end tell

		delay 2

		tell application "System Events"
			set textToType to "{command_text}"
			keystroke textToType
			key code 36
		end tell
	""")

	script.run()
```

### Multi-Intent Commands

We can call OpenAI's API to autocomplete our prompt and extract multiple commands. We don't need to use OpenAI's API, and can also simply use Google's Flan-T5 model using HuggingFace's transformers library. 

#### Ze Prompt

```text
You are provided with multiple commands as a single command. Break down all the commands and return them in a list of strings. If you are provided with a single command, return a list with a single string, trying your best to understand the command.
	
	Example:
	Q: "Turn on the lights and turn off the lights"
	A: ["Turn on the lights", "Turn off the lights"]

	Q: "Switch off the lights and then play some music"
	A: ["Switch off the lights", "Play some music"]

	Q: "I am feeling sad today, play some music"
	A: ["Play some cheerful music"]

	Q: "{command_text}"
	A:
```

This prompt gives the model a few examples to increase the generation accuracy, along with instructing it to return a Python list. 


#### Ze Code

```python
import openai

from os import getenv

openai.api_key = getenv("OPENAI_KEY")
engine = "text-davinci-003"

def execute_with_llm(command_text: str) -> None:
	llm_prompt = f"""You are provided with multiple commands as a single command. Break down all the commands and return them in a list of strings. If you are provided with a single command, return a list with a single string, trying your best to understand the command.
	
	Example:
	Q: "Turn on the lights and turn off the lights"
	A: ["Turn on the lights", "Turn off the lights"]

	Q: "Switch off the lights and then play some music"
	A: ["Switch off the lights", "Play some music"]

	Q: "I am feeling sad today, play some music"
	A: ["Play some cheerful music"]

	Q: "{command_text}"
	A: 
	"""

	completion = openai.Completion.create(engine=engine, prompt=llm_prompt, max_tokens=len(command_text.split(" "))*2)

	for task in eval(completion.choices[0].text): # NEVER EVAL IN PROD RIGHT LIKE THIS
		execute_command(task)
```


### Gluing together code

To finish it all off, we can use argparse to only send the input command to OpenAI when asked to do so.

```python
import argparse

if __name__ == "__main__":
	parser = argparse.ArgumentParser()
	parser.add_argument("command", nargs="?", type=str, help="The command to pass to Siri", default="What time is it?")
	parser.add_argument('--openai', action=argparse.BooleanOptionalAction, help="Use OpenAI to detect multiple intents", default=False)
	args = parser.parse_args()

	if args.openai:
		execute_with_llm(args.command)
	else:
		execute_command(args.command)
```

## Conclusion

Siri is still dumb. When I ask it to `Switch off the lights`, it default to the home thousands of miles away. But, this code snippet definitely does work!