Only one action needs to be performed, thus the cognitive load on the user is less (since they do not need to remember to initiate an utterance-terminal action).