Speech systems know only a restricted language and consequently will not be able to interpret utterances that fall outside the currently active language. On the other hand, users bring into the situation assumptions based on the characteristics of human-human communication (in particular that they are dealing with an intelligent entity with flexible understanding skills).
Guide the user into acceptable language in the following ways:
Prompts can be phrased in such a way as to indicate the system's expectation, e.g., ``Do you want to proceed? Please say Yes or No''.
If a small number of alternatives are legal, display these in a menu. Alternately, provide legal sentence frames, either on-screen or in a browsable list.
Many tasks are sufficiently restricted in domain that with sufficient observation of user behavior, a reasonably complete domain language can be built. Of course, this only works if the necessary resources are available.
The real problem here is of course the need to learn the interface. A user familiar with the task and experienced with the interface will have much less difficulty staying within the bounds of the language.