Introducing the Rational Speech Act framework

Day 1: Language understanding as Bayesian inference

The Rational Speech Act (RSA) framework views communication as recursive reasoning between a speaker and a listener. The listener interprets the speaker’s utterance by reasoning about a cooperative speaker trying to inform a naive listener about some state of affairs. Using Bayesian inference, the listener infers what the state of the world is likely to be given that a speaker produced some utterance, knowing that the speaker is reasoning about how a listener is most likely to interpret that utterance. Thus, we have (at least) three levels of inference. At the top, the sophisticated, pragmatic listener, L1L_{1}, reasons about the pragmatic speaker, S1S_{1}, and infers the state of the world ss given that the speaker chose to produce the utterance uu. The speaker chooses uu by maximizing the probability that a naive, literal listener, L0L_{0}, would correctly infer the state of the world ss given the literal meaning of uu.

At the base of this reasoning, the naive, literal listener L0L_{0} interprets an utterance according to its meaning. That is, L0L_{0} computes the probability of ss given uu according to the semantics of uu and the prior probability of ss. A standard view of the semantic content of an utterance suffices: a mapping from states of the world to truth values.

PL0(su)[[u]](s)P(s)P_{L_{0}}(s\mid u) \propto [\![u]\!](s) \cdot P(s)
x
 
// possible states of the world
var worldPrior = function() {
  return uniformDraw([
    {shape: "square", color: "blue"},
    {shape: "circle", color: "blue"},
    {shape: "square", color: "green"}
  ])
}
// possible one-word utterances
var utterances = ["blue","green","square","circle"]
// meaning funtion to interpret the utterances
var meaning = function(utterance, world){
  return utterance == "blue" ? world.color == "blue" :
  utterance == "green" ? world.color == "green" :
  utterance == "circle" ? world.shape == "circle" :
  utterance == "square" ? world.shape == "square" :
  true
}
// literal listener
var literalListener = function(utterance){
  Infer({method:"enumerate"}, function(){
    var world = worldPrior();
    var uttTruthVal = meaning(utterance, world);
    condition(uttTruthVal == true)
    return world
  })
}
viz.table(literalListener("blue"))
webppl:
seed:

Exercises:

  1. Check what happens with the other utterances.
  2. In the model above, worldPrior() returns a sample from a uniformDraw over the possible world states. What happens when the listener’s beliefs are not uniform over world states? (Hint, use a categorical distribution by calling categorical({ps: [list_of_probabilities], vs: [list_of_states]})).

Fantastic! We now have a way of integrating a listener’s prior beliefs about the world with the truth functional meaning of an utterance.

What about speakers? Speech acts are actions; thus, the speaker is modeled as a rational (Bayesian) actor. He chooses an action (e.g., an utterance) according to its utility. The speaker simulates taking an action, evaluates its utility, and chooses actions in proportion to their utility. This is called a softmax optimal agent; a fully optimal agent would choose the action with the highest utility all of the time. (This kind of model is called action as inverse planning; for more on this, see agentmodels.org.)

In the code box below you’ll see a generic softmax agent model. Note that in this model, agent uses factor (not condition). factor is a continuous (or, softer) version of condition that takes real numbers as arguments (instead of binary truth values). Higher numbers (here, utilities) upweight the probabilities of the actions associated with them.

 
// define possible actions
var actions = ['a1', 'a2', 'a3'];
// define some utilities for the actions
var utility = function(action){
  var table = { 
    a1: -1, 
    a2: 6, 
    a3: 8
  };
  return table[action];
};
// define speaker optimality
var alpha = 1
// define a rational agent who chooses actions 
// according to their expected utility
var agent = Infer({ method: 'enumerate' }, function(){
    var action = uniformDraw(actions);
    factor(alpha * utility(action));
    return action;
});
print("the probability that an agent will take various actions:")
viz.auto(agent);
webppl:
seed:

Exercises:

  1. Explore what happens when you change the agent’s optimality.
  2. Explore what happens when you change the utilities.

In language understanding, the utility of an utterance is how well it communicates the state of the world ss to a listener. So, the speaker S1S_{1} chooses utterances uu to communicate the state ss to the hypothesized literal listener L0L_{0}. Another way to think about this: S1S_{1} wants to minimize the effort L0L_{0} would need to arrive at ss from uu, all while being efficient at communicating. S1S_{1} thus seeks to minimize the surprisal of ss given uu for the literal listener L0L_{0}, while bearing in mind the utterance cost, C(u)C(u). (This trade-off between efficacy and efficiency is not trivial: speakers could always use minimal ambiguity, but unambiguous utterances tend toward the unwieldy, and, very often, unnecessary. We will see this tension play out later in the course.)

Speakers act in accordance with the speaker’s utility function US1U_{S_{1}}: utterances are more useful at communicating about some state as surprisal and utterance cost decrease.

US1(u;s)=log(L0(su))C(u)U_{S_{1}}(u; s) = log(L_{0}(s\mid u)) - C(u)

(In WebPPL, log(L0(su))log(L_{0}(s\mid u)) can be accessed via literalListener(u).score(s).)

With this utility function in mind, S1S_{1} computes the probability of an utterance uu given some state ss in proportion to the speaker’s utility function US1U_{S_{1}}. The term α>0\alpha > 0 controls the speaker’s optimality, that is, the speaker’s rationality in choosing utterances.

PS1(us)exp(αUS1(u;s))P_{S_{1}}(u\mid s) \propto exp(\alpha U_{S_{1}}(u; s))
 
// pragmatic speaker
var speaker = function(world){
  Infer({method:"enumerate"}, function(){
    var utterance = utterancePrior();
    factor(alpha * literalListener(utterance).score(world))
    return utterance
  })
}
webppl:
seed:

Exercise: Check the speaker’s behavior for a blue square.

We now have a model of the generative process of an utterance. With this in hand, we can imagine a listener who thinks about this kind of speaker.

The pragmatic listener L1L_{1} computes the probability of a state ss given some utterance uu. By reasoning about the speaker S1S_{1}, this probability is proportional to the probability that S1S_{1} would choose to utter uu to communicate about the state ss, together with the prior probability of ss itself. In other words, to interpret an utterance, the pragmatic listener considers the process that generated the utterance in the first place. (Note that the listener model uses observe, which functions like factor with α\alpha set to 11.)

PL1(su)PS1(us)P(s)P_{L_{1}}(s\mid u) \propto P_{S_{1}}(u\mid s) \cdot P(s)
 
// pragmatic listener
var pragmaticListener = function(utterance){
  Infer({method:"enumerate"}, function(){
    var world = worldPrior();
    observe(speaker(world), utterance)
    return world
  })
}
webppl:
seed:

Within the RSA framework, communication is thus modeled as in Fig. 1, where L1L_{1} reasons about S1S_{1}’s reasoning about a hypothetical L0L_{0}.

Fig. 1: Graphical representation of the Bayesian RSA model.

Fig. 1: Bayesian RSA schema.

Application 1: Simple referential communication

In its initial formulation, Frank and Goodman (2012) use the basic RSA framework to model referent choice in efficient communication. To see the mechanism at work, imagine a referential communication game with three objects, as in Fig. 2.

Fig. 2: Example referential communication scenario from Frank & Goodman (2012). Speakers choose a single word, $$u$$, to signal an object, $$s$$.

Fig. 2: Example referential communication scenario from Frank and Goodman. Speakers choose a single word, u, to signal an object, s.

Suppose a speaker wants to signal an object, but only has a single word with which to do so. Applying the RSA model schematized in Fig. 1 to the communication scenario in Fig. 2, the speaker S1S_{1} chooses a word uu to best signal an object ss to a literal listener L0L_{0}, who interprets uu in proportion to the prior probability of naming objects in the scenario (i.e., to an object’s salience, P(s)P(s)). The pragmatic listener L1L_{1} reasons about the speaker’s reasoning, and interprets uu accordingly. By formalizing the contributions of salience and efficiency, the RSA framework provides an information-theoretic definition of informativeness in pragmatic inference.

 
// Here is the code from the Frank and Goodman RSA model
// possible states of the world
var worldPrior = function() {
  return uniformDraw([
    {shape: "square", color: "blue"},
    {shape: "circle", color: "blue"},
    {shape: "square", color: "green"}
  ])
}
// possible one-word utterances
var utterances = ["blue","green","square","circle"]
// meaning funtion to interpret the utterances
var meaning = function(utterance, world){
  return utterance == "blue" ? world.color == "blue" :
  utterance == "green" ? world.color == "green" :
  utterance == "circle" ? world.shape == "circle" :
  utterance == "square" ? world.shape == "square" :
  true
}
// literal listener
var literalListener = function(utterance){
  Infer({method:"enumerate"},
        function(){
    var world = worldPrior()
    condition(meaning(utterance, world))
    return world
  })
}
// set speaker optimality
var alpha = 1
// pragmatic speaker
var speaker = function(world){
  Infer({method:"enumerate"},
        function(){
    var utterance = uniformDraw(utterances)
    factor(alpha * literalListener(utterance).score(world))
    return utterance
  })
}
// pragmatic listener
var pragmaticListener = function(utterance){
  Infer({method:"enumerate"},
        function(){
    var world = worldPrior()
    observe(speaker(world),utterance)
    return world
  })
}
print("literal listener's interpretation of 'blue':")
viz.table(literalListener( "blue"))
print("speaker's utterance distribution for a blue circle:")
viz.table(speaker({shape:"circle", color: "blue"}))
print("pragmatic listener's interpretation of 'blue':")
viz.table(pragmaticListener("blue"))
webppl:
seed:

Exercises:

  1. Explore what happens if you make the speaker more optimal.
  2. Add another object to the scenario.
  3. Add a new multi-word utterance.
  4. Check the behavior of the other possible utterances.

In the next chapter, we’ll see how RSA models have been developed to model more complex aspects of pragmatic reasoning and language understanding.


Table of Contents