11
Advanced: Sports Dialogue and Crowds

Summary: Crowds, color commentary, play-by-play commentary, concatenation for dialogue, queuing dialogue

Project:DemoCh11Soccer01 Level:Soccer01A/B

Introduction

In terms of dialogue, sports games present huge challenges. Dialogue is there for authenticity, reinforcement, information, and advice, but the speech, the crowd, the camera, and graphic overlays all need to match and make sense.

Crowd Systems

Open the Level Soccer01A.

Reactive crowd systems are reasonably straightforward from an audio point of view since the nature of the crowd sounds is similar to a broadband noise, which can be effective in masking elements that come in and out. How convincing it is, to a great extent, will be determined by the accuracy of the information you can get from the game system itself as to the current play conditions. Given that we’re British, we’ll obviously use the game of football, which we’ll refer to as soccer so as not to confuse our North American cousins.;-)

fig0554

WASD to move, mouse click or space bar to pass/shoot.

Randomized Crowd

The first thing to start with is a base randomized crowd to serve as a backdrop. Here we have a simple crowd loop with some randomized one-shots over the top (the location specific chants could be swapped out depending on the stadium using a -Switch- and <Set Int Parameter>) as seen in Chapter 06.

fig0555

Crowd Excitement Levels

Crowd sounds don’t typically start and stop immediately—they build and ebb, so they are most suited to using curve graphs to control the volume of the different elements (the principle of using a value or parameter to read through volume curves to bring crowd elements in and out could be applied to many other crowd situations, for example the panic level variable of a street crowd when shooting occurs).

The systems for the three main crowd elements, {Crowd_Ambience}, {Crowd_Excitement}, and {Crowd_Frenzy}, are governed by the player’s position on the pitch (based on the premise that, as the player moves closer to the goals at either end, the corresponding supporters will get more excited by the promise of a goal being scored). This is illustrated in the following diagram, with general crowd sound represented in blue, crowd excitement in yellow, and crowd frenzy in red.

fig0556

An <Event Tick> gets the distance between the player and a trigger box that’s been placed at the right hand (red) goalmouth. This value is normalized (i.e., converted into a range between 0.0 and 1.0) and is then used to read through a set of curves that set the volume multiplier of the different elements.

fig0557

For example the away team excitement rises when the ball is in play near to their goalmouth and becomes frenzied when very close.

fig0558

When a goal is attempted, we can simply play the {Miss} or {Goal} waves over the top, and the player is then reset to the middle of the pitch, bringing us back to the {Crowd_Ambience}.

Although the two crowd sounds for the {Crowd_Excitment_Cue} are relatively short (14 seconds and 11 seconds), a sense of repetition is avoided through the use of volume envelopes.

fig0559

In addition to the differing lengths of the loops themselves creating new combinations through their asynchronicity these long looping volume envelopes (of 180 seconds and 145 seconds) help to further vary the resulting sound.

fig0560

Commentary

Sports dialogue can be broadly separated into two categories: play-by-play and color commentary.

The play-by-play commentary will reflect the specific action of the game as it takes place. The color commentator will provide background, analysis, or opinion, often during pauses or lulls in the play-by-play action and often reflecting on what has just happened. Obviously some events demand immediate comment from the play-by-play commentator, so this will override the color commentary.

After an interruption a human would most likely come back to their original topic, but this time phrasing it in a slightly different way (“As I was saying…”). To replicate this in games, we’d have to track exactly where any interruption occurred and have an alternative take for the topic (a recovery version), but where the interruption occurred in the sentence would affect how much meaning has already been conveyed (or not). We’d need to examine how people interrupt themselves or change topic mid-sentence and decide whether to incorporate natural mistakes and fumbles into our system. As you might appreciate, this is going to be very complex, and given that this is an area requiring further research and development even in big budget commercial games, we’re not going to provide a perfect solution here. We will, however, try to propose a couple of approaches, such as ducking or queuing, to alleviate the worst offences.

Color Commentary

As a useful starting point, we’ve used the ambient crowd curve (that is up when the player is in the middle of the pitch) to control when the color commentary might come in since there’s less likely to be any crucial action in this pitch position.

fig0561

The readout of the general crowd curve is checked every tick to see if it is greater than or less than 0.5 (i.e., if the player is in the center of the field). If the player is indeed in this position, then the <Gate> is opened that allows the <Retriggerable Delay> through (the <Random Float Float In Range> gives us a random delay between 10–17 seconds). The <MultiGate> just cycles through its outputs each time it is triggered to select a different color comment type (1—Weather, 2—The team’s current form, and 3—Injuries).

fig0562

This is used to <Set Integer Parameter> to control the -Switch- for the color commentary.

fig0563

Concatenated Dialogue for Play-by-play Commentary

There are certain scenarios (and sports games are one) where simply recording all of the potential dialogue lines becomes a practical impossibility. In the English Premier League, there are 20 teams. In a season each team will play each other team twice, once at home and once away, giving them 38 games. As an example at the start of a match we might want a commentator to say:

“Welcome to today’s match with Team A playing at home/away against Team B. This promises to be an exciting game!”

In order to have this sentence stating each of the possible matches in the football / soccer season, we’d need to record this 760 times. If we now acknowledge that we have 11 players on each team (not including subs!) and each player might intercept a ball that has been passed by another player (or pass to, or tackle, or foul), then you can appreciate that the number of potential dialogue lines becomes huge.

From the sentence above, you can see that it’s only actually certain words within the sentence that need replacing. We could keep the sentence structure the same each time but just swap in or out the appropriate team names and the words “home” or “away.” Sentences usually comprise one or more phrases. By using a phrase-based approach, you can create a compound sentence by stitching together an opening, middle, and end phrase. This method is called concatenation or stitching. We’ve already come across the -Concatenator- node in the Sound Cue that will string together its inputs end-to-end, and this is what we want to happen to our sentence elements.

In our example there are two teams:

Team A

  • 0 Lievsay
  • 1 Freemantle
  • Van der Ryn
  • Boyes

Team B

  • 0 Thom
  • 1 Burtt
  • 2 Murch
  • 3 Rydstrom

The first concatenation system takes the player who has scored as a variable to control the switch inside the cue {Soccer_Concat_A}.

fig0564

This gives us the dialogue line:

“Spectacular goal” “from Murch”
“from Thorn”
“from Burtt”
etc.

Note where we have decided to split the dialogue. Rather than “Spectacular goal from…”/“Murch,” etc., our stitched elements include the “from.” If you say the sentence, you can feel that the word “from” and the player name often flow into each other, so there’s no natural break—making stitching hard and most likely giving a clumsy outcome. Between “goal” and “from,” there’s a more natural gap, although very slight, making this a better place to cut and stitch.

Our two interception cues use the same approach ({Interception} and {Given_it_away}).

fig0565

When the player loses the ball to the opposition, we have version A:

“Nice Interception” “from Murchȁ
“from Thorn”
“from Burtt”
etc.

Version B reverses the order to give the player who lost the ball first, and there’s a variation in the following line:

“Murch” “has given it away!“/”has given the ball away!”
“Thorn”
“Burtt”
etc.

fig0566

There are two randomly chosen concatenation systems for a goal score. The first operates in the same way as the interception system, adding the player name to the end of a line, and the second goal concatenation system actually inserts the player name in the middle of a phrase:

{Soccer_Concat_B}.

“It’s a goal!” “Murch” “puts it in the back of the net!”
“Thorn”
“Burtt”
etc.
fig0567

The final play-by-play system comments on the passing of the ball from one player to another, this time taking the two players involved to determine the -Switch- position within the cue {Pass}.

fig0568

fig0569

“Nice pass” “from Murch” “to Thorn”
“from Thorn” “to Burtt”
“from Burtt” “to Rydstrom”
etc.

As noted above, the importance of the play-by-play frequency outweighs that of the color commentary, so in this first version of the level (Soccer01A), we’ve implemented a passive Sound Mix that automatically ducks out any color commentary if a play-by-play line occurs.

fig0570

Considerations for Concatenated Dialogue

Concatenated dialogue can work well, but the challenge is to get the flow of pitch and spacing over the words feeling right across all versions. It is often the spacing of words that stops it from sounding natural and human (if you want to hear some examples of bad speech concatenation, try phoning up your local utility service or bank). You can also appreciate how a recording session for a concatenated system would differ from a normal session in that you’d want to construct a script so that you get the maximum amount of words you want in the least amount of time, but not have the chunks in such isolation as to lose the natural delivery of the words as they would be in the final context (and as you can imagine, this approach also presents a nightmare for localization where you are dealing with different languages).

Although functional, you can hear that in many cases it does not always sound very natural. In a real match these comments would also vary in inflection depending on the circumstance, for example a pass of the ball a little way down the field would be different from a pass that’s occurring just in front of the goal of the opposing team with only thirty seconds left of the match. It might be that you record two or three different variations to be chosen depending on the current game intensity level. The problem with more excited speech is that we tend to speak more quickly. Even in normal speech, our words are often not actually separate but flow into each other, and this is exacerbated by speed. You recall that we discussed the importance of judging where to cut your sentences, looking out for opportunities where you can stitch seamlessly (for example on stopped consonants such as t, d, p, b, k, and g where the mouth is closed—therefore creating a silence). More seamless stitching can be achieved, but it requires a deep understanding of aspects of speech such as phonemes and prosody that are beyond the remit of this book.

Your system also needs to show an awareness of time and memory by suppressing certain elements if they repeat within certain timeframes. In natural speech we would not continually refer to the player by name but would, after the first expression, replace the name with the pronoun “he” or “she.” However if we came back to that player on another occasion later on, we would reintroduce the use of their name.

All these things would be significant challenges in themselves, but are heightened by our innate sensitivity to speech and language and the huge amount of time that many players spend playing these games. They may be played regularly over a series of months or years during which players are listening to the commentator dialogue all the time. Rarely are game audio systems exposed to such scrutiny as this.

Dialogue Queues

In sports games everything tends to be much more accelerated than it would be in reality. This makes life in terms of repetition more difficult, but more importantly events happen faster than your speech system can keep up. A list of things to talk about will build up, but once they get a chance to play they may no longer be relevant (e.g., play has continued and somebody else now has the ball). You need a system of priorities and expiry times if a second event has overtaken the first or if some things haven’t been said within a certain period, then they are no longer worth saying.

In the level Soccer01B we’ve implemented a queuing and interrupt system so that for each cue we can define whether we want to allow it to be interrupted and can queue up the color commentary. In order to implement our system, we’ve integrated all of the previously discussed play-by-play and color systems into one Sound Cue {Q_Soccer_All}.

fig0571

For each dialogue event, we set a series of parameters that define whether we want to allow this to interrupt any current dialogue, whether we should queue this line so that it will play back after the current dialogue, and if we do queue it, how long it should stay in the queue until dismissed because it is no longer relevant.

fig0572

fig0573

In this instance, only the color dialogue is queued, so all the other dialogue events go to this system.

fig0574

  • Do you want this line to be able to interrupt any currently playing line?
  • True—Go ahead and play it (and fade out any currently playing line).
  • False—Check if any line is actually currently playing (QSoccerAllPlaying Boolean), and if not then you won’t be interrupting anything, so go ahead and play.

The color commentary dialogue goes to this alternative system.

fig0575

  • Is a line currently playing (QSoccerAllPlaying)?
  • False—Go ahead and play the color commentary line.
  • True—Add the Q Colour Type variable (that holds what kind of color comment to make) to the array Queue.

If there is already an item in the array, then this will add the new item to the end and extend the array as necessary.

When any currently playing dialogue line is finished, an event goes into the second <Branch> node. If the array is empty (i.e., length = 0) then it does nothing. If its length is not 0, then after a short <Delay> it <Get>s the first item from the array, removes this first item from the array, and then sets the Q Con Type variable to control the -Switch- in the Sound Cue. After this has finished playing, an event will come back into the second <Branch> node to see if there are any more lines queued up in the array (there’s also a timer event that clears the array should the dialogue not get played within a given timeframe).

Conclusion

Sports commentary in particular requires complex systems and is certainly an area where games need some more development in order to match the expectations of reality over extended play sessions. Like the branching dialogue system in Chapter 06, if you’re going to tackle this in some depth, then you’ll probably want to explore options relating to the use of spreadsheets and data structures to manage your assets. Nobody has quite got the feel of commentators interrupting each other or themselves quite right yet—maybe you’ll be the one to come up with a better solution!

For further reading please see the up-to-date list of books and links on the book website.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset