How Paul Boechler Created Live Stadium Crowds From FIFA Recordings

Due to the current COVID-19 situation, a huge majority of sports stadiums all around the world have no audience whatsoever attending. To increase viewers’ engagement during the transmission of the matches, some international broadcasters have requested Electronic Arts (EA) to help them simulate crowd sounds during their broadcasts. Paul Boechler, sound designer on EA’s FIFA franchise, was the one responsible for creating the dynamic crowd audio playback system that is currently being used by broadcast companies such as Sky Sports, BT and BBC. And he was very kind to share his experience with us and to also tell us a little bit of the process behind the creation of the sounds used in the games.

How paul boechler created live stadium crowds from fifa recordings

How does the recording process work?

Through a mixture of attending and recording football matches ourselves, as well as leveraging our various partnerships, we’ve been able to build an incredibly large library of audio recordings from all over the world. We work really hard to continue to grow and expand this library, as we are always trying to improve our representation of different countries and leagues.

When we’re trying to authentically represent a crowd in our game, there’s a lot of different layers to it. Because, for instance, reactions such as a goal cheer don’t sound the same as they do in the UK or as they do in Spain. Thus we have to make sure that these reactions (goals, misses, etc.) sound unique to each region, and then we need to have enough recordings to reasonably cover all the different reactions a crowd can do. Then, within each of these reactions, we need further variety. Because a goal cheer doesn’t sound the same if it’s a blow-out goal, compared to a consolation goal, or a goal to clinch a cup.

Simply put, we just need a lot of content, we can’t just show up to one game, on one day, record it, and assume “Okay, we’ve got one English Premier League match recorded and that will be all of the UK’s crowds covered”. It may have been raining that day, it may have been a big win, or a loss, any number of possible factors.
One match recording doesn’t capture the full range of emotions of an entire sport. And trends change over time as well, so prioritizing these recording efforts really helped us improve and maintain audio quality over the years.

So, for these types of recordings, if they are tied to a specific feature or if we’re doing something new and unique, that’s when I might attend a game to record it and make sure that I’m the one sitting by the recorder just to give us more confidence or be able to make adjustments on the fly. Then, obviously we've got a big database where we keep track of all the metadata. What were the home teams, the results, and other match details. Some games get rained out, some games have lesser attendance for whatever reason, so that way if I’m looking for maybe a cup tie, between two large English clubs, that went to penalties, I can search that and then I can go find audio content from that match for what I need. The great result from this effort is that everything is authentic, so we’re never leaning on generic crowd libraries or other sport crowd recordings or anything like that to make the sound of our game.

Is there any special type of setup you use when you record and is it any different from the ones the broadcasters use?

No, not particularly. We just try to get good quality microphones and place them in locations we know will give us the best coverage. When the ambisonics and surround mics started becoming widely available, there was a lot of interest around that because, obviously, if we could set up one microphone and get something that sounded like a full stadium it’d be much easier. But those are still only one microphone, in one location, within an entire stadium. So it always feels a bit narrower and doesn’t always yield the best results.

What tends to work best for us is recording with many microphones around the pitch and that’s what really helps to sell the size of the stadium. You start picking up the timing difference and delay between microphones, instead of just relying on the rejection axis of a single microphone array in one spot. If you have two microphones that are just far apart, you’re going to pick up that sound delay as a much more discrete and manageable quality of the recording, which definitely helps convey the size and improve the final quality.

We might focus our mics on a particular supporter group or section, or we might focus just on our distance from the crowd. Because if you put your microphone close to the crowd, you’ll get lots of nice details and presence, but it also sounds like a much smaller crowd. You might have two thousand people in front of your mic, but if you’re only 1 or 2 meters away, you’re really just picking up the front row and the group size almost just becomes irrelevant background noise. However, if you get your mics further away, it can definitely start to sound like a bigger crowd, but you run the risk of not hearing the diction or pronunciation of chants or reactions, and recordings can lose some detail that helps make reactions feel distinct. Sometimes this type of softer, washed out crowd sound is what we want in order to, for instance, sell the difference of some larger crowd sounds versus a smaller crowd. So I would say that’s what differs the most, that’s just where we are focusing our particular recording if we attend, and also our distance to the supporters, but as far as microphones or technical aspects. Just good microphones, cable runs that don’t interfere with photographers and security, and try to place them so they don’t get hit by a wide shot.

When you go to a place to record, do you already have in mind what you want to record?

Yeah, if we’re showing up, we likely need a particular crowd size, or maybe a closer perspective, and we definitely have a plan before going to the stadium. We obviously have a lot of respect for the clubs, the leagues, the broadcasters, and the stadium staff that allow us that access to the pitch. So we don’t want to take unnecessary risks or abuse the privilege of that access by just aimlessly showing up and running around on the sidelines with a bunch of microphones. This helps us maintain a good relationship with our partners and ensure we can leverage that access again in the future.

How do you go from recording to organizing all the assets to editing?

Once we have a match recorded, we listen through it for everything that might be useful. There’s a lot of information that you can learn about a region or about a club, and we take everything that we can from a match recording. Obviously not all of that can get used in the game: it could be for licensing restrictions; it could be through offensive content; there’s a lot of reasons why something doesn’t make it into the game.

But In a nutshell, we edit everything out, it gets reviewed by a separate team and when that comes back, it enables us to know what we should work on - mainly so we’re not spending time on offensive material - and then we can give a qualitative assessment of “Is this good enough to go into our game?”. Then we’ll go in and mix and master the content to create surround assets that will go into our game. From there we can play the game, evaluate it at runtime, and then we will do further mixing in the game to balance everything together.

How do you go about making everything sound more or less concise?

Obviously stadiums all sound different, regions sound different and matches are going to sound different from one day to the next. Crowd noise, especially like this when it’s such broad spectrum noise - it’s almost white noise at times. But you can really pick up on the tonality and quality differences from one match to another. Even though your source is almost the same (big crowds), no two matches sound alike.

So the short answer here is we make everything feel cohesive with EQ. Sometimes it’s very minimal and subtle. Just a high pass filter and it’s amazing. Other times it’s ridiculous, aggressive, and I never want anyone to see my EQ settings.

Sometimes a microphone might sound very off axis from the supporters that are singing, so even though you’re trying to still make it sound smooth and balanced, it can mean some very aggressive EQ is needed. But when you combine all the various microphones it should work as a collection that ends up sounding full and still allows us to hear what’s being sung/said. I also try to avoid a lot of aggressive limiting and downward compression, because that’s going to do the opposite of what we want and it will basically just turn up the background din.
So we lean a lot on upward expansion, dynamic EQ, and just manual automation to help improve dynamics and help ensure the best moments stand out, and it’s not just layers of compressed noise.

How do you usually go about implementing the sounds?

We use a proprietary tool, and then we have our internal Frostbite engine that runs our game. So it’s quite similar to how other people may use say Wwise, or FMOD, and then build their content from there into an engine like Unity or Unreal Engine. So the method for getting it in isn’t that unique. I think that the difference is just that our particular tool is quite good for mixing and something that has been worked on over the years. It’s not just setting the mix and then leaving it saying “Good! The game is mixed, it’s done”, because as soon as you add more content, or add new game modes, then you need to revisit the mix.

Sometimes I think people believe the sound of a sports game or the sound of a crowd is like “You go record a crowd cheer, just put in the crowd cheer and crowds are done”, but there’s a lot more to create a dynamic and compelling crowd atmosphere, and it’s very much like the speech system in other games where they need to be able to interrupt themselves and respond to different variables, but rather than a speech for one person we just have speech samples for a group of 7k people, 15k people, 20k, 40k, etc. Loops aren’t a compelling way for anybody to have an atmosphere because even if the loop is long, eventually something is going to catch the player's ear, they’ll hear it’s a loop, and they’re not going to be able to unhear it: they’re always going to hear a bird chirp, or somebody talking, or whatever. So we try and avoid loops wherever possible.

Generally, we build up a lot of layers of discrete crowd elements and try to make those layers change with different game variables. There’s a lot of things that you can do to really help push the story that unfolds throughout a game and that’s something we definitely look at emphasizing through how we do our implementation and in the mixing process.

How did this partnership with real football leagues and broadcasters begin and how did you handle it?

They approached us in late April (2020), which was around the time where leagues were cleared to start in late May/early June, and that’s when I started to see some emails come in saying “Hey, could you do something? Could you maybe put something together?”. So myself and our audio producer Andrew Vance started brainstorming. I think the reason why the leagues and the broadcasters started coming to us was because they knew we had been interested in recording football crowds for years. We were already running an automated crowd system in the game and they knew we could create a quality crowd atmosphere. But I also think that they were hoping that we just had some kind of magical process that would be easily automated.

I even started seeing some comments on twitter from fans after football resumed with no crowd audio for some leagues. And people were discussing this challenge like, “This is easy. Just track the position of the ball and tie the crowd sounds to that! It’ll be fine”. But, how would we differentiate between an odd man attacking rush, versus just a simple back pass to a keeper? The position of the ball doesn’t tell you anything about the context of the game, it just tells you the position of the ball. So even if we wanted to do something basic, like have the crowd cheer when the ball gets kicked into the net. How do we identify a good goal versus an own goal? Versus a goal that was just called back due to offside or VAR? So suddenly you need exception handling and maybe an interrupt system to say “Okay, I need to get the player ID, I need to check against the team ID”. It gets complicated very quickly, and that’s for an “easy” reaction. Forget more subjective and complex reactions like tackles or out of play or sarcastic cheers… There’s no way an automated system is going to pick up on all the subtleties that go into actually watching and reacting to a game. So the idea of some magical automated system that could analyze live game footage and react in real time with no errors was just not possible in the timeframe we had.
jco10767295

That’s what kind of led us down the road of needing a person to be in control. Having a person in control was the only method that was going to be able to keep up with the speed of a game, reduce the risks of errors and to just have acceptable levels of quality overall. Then we had to look at what a person could use to operate this playback in real time. At first, we thought we could do some kind of a debug version of the game where they would have to hit different options. For instance, we have a debug system for testing where we can open a menu and say like “Force yellow card” but it’s from a drop-down menu foul...yellow card...home team, and there’s no way someone is going to be able to do that in real time and keep up with this, the interface just wasn’t made for this kind of fast operation. So that’s when I started looking at Ableton Live.

Since it is an audio live performance tool (and very stable), we could take our audio assets out of the game and create a Live project for someone to operate. Live also enabled us to use a mixture of linear and nonlinear playback functions, so you can have something looping while also firing off dedicated one-shot events separately. Basically you can start to get to a point where the Ableton project starts to emulate manually what our game does automatically.

So first we got one asset that’s looping, but that’s just a kind of background noise people won’t probably even notice it that much, but it helps glue everything together and fill in space between assets. Then the chants for the home and the away team are working on a random looping structure rather than one long audio file that’s just hundreds of chants for one club. We’ve got all our specific chants on different clips in Ableton and then the project can just randomly pick between which one to play, so it waits for a chant to end, then picks another one, and then it waits for a chant to end, and picks another one and it’s doing that for the home and the away team while it’s still looping the background bed. So in that way this process frees up the operator to just basically watch gameplay and fire events for specific reactions with sounds that we’ve provided them.

These are a suite of other one shot reactions that cover various gameplay events so if there’s a good defense tackle, the operator can use a strong applause, if there’s a weak challenge, maybe the player goes down and the audience think they’re embellishing, the operator can have some jeers and some kind of foul appeal sounds. Basically, the operators have an instrument in front of them that, through some dynamic processing, emulates the mix that’s happening in the game where, for instance, if the ball goes out of play but it was because of a good defensive challenge, the operator will hit some applause and that will attenuate chants, and those will attenuate the background loops. And through this dynamic structure, we make sure that we’re not just adding noise on top of noise, and consequently that we are maintaining headroom.

It’s been used now for hundreds of EPL and La Liga matches with no errors or dropouts. We’ve seen viewership numbers increase for matches with this crowd noise, and more people switch to feeds that use this crowd noise. It’s obviously just a stop gap until hopefully supporters can return and we can go back to hearing real fans in the stadiums. But hopefully it makes the game feel a bit more natural to watch, and I think it just highlights the importance that crowds and passionate supporters have on the game of football. It’s definitely not the way we intended to showcase our content, but we’ve been really happy to provide something back to our league and broadcast partners to help get them through this difficult time.

Thank you for sharing your whole process with us, Paul! It's incredibly interesting to hear all about the behind the scenes of making the sound design of FIFA, but also how that job was able to help with a whole new process of using those skills to bring more life to watching a game in pandemic times!

If you liked this interview, check out this interview we did with Dan Kenyon, sound designer and sound effects editor of Star Trek: Discovery, or this interview with our very own founder, Nuno Fonseca!

How Paul Boechler Created Live Stadium Crowds From FIFA Recordings