Be smart about automation: the future is AI driven game testing with Keywords Studios
Mighty Build and Test CTO Ben Britten explores the use of AI-driven tools like BuildBot, TestBot, and ProofBot.
Presented by Keywords Studios
At Keywords Studios, we believe that AI is a creative collaborator and performance enhancer, with the potential to revolutionize the future of our industry.
We take pride in our three world-class AI products:
KantanAI - Our Localization platform, KantanAI, enables clients to localize text content with fast turnaround times, for example translating 30 million words across 35 languages at a throughput of 3k projects per week for an industry leader, and will open games to players in new languages.
Helpshift - Our Player Support platform, Helpshift, uses intelligent bots integrated in mobile apps and consoles, so players do not wait hours or even days for resolution. We target automation of more than 30% of tickets in real-time while giving a highly personalized experience.
Mighty Games - Mighty Build and Test, our Game Development and Testing platform, works with teams to start testing from the checkin and is able to get close to 100% testing coverage. We verify builds, test gameplay and localize projects at scale with innovative AI tools allowing developers to focus on what they do best – making great games.
Automated Testing
In this article I'll be focusing on the automated testing we do at Mighty Build and Test with our AI driven tools, BuildBot, TestBot, and ProofBot.
When discussing automated testing, the first things people often think about are unit tests and code coverage. Unit tests are excellent for testing code, but you must create a test for every method and every edge case. Similarly, when doing gameplay testing, we make test plans with huge lists of regressions and checklists of what needs to work on every new build. These are basically the unit tests of game QA. Go to a place in the game, do a thing, make sure the result is what we expect. Check. This is the quantitative testing. We joke that this QA is Quantity Assurance.
However, this leaves a huge world of bugs. Does it look right? Does the new system you added play well with the existing systems? Is this game even fun? Why are the horses climbing up the ladders? This is the qualitative testing. This is actual Quality Assurance.
When creating games, what we need is the quantity. This allows us to move forward, confident that everything still works. What we want is the quality testing. Games are so complex now that all the checklists in the world can’t cover all the edges. We need people to play the games and find those unexpected edges.
Excellent QA teams find the right balance between the two. However, as games get bigger, checklists are getting longer, and even the best QA teams are finding it impossible to get enough quality time with games. We are seeing games that are so big, it is impossible to staff enough QA people. This is where automation (and AI) comes in. How do we leverage this fancy AI stuff and get human testers back to assuring the quality of the games?
It's common when designing automation systems for the problem to be approached from the wrong end. Teams will look at how people test and try to replicate that with code. A better approach is to look at what robots can do, and then say: how can we leverage this to equal what humans can do? So rather than trying to create thousands of brittle automated unit-tests-but-gameplay that break every time something changes, let us instead make a handful of bots that can just play the game and let them tell us if something is broken.
Our bots are created to play the game, continuously, forever. They are generally kinda dumb. They are good at getting to every nook and cranny of your UI system. They are good at dying in combat, but with some effort can be better than people at winning. They are excellent at finding places to get stuck in your terrain. But the core idea is to use a bevy of AI tools to get the bots to basically ‘know their way around’ the game. The bots will not be very good, nor will they necessarily make good decisions, but eventually they will try to make all the decisions. So, if you have bots that play your game continuously, automatically refreshing on every new build, you get something that more closely replicates end-users playing the game. The fancy word for this is stochastic testing. And the bots do lots of it. Say we have ten machines running four instances of the game on each; this gives us nearly 1000 hours of automated testing for every 24 hours on the clock.
"But my checklists!"
Most teams are not quite ready to give up their checklists and trust the bots, and that’s fine. The great thing about a ‘get the bots playing first’ model is that once the bots know how to navigate your game mechanics and have a decent map of the UI structure, then creating checklist tests becomes much simpler. Tests can be much more robust and resilient to code changes.
You will be happy to know that getting some code to run around your game is probably the easiest part! I mean, it was the first thing we built eight years ago. It didn’t take long, and we've been developing and expanding it to every conceivable game genre ever since. Basically; making an automated test is pretty easy, but making something to automatically run the games and then automatically run those tests, and then automatically report back... much more complicated. The hardest thing about automation is the rest of the pipeline.
You need a way for people to define bot behaviors and create those non-stochastic validations. You need a system that can take a build fresh off the build machine and put it onto a test machine and start the game. If the games need to run on a console or a mobile device, then you need to be able to put the game onto the device and launch it there. You need a way to capture screenshots and video of gameplay automatically when something goes wrong. You need to be able to capture performance data and report it back to the dev team. You need a way to know ‘what is a bug?’ and then report those bugs back to the QA team. You need to know ‘is this bug the same as that bug?’ because you are going to find so many of the same bugs over and over again.
We mostly talk about the automated testing in terms of FQA, and making FQA lives easier. However, once you get bots that can play the games thoroughly, you can start to do things like screen capture every single string in the game, in every language, and match those strings with the localization string IDs and give those to the LQA testers. You can start to look at what an LQA bug is and start to automatically detect things like: is the string in the wrong language, does it get cut off, etc. All these things make LQA lives easier.
The question of ‘what is a bug and how can we detect them automatically’ is huge and I could probably write a book on it.
Many of the clients we work with, both big and small, have tried and in some cases succeeded in building their own in-house automation. This is somewhat akin to the olden days when we all had our own game engines. There can still be value there, but using an off-the-shelf engine means your team can focus on making a great game instead of spending all their time making a great engine. Now many studios do just that; they leave the engine building to the engine companies and focus on making great games.
In a similar vein, I would argue that games are already too big to be developed and properly tested without automation, and teams that do not have robust automation solutions in place are going to find themselves very much behind the curve. Building your own pipeline may have value but it is going to be expensive and time consuming for your team. These are solved problems. Your people should be making great games, you can get the automation from us.
Imagine More for your Game
At Keywords Studios, our global experts consistently explore and evaluate emerging AI applications to enhance workflows for both our clients and our team. Supported by a dedicated group of over 150 skilled AI engineers, who specialize in integrating Large Language Model (LLMs) use cases, we harness the power of AI to shape the future of video games.
Read more about:
Sponsor Resource CenterAbout the Author(s)
You May Also Like