January 31, 2020

Training AI in a Virtual Environment

The Chinese strategy board game Go is one of the oldest board games played to this day. It’s one of the most complex ones too, with at least 2 x 10170 legal board positions possible on its 19 x 19 grid. The game is well known in East Asia, where it’s played by millions of people, but it received mainstream coverage all over the world when a computer program called AlphaGo, developed by Google DeepMind, started beating some of the best Go players in the world.

Although artificial intelligence (AI) programs had been beating humans at chess for a number of years, Go is considered orders of magnitude more complex than chess, and many long considered it impossible for AI to defeat the best Go players. AlphaGo defeating Lee Sedol in March 2016 was considered an AI milestone, so much so that the academic journal Science chose it as one of the Breakthrough of the Year runner-ups of 2016.

Lee Sedol leaving the room as AlphaGo makes an unusual move

That’s not the end of it. The Google DeepMind team continued developing their algorithms, and AlphaGo’s successors were each significantly more powerful than the previous iteration. AlphaGo Zero beat AlphaGo with a 100–0 victory, and the latest version (AlphaZero) is considered the world’s best Go player.

Stepping Up the Complexity

However, even though Go is a complex game in terms of possible moves you can make, it’s still a structured game with a limited number of rules. It pales in comparison to the overall complexity of a game such as DotA 2, a multiplayer online battle arena where five players fight five other players in an attempt to destroy the other team’s base.

There are a few reasons why DotA 2 is considered a more complex game to master for AI. Firstly, it has long time horizons. A move in Go and in Chess has a higher individual impact than in DotA 2, where a game can easily be 20,000 moves long and where most moves have little impact individually.

Secondly, players only ever have a partially observed stage in DotA 2. There’s a small area of the map around the individual heroes that’s possible, but you cannot see what’s happening outside of those little circles. This requires players to come up with strategies that are based on best guesses of what’s going on outside of their visual circles.

Thirdly, there’s much more visual clutter on the map of DotA 2 than there is on the board of Go. There are trees, other heroes, towers, a big river, structures, paths, and so on… An AI playing DotA 2 has around 20,000 floating points numbers, compared with 70 for chess and 400 for Go.

A lot of information to take in, both for humans and for AI OpenAI, a non-profit AI research organization founded by Elon Musk and Sam Altman, has created a team of five OpenAI-curated bots to compete in 5v5 DotA 2 games. The OpenAI Five, which is how their bots are called, have become good enough to beat amateur and semi-professional teams already, although they’re not good enough yet to beat the best DotA 2 players in the world. However, it seems only a matter of time before AI conquers this game too.

Why the OpenAI Five make for such formidable opponents

How Does AI Become So Good?

Both AlphaGo and the OpenAI Five use Artificial Neural Networks (ANN) to become better at their respective games. An ANN isn’t an algorithm, but a framework for different machine learning algorithms to work together and process all the complex data inputs they continuously receive. Every ANN consists of artificial neurons, which are mathematical functions that receive an input to produce an input.

These neurons are connected through so-called edges, which have a weight attached to them that can either increase or decrease as the AI trains itself. As you can probably tell if you’ve taken a few classes of biology, an ANN is modeled after the neural networks that make up animal brains.

How our brain works and how AI works too, kind of

Of course, the ANN is only the framework in which the AI operates. There are three learning paradigms that ANNs can use to learn: through supervised learning, unsupervised learning, and reinforcement learning. Both AlphaGo and OpenAI Five learn through reinforcement learning, which encourages the AI to find the best solution towards the highest reward, using sequences of dependent decisions.

That’s the highest-level explanation I can give on how these AIs work without diving into the technical details. They run on ANNs and learn using a technique that’s called reinforcement learning.

What About the Real World?

It’s one thing to train AI using simulations of computer games, it’s another thing entirely to train AI so it can operate in the real world. After all, for AI to function properly, it needs to be able to train on simulations that are as accurate as possible when compared to real life. For Nvidia’s CEO Jensen Huang, this means a training environment need to have a few characteristics:

Firstly, it needs to have photorealistic graphics. A robot driving around the streets of London, receiving camera input from several angles, won’t get very far if its AI has only been trained on drawings of London.

Thirdly, it needs to be able to run simulations incredibly fast. Although we want those simulations to behave like the real world in terms of physics, we don’t want this when it comes to time. AI trained in an ANN through reinforcement learning is incredibly stupid when it first starts out, incapable of doing much at all. But the faster it can train, the faster it’ll become better.

The Unreal Engine can also be used to simulate life-like environments for training AI

Leading AI companies realize this and have already built environments that have these characteristics. The OpenAI Five trained a crazy 180 years per day, using 128,000 CPU cores on the Google Cloud platform. This isn’t just territory for well-funded companies either. The Nvidia RTX 2080 Ti, which ships at around $1,000, is built with Turing architecture, which has tensor cores specifically built for fast AI inferencing. This gives enthusiasts or companies on a somewhat limited budget the ability to experiment with AI.

It seems we’ve now reached a stage where we’re progressing leaps and bounds year-over-year when it comes to AI. Where it seemed impossible for AI to beat anyone at Go five years ago, AI is now good enough to almost beat the best DotA players in the world. That’s some stunning progress. Imagine where we’ll be in five years, let alone ten. Improvements in AI will soon create a drastically different world.

OneBonsai is a VR/AR provider that builds business solutions to improve health and safety, lower cost and increase revenue for companies.

Training AI in a Virtual Environment

Stepping Up the Complexity

How Does AI Become So Good?

What About the Real World?

Subscribe to our newsletter

Subscribe to our newsletter

Where to find us?