When we stride down the aisles at our local grocer, shelves are full of products vying for our attention. To make their way into our shopping carts, some tout their superior performance on their packaging, and some even try to back their claims up with some magical ingredient. Yet when the rubber meets the road, few of us expect a laundry detergent empowered by such a magical compound to truly get rid of all traces of stains from holiday cooking.
While the stakes may be high if our favorite pair of trousers is involved, they are surely higher when picking a security solution. In cybersecurity, most offerings tout some level of AI. Sometimes it’s qualified further, such as an especially “deep” AI or a comfortingly “autonomous” one (which may or may not take care of your laundry, too).
What all the extra adjectives try to cover up is that AI is now the bread and butter of the security industry. And like with bread, we have a pretty good idea what’s in it. In other words, at this point I would expect that the vast majority of security solutions have adopted AI. Why wouldn’t they? It is easy to get started, and one can get to results quickly. But like with bread, both the quality of the ingredients and recipe for the process of making it determine the outcome.
Why is there so much interest in doing more with AI in the security industry? We are all still jaded from the signature days. Back then, signatures got deployed, signatures started to miss new threats, humans wrote new signatures, and the cycle would restart on the next day. This is obviously a losing proposition -- not only is this approach purely reactive, its speed is indeed limited by human response time. This is specifically where AI has promise, detecting threats that have not even been conceived yet, without updates.
What does it take to train an AI model that can do such a feat reliably? First and foremost, it takes data. A lot of it. Cloud-based solutions have a clear advantage with their broad visibility of the threat landscape, allowing correlation of global observations across organizations and networks. To process such large amounts of data a lot more technology than just AI is needed – but once the data has been correlated, AI is a powerful tool to make sense of it. With AI, we can process more data at scale, and we can spot more complex relationships than a human mind can uncover.
More data allows us to spot fainter signals. Let’s say you start plotting the latitude and longitude of European cities onto graph paper. Initially, you will see some randomly scattered points. But if you do this for a larger number of cities, the familiar shape of Europe will slowly emerge out of a cloud of points. This simply won’t work if everyone has a “local” piece of graph paper to plot a handful of cities in their area. However, with a global view the combination of Cloud and AI really shines. None of this is possible on an appliance. And none of this is possible with hybrid cloud solutions, i.e., those clouds that are merely stacks of vendor-managed rack-mounted appliances.
Not all data is created equal. There is another type of data to which humans can contribute. We call this type of data ground truth, and it has a large impact on the training of AI models. Ground truth is the type of data that describes how we want an AI model to behave under certain input. When certain types of AI learn, they leverage ground truth as examples and learn to interpret other data based on these roots of knowledge -- this way of learning is called supervised learning.
Supervised learning is a powerful way to create highly accurate classification systems, i.e., systems that have high true positive rates (detecting threats reliably) and low false positive rates (rarely causing alarms on benign behavior). Not all learning needs to be conducted using ground truth (the domain of unsupervised learning is concerning itself with those other approaches). But as soon as it's time to evaluate whether such an AI system works as intended, you will need ground truth, too.
Well-designed cybersecurity systems strive to maximize the generation of ground truth. For example, take a managed threat hunting service such as Falcon Overwatch. Whenever a threat hunter discovers an adversary on a network, those findings become new ground truth. Similarly, when the threat hunting experts evaluate suspicious activity as benign, it is also added to the pool of ground truth. Note that this all happens independently of AI models stopping threats in real-time. Those new data points can then be used to train or to evaluate AI systems. Generating this kind of data at scale, every day, using the cloud as the vantage point allows for training better models. In other words, the AI system is getting better every day.
With so much opportunity and new technology allowing us to process more and more data, when will AI completely handle the security of our computing systems for us? Not in a while. Artificial intelligence is not, indeed, intelligent. Have a conversation with your smart speaker to reassure you of that fact. AI is a set of algorithms and techniques that often produce useful results. But sometimes they fail in odd and unintuitive ways. AI even has its own distinct attack surface that adversaries can leverage if left unprotected. Looking at it from another angle, if AI is such a powerful tool, could an AI outsmart another AI? The field of Adversarial Machine Learning concerns itself with that question, and the short answer is “yes.” Ignoring these inherent risks and limitations by treating AI as the panacea fixing all woes of our industry is dangerous.
AI is, however, an essential part of every modern cybersecurity solution. There’s no other practical way to deal with the volumes of information that are now required to stop modern threats. Those threats are driven by motivated adversaries with strong financial incentives that will not cease their attempts to evade detection. But the mere use of AI is not what makes a security solution superior. What matters most is what drives the AI: the breadth of data it consumes, the volume of that data, the ground truth it can leverage, and its human teachers.