Researcher Oluwapelumi Bankole explains why AI intrusion detection is systematically blind to the most dangerous threats, and what the security industry needs to do about it.

If you ask a cybersecurity vendor how accurate their AI-powered intrusion detection system is, you will almost certainly hear a number in the high 90s. What you will not hear is how often it misses the attacks that actually matter.
That gap between marketed accuracy and real-world performance is the central concern of Oluwapelumi Bankole, a cybersecurity researcher at the University of Nevada, Las Vegas. Bankole has spent the past several years studying a fundamental structural flaw in how AI-based security systems are built and trained, a flaw that makes them reliably good at catching common, low-stakes threats while systematically failing to catch the rare, high-impact attacks that cause the most damage.
The problem has a technical name, class imbalance, but its consequences are not technical abstractions. They are ransomware shutdowns at hospitals, silent data exfiltration campaigns that run for months, and industrial control system attacks that could affect power grids and water systems.
I spoke with Bankole at length about what class imbalance is, why it persists, and what an honest reckoning with the problem would require.
On the fundamental flaw in how AI security systems learn
Q: Start at the beginning. What is the class imbalance problem, and why does it matter for cybersecurity?
Bankole: Every intrusion detection system is essentially a classification engine. It looks at network traffic and decides: is this normal, or is this an attack? On a real network, the vast majority of what it sees is completely normal. Attacks are rare events. They might be one tenth of a percent of total traffic, sometimes even less. The problem is that most AI models are trained on datasets where attacks are far more common than that, sometimes 30 or 40 percent of the training data. The model learns a world where attacks are frequent. When it gets deployed on a real network, it encounters a very different world, and it is not prepared for that difference.
Q: What does that lack of preparation actually look like in practice?
Bankole: It looks like a model that defaults to classifying almost everything as normal, because that strategy scores very high on the accuracy metric it was optimized for. If 99 percent of traffic is normal, a model that calls everything normal is 99 percent accurate. But it has caught zero attacks. The metric looks great. The real-world security outcome is a complete failure.
“A model that calls everything normal is 99 percent accurate. But it has caught zero attacks. The metric looks great. The real-world security outcome is a complete failure.”
Q: And this is not a niche problem. It’s widespread in how the industry builds these systems?
Bankole: It is the standard approach. The benchmark datasets that researchers and
developers use to train and test models are preprocessed to be relatively balanced. That makes the math cleaner and the published results more impressive. But it disconnects the model from the conditions it will actually face in deployment. Most of the security products on the market today were trained and validated on these kinds of balanced benchmarks, and most of them have never been rigorously tested on traffic distributions that resemble a real hospital network or a real industrial control system.
On the solution the industry keeps getting wrong
Q: There are technical solutions to this problem. Why haven’t they fixed it?
Bankole: The most widely used solution is oversampling, specifically a technique called SMOTE, Synthetic Minority Oversampling Technique. The idea is straightforward: if you do not have enough examples of rare attack types in your training data, generate synthetic ones. Create artificial data points that represent what those attacks look like, and give the model enough examples that it learns to take them seriously. When implemented thoughtfully, this meaningfully improves performance on rare attack detection. The problem is that it is often implemented carelessly.
Q: What does careless implementation look like?
Bankole: It looks like generating synthetic examples that do not accurately reflect how real attacks actually behave. It looks like applying SMOTE once and calling the problem solved, without testing whether the resulting model generalizes to actual deployment conditions. The technique can introduce its own distortions. If your synthetic minority-class examples are not realistic, you have just taught the model to recognize fictional attacks while potentially making it worse at recognizing real ones. The solution is not simply ‘apply SMOTE.’ The solution is adaptive oversampling combined with rigorous testing against realistic traffic distributions. Those are meaningfully different things.
“The solution is not simply ‘apply SMOTE.’ The solution is
adaptive oversampling combined with rigorous testing against realistic traffic distributions.”
On what the healthcare and infrastructure stakes actually are
Q: You’ve focused specifically on healthcare and critical infrastructure. Why those sectors?
Bankole: Because the consequences of a missed detection are categorically different
there than in other environments. In a typical enterprise, a missed attack usually means a data breach and a very bad quarter. In a hospital, a missed attack on an IoT-connected infusion pump or a patient monitoring system can have direct patient safety implications. In a power grid or a water treatment facility, a successful attack on industrial control systems is not a business continuity problem. It is a public safety emergency. The rarest attacks, the ones that class imbalance causes systems to miss, are often precisely the kind of targeted, sophisticated intrusions that go after critical infrastructure. The mismatch between what these systems are tested for and what they need to catch is most dangerous exactly where the stakes are highest.
Q: Is this being taken seriously at the policy level?
Bankole: The awareness is there. CISA and NIST have both identified AI-based intrusion detection for IoT as a priority area. The gap is between the frameworks and the procurement standards. Right now, an organization can buy a security system, the vendor can present benchmark accuracy numbers, and there is no standard requirement to demonstrate performance on rare attack categories under realistic traffic conditions. Until procurement standards require that evidence, vendors have little incentive to build for it. The market is rewarding impressive benchmark numbers, not real-world performance on the threat categories that matter most.
On what honest evaluation would require
Q: If you were advising an organization evaluating intrusion detection systems today, what would you tell them?
Bankole: Do not ask for accuracy. Ask for recall on rare attack categories specifically. Ask for false negative rates on advanced persistent threats and targeted intrusion attempts. Ask how the system performs on traffic distributions that reflect your actual environment, not a balanced benchmark dataset. Ask what happens to performance six months after deployment, when your traffic patterns have changed and the model has not been updated. These are not complicated questions. But most vendors are not prepared to answer them honestly, because most products have not been tested against them. The questions alone will tell you a great deal about whether the system was built for the real world.
Bankole’s research adds technical weight to a concern that has circulated in security operations circles for years: that the AI security tools being sold to protect critical infrastructure have not been honestly evaluated for the conditions those tools will actually face.
Whether the industry moves to address that gap, or continues optimizing for benchmark numbers that look good in a sales presentation, remains to be seen.
———————————————————————————————————————
Oluwapelumi Bankole is a researcher in information systems and cybersecurity at the University of Nevada, Las Vegas. His research focuses on AI-based intrusion detection for IoT and cloud environments, with particular emphasis on class imbalance, adaptive oversampling techniques, and multi-dimensional performance evaluation for real-world deployment.

























