How We Built a Cybersecurity Shield for Smart Grids Using Machine Learning and Encryption

By M Mithul Pranav · Published in Energy Reports (2026)

So our paper just dropped in Energy Reports (Elsevier), and I figured the best way to celebrate is to break it down in plain terms — what we actually built, why it matters, and what the numbers mean. No paywalls, no dense academic prose. Just the real stuff.

The Problem: Power Grids Are Getting Hacked

Modern power grids aren’t the dumb infrastructure of the 20th century anymore. They’re deeply connected cyber-physical systems — thousands of sensors, actuators, and Industrial IoT (IIoT) devices all talking to each other over networks using protocols like DNP3. A central SCADA system (Supervisory Control and Data Acquisition) orchestrates everything: reading sensor data, sending control commands, keeping the lights on.

That connectivity is also their biggest vulnerability.

Enter Man-in-the-Middle (MiTM) attacks — a class of cyberattack where an adversary silently positions themselves between two communicating parties, intercepting or modifying data without either side knowing. In a power grid context, this is terrifying. An attacker could:

False Command Injection (FCI): Send fake commands — like “open circuit breaker X” — causing blackouts or equipment damage.
False Data Injection (FDI): Feed the control system corrupted sensor readings, making it act on lies.
Combine both for multi-stage attacks that are much harder to detect.

Existing research tended to tackle these problems in isolation — either detection or encryption or localization. Nobody had really integrated all three into a single coherent framework for DNP3-based smart grids. That’s the gap we went after.

What We Built: A Three-Layer Defense Framework

Our framework has three interlocking components working in concert:

┌─────────────────────────────────────────────────────┐
│              Smart Grid Traffic                     │
└──────────────────────┬──────────────────────────────┘
                       │
          ┌────────────▼────────────┐
          │   1. ML-Based IDS       │  ← Detect & Classify Attacks
          │  (LightGBM / RF / etc.) │
          └────────────┬────────────┘
                       │
          ┌────────────▼────────────┐
          │   2. Dual-Layer         │  ← Encrypt SCADA Communications
          │   Encryption            │
          │  (AES-GCM + Salsa20)    │
          └────────────┬────────────┘
                       │
          ┌────────────▼────────────┐
          │   3. Attack Localization│  ← Trace Attack Origins
          │  (IP Frequency Analysis)│
          └─────────────────────────┘

Let me walk through each layer.

Layer 1: Machine Learning Intrusion Detection

The Dataset

We used the “Cyber-Physical Dataset for MiTM Attacks in Power Systems” (Sahu et al., 2021) — a real network capture dataset from a DNP3-based power grid simulation. It contains 5,077 samples across four attack scenarios:

Use Case	Attack Type	Samples	Share
UC1	False Command Injection (FCI)	1,102	21.7%
UC2	FCI + Analog Command Manipulation	1,477	29.1%
UC3	FDI + FCI (combined)	1,477	29.1%
UC4	Three-Stage Attack	1,484	29.2%

The features span both cyber dimensions (TCP flags, RTT, packet counts, Snort alerts, ARP/IP addresses) and physical dimensions (DNP3 payload values, breaker statuses, analog I/O counts). This cyber-physical fusion is what makes the dataset so realistic and challenging.

Two-Stage Classification

We implemented a two-stage approach:

Binary classification — Is this traffic attack or benign?
Multiclass classification — If it’s an attack, which type (UC1–UC4)?

This hierarchical approach is computationally elegant: you don’t need to run the expensive multiclass model on obviously clean traffic.

The Models

We benchmarked four tree-based ensemble models:

For anomaly detection (binary):

Model	Accuracy	F1-Score	MCC
Random Forest	99.80%	99.80%	0.9934
SVM (RBF)	99.15%	99.15%	0.9695
XGBoost	97.87%	97.82%	0.9285
LightGBM	98.23%	98.19%	0.9405

Random Forest won the binary task because its ensemble-of-trees approach is inherently robust to noise and outliers — important since SCADA traffic is messy in the real world.

For multiclass attack classification:

Model	Accuracy	MCC
Random Forest	99.41%	0.9912
CatBoost	99.74%	0.9921
XGBoost	99.87%	0.9982
LightGBM	99.90%	0.9965

LightGBM took the top spot here. Why? Its leaf-wise tree growth (rather than level-wise) builds deeper, more expressive trees efficiently. Combined with gradient-based feature weighting, it handles the subtle class boundaries between UC1–UC4 attack types better than the competition.

Handling Class Imbalance with SMOTE

Real intrusion detection datasets are imbalanced — normal traffic vastly outnumbers attack traffic. A naive classifier just learns to say “normal” all the time and still gets 90%+ accuracy. That’s useless.

We used SMOTE (Synthetic Minority Over-sampling Technique) for Random Forest and our Stacking Classifier. SMOTE doesn’t just duplicate minority samples — it interpolates between existing minority points to create realistic synthetic examples, pushing the decision boundary to the right place.

LightGBM handles imbalance internally via gradient-based weighting, so no SMOTE needed there.

Stacking Classifier for False Negative Reduction

This is arguably the most operationally important result. In security, false negatives are catastrophic — a missed attack is a successful attack.

We built a Stacking Classifier (multiple base learners feeding a meta-learner) combined with SMOTE. The comparison against plain Random Forest:

Model	Accuracy	False Negatives
Random Forest	99.48%	8
Stacking Classifier	99.88%	3

That’s a 62% reduction in missed attacks — 5 fewer attacks slipping through per test set. At grid scale, this is the difference between catching a breach and having it cause a blackout.

Layer 2: Dual-Layer Encryption

Detection alone isn’t enough. If SCADA communications aren’t encrypted, attackers can intercept and read (or modify) grid commands in transit even without triggering the IDS. So we built a dual-layer encryption system.

Why Two Ciphers?

They serve different purposes:

AES-GCM — Authenticated encryption. Provides both confidentiality AND integrity (via a 128-bit authentication tag). The gold standard for SCADA/IEC protocols. If anyone tampers with the ciphertext, the MAC tag fails and the message is rejected outright.
Salsa20 — A stream cipher optimized for speed. No block size padding, no lookup tables (resistant to timing/cache attacks), ideal for high-throughput real-time sensor data streams.

AES-GCM: How It Works

The flow is:

Plaintext (M)
     │
     ▼
Counter blocks from 12-byte Nonce → AES_K(Counter) → XOR with M → Ciphertext (C)
     │
     ▼
GHASH(Ciphertext + Associated Data) → Authentication Tag (T)
     │
     ▼
Base64(Nonce ‖ Tag ‖ Ciphertext) → Encrypted Output

On decryption, the tag is recomputed. If it doesn’t match — reject. This is the INT-CTXT (ciphertext integrity) property.

Salsa20: How It Works

Salsa20 initializes a 4×4 matrix of 32-bit words from your key, nonce, and a counter. It runs 20 rounds of quarter-round transformations (XOR + modular addition + bit rotation), producing a 512-bit keystream block. XOR that with your plaintext: done.

Key (256-bit) + Nonce (64-bit) + Counter
        │
        ▼
    4×4 State Matrix
        │
   20 rounds of QuarterRound()
        │
        ▼
   Keystream Block (512 bits)
        │
        ▼
   Ciphertext = Plaintext XOR Keystream

Decryption is identical — XOR again with the same keystream. No modes, no padding, no complexity overhead.

Cryptographic Security Testing

We didn’t just implement and hope for the best. We ran a battery of formal cryptographic tests:

AES-GCM Results:

Test	Result
Brute-Force (1M GPU attempts, CUDA kernel)	0 keys found. Exhaustive search takes ~3.68×10⁵⁷ years
Chi-Square randomness test	p-value = 0.976 (strong uniformity, > 0.95 threshold)
Bit-flipping attack	100% MAC check failure — every tampered ciphertext rejected

Salsa20 Results:

Test	Result
Brute-Force (1M GPU attempts)	0 keys found
Chi-Square test	p-value = 0.971
Key Collision (1000 key pairs)	Avg Hamming distance = 127.8 bits out of 256 — near-perfect diffusion
Nonce keystream divergence	>95% divergence between keystreams
Ciphertext entropy	>7.98 bits/byte — approaching theoretical maximum of 8

The entropy result deserves a highlight. Perfect random data has 8 bits/byte of entropy. Our ciphertexts are at 7.98+. Statistically, the ciphertext is indistinguishable from random noise. No patterns to exploit.

Formal Adversarial Indistinguishability

Beyond empirical tests, we formally proved security using the IND-CPA (Indistinguishability under Chosen Plaintext Attack) model. The idea: an adversary picks two messages M₀ and M₁, gets back the encryption of one, and tries to guess which. If their advantage over random guessing is negligible, the scheme is IND-CPA secure.

For AES-GCM, this holds as long as nonces are unique per encryption (which our CSPRNG-based nonce generation guarantees). The joint adversarial advantage over both confidentiality and integrity is bounded by a negligible function in the security parameter — the scheme is formally proven secure.

Benchmarking: AES-GCM vs Salsa20 vs ChaCha20

For transparency, we also benchmarked against ChaCha20:

Cipher	Enc. Throughput	Dec. Throughput	Peak Memory
ChaCha20	22.06 MB/s	5.00 MB/s	1.28 MB
Salsa20	20.07 MB/s	4.99 MB/s	1.20 MB
AES-GCM	1.96 MB/s	1.65 MB/s	9.99 MB

ChaCha20 edges out Salsa20 slightly, but lacks SCADA protocol standardization and verified FPGA IP cores. AES-GCM is much slower but provides standardized authenticated encryption required for IEC/SCADA compliance. The dual-layer combination is deliberate: let AES-GCM handle integrity-critical control messages, Salsa20 handle high-throughput telemetry streams.

Layer 3: Attack Localization

Detection and encryption aren’t enough for forensics. When an attack happens, you need to know where it came from — which nodes in the network are being targeted.

We implemented IP-based frequency analysis: filter traffic flagged by Snort alerts, validate routable IPv4 addresses via regex, rank destination IPs by attack frequency, and surface the top 10 most-targeted nodes.

The top 10 localized IPs matched perfectly with the intended targets defined in the UC1–UC4 attack scenarios — zero false localization cases. This gives security teams an actionable map of vulnerable nodes for network hardening.

Deploying on Edge Hardware

SCADA systems operate under tight latency constraints — DNP3 polls every 2–4 seconds. Any IDS running on the edge must complete inference well within that window.

We deployed on two edge platforms:

PYNQ-Z2 (Xilinx Zynq FPGA)

Random Forest: training 4.874s, inference 0.3997s ✅
LightGBM: incompatible with PYNQ Image v2.5 ❌

The 0.4s inference time sits comfortably within the 2–4s SCADA polling window. The PYNQ board works, but FPGA-level optimization (parallelism, pipelining) is still on the roadmap.

Google Coral Dev Board (Edge TPU)

Model	Train	Test	Total
Random Forest	1.06s	0.24s	3.73s
LightGBM	1.90s	0.10s	3.98s

The Coral’s Edge TPU gave us significantly faster inference than PYNQ, and crucially — LightGBM ran on it. Both models comfortably meet the 2–4s SCADA cycle requirement. This demonstrates the framework is deployable at the edge today, without cloud round-trips.

Scalability: Validating on a Larger Dataset

To prove this wasn’t overfit to one dataset, we retested on the DIA (Data Integrity Attack) dataset — an IEEE 24-bus system dataset with normal, contingency, and multiple attack-type scenarios.

Binary classification on DIA:

Model	Accuracy
XGBoost	99.98%
LightGBM	99.97%
Random Forest	99.93%
SVM	97.81%

SVM struggles here because DIA has higher-dimensional, more complex feature interactions than MiTM — exactly the conditions where gradient boosting methods dominate.

Multiclass on DIA:

Model	Accuracy
XGBoost	99.93%
LightGBM	99.92%
CatBoost	99.91%

Performance held up across both dataset sizes. The models generalize.

How This Compares to Prior Work

Here’s the honest landscape comparison:

Prior Work	IDS	Encryption	Attack Localization	Edge Deployment
Dagoumas (2019)	❌	❌	❌	❌
Diaba et al. (2023)	✅ (97.8%)	❌	❌	❌
Parvez et al. (2024)	✅	✅	❌	❌
Sen et al. (2021)	❌	❌	✅	❌
Ours	✅ (99.90%)	✅ (AES-GCM + Salsa20)	✅	✅ (PYNQ + Coral)

The novelty isn’t any single component — it’s the integration. No prior work combined multi-class IDS, dual-layer authenticated encryption, and FPGA/TPU edge deployment into one framework for DNP3-based smart grids.

Limitations and What’s Next

Being honest about limitations is part of good science:

Protocol coverage: The IDS was validated only on DNP3 traffic. Testing on IEC 60870-5-104 and Modbus is needed for broader applicability.
Key management: Dynamic key management for AES-GCM and Salsa20 in real grid environments isn’t addressed yet — critical for production deployment.
Adversarial ML: We haven’t tested against data-poisoning attacks targeting the IDS models themselves.
FPGA optimization: A full FPGA-optimized implementation with hardware-level parallelism is still pending.

Planned future work: cloud-based distributed deployment (Apache Spark), TPU quantization for lower latency, and adversarial robustness testing.

TL;DR

We built a unified cybersecurity framework for smart grids that:

Detects MiTM attacks with 99.90% multiclass accuracy using LightGBM
Prevents data tampering via AES-GCM (authenticated) + Salsa20 (high-speed) dual-layer encryption, both proven secure against brute-force, bit-flipping, and statistical attacks
Localizes attacks to specific IP addresses for forensic response
Runs in real-time on edge hardware (PYNQ-Z2 FPGA, Google Coral Dev Board) within SCADA polling cycles
Reduces false negatives by 62% using a stacking classifier with SMOTE

The full paper is available at doi.org/10.1016/j.egyr.2025.12.035.

M Mithul Pranav is a researcher at Amrita School of Artificial Intelligence, Amrita Vishwa Vidyapeetham, Coimbatore. This work was co-authored with Rithan S., Rayappa David Amar Raj, Archana Pallakonda, Rama Muni Reddy Yanamala, and Krishna Prakasha K.