Matador Docs
DeFi Automation

DeFi Automation & Keeper Rails

Keeper rails let you automate DeFi strategies while minimizing custody risk. A historical look at why automation breaks—and how to fix it.

DeFi Automation & Keeper Rails

Hook: When keepers fail, markets don't pause

It's March 12, 2020. Markets are crashing fast. ETH is tumbling from ~$200 toward $90 in a few hours. On MakerDAO, a critical automated system is supposed to bid at these auctions to keep the peg stable and protect collateral.

But it's not working.

Congestion on Ethereum is insane. Gas prices are spiking. Keeper bots that normally liquidate undercollateralized vaults are stuck—their transactions aren't getting through. The few that do get through bid near-zero DAI for batches of ETH collateral.

In the aftermath, MakerDAO reports that $8.32 million was liquidated for 0 DAI. Bad debt piles up. The protocol nearly collapses.

The uncomfortable truth

This wasn't an edge case. It was the default failure mode of automation under stress.

Markets don't pause. If your automation breaks when they move, you don't just lose an opportunity—you lose the ability to respond.

The pager went off. Teams stayed up all night. The postmortem ran hundreds of pages. Everyone learned a lesson that shouldn't have been surprising: when you give a bot "god mode" authority and trust it to behave, you're betting that the world won't break.


Problem Evolution: How automation got here

Keeper automation didn't start as a sophisticated product category. In 2017–2019, DeFi's early protocols (Maker, Compound) needed bots for basic tasks: liquidate undercollateralized positions, bid in auctions, update oracles, harvest yield.

These were scripts with a private key and a narrow set of function calls. When they worked, great. When they failed, you restarted them. Protocol states were simpler, markets were thinner, and "it ran at all" was usually good enough.

Then March 2020 happened. Black Thursday exposed what happens when automation assumes normal conditions in an adversarial world. The failure wasn't "bot down"—it was "bot doing the wrong thing under stress." The lesson hit hard: "good enough" automation is not enough for mission-critical protocols.

2017–2019: Early keepers

Ad hoc scripts run by core teams. Simple scope, manual recovery, "hope it works" as the reliability strategy.

March 2020: Black Thursday

Congestion + keeper liveness failure → $8.32m liquidated for 0 DAI. The crisis that reshaped how teams think about automation.

2020–2021: Automation networks emerge

Keep3r (job marketplace), Gelato (automation protocol), Chainlink Keepers (decentralized execution). Automation became "infrastructure"—but still fragile.

2021–present: MEV and private orderflow

Flashbots Alpha changes execution. Keepers become "searchers." Private bundles reduce gas wars but introduce new dependencies.

2026: Mature stack, same problem

Layered automation infrastructure exists. But liveness is still adversarial, incentives are hard to design, and privileged keys still carry blast-radius risk.

Why the problem persists

Automation has matured, but the core issues haven't gone away. Liveness failures still happen under congestion. Incentive designs can still be gamed. And every time you give a bot more power to handle edge cases, you also increase the blast radius if it goes wrong.


Tension: Your bot works...until it matters

Most teams respond to reliability problems by adding more safeguards off-chain: better monitoring, more simulation, more alerts, circuit breakers in the keeper code.

This helps—until it doesn't.

The deeper problem

Off-chain guardrails are promises. On-chain guardrails are boundaries.

When the keeper is the one holding the private key, the keeper can always do more than you intended—especially after an upgrade, a dependency change, or a "temporary" hotfix shipped during an incident. Off-chain checks are advisory; the bot chooses whether to follow them.

On the other side, some teams move control on-chain with multisigs and manual review. This trades one failure mode for another: approvals don't scale, signers get fatigued, and you miss your own targets because humans can't react at keeper speed.

You end up oscillating between "fully automated" and "fully paused"—because there's no middle state that's both safe and fast.


Insight: Stop betting on perfect bots—design keeper rails

The shift is simple: stop trying to make the keeper trustworthy, and start making the permission trustworthy.

With Matador, you express keeper rails as policies—explicit constraints like slippage bounds, health factor floors, target ranges, cooldowns, and rate limits—and those constraints are enforced on-chain by an interpreter. The keeper can call, but it can't cross the boundaries.

The insight

Keeper rails are a safety harness. The goal isn't perfect automation—it's bounded automation that fails safe when the world breaks your assumptions.

This means you can finally delegate safely: automation that moves fast, but stops when reality stops matching your model. Instead of hoping the bot behaves correctly, you get fail-safe behavior you can audit—and a bounded downside you can explain to your team (and your postmortem doc).

Design for failure. If you can't guarantee liveness, guarantee that failure looks like a pause, not a catastrophe.


Resolution: Trustless delegation with enforceable boundaries

Matador's model is "delegation with enforceable boundaries." You keep custody in your smart account stack (Safe or Kernel). You grant your keeper a permission that's narrow and explicit:

Which calls it may make

Specific function selectors only—like swap(), liquidate(), harvest(). The keeper can't call arbitrary functions.

Which parameters are allowed

Target addresses, asset IDs, price bands, slippage tolerances. The keeper can't pass values outside your defined ranges.

What state must be true

Pre-conditions like "health factor above X," "oracle updated within Y seconds," "last execution was Z seconds ago." If these aren't met, execution stops.

What rate limits apply

Cooldowns between calls, per-interval caps, lifetime limits. The keeper can't spam or drift past safe bounds.

What this unlocks

You can swap out a buggy or compromised keeper without changing your trust boundaries. The bot is replaceable infrastructure; the policy is your actual security layer.

This isn't just feature-checking. It changes how you operate. Automation becomes replaceable infrastructure: if a keeper goes down, you swap it. If execution conditions change, you update the policy—reviewable like code, not buried in an ops script. And when something goes wrong, the system fails safe: constraints reject unsafe actions instead of letting "close enough" execution slip through.

Practically, this is how you stop getting surprised by drift. You encode assumptions you already have—like "never swap above 3% slippage," "never act if HF < 1.2," "only rebalance toward target bands"—and you enforce them where it matters: at execution time, on-chain.


Proof: Trust comes from constraints you can inspect

Credibility isn't marketing copy. It's being able to point at the rails and say "this is what can happen, and this is what can't."

Auditable guardrails

Policies are explicit and human-readable. Reviewers can see exactly what's allowed and what's blocked. No "trust me" logic buried in opaque bot code.

On-chain enforcement

Constraints are enforced on-chain by the interpreter. Guarantees don't depend on a keeper's code path, retry logic, or monitoring setup. If the policy says "bounded slippage," that bound is checked where the transaction is decided.

Fail-safe behavior

The system is designed to fail closed. Unknown or malformed inputs revert. Stateful checks run before and (when configured) after execution. If conditions aren't met, the transaction doesn't execute—period.

Smart account integration

Matador integrates with Safe and Kernel as a Guard and validator/executor. Custody stays in your account; automation gets bounded permission.

What we don't claim

We don't guarantee liveness. Market congestion, RPC failures, and adversarial conditions can still block execution. We make failure predictable and bounded—so you can design for it, not be surprised by it.


Next steps: Pick your starting point

Ready to evaluate keeper rails? You don't have to boil the ocean—start small, iterate fast.

Read the architecture

Learn how the DSL, compiler, and interpreter work together to enforce policies on-chain.

Build your first policy

Walk through a guided tutorial: write a minimum viable policy with slippage bounds and health factor checks.

Explore recipes

Start from proven patterns: delta-neutral rebalancing, liquidation protection, health factor guards.


Why it matters

Automation is unavoidable in DeFi—strategies require monitoring, rebalancing, liquidation. But unmanaged automation is unacceptable. Every major incident reveals the same pattern: delegation without constraints creates tail risk.

Keeper rails reduce that risk. They don't guarantee automation always works—but they guarantee that when it does work, it stays within the envelope you defined. Slippage doesn't blow out. Health factors don't drift without triggering alerts. Permissions don't expand beyond what you reviewed.

The shift

Predictability is a risk control, not a performance optimization. Bound execution makes automation auditable, incident-responsive, and safer at scale.


Epilogue

A keeper tries to rebalance a vault during a fast market move. The route shifts, and the swap would land outside the slippage limit set in the policy. The transaction reverts with a clear reason, the vault position doesn't drift, and nobody gets paged at 2 AM.

In the morning, the ops team reviews what happened. They widen the slippage bound slightly—after verifying it's still within their risk tolerance—and redeploy the policy. The keeper goes back to work, still operating inside the same rails.

Guardrails are how you automate safely in adversarial conditions.

On this page