Manoj Verma • about 1 month ago
Building AI Agents You Can Trust: Secure Execution on Real Systems (Proxi)
Back on Feb 9, I submitted Proxi to Devpost — at a time when agentic systems were still mostly experimental and tools like OpenClaw had just started surfacing.
The idea was simple but ambitious:
Let AI not just suggest, but actually execute real work on real systems — remotely, safely, and under human control.
Proxi explores:
Operating system–level task execution (not just APIs or sandboxes)
Adaptive behavior without rigid workflows
Human-in-the-loop control for trust and safety
Verifiable actions instead of black-box outputs
At that time, this space was still forming. Today, we’re seeing a clear shift toward autonomous agents — but with that comes serious questions around control, security, and reliability.
That’s exactly the gap Proxi tries to address.
The project gallery is now public — sharing it here for feedback, critique, and discussion from folks exploring similar directions.
Curious to hear:
Where do you see real-world adoption of such agents first?
What would make you trust an AI system to operate your machine?
Log in or sign up for Devpost to join the conversation.

2 comments
Chieh-Ping (aka CheRocks) Chen • about 1 month ago
Hi Manoj,
Read through your full project page — the emphasis on "Proxi never decides success. Reality does." resonated with me immediately. Most agent frameworks let the agent self-report completion. You requiring screenshot-based evidence before marking tasks complete is a meaningful departure from that pattern.
Your two questions:
Where do you see real-world adoption first?
Anywhere the cost of an undetected error exceeds the cost of slowing down. Compliance-heavy environments — finance, healthcare, legal, regulated enterprise IT — where "the agent said it worked" is not an acceptable audit response. Your three-tier safety model (safe / sensitive / destructive) maps naturally to these contexts.
What would make you trust an AI system to operate your machine?
Two things: control and record.
Control is what you've built — human-in-the-loop approvals, policy-based action classification, fallback under OS constraints. That handles the real-time trust problem.
Record is what's mostly missing across the industry. Your screenshots provide immediate visual verification, which is strong. But for post-hoc auditability — "what did the agent do three weeks ago, why, and under whose authority?" — the evidence needs to be persistent, tamper-evident, and stored outside the agent's own system.
This is what I've been working on with Project RE, also submitted to this hackathon. RE approaches the problem from the governance and audit layer:
- Session logs serialized as RFC 5322 email objects (signed, timestamped, human-readable, stored in the user's own inbox)
- Gemini's Thought Signatures captured and preserved as part of the evidence chain — not just what the agent did, but what it was thinking when it decided
- Hardware-anchored authorization (a physical device called Totem that defines operational authority by presence/absence)
- Policy engine that allocates reasoning depth by risk level — similar in spirit to your safe/sensitive/destructive tiers, but applied to the model's thinking budget rather than action approval
I noticed your roadmap includes "Verifier & Audit Layers — Independent verification, audit trails, and replay for compliance-sensitive environments." That's essentially where RE already sits. Your execution layer + RE's evidence layer could be complementary — you ensure the agent acts safely in real-time, RE ensures there's a permanent, auditable record of what happened and why.
Would be glad to exchange notes. Here's the project page if you want to take a look:
https://devpost.com/software/project-re-the-governance-protocol
Cheers,
Che (CheRocks)
Manoj Verma • about 1 month ago
Thanks for the detailed feedback Che. Your solution looks interesting, I will explore and come back.