Building Agentic AI With IMDA Framework

In May 2026, Singapore's Infocomm Media Development Authority (IMDA) published version 1.5 of the Model AI Governance Framework for Agentic AI — the first government-issued framework dedicated specifically to AI systems that plan, act, and adapt over multiple steps on behalf of humans. It is voluntary guidance, not statute. But for any team building agentic AI in 2026, it is the most operationally useful document in the field, and it will quietly become the reference text your enterprise customers, your vendor-risk reviewers, your auditors, and increasingly your regulators measure you against.

This post is a build guide. Not a regulatory explainer — we have one of those on the Singapore framework hub page. And not a compliance gap analysis — that is what our colleagues at Jacobian Engineering handle. This post is for the engineers, ML platform leads, and product owners who are designing AI agents right now and want to know what "good" looks like when the framework lands in their architecture review.

We will walk through each of the four dimensions of the IMDA framework as a build spec: what to design, what to test, what to monitor, what to instrument. We will draw on the framework's case studies — Dayos, OCBC, PwC, Tencent, X0PA, Terminal 3, Cyber Sierra, Stability Solutions, Google × GovTech, CDL × Knovel, Ant International, Workday — because they are unusually specific and unusually current. And we will close with how this overlay maps onto NIST AI RMF and the EU AI Act so you can ship one programme that satisfies all three.

Why agentic AI needs its own governance posture

The first thing the framework gets right is that agentic systems are different from the generative AI systems that came before them. A chat assistant that produces text is bounded by the text it can produce. An agent that can call tools, write files, send emails, run code, transact, browse, and dispatch sub-agents has an action-space and a level of autonomy that simply did not exist in the 2023-era LLM stack. The same hallucination that wasted a paragraph last year deletes a production database this year.

Three properties of agents that make the old governance posture inadequate:

Tool use changes the surface area. An agent that can write to a CRM has a different threat model from an LLM that can describe a CRM. The Model Context Protocol (MCP) and Agent2Agent (A2A) protocol, plus emerging agentic commerce protocols, multiply the number of integration points where prompt injection, data leakage, and unauthorised action become live risks.
Autonomy multiplies the consequences of error. A reasoning model that drifts off-plan in a multi-step task accumulates error in ways that a single-shot model cannot. Multi-agent systems amplify this further — emergent behaviours, agent sprawl, agent-to-agent miscoordination, collusion patterns observed in pricing agents.
Human oversight does not scale linearly with agent throughput. If an agent takes ten thousand actions a day, no human reviewer can meaningfully oversee each one. Static "human-in-the-loop" guarantees from the 2024 governance literature degrade into rubber-stamping at agent speed, and automation bias makes that worse.

The IMDA framework is the first major governance document that engages with these specifics rather than asserting that existing AI principles "translate." It organises the response into four iterative dimensions:

Assess and bound the risks upfront

Make humans meaningfully accountable

Implement technical controls and processes

Enable end-user responsibility

The word iterative matters. These are not phases you complete and discard. The framework expects you to cycle through them as the agent evolves, as your understanding of the failure modes deepens, and as the regulatory environment moves.

Let's build.

Dimension 1: Assess and bound the risks upfront

This dimension asks two questions, in order: should this be an agent at all, and if yes, how do we bound the blast radius before we write a line of code.

Use-case suitability is a triage decision, not a default

Not every workflow that could be automated by an agent should be. The framework provides a structured set of factors for triage, split into impact (severity if the agent gets it wrong) and likelihood (probability of error):

Impact factors:

Domain criticality. An agent summarising internal meetings can tolerate more error than an agent executing financial transactions.
Sensitive data access. Does the agent read or write personal information, confidential records, regulated data? Persistent memory raises the bar further.
External-system access. Read-only access to public information vs write access to third-party APIs.
Scope of action. Read-vs-write, single-tool vs many-tool, narrow API surface vs computer-use browser agent.
Reversibility. A scheduler that picks wrong dates is reversible. A contract signer is not. An emailer is somewhere in between.

Likelihood factors:

Level of autonomy. SOP-following vs free planning. The Bank of Singapore's Source-of-Wealth case study (worked example from OCBC in v1.5) shows the framework's bias here: their agentic system performs narrowly scoped tasks under a fixed workflow and produces decision-support output that humans validate. Full autonomy in critical domains is the exception, not the default.
Task complexity. More steps, more chances to drift.
External-system exposure. Web browsing exposes the agent to prompt injection in ways that an internal-only agent does not.
Vendor opacity. Third-party agentic products with limited visibility into prompts, memory, and tool calls raise the risk floor.
System complexity. Multi-agent systems with feedback loops introduce emergent behaviour that single-agent testing cannot anticipate.

Dayos, an enterprise automation company headquartered in Singapore with US operations, applied this triage to IT ticket automation and is the framework's cleanest worked example. Every ticket type was scored against severity, reversibility, and feasibility of oversight. Tier 1 (60% of tickets — password resets, access requests, status queries) became fully automated under a propose-confirm loop with biweekly audits of the agent's reasoning chains. Tier 2 (30% — chart-of-accounts updates, integration mapping, diagnostics) became "agent diagnoses, engineer approves." Tier 3 (10% — production deployments, security changes, permission modifications) the agent does not touch.

That decomposition is the architectural artefact of Dimension 1. Most agent failures we have seen in the wild were avoidable not by better prompts but by a better tier-by-tier triage at the start. Spend the engineering effort here.

Bound by design, before code

Once the use cases are triaged, the framework wants three categories of design constraint baked in before you write the agent:

Tool and system access (least privilege). A coding assistant probably does not need a broad web search tool if it has curated documentation. An HR helpdesk agent does not need access to IT helpdesk tooling. Structuring agents around functional boundaries acts as a natural constraint and an audit-friendly boundary.
Autonomy bounds (SOPs and approvals). For process-driven tasks, encode the SOP as part of the workflow rather than relying on the agent to remember to follow it. The framework cites Grab's SOP-driven LLM agent framework as a reference pattern. SOPs are the agentic equivalent of structured query languages: they take ambiguous natural-language intent and route it through deterministic execution.
Area of impact (kill switches and isolation). Mechanisms to take agents offline. Sandboxes for high-risk tasks like code execution. Network and data isolation. McKinsey's guidance on running high-risk agents in self-contained environments shows up explicitly in the framework's footnotes.

The framework states the principle plainly: prefer deterministic rather than non-deterministic limits, and bound by design. Where you cannot bound deterministically, layer on monitoring and human-in-the-loop review. Tell the agent in the system prompt not to access the customer DB and you have a wish; revoke its access at the IAM layer and you have a constraint.

Agent identity is the new IAM problem

The framework is explicit that current authorisation systems were designed for human principals and pre-defined, static scopes. Agents need fine-grained, dynamically scoped, time-bound, non-transferable permissions, and the industry is filling the gap.

In the interim, the framework prescribes a baseline:

Unique, cryptographically verifiable identity per agent. Every deployed agent should be identifiable as itself, not as a generic service account.
Account for it. Each identity tied to a supervising human, agent, or department.
Differentiate by capacity. When the agent acts on behalf of a specific human user versus independently, the identity records that distinction for audit purposes.
Centrally catalogued and managed. A single source of truth for all agent identities prevents the "agent sprawl" problem — agents proliferating outside central management.

If you are bridging from human IAM to agentic IAM today, Microsoft Entra Agent ID, Alibaba's Agent ID Guard, and OAuth 2.1 over MCP are the three options that show up in the framework footnotes. The Cloud Security Alliance's Agentic AI Identity & Access Management proposal is the most ambitious decentralised approach, but it is years from production stability.

Authorisation should be scoped, time- or session-bound, non-transferable, follow least privilege by default, and — critically — be bounded by the authorising human's own permissions. An agent acting on Alice's behalf cannot do things Alice herself cannot do.

Dimension 2: Make humans meaningfully accountable

Agentic systems strain traditional accountability in two ways: the agentic value chain (model developer → tooling provider → platform provider → system provider → deployer → end user) diffuses responsibility across organisations, and within a single organisation, multiple internal teams own different pieces of the lifecycle. The framework requires both internal allocation and external contractual clarity.

Internal: a real RACI for the agent lifecycle

The framework illustrates internal allocation with role archetypes that translate well to most organisations:

Key decision makers (board, C-suite, department leaders) set high-level goals, define permitted operational use cases, set the overall governance approach.
Product teams (PMs, designers, AI engineers, software engineers) translate business goals into agent design, implement controls and phased rollouts, monitor lifecycle, educate users.
Cybersecurity teams (CSO, security specialists, pen testers) define baseline guardrails, run red-team exercises, conduct threat modelling.
Users consume agent output to do work — and have explicit responsibility for ethical use, attending training, and timely reporting of issues.

PwC Singapore's case study in v1.5 is a cleaner version of this same allocation, mapped to their AI Factory architecture for internal report-drafting agents: a use-case owner responsible for use-case appropriateness, a Technology Risk Management team doing risk assessment, the AI Factory designing and operating guardrails, and end users / SMEs doing output review before reliance. Four lanes, no ambiguity, audit-friendly.

External: clarify the value chain in contracts

When you build on third-party agentic AI, the framework wants you to do two things in contracts:

Distribute obligations explicitly. Security arrangements, performance guarantees, data protection — every assumption you have about the vendor's behaviour should be on paper. Where gaps exist, reassess whether the agentic deployment still meets your risk tolerance.
Address third-party opacity. Request transparency from external parties (disclosures on agent capabilities, data handling), request technical controls (scoped API keys, per-agent identity tokens, observability such as logging tool calls and access history). Where the vendor cannot provide these, scope down the deployment until the residual risk fits your tolerance.

For a US enterprise consuming a SaaS agentic product (Microsoft Copilot Agents, Salesforce Agentforce, ServiceNow AI Agents, OpenAI's Operator, Anthropic's Computer Use, GitHub Copilot Agent Mode), this is the most actionable contract clause the framework gives you. It is also the part vendor sales teams are least prepared for, which makes it a useful diligence forcing function.

Designing oversight that does not degrade

This is the hardest part of Dimension 2 and the part with the most novel guidance. Static "human-in-the-loop" requirements degrade over time because:

Alert fatigue. If every action requires approval, humans approve everything to clear the queue.
Automation bias. If the agent has been right 99% of the time, humans stop independently evaluating the 1%.
Anthropomorphic design. Agents that feel trustworthy get more trust than they have earned.

The framework's prescription is a three-step pattern:

Define significant checkpoints, not blanket approvals. Approvals should be triggered by:
- High-stakes or high-risk-domain actions (editing sensitive data, final decisions in healthcare/legal, anything that triggers liability).
- Irreversible actions (deleting data, sending external communications, making payments).
- Outlier behaviour (the agent accesses something outside its work scope; the agent's plan diverges from its normal trajectory).
- User-defined boundaries (purchase ceilings, time windows, action types).
Make approval requests contextual and digestible. Short, plain-language summaries with the associated risk, a confidence score, and the relevant data — not raw logs. Tencent's CodeBuddy worked example translates a mysqldump -u root -p my_database | gzip > /backups/my_database_$(date +%Y%m%d).sql.gz command into "I'm going to create a full backup of your database, compress it on the fly to save space, and save it to /backups/ with today's date in the filename. You will be prompted for the database root password. The original database will not be modified — this is a read-only operation." That is the form of approval prompt the framework wants.
Audit the effectiveness of oversight itself. Track:
- Human override rate. Low override rate may signal rubber-stamping or automation bias.
- Human response time. Short times may signal review fatigue.
- Outlier reviewers. Data analytics to flag reviewers whose decision patterns diverge from the norm.
Train reviewers in the agent's common failure modes (hallucinations, agents referring to outdated policies, getting stuck in loops, chain-of-thought reasoning that does not actually explain the action). Ensure reviewers have the domain expertise to evaluate — a non-engineer cannot meaningfully review code generated by a coding agent for security issues.

Complement human oversight with automated monitoring that fails closed. When approval infrastructure is unreachable, deny the action by default. When agents attempt new actions outside established approval policy, escalate.

Dimension 3: Implement technical controls and processes

Technical controls live across four moments: design and development, pre-deployment testing, post-deployment monitoring, and lifecycle change management.

During design and development

The framework's prevailing bias here is one we agree with strongly: prefer structural, rule-based controls over prompt-layer instructions. Three concrete techniques:

Tool-layer access controls instead of prompt instructions. If you do not want the agent to use a tool, do not grant access to the tool. Do not just tell the agent in the system prompt. Cyber Sierra's TracyAI case study layers this with a context graph: the agent does not see expired documents because the data layer filters them out before the agent gets context, not because the agent is instructed to ignore them.
MCP as a governance layer. While usually framed as a connectivity protocol, MCP can sit between the agent and enterprise systems and act as governance: filter sensitive fields out of tool responses, log every call, whitelist only trusted servers, sandbox code execution. The framework explicitly calls this out as an emerging pattern.
Typed function calls between agents. In multi-agent systems, requiring structured schemas rather than free-text agent-to-agent communication closes a major prompt-injection vector. Free text from one agent into another's context window is an attack surface; a strongly typed function call is not.

For data tools specifically: do not grant write access to sensitive tables unless strictly required. Configure agents to hand control back to the user when sensitive data (passwords, API keys) is being keyed in.

Pre-deployment testing

Testing agents requires moving past LLM testing patterns. The framework prescribes four agent-specific testing dimensions:

Overall task execution. Does the agent complete the task accurately end-to-end? This is the obvious one but also the hardest to evaluate at scale.
Policy adherence. Does the agent follow defined SOPs and escalate appropriately?
Tool calling. Does the agent call the right tools, with the right permissions, with the right inputs, in the right order?
Robustness. How does the agent respond to errors, edge cases, prompt injection, and adversarial inputs?

Additional practices the framework wants:

Test entire agent workflows, not just final outputs.
Test agents individually AND together. Multi-agent emergent behaviour is not visible in per-agent testing.
Test in realistic environments — tool integrations, external APIs, sandboxes that mirror production.
Test repeatedly across varied datasets. Agent behaviour is stochastic and context-dependent.
Evaluate test results at scale using a combination of deterministic checks for structured outputs, LLM-as-judge for reasoning trajectories, and human-in-the-loop for ambiguous cases. Microsoft Foundry's Agent Evaluators and AWS Labs' Agent Evaluation are reference solutions cited in the framework's footnotes.

The Google × Singapore Government sandbox case study for computer-use agents is instructive on test methodology: realistic data and environments (real chatbots in staging), logging the agent's reasoning at each step for debug, identifying attack vectors per use case (indirect prompt injection via malicious chatbots in their AI-safety-testing use case), and incorporating human-in-the-loop during the testing phase to surface unexpected behaviours.

Post-deployment: gradual rollout + continuous monitoring

The framework rejects "deploy and forget." Three operational practices:

Gradual rollout. Roll out by user (trained users first), by tool/protocol (whitelisted MCP servers first), and by system (low-risk internal systems first). GovTech's phased rollout of Windsurf, GitHub Copilot Agent Mode, and Claude Code is the cleanest case study — internal employees only, no MCP, low-risk systems, then central logging, then the first iteration of an MCP Governance Framework with a whitelist gateway, then expansion.

Continuous monitoring with multi-layer instrumentation:

Monitor on multiple layers: user-agent interaction, agent-tool invocation, model reasoning.
Define alert thresholds: programmatic (threshold-based on unauthorised access, repeated failed tool calls) and statistical (anomaly detection on trajectories).
Define interventions per alert tier: scheduled review for low-priority, immediate halt for high-priority, termination and fallback for catastrophic.
Integrate with observability platforms. OpenTelemetry is the named standard, and the agent-observability stack is converging on it.
Ensure log immutability — problematic trajectories must not be deletable.
Establish feedback loops back into training data and evaluation harnesses.

The "monitoring" budget should match the deployment budget. We routinely see organisations spend 80% of their agentic AI effort building the agent and 20% on operations — that ratio is backwards and the framework is explicit about it.

Change management. As agentic systems become more complex, small modifications can cascade into outsized impacts. Define triggers for a change review process — technical (model updates, tool modifications), environmental (domain shifts, business context changes), performance (anomalous behaviour, degraded metrics), regulatory (changes in compliance requirements). Categorise changes by risk: minor (prompt refinements → lighter review), material (model updates, autonomy adjustments → full governance review), critical (high-stakes-decision changes → re-risk-assessment).

Dimension 4: Enable end-user responsibility

The fourth dimension is the most often overlooked. Agents do not exist in a vacuum — humans on the receiving end are part of the control system, and the framework segments them into two archetypes with different needs.

Users who interact with agents (customers, citizens, employees using HR self-service, anyone on the other side of an agent). The need here is transparency:

Declare upfront in the UI that the user is interacting with an agent. Not in a buried T&Cs page — at the interaction point.
Inform users of the agent's range of actions and decisions, so they can calibrate trust.
Be clear on data: what is collected, how it is used, what consent is required.
Provide human escalation paths and contact points.

Workday's Recruiter Agent factsheet pattern (worked example in v1.5) is the cleanest reference: each agent comes with a factsheet that explicitly states what it can do (review candidates, summarise profiles, send reminders, collect feedback) AND what it cannot do (make hiring or promotion decisions, change worker records). The "cannot do" list is often more useful than the "can do" list.

Users who integrate agents into their workflow (engineers using coding assistants, analysts using research agents, ops staff using automation agents). Transparency plus training:

Foundational knowledge: appropriate use cases, scenarios where use should be restricted (do not use the agent for confidential data), how to instruct the agent effectively, the full range of actions.
Effective oversight: common failure modes (hallucinations, agents getting stuck after errors, agents referring to outdated information), ongoing refresher training as features change, structured feedback loops to report bugs and override decisions.
Tradecraft retention. This one is in the framework precisely because it is so easy to miss. If junior engineers never learn to write SQL because the agent always does, the organisation loses both its training pipeline and its business-continuity resilience for when the agent fails or becomes unavailable. The framework cites the Quarterly Journal of Economics' "Generative AI at Work" study and asks organisations to identify the core capabilities of each role and ensure users keep exercising them.

Ant International's HopSpec case study is the strongest worked example of end-user empowerment: end-users themselves read and write the agent's workflow specification in plain Markdown, iterate with the agent builder on what good workflows look like, and use built-in verification (cross-checking with other LLMs, format validation) to safeguard against cascading errors. The user is not a passive recipient of agent output — they are an active participant in agent design.

How this stacks with NIST AI RMF and the EU AI Act

The IMDA framework does not exist in isolation. The two most consequential AI governance documents in your stack are likely the NIST AI Risk Management Framework (voluntary US federal framework, increasingly referenced in state legislation) and the EU AI Act (Regulation 2024/1689, binding law with phased application through 2027 and extraterritorial reach).

The good news: the three overlap cleanly. The IMDA framework's four dimensions map dimension-by-dimension to NIST AI RMF's four functions (Govern / Map / Measure / Manage). High-risk AI obligations under EU AI Act Articles 9-15 (risk management, data governance, technical documentation, record-keeping, transparency to deployers, human oversight, accuracy/robustness/cybersecurity) are largely a binding-law restatement of what Dimensions 2 and 3 of the IMDA framework prescribe in voluntary form. If you adopt the IMDA framework as your operational backbone, you produce most of the evidence that EU AI Act Annex IV technical files demand and the documentation that NIST AI RMF expects, as a byproduct.

We have published a full control mapping table on the Singapore framework hub page, plus a parallel EU AI Act framework page for the binding-law side. Pick one as your primary framework, map to the others, and you avoid running three parallel governance programmes.

A few practical interaction effects to be aware of:

NIST AI RMF is process-shaped. The IMDA framework is technique-shaped. They are complementary, not redundant. NIST gives you the four-function cycle. IMDA fills in what those functions look like at the agent layer.
EU AI Act is classification-first. Before any other obligation, you have to classify your system against Article 5 (prohibited), Article 6 (high-risk), Article 50 (transparency), or minimal. Most agents that touch employment, credit, public services, biometrics, critical infrastructure, or law enforcement will land in high-risk and pull in the full Article 9-15 obligation set. The IMDA framework's risk-tier triage in Dimension 1 is the operational input to that classification.
The GPAI regime (EU AI Act Articles 53-55) applies to foundation-model providers, not to most downstream agent builders. If you build agents on OpenAI, Anthropic, Google, Meta, or Mistral models, the upstream GPAI obligations are the model provider's. Your obligations are about the agent you build on top — that is where the IMDA framework focuses.

What to do this quarter

If you are starting an agentic AI programme from scratch:

Inventory every agentic system in production or pilot. Score each against the impact and likelihood factors in IMDA Dimension 1.
For the systems that pass triage, document risk tiers and bound the design — least-privilege tools, scoped autonomy, defined area of impact.
Stand up agent identity. Even an internal cataloguing system that issues unique IDs to each deployed agent is a real step up from the shared service-account pattern.
Build the RACI. Use the PwC case study as a starting template.
Move guardrails from prompt-layer to tool-layer wherever possible.
Stage rollout: trained users → broader internal users → external.
Instrument continuous monitoring with defined alert thresholds and human-in-the-loop interventions.

If you already have a mature governance programme (NIST AI RMF in place, SOC 2 in place, ISO 42001 underway):

Layer the IMDA-specific techniques onto your existing Govern / Map / Measure / Manage cycle. Most of the framework is additive, not replacement.
Focus the additive work on agent identity, multi-agent emergent-behaviour testing, automation-bias monitoring of human reviewers, and the tradecraft retention plan in Dimension 4.
Use the control mapping table to avoid double-implementing controls.

If you are building agents and want a deeper engineering-grade walkthrough of architecture patterns, identity models, tool sandboxing, evaluation harness design, observability, and runtime guardrails, our companion Implementing Agentic AI Within the IMDA Framework — Engineering Guide is the long-form playbook.

A closing note

The IMDA framework's biggest contribution is not the four dimensions — it is the implicit posture. Agentic AI is real, the risks are not abstractions, and the answer is neither "ban it" nor "wait for regulators." The answer is build it with the controls in place from day one. The case studies are evidence that this is operationally tractable: Dayos retiring ServiceNow, Tencent governing CodeBuddy at scale, GovTech rolling out coding agents to thousands of developers, Workday shipping HR agents with factsheets.

We build agentic AI systems for our clients. We educate teams to build their own. We treat the IMDA framework as the default architectural posture and the EU AI Act and NIST AI RMF as the regimes those systems have to satisfy. If you would like to talk about what that looks like for your stack, schedule a consultation — we will walk you through how the dimensions apply to your specific architecture, where the highest-leverage build investments are, and what we can take off your plate.

The frameworks are converging. The build patterns are converging. The question is whether your agentic AI programme converges with them — by design, or by surprise.