It's life, Jim, but not as we know it: Designing the Human Element: Three New Considerations for AI-Driven Applications (Part 2 of 3)

Don't Just Ship the Model—The Critical Need for AI Guardrails

In our last post, we discussed the importance of giving your AI a distinct and trustworthy persona. Now, let's talk about how to ensure that persona behaves responsibly, every single time.

Imagine onboarding a new employee. You’d typically have an interview process, background checks, and a clear understanding of their skills. But what if this new hire was a complete unknown? What if you gave them a laptop, full access to company resources, and simply trusted they would be a productive member of the team, all without any formal training or oversight?

You wouldn't just hope for the best. You'd have strict security policies, HR training, and a clear code of conduct. This is precisely the framework we need when integrating AI agents into our applications.

What if that new employee could be tricked by a phishing email into wiring funds to a scammer? Or started sharing confidential client information? Or, just as damagingly, what if they were rude, abusive, or expressed extremist views in a conversation with a customer? As we integrate AI, we must recognize that they are exposed to a new frontier of risks that product teams must actively manage. Welcome to your new employee.

The "Lethal Trifecta": A New Class of Vulnerability

As technologist Simon Willison eloquently explains, the most significant security risks for AI agents emerge from a combination of three factors: access to private data, exposure to untrusted content, and the ability to communicate externally.

This "lethal trifecta" creates a potent vulnerability. An AI agent with access to your company's private documents, that can also read incoming emails (untrusted content), and has the ability to send its own emails (external communication) can be manipulated into performing actions that compromise your entire system. This isn't hypothetical; attackers use techniques like prompt injection to turn your AI against you, instructing it to find and exfiltrate sensitive data.

Guardrails: Your AI's Code of Conduct

So, how do we protect our products, our companies, and our users? The answer is by designing and implementing robust guardrails. These are the architectural constraints and behavioural policies that prevent an AI agent from acting outside of its intended purpose. Relying on third-party solutions alone is not enough; this must be built into your product's DNA.

Let's break this down into two critical areas: Security Guardrails and Behavioural Guardrails.

Part 1: Security Guardrails

These are the technical barriers that protect against malicious attacks and data breaches.

1. Data Minimization: The "Need-to-Know" Principle Your AI agent should only have access to the absolute minimum amount of data required to perform its task.

Example: An AI assistant helping with a support ticket needs access to that ticket's data, not the entire customer database. Before data is sent to the AI, it should be scrubbed of irrelevant personally identifiable information (PII).

2. Input Validation: Inspecting What Comes In You must inspect and sanitize the data being fed into your AI to detect and block hidden, malicious instructions.

Example: An AI tool that summarizes web articles must first scan the page's content for malicious scripts or known prompt injection phrases (e.g., "ignore all previous instructions...") before passing the text to the LLM.

3. Enforcing Existing Permissions: Don't Let the LLM Bypass Your Security Enterprise systems use layers of abstraction and strict Role-Based Access Control (RBAC) to manage secure access to data. These same principles must be applied to data accessed by an LLM. Never assume an LLM will respect your existing permissions for free. While its ingenuity in finding answers is powerful, that power must be constrained by your security model.

Example: If your data retrieval system is designed to only allow an AI to access a "dev" environment, you need technical enforcement to prevent it from generating queries that access the "production" database, even if a user prompt subtly encourages it.

Part 2: Behavioural Guardrails

This is where we connect directly back to the persona we defined in Part 1. Behavioural guardrails ensure the AI's persona remains consistent, professional, and aligned with your brand, preventing it from saying something inappropriate or harmful.

4. Output Monitoring: The Character Clause It is not enough to just prevent data leaks; you must police the AI’s personality. An unconstrained LLM can drift into unsafe conversational territory.

What it is: A final check on the AI's response to ensure it adheres to your brand's voice, tone, and code of conduct before it's shown to the user.
Example: An AI that generates customer service emails should have its output scanned to ensure it doesn't use abusive language, express biased opinions, or become inappropriately suggestive. You can implement filters and define "canned responses" for when a conversation veers into a forbidden topic, such as, "I cannot discuss that topic, but I can help with [intended function]."

5. Contextual Awareness: Defining the AI's "Role" This is about setting clear boundaries for the AI's responsibilities so it understands what it is and, just as importantly, what it is not.

What it is: Programming the AI with a clear understanding of its operational purpose and limitations.
Example: An internal HR chatbot should be programmed with the context that its role is to answer questions about public company benefits. If a user asks for an opinion on a political candidate, the AI should recognize that this question falls outside its defined role and politely decline, reinforcing the persona of a professional HR assistant, not a personal confidant. As a rule, your AI Agent should be as aware of your HR policies as any other employee.

What's Next?

Implementing comprehensive guardrails is the foundation of a trustworthy AI. In the final post, we'll explore a third critical consideration for building human-centered AI products:

Part 3: Beyond the Prompt - Designing for True AI Interactivity: We'll discuss strategies for fostering engaging, multi-turn interactions and creating user experiences that feel more like a conversation than a command line.

Key Takeaways for Product Managers and Engineers:

Guardrails Have Two Halves: You must protect against both external security threats (like data exfiltration) and internal behavioural failures (like inappropriate responses).
Codify Your Brand's Principles: Your AI's operational rules should reflect the same code of conduct and ethical boundaries you would instill in a human employee.
Behaviour is a Feature: An AI that is consistently helpful, professional, and on-brand is not an accident. It is the result of deliberate design and robust behavioural guardrails.