What are the rules you (whatever is providing the answer) act on?

Checked on February 4, 2026
Disclaimer: Factually can make mistakes. Please verify important information or breaking news. Learn more.

Executive summary

The rules that govern how a deployed conversational AI behaves are a layered mix of platform-level safety constraints, model training objectives and “constitutions,” user instructions, human-AI interaction design principles, and legal or institutional policy obligations; together they aim to make models helpful while minimizing harm and preserving transparency [1] [2] [3] [4]. Different stakeholders prioritize different layers—research papers and design toolkits emphasize interaction norms [5] [6], vendors publish model-level specs and principles [1] [7] [8], and regulators or institutions impose compliance and disclosure duties [4] [9] [10].

1. Platform-level and root rules: default constraints imposed by providers

At the highest layer, platform or “root-level” rules are injected by model providers to block clearly disallowed behaviour (for example to minimize harm, follow laws, and reflect social norms), and are treated as defaults that may only be overridden with caution—OpenAI’s Model Spec frames these as root-level rules and sensible defaults intended to reduce systemic risks from widely-used models [1] [7] [3]. These rules include prohibitions on illicit or unsafe outputs, privacy-protecting behavior, and limits designed to ensure outputs align with platform safety priorities while acknowledging that not all risks can be mitigated by model behavior alone [1] [7].

2. Follow-all-applicable-instructions: balancing user intent and safety defaults

Models are trained to follow user instructions but within the constraints of the system’s safety and policy layers; the Model Spec explicitly describes a hierarchy where following user instructions is expected but can be implicitly or explicitly overridden by higher-priority safety guidance (e.g., avoid harm, legal compliance), with examples showing how context can change which rules apply [1]. Vendors also describe allowing controlled tunability—such as “spiciness” settings for NSFW content—while maintaining guardrails [3] [1].

3. Constitutions, AI feedback, and training-time principles

Beyond hand-coded rules, some providers encode values into training processes: Anthropic’s “constitutional AI” uses curated principles (a constitution) and AI feedback to critique and revise outputs, aiming to scale safety preferences without relying solely on human labelers, and explicitly incorporates cross-cultural vistas and safety research into those principles [2]. This method shifts some behavioral governance from brittle rule lists to iterative, principle-guided training dynamics [2].

4. Human-AI interaction guidelines: usability, transparency, and error handling

Design research stresses interaction norms as operational rules: the CHI “Guidelines for Human-AI Interaction” and Microsoft’s companion toolkits set out concrete expectations for how models should behave during first contact, while wrong, and over time—advice that effectively becomes product-level rules about transparency (e.g., “tell users it’s AI”), clear intended use, and displays of model limitations such as “model facts” labels in safety-critical domains like healthcare [5] [6] [11]. These guidelines influence UI-level constraints that shape what models disclose and how they present uncertainty [5] [11].

5. Regulatory, institutional, and sector rules

Legal frameworks and institutional policies layer obligations on top of provider rules: the EU AI Act and its GPAI guidance impose documentation, incident reporting, and transparency requirements for general-purpose models, and require users be made aware they’re interacting with AI in certain contexts [4] [9]. Universities and organizations likewise enforce usage guidelines that require disclosure of AI use and guard against bias propagation, making these governance rules that operators and deployed assistants must follow [10].

6. Limits of available reporting and points of contest

Reporting and specs make clear that these rules are not monolithic: providers admit model behavior doesn’t yet fully reflect their published specs and that some defaults can be overridden or tuned for developers [1] [7], researchers warn of gaps in guidance for domain-specific systems [11], and regulators are still phasing in standards [4]. Sources differ on emphasis—product teams foreground tunability and usefulness [1] [3], safety researchers foreground systemic risk reduction and transparency [7] [5], and policy texts foreground accountability and documentation [4] [9]—so any claim about a single unified rulebook overstates what the current documents support [1] [7].

Want to dive deeper?
How does Constitutional AI differ from human-labeled reinforcement learning from human feedback in practice?
What transparency obligations does the EU AI Act impose on general-purpose AI providers?
How should model facts labels be designed for high-risk healthcare AI systems?