Goldman Sachs AI Agents: When Machines Audit the Books

In February 2026, Goldman Sachs made an announcement that reverberated well beyond Wall Street: the investment bank had spent six months embedding Anthropic engineers within its technology teams to co-develop autonomous AI agents for trade accounting and client onboarding. It is the most ambitious move yet by a Tier-1 bank to automate rule-based back-office processes with AI agents – not as a pilot in an innovation sandbox, but as an operational tool for processes that move trillions in assets.

For European financial institutions, this step matters for two reasons. First, Goldman demonstrates what is technically possible when a bank treats AI agents not as chatbot extensions, but as autonomous process actors in core functions such as reconciliation, Know Your Customer (KYC) and Anti-Money Laundering (AML). Second, on 2 August 2026, the high-risk requirements of the EU Artificial Intelligence Act (EU AI Act) take effect – and AI agents that make compliance decisions fall squarely within that category.

At a Glance

What: Goldman Sachs co-develops autonomous AI agents with Anthropic for accounting, compliance and onboarding

Technology: Anthropic Claude, co-developed during a six-month embedded partnership

Use cases: Trade accounting, reconciliation, KYC/AML screening, trading surveillance

Results: Onboarding times –30%, developer productivity +20%, exception queues –80% (target)

EU context: From 2 August 2026, the EU AI Act's high-risk requirements apply to AI in financial functions

The Anthropic Partnership: From Coding Assistant to Compliance Agent

Six Months of Embedded Engineering

The path to autonomous AI agents at Goldman Sachs did not begin with a strategic master plan, but with an observation. When Goldman developers first tested Anthropic's Claude as a coding assistant, they noticed the model's reasoning capabilities extended well beyond simple code generation. Marco Argenti, Goldman Sachs' Chief Information Officer (CIO), described the moment: engineers recognised that Claude's reasoning abilities were strong enough to handle more complex financial tasks.

What followed was a form of collaboration unusual in the financial industry. Goldman embedded Anthropic engineers for six months directly within its own technology teams. They worked alongside the bank's domain experts – not in a shielded lab, but directly on the processes to be automated. They observed how staff navigate their computers, where processes bottleneck, and what the actual work looks like in practice.

The beauty of neural networks lies in reasoning like humans in micro use cases, complementing rules-based systems toward full automation. Marco Argenti, CIO Goldman Sachs, speaking to American Banker

Jonathan Pelosi, Head of Financial Services at Anthropic, articulated the approach from the technology partner's perspective: Claude is designed for data-intensive reasoning problems requiring judgement – where rules alone fall short. This positioning is not mere marketing: trade accounting and compliance are precisely the domains where rule-based automation has hit its limits for decades, because exceptions, edge cases and context-dependent decisions account for the bulk of the workload.

The Concrete Use Cases: What Goldman Is Automating

Trade Accounting and Reconciliation

The first and most advanced deployment area is trade accounting. The AI agents process transaction data, cross-reference it against regulatory frameworks and identify discrepancies in the books. Goldman expects to reduce exception queues – those backlogs of unresolved discrepancies previously handled manually – by 80 per cent. This is not incremental improvement; it is a paradigm shift for a function that in most banks still relies on spreadsheets and manual workflows.

The decisive difference from previous automation attempts: the agents do not merely process structured data according to fixed rules. They apply context-dependent logic to multi-step workflows that previously required human judgement. In Argenti's words, teams could handle five to ten times the case volume – or the same volume in a fraction of the time.

Client Onboarding and KYC/AML

The second core use case involves client onboarding, including KYC and AML checks. The AI agents review documents, extract relevant entities, analyse ownership structures and identify beneficial owners. Internal tests showed a 30 per cent reduction in onboarding times.

Particularly noteworthy is the agents' ability to detect discrepancies in identity document verification – a task that previously required highly qualified compliance staff. The agent checks not only whether a document is formally correct, but can flag inconsistencies between different data sources, thereby identifying potential money-laundering risks early.

Trading Surveillance: The Joint Project with Deutsche Bank

In parallel with the accounting initiative, Goldman Sachs is working on another particularly sensitive deployment area: trade surveillance. Together with Deutsche Bank, Goldman is testing AI agents for trading surveillance. Both institutions have independently begun developing the next generation of market abuse detection.

The fundamental difference from conventional surveillance systems: instead of applying static rules with fixed thresholds, the AI agents analyse trading behaviour across multiple signals in real time. They compare current activity against historical patterns and flag constellations that do not violate any clear rule but nonetheless stand out as unusual. Goldman is collaborating with Anthropic, while Deutsche Bank partners with Google Cloud.

Goldman Sachs – AI Agents at a Glance

Trade Accounting: Transaction matching, exception handling, rule-based bookkeeping – Target: –80% exception queues

Client Onboarding: KYC, AML screening, document review, ownership analysis – Result: –30% onboarding time

Trading Surveillance: Real-time behavioural analysis across multiple signals (with Deutsche Bank) – Pilot phase

Developer Productivity: Autonomous coding agent Devin for 12,000 developers – Result: 3–4x productivity gain

Governance: Human-in-the-Loop Is Not Enough

Goldman's Control Architecture

Goldman Sachs has built a multi-layered control system for its AI agents that goes beyond the standard "human-in-the-loop" promise. The architecture comprises four layers: programmatic validation layers for deterministic verification, Retrieval Augmented Generation (RAG) for source grounding, chain-of-verification prompting for multi-step plausibility checks, and full source attribution with a comprehensive audit trail.

What matters is the interplay: the AI agents do not operate in a vacuum but are embedded in a mesh of rule-based systems that validate the neural network's output against deterministic business logic. Every agent action is logged, every decision path can be traced by risk officers and regulators.

Who Is Liable When an Agent Decides?

The central question Goldman has so far answered only implicitly is: who bears responsibility when an AI agent makes an error in a compliance process? The bank emphasises that human staff intervene at multiple points and retain final authority. Yet as the degree of automation increases, this boundary becomes more diffuse.

The EU AI Act provides a clear answer: responsibility lies with the deployer – that is, the bank, not the technology provider. Article 26 obliges the deploying organisation to ensure the AI system operates safely and lawfully. IT leaders bear ultimate accountability. If an institution cannot trace an agent's actions or control its authority, it cannot demonstrate safe and lawful operation to regulators.

Requirement	EU AI Act (from August 2026)	Goldman Sachs Approach
Audit Trail	Centralised, encrypted activity log for all AI actions	Full source attribution and comprehensive audit trail
Human Oversight	Humans must be able to reject any proposed action; sufficient context for informed decisions	Human-in-the-loop at multiple stages; agents framed as "digital colleagues"
Agent Registry	Every agent uniquely identified with documented capabilities and permissions	Model Risk Management (MRM) framework with bias detection
Explainability	Third-party AI systems must be interpretable by users	Chain-of-verification prompting; RAG source grounding
Accountability	Deployer (bank) is responsible, not the AI provider	Internal accountability emphasised; legal delineation with Anthropic not public

The Industry Is Moving: From Pilots to Scale

BNY Mellon: Digital Employees with Their Own Logins

Goldman Sachs is not alone. The Bank of New York (BNY) Mellon has adopted an approach that takes the "digital colleague" logic even further: its AI agents receive their own logins, are assigned to specific teams, and will soon gain access to email and Microsoft Teams. BNY's AI Hub developed two digital employee personas – one focused on identifying and fixing code vulnerabilities, the other on validating payment instructions. Development of both took three months.

The difference from Goldman lies in the framing: while Goldman positions its agents as process-supporting tools, BNY treats them as organisational units with defined roles and communication channels. Both approaches have governance implications – but BNY's model raises additional questions: what happens when an AI agent independently sends an email to a compliance officer containing an incorrect risk assessment?

The Industry Numbers

The NVIDIA State of AI in Financial Services Report 2026 quantifies the adoption rate: 82 per cent of surveyed financial firms are deploying AI, 21 per cent have already deployed AI agents, and 42 per cent are using or evaluating agentic AI. At the same time, the survey reveals the reality beyond the headlines: only ten per cent of companies have scaled the technology. The rest remain in the pilot phase.

KPMG estimates global spending on agentic AI at USD 50 billion in 2025, with the share of finance teams using agentic AI projected to reach 44 per cent in 2026 – a rise of over 600 per cent. And according to a survey of banking executives, 57 per cent expect AI agents to be fully embedded in risk, compliance and audit functions within three years.

Claude is designed for data-intensive reasoning problems requiring judgement – where rules alone fall short. Jonathan Pelosi, Head of Financial Services, Anthropic

The EU AI Act: The Deadline That Does Not Apply to Goldman – But Very Much to European Banks

What Takes Effect from August 2026

On 2 August 2026, the EU AI Act's requirements for high-risk AI systems come into force. AI agents deployed in financial functions such as compliance, risk assessment or regulatory reporting fall under this category. The requirements are substantial: technical documentation of decision logic, open-loop architecture preventing isolated operation, structured human oversight with clear intervention points, and control mechanisms enabling the system to be stopped or corrected.

For European banks seeking to adapt Goldman Sachs' approach, this means: the use case is promising, but the regulatory requirements are higher than in the United States. An AI agent that makes KYC decisions or conducts trading surveillance must not only function technically – it must be demonstrably compliant. The EU AI Act's penalty provisions leave no margin: up to EUR 35 million or seven per cent of global annual turnover for prohibited practices, up to EUR 15 million or three per cent for other infringements.

The Governance Gap

The fundamental problem for European banks is not the technology – it is the governance. The EU AI Act requires that authorities may demand access to logs and technical documentation at any time. Agentic AI creates a particular challenge here because these systems can act without a clear record of what, when and why they undertook their tasks. Providing merely a prompt or a confidence score is insufficient – decision-makers need information about context, agent authority and available intervention time.

Goldman Sachs' approach – the combination of RAG, chain-of-verification and programmatic validation – points in the right direction. Whether it would withstand European regulatory scrutiny, however, remains to be seen. The projected annual cost of non-compliance for the European financial sector exceeds EUR 2.5 billion.

What European Banks Can Learn from Goldman – and What They Cannot

Goldman's initiative reveals three patterns that extend beyond the specific use case. First, the most successful AI agent implementations begin not with the technology but with the observation of real work processes. The six-month embedding of Anthropic engineers is resource-intensive, but it prevents the most common error – applying AI to processes one does not understand deeply enough.

Second, the transition from coding assistant to compliance agent was not a planned strategic pivot but an emergent insight. This argues for an exploratory approach to AI agent adoption – but it also warns: those who migrate AI agents into critical compliance functions need governance that keeps pace with the technology's capability.

Third, Goldman's approach demonstrates the limits of transferability. A bank with 46,000 employees, 12,000 developers and the capacity to embed external engineers for six months operates in a different reality from a mid-sized European bank with a constrained IT budget and regulatory pressure on multiple fronts.

Recommendations for European Financial Institutions

The Goldman Sachs case demonstrates that AI agents in accounting and compliance are operationally viable. European banks face a dual challenge: maintaining technological relevance while simultaneously meeting the EU AI Act's regulatory requirements from August 2026.

1. Conduct High-Risk Classification Now

Immediately: Every institution should inventory its AI systems and assess which fall under the EU AI Act's high-risk category. AI agents in compliance, KYC/AML, risk assessment and regulatory reporting are very likely affected. By August 2026, technical documentation, human oversight mechanisms and audit trails must be in place.

2. Build an Agent Registry and Audit Trail

Q2 2026: The EU AI Act requires that every agent be uniquely identified and its authorities documented. Institutions should establish a central agent registry capturing capabilities, permissions and decision logs for every deployed AI agent. A comprehensive audit trail is not optional best practice – it is a regulatory obligation.

3. Governance Before Scale

Q2–Q3 2026: Goldman's control architecture – programmatic validation, RAG source grounding, chain-of-verification – offers a blueprint European banks can adapt. The critical point: governance infrastructure must precede agent scaling, not follow it. Concretely, this means extending model risk management frameworks to cover AI agents, assigning clear accountabilities and defining escalation paths.

4. Identify Use Cases with High Automation Potential

Q3 2026: Trade reconciliation, KYC onboarding and transaction monitoring offer – as the Goldman case demonstrates – the greatest potential for AI agent automation. European banks should analyse these processes in detail: where are the exception queues? Which steps require judgement, which are rule-based? Goldman's method of process observation through embedded engineers is resource-intensive but effective.

5. Build Competence: From AI Users to AI Supervisors

Ongoing: 92 per cent of banks report a skills gap in AI agent deployment. The compliance officer's role is shifting from manual case handler to AI supervisor. Institutions must invest in training – not only for IT teams but for compliance, risk and audit functions that will increasingly evaluate and approve AI agent outputs.

Goldman Sachs has set a reference point with its AI agents for accounting and compliance against which the entire industry will be measured. The question for European banks is not whether they will deploy AI agents in these functions – but whether they are prepared to do so under the EU AI Act's stricter regulatory conditions before the competitive advantage of early adopters becomes insurmountable.

Timeline: AI Agents in Banking – From Goldman to the EU AI Act

Key milestones in adoption and regulation

July 2025

Goldman Sachs pilots Devin as autonomous coding agent

First Tier-1 bank with an autonomous software engineer; 3–4x productivity gain over previous AI tools.

H2 2025

Goldman begins embedded partnership with Anthropic

Six months of co-development of AI agents for trade accounting and client onboarding.

July 2025

BNY Mellon introduces digital employees

AI agents with their own logins for code review and payment validation.

February 2026

Goldman-Anthropic partnership goes public

Announcement of Claude-based agents for accounting and compliance; results: –30% onboarding time.

February/March 2026

Goldman and Deutsche Bank test AI surveillance

Agentic AI for trading surveillance – Goldman with Anthropic, Deutsche Bank with Google Cloud.

2 August 2026

EU AI Act: High-risk requirements take effect

AI agents in financial functions must demonstrate audit trails, human oversight and technical documentation.

H2 2026

Expected production rollout at Goldman

Transition from pilot projects to production AI agents in back-office functions.

Christian Schablitzki

Strategy & Management Consultant · Agentic AI Expert for Financial Services

Over 20 years in investment banking and derivatives trading, followed by more than 10 years advising financial institutions. Currently Partner at Infosys Consulting in Germany. Certified in Google AI, Generative AI Leader (Google Cloud) and IBM RAG and Agentic AI.

LinkedIn profile →

newsletter

the agentic banker

Keep reading – delivered to your inbox every two weeks.

Capital markets insights, regulatory updates and AI trends. Concise, substantive, free.

GDPR-compliant. Unsubscribe any time.

← Back to overview