AI RAG: How Does the New Enter the World?

Imagine a morning in 2028. Your news portal delivers stock prices, quarterly figures, sports results and weather reports – all generated by algorithms that scan press releases, databases and archives in fractions of a second. The routine news item is perfected, flawlessly written, available in twelve languages. Then something unexpected happens: a mid-sized company systematically conceals its financial statements, a government ministry issues internal instructions that contradict its official communications, a pharmaceutical company suppresses clinical trial data. Nobody reports on it. Not because the algorithms fail – but because what they could find simply does not exist. Not yet.

This scenario is not a dystopia but the logical consequence of a technology whose strengths and limitations are systematically confused in public debate. Retrieval-Augmented Generation – RAG for short – is currently the most powerful method for connecting large language models with current, specific knowledge. Yet precisely because RAG is so good at assembling what already exists, its success obscures a fundamental limitation: the creation gap.

What Retrieval-Augmented Generation Can Really Do

To understand the creation gap, one must first grasp what RAG actually achieves. The method, first formalised in 2020 by researchers at Meta AI, combines two capabilities: the targeted retrieval of relevant information from external data sources and the linguistic processing of that information by a generative language model.

The RAG Pipeline in Three Steps

1. Retrieval: A search query is converted into a mathematical vector and matched against a database. The system identifies the most relevant text passages – similar to a highly specialised search engine that filters not by keywords but by semantic proximity.

2. Augmentation: The retrieved passages are passed to the language model together with the original question as context. The model thereby gains access to information that goes beyond its training knowledge.

3. Generation: The language model formulates an answer based on the retrieved information. It synthesises, condenses and contextualises – but ideally adds nothing fictitious.

The strengths of this approach are considerable. RAG systems can bring together information across linguistic and format boundaries. They recognise patterns in volumes of data that no human analyst could work through in a reasonable time. They operate around the clock, do not tire and forget nothing that has been indexed. For news production, this means: routine reports on quarterly results, sports scores, stock movements or government statistics can be produced at a quality and speed that human newsrooms cannot match.

RAG is, however, only one building block in an increasingly complex ecosystem. Modern AI agents combine RAG with autonomous tool use, web research and independent hypothesis formation. So-called agentic AI systems can query databases, detect anomalies in documents and independently pursue lines of enquiry – capabilities that go far beyond mere text retrieval. The question of whether AI can discover something genuinely new must therefore be posed more carefully than a focus on RAG alone would suggest.

And yet even the most advanced AI architecture encounters a limit that is not technical but epistemological in nature. Even autonomous agents operate within formalised rules and digitised data spaces. They can flag an anomaly in a balance sheet – but they cannot make the phone call that persuades an insider to reveal the truth behind the anomaly. Both RAG and agentic AI operate within the horizon of what has already been digitally captured, structured and made retrievable. They are systems of recombination and pattern recognition – not of the social interaction that makes investigative work possible in the first place.

The Blind Spot: What Is Not in the Database

The creation gap describes the architectural divide between the recombination of existing information and the emergence of original insight. It is not a deficiency that could be remedied by better models, larger databases or faster processors. It is inscribed in the method itself.

The difference can be illustrated by a simple pair of terms: reporting news and discovering news are two fundamentally different activities. Reporting means processing known facts linguistically – summarising a press release, analysing an annual report, contextualising a speech. RAG can do this superbly. Discovering, by contrast, means making a fact visible for the first time – through research, through building relationships of trust with sources, through the persistent questioning that uncovers a contradiction no one had yet articulated.

RAG can only find what already exists. The decisive stories, however, are those that come into being only through the act of asking.

An algorithm that accesses a knowledge base can determine that a company reports high profits. It can draw historical comparisons, aggregate analyst opinions and identify industry trends. What it cannot do is develop the suspicion that those profits might be fabricated. For that suspicion presupposes an insight that exists in no database – it must first be generated, through critical thinking, through experience, through the ability to hear what is left unsaid.

The Thought Experiment: A World Without Journalists

To appreciate the full implications of the creation gap, a thought experiment is instructive. What would happen if newsrooms were entirely replaced by RAG systems?

Press conferences without a critical press would be pure announcement events. Every piece of information released by a company, a government agency or a government would flow into the information cycle unchecked. The asymmetry between sender and receiver, which journalism at least partially corrects through investigative reporting, would be abolished.

The whistleblower problem sharpens this consideration. People who expose wrongdoing need a counterpart they trust. They need source protection, journalistic diligence and the assurance that their information will be handled responsibly. An algorithm offers none of this. No RAG system has a telephone number that a concerned employee can ring after hours. No language model can grant source protection or verify the credibility of information through cross-checking.

The Wirecard Scenario

The Wirecard case illustrates this problem with full force – and at the same time its nuances. There were in fact isolated critical reports as early as 2008; from 2016 onwards, short sellers such as Zatarra Research published detailed analyses pointing to inconsistencies in the balance sheet structure. A capable anomaly detection system might also have flagged these discrepancies. The warning signs were in the data – if you knew what to look for.

Yet detection is not disclosure. It took journalists at the Financial Times, notably Dan McCrum and his team, who pursued these leads over years, built sources in Asia, verified documents on the ground and withstood massive legal pushback – lawsuits, intimidation, BaFin filing criminal complaints against the journalists rather than the company. The realisation that 1.9 billion euros in trust accounts did not exist could not be extracted from a database. It had to be generated through human investigation, at personal risk, against institutional resistance. No algorithm withstands that kind of pressure.

Information Entropy and Model Collapse

The problem extends beyond individual cases. In computer science, the concept of information entropy describes the degree of unpredictability in a system. The more homogeneous the available information, the lower the entropy – and the less that is genuinely new can be derived from it.

If RAG systems increasingly produce content that in turn serves as training and retrieval data for other RAG systems, a feedback loop emerges. AI researchers call this model collapse: a gradual loss of diversity and differentiation that occurs when generative systems are predominantly trained on machine-generated data. The information landscape does not become wrong in the strict sense – it becomes uniform. And uniformity is the opposite of what an open society needs in terms of information diversity.

Journalism as a Process of Discovery

The question of how the new enters the world is at its core an epistemological one. It touches on the philosophical question of whether knowledge arises exclusively through the recombination of existing information, or whether there are processes that enable genuine insight beyond the known.

Hannah Arendt coined the phrase "thinking without a banister" – the ability to judge without pre-made categories, to expose oneself to the unknown and to create meaning where none yet exists. Artificial intelligence thinks, if one may use the term at all, by definition with a banister. Its banister consists of training data, retrieval databases and the statistical patterns it has learned. Within this banister it moves with impressive confidence. Beyond it – quite literally – there is nothing.

Thinking without a banister means making judgements where no categories exist. This is precisely what distinguishes investigative journalism from algorithmic information processing.

The fourth estate – the press's oversight function vis-à-vis the state and the economy – rests on precisely this capability. Admittedly, in practice it is the exception rather than the rule. Many newsrooms reproduce agency dispatches, follow routines and work under time and cost pressure far removed from any philosophical ambition. But the possibility that a journalist might stand against the prevailing narrative, ask an uncomfortable question whose answer no one yet knows, pursue a suspicion without knowing whether it will be confirmed – this possibility is constitutive for an open society.

Equally, it would be too narrow to regard professional journalism as the sole source of the new. Scientists, activists, OSINT specialists such as Bellingcat and whistleblower platforms produce original insights that reshape public discourse. Yet they all operate within structures – peer review, legal protection, editorial scrutiny – that ensure quality. The question is not whether only journalists discover new things, but whether professional structures exist that verify, contextualise and responsibly publish discoveries.

A RAG system can perform a balance sheet analysis that surpasses any human analyst in technical precision. But it cannot develop the impulse to question a balance sheet that appears correct at first glance. This asymmetry is, by the current state of technology, fundamental.

The Tool, Not the Replacement

The analysis so far might suggest that RAG is a threat to journalism. The opposite is the case – if properly understood. The real danger lies not in the technology but in confusing it with what it is not.

Perhaps the most compelling example of human-machine collaboration is the Panama Papers. When the Süddeutsche Zeitung received a dataset of 11.5 million documents in 2016, it was clear that no team of journalists could have evaluated this volume manually. Only the use of text recognition, pattern detection and document classification software made the evaluation possible. Over 400 journalists in 80 countries worked together, supported by algorithmic tools.

But the crucial point is often overlooked: the impulse was human. An anonymous source approached a journalist – not an algorithm. The decision to verify the dataset, to weigh the risks of publication and to place the findings in a societal context lay with people. The technology was a tool, not the originator.

RAG systems could strengthen journalism in a similar way: by automating routine tasks, freeing newsrooms for investigative work. By analysing large datasets, opening new lines of enquiry. By monitoring data streams, flagging anomalies that a journalist can then pursue.

The Productivity Paradox

Yet here an economic dilemma emerges that will preoccupy the industry in the years ahead. AI significantly increases the productivity of news production. At the same time, the availability of automated content reduces the willingness to pay for precisely those routine reports that have hitherto funded a considerable share of media revenue. Agency journalism, local news portals, specialised trade media – they all face the question of how quality journalism can be financed when the baseline work is done by machines.

The efficiency gains from RAG could therefore paradoxically undermine the very structures that make investigative journalism possible. A newsroom that automates its routine reporting saves costs. But if the revenue from that same reporting simultaneously collapses, there is less room for costly investigations – not more.

Recommendations

The creation gap cannot be closed, but it can be shaped. The following recommendations are addressed to media organisations, regulators, educational institutions and the technology industry.

1. Deploy AI as a tool, not a replacement

Media organisations should deploy RAG systems specifically for routine tasks – data analysis, summaries, monitoring – and consistently reinvest the freed resources in investigative reporting and quality journalism. Automating baseline work is not a cost-cutting programme but an opportunity for qualitative enhancement.

2. Implement labelling requirements rigorously

The EU AI Act mandates the labelling of AI-generated content from 2026. Media organisations should go beyond the statutory minimum and make transparency a brand hallmark. Readers deserve clarity on whether an article was assembled by an algorithm or researched by a journalist.

3. Actively foster new professional roles

At the intersection of journalism and technology, new roles are emerging: the AI editor, who curates and verifies machine-generated content. The verification specialist, who checks algorithmically generated information for accuracy. The source relationship manager, who cultivates human source relationships. Educational institutions must embed these roles in their curricula.

4. Strengthen willingness to pay for quality journalism

When the routine news item becomes a commodity, the value of the original rises. Society, policymakers and the media industry must jointly develop models that sustainably fund investigative journalism – whether through foundations, membership models or public funding. Those unwilling to pay for journalism will lose it.

5. Develop RAG systems with journalistic standards

Technology companies offering RAG systems for the news sector should build journalistic principles into their architecture: source attribution, transparency about data gaps, automatic flagging of uncertainties. A RAG system that communicates its own limitations is more valuable than one that simulates omniscience.

Outlook 2030: How News Will Be Created in the Future

The most likely development is not displacement but stratification. Three layers of news production will emerge, each with its own logic and value creation.

The bottom layer – routine reports, data evaluations, standardised updates – will be largely automated. RAG systems will process quarterly figures in seconds, handle sports results in real time and present government announcements in readable form. This layer will be efficient, reliable and inexpensive.

The middle layer – analysis, contextualisation, commentary – will be augmented. Journalists will use AI tools to research faster, penetrate larger datasets and visualise complex relationships. The machine provides the material; the human provides the judgement.

The top layer – investigative reporting, exposés, the discovery of the unknown – will remain inherently human. This is where the societal value is created that distinguishes an informed democracy from a managed public sphere.

New professional roles will connect these layers. The AI editor curates machine-generated content and checks it for consistency and relevance. The prompt journalist formulates queries to RAG systems that open investigative leads rather than reproducing what is already known. The verification specialist checks algorithmically generated information against independent sources. And the source relationship manager maintains the human relationships that no algorithm can replace.

Trust will be the decisive currency in this landscape. Media brands that demonstrably invest in quality will become more valuable – precisely because the environment is becoming noisier and more confusing. The readers of 2030 will not have less information at their disposal but more. Their problem will not be scarcity but discernment.

The decisive question is therefore not: how well can AI write news? But rather: how do we ensure that there continue to be people who discover news?

Christian Schablitzki

Strategy & Management Consultant · Agentic AI Expert for Financial Services

Over 20 years in investment banking and derivatives trading, followed by more than 10 years advising financial institutions. Currently Partner at Infosys Consulting in Germany. Certified in Google AI, Generative AI Leader (Google Cloud) and IBM RAG and Agentic AI.

LinkedIn profile →

newsletter

the agentic banker

Keep reading – in your inbox every two weeks.

Capital markets insights, regulatory updates and AI trends. Concise, well-founded, free of charge.

GDPR-compliant. Unsubscribe at any time.

← Back to overview