The honest answer to "can we use AI in MLR" is yes, but only in one shape. AI can run as an augmenting pre-check that flags issues for a human reviewer. It cannot run as the thing that approves the asset. Most of the anxiety inside pharma comes from teams who have not yet drawn the line between the two.
That anxiety is rational. In a 2025 survey of US pharma marketing and promotional-review professionals, 65 percent said they do not trust AI for creating regulatory compliance submissions. The same survey breaks that distrust into its drivers, and every driver maps cleanly onto something a regulator actually requires. This piece takes the 65 percent apart, driver by driver, and converts each one into a design rule. Follow the three rules and you get an AI a skeptical regulatory team can adopt without surrendering the documented human oversight the rules demand.
The 65 percent is a correct read of the requirements
The skeptics have been specific about what they fear. In the same study, the top concern was hallucination, named by 40 percent of respondents, followed by a lack of traceability or audit trail at 20 percent, and a lack of transparency or explainability at 12.5 percent. The fear is not hypothetical. More than half of respondents, 56 percent, said using AI had already led to errors in their work, most likely from relying on incorrect or hallucinated AI-generated content.
Those three drivers are hallucination, no audit trail, and no explainability. Hold them next to what a promotional-review function is legally obligated to produce: claims that are true and not misleading, a documented chain from every external-facing claim back to an authorised source, and a record that a reviewer can stand behind under inspection. The distrust drivers are the exact inverse of the compliance requirements. A reviewer who refuses to trust a tool that hallucinates, hides its sources, and cannot explain itself is doing their job.
So the 65 percent is not an obstacle to design around. It is a specification. Each fear is a requirement stated in the negative; flip all three from negative to positive and you have the design.
Driver one, hallucination: deterministic checks, not generative paraphrase
Hallucination is the headline fear at 40 percent, and most people misdiagnose it. The instinct is to chase a model that hallucinates less. That is the wrong fix. A large language model that paraphrases your safety language can produce a fluent, plausible, subtly wrong sentence on its best day. Fluency is the failure mode. The fix is to stop asking the model to generate the thing that matters.
The rule that falls out of this: the parts of a pre-check that touch a regulated claim must be deterministic, not generative. A deterministic check does not write a new sentence about your drug. It extracts the claim already in the asset and tests it against fixed criteria. Is there an efficacy claim with no fair balance within reach. Is the Important Safety Information present and complete. Does this sentence map to an indication the label supports. Does the cited reference contain the data point attached to it. Those are pass-or-fail questions with auditable answers. They do not require the model to invent anything, so there is nothing to hallucinate.
Generative AI stays on the side of the line where a mistake is cheap: drafting a first pass of marketing copy, suggesting plain-language rewrites, summarising a long reference for a human to verify. Every one of those outputs is an input to review, not an output of it. The effective pattern is AI pre-checks that augment, not replace, reviewers, with the human accountable for the judgment. The model can help write the asset. It must not be the thing that certifies the asset is safe to ship.
The answer to the hallucination fear is not "trust the model more." It is "give the model less to make up." A pre-check that checks rather than writes removes the surface the objection lives on.
Driver two, no audit trail: full traceability to the approved source
The second driver, named by 20 percent, is the absence of a traceable record. This one is non-negotiable. Regulators require an auditable trail tracing every external-facing claim back to an authorised source, and every generative-AI output still has to pass MLR review before it goes anywhere. A black box that emits a verdict with no lineage manufactures a new governance problem: an approval decision in your file that no one can reconstruct.
The rule: every flag the AI raises must point at the approved source it checked against, and the entire interaction must be logged. When the pre-check says an efficacy claim lacks fair balance, the output is not "this looks non-compliant." It is "this claim, on this page, is missing fair balance, measured against this section of the approved label, at this timestamp, in this version of the asset." The reviewer is handed a citation they can open and check in seconds, not a score to trust.
A pre-check that surfaces issues but produces no record is a tool a regulatory team cannot defend. One that backs every flag with a 21 CFR Part 11-supporting trail, captures who reviewed what and when, and carries the e-signature sign-off the final approval requires, strengthens the file instead of weakening it. The test is brutal: if an inspector asks why this asset was approved, can you reconstruct the entire chain, including what the AI flagged and what the human decided to do about it. If the answer is no, the AI has no business near MLR. If the answer is yes, the audit-trail fear is answered by design rather than by reassurance.
Driver three, no explainability: every flag shows its work
The third driver, at 12.5 percent, is the smallest number on the page and the one that quietly decides whether any of this gets adopted. A reviewer will not stake their name on a verdict they cannot interrogate. "The model rated this asset 0.82" is not a finding a regulatory professional can act on, defend, or sign. It is an oracle, and an oracle is what a compliance function is built to distrust.
The rule: every flag has to be legible to the human who acts on it. Explainable does not mean a research paper on the model's internals. It means the flag states what triggered it, which rule it tested, what the asset said, what the approved source says instead, and why the gap matters. A reviewer should be able to read a flag and immediately confirm it and fix the asset, dismiss it with a documented reason, or escalate it. A flag that cannot be triaged that way is noise, and noise in a compliance tool is worse than silence because it trains reviewers to ignore the instrument.
The deterministic design from rule one makes flags explainable almost for free, because a deterministic check already knows which criterion it applied. A generative scorer can only offer a number and a vibe. Explainability is not a feature you bolt on at the end. It is what you get when the first two rules are honoured, and what you lose the moment you let a black box make the call.
What the FDA actually said, and what it did not
In January 2025 the FDA published its first draft guidance on the use of AI to support regulatory decision-making for drug and biological products, introducing a risk-based credibility-assessment framework. The higher the consequence of an AI being wrong, the more rigour you owe before you rely on it. That is not a ban on AI. It is a demand that the trust you place in a model be earned in proportion to what is riding on its output.
Read against that framework, an AI approver in MLR is close to the worst case: maximum consequence if it is wrong, minimum human oversight, and a credibility burden almost no model can carry. An AI pre-check is close to the best case: it raises the floor on quality, the consequence of any single flag is bounded because a human adjudicates it, and the human-oversight requirement is satisfied rather than circumvented. The guidance does not forbid AI. It tells you where on the risk curve each use of it sits, and the pre-check sits in the safe zone because a human still decides.
So when a skeptical reviewer asks "are we even allowed to do this," the answer is yes, with a condition that is easy to state and hard to fudge. You are allowed to use AI to help find problems. You are not allowed to use it to declare there are none.
A worked example: Varigel and the two architectures
Take a fictional brand, Varigel, with one approved indication and a known contraindication. A new digital sales aid comes in with an efficacy headline, a chart, and an Important Safety Information block. The same asset moves through two AI architectures, and only one survives contact with a regulatory team.
In the black-box architecture, the asset goes into a model and a verdict comes out: approved, confidence 0.91. Nobody can see why. The model paraphrased the safety language to "check" it and quietly normalised a hedge the label was careful about. There is no record of which label section it compared against, no flag a reviewer can open, and no way to explain the 0.91. Every distrust driver is present at once: a possible hallucination in the rewritten safety text, no trail, no explanation. It also pretends to remove the human, so it violates the oversight the rules require.
In the pre-check architecture, the same asset is dismantled into its claims and each claim is tested deterministically. The efficacy headline triggers a fair-balance flag: the supporting risk statement sits two screens away, past the threshold the rule allows, measured against the approved label section, logged with a timestamp and an asset version. The ISI block passes a completeness check. One sentence is flagged as reading toward a benefit the label does not support, with the exact label language shown beside it. None of these is a decision. Each is a legible, sourced finding handed to a human reviewer, who confirms two, dismisses one with a documented rationale, and signs. The MLR cycle that often stretches 50 to 60 days per content piece gets shorter not because a machine approved anything, but because the reviewer spent their hours on judgment instead of on counting whether the ISI was present.
Same brand, same asset, same AI capability underneath. The only difference is where the line sits between what the machine does and what the human decides. That line is the entire product.
What this means for adoption
You do not have to choose between using AI and keeping control. The choice is between two architectures, and only one of them was ever on the table. Ask three questions of any AI a vendor wants to put near your MLR process, drawn straight from the three distrust drivers:
Does it generate or does it check. If it paraphrases regulated claims to evaluate them, it can hallucinate inside your safety language, and you should treat its output as a draft for a human, never as a verdict. The checks that touch a claim must be deterministic.
Can it show its source for every flag. If a flag does not point back to the specific approved language it was measured against, and the whole interaction is not logged in a Part 11-supporting trail, it is adding governance risk, not removing it.
Can a reviewer act on a flag without trusting a number. If the output is a score rather than an explained, sourced finding a human can confirm, dismiss, or escalate, it is an oracle, and your reviewers will be right to ignore it.
Juncture is built on those three answers. Its Pre-check runs deterministic checks against the approved label, cites the exact source language behind every flag, backs the record with a 21 CFR Part 11-supporting trail and e-signature sign-off, and hands legible findings to a human who stays accountable for every approval. It augments the reviewer, it does not replace them. The same approved label the Pre-check measures an asset against on the inside is the label its Answer Monitor watches the machines against on the outside, so the standard is one standard, governed in one place. See it on Pre-check, see how the two halves join on the platform, or bring one asset to a demo and watch where the line sits.
For the mechanics of what a pre-check inspects rule by rule, read what an AI pre-check actually checks before MLR. For why the bottleneck exists, read the MLR bottleneck and what AI can and cannot do about it.
The 65 percent who distrust AI for compliance are not the obstacle to using it. They already wrote the spec. Build the AI they describe in the negative, an augmenting, deterministic, traceable, explainable pre-check that leaves the approval where the rules put it, and the skeptics become the adopters. The black-box approver was never going to pass MLR. The thing that helps MLR move faster already can.
Sources
- Klick Health / Momentum Events survey on AI for regulatory compliance submissions, 2025 (reported by FiercePharma, see reference 2).
- Fierce Pharma, "Pharma pros skeptical about letting AI loose on regulatory compliance submissions: survey," 2025. fiercepharma.com
- Ciberspring, "Faster Pharma Content Approvals With AI: What Pre-Screening Means for Compliance," 2025. ciberspring.com
- Indegene, "AI for MLR Excellence," 2025. indegene.com
- Goodwin Law, "FDA Publishes Its First Draft Guidance on the Use of AI in Drug Development," January 2025. goodwinlaw.com