Can Large Language Models Write Effective Hypnosis Scripts? Testing the Capabilities and Limits of AI

Can a language model write a hypnosis script that a trained practitioner would actually use? Early testing reveals surprising strengths, serious gaps, and one crucial non-negotiable.

What Large Language Models Can and Cannot Do

Large language models (LLMs) such as GPT-4 and Claude have demonstrated remarkable abilities in generating structured text, including creative writing, clinical documentation, and educational content ^[1]. When prompted to produce a hypnosis script, these models can reliably generate the structural components: an induction phase, deepening suggestions, therapeutic metaphors, and re-alerting sequences. However, the quality of these outputs varies significantly depending on prompt specificity and the model’s training data coverage of clinical hypnosis literature ^[2].

Methodology: Testing Script Quality

In a structured evaluation framework adapted from script concordance testing methodology used in medical education ^[3], AI-generated hypnosis scripts were rated on four dimensions: (1) clinical safety — absence of contraindicated suggestions, (2) hypnotic language quality — use of permissive vs authoritarian phrasing, (3) therapeutic appropriateness — match to presenting concern, and (4) engagement — pacing and sensory vividness. Results showed that LLMs scored well on safety and structure but poorly on therapeutic nuance and individualisation.

Key Strengths of AI-Generated Scripts

AI models excel at producing grammatically correct, well-structured scripts that follow established hypnotic conventions. They reliably include essential components such as eye fixation inductions, progressive relaxation, staircase deepening, and post-hypnotic suggestions. Models also handle metaphor generation competently, drawing from a wide range of cultural references ^[1]. For practitioners seeking inspiration or a first draft, AI-generated scripts can serve as a time-saving starting point.

Critical Limitations and Risks

Three significant limitations emerged. First, AI-generated scripts lack individualisation — they cannot incorporate client-specific history, language preferences, or subtle cues observed during intake ^[2]. Second, models occasionally generate suggestions that conflict with established hypnotherapy best practices, such as overly directive language that may not suit resistant clients. Third, without clinical oversight, there is a risk that AI-generated scripts could reinforce outdated or disproven therapeutic approaches ^[2]. The clinical literature on agentic AI failures stresses that context-blind content generation presents real risks in therapeutic settings ^[3].

Implications for Practitioners

AI can be a useful assistant — for drafting, idea generation, and educational purposes — but it cannot replace clinical judgment. The most ethical use of LLMs in scriptwriting is as a collaborative tool: the AI generates a draft, and the practitioner adapts it to the specific client’s needs, language, and therapeutic goals. The human element — attunement, intuition, and relational safety — remains irreplaceable.

References

Poibeau, T. (2025). Large Language Models and the Future of Writing. Understanding Conversational AI: Philosophy, Ethics, and Social Impact of Large Language Models. 65-84. DOI: 10.5334/bde.d
Mastrogiacomo, R. (2025). When AI Goes Off Script—Real-World Agentic AI Failures. AI Identities. 233-241. DOI: 10.1007/979-8-8688-2034-2_19
Abouzeid, E., & Sallam, M. (2026). AI-Assisted Script Concordance Tests: Enhancing Feasibility with Customized ChatGPT. Medical Teacher. 48(5), 757-760. DOI: 10.1080/0142159x.2025.2533405