Evidence for AI tutors and chatbots: use them without outsourcing your thinking #

Every substantive claim on the AI tutors and chatbots: use them without outsourcing your thinking page is checked against current research. Here is each claim, how well today’s evidence supports it, and the sources. The full, de-duplicated source list lives on the references page.

Supported · strong evidence — Cognitive offloading—using an external aid to perform mental work—reduces the degree to which a person encodes and later remembers that information themselves, so it carries a memory cost when the aid is unavailable.

Risko & Gilbert’s review establishes cognitive offloading as a real and pervasive trade-off, and a substantial experimental literature confirms that offloading information to external stores reliably lowers internal memory for that information.

Sources: Risko & Gilbert (2016), Cognitive Offloading, Trends in Cognitive Sciences — https://doi.org/10.1016/j.tics.2016.07.002 · Sparrow, Liu & Wegner (2011), Google Effects on Memory: Cognitive Consequences of Having Information at Our Fingertips, Science — https://doi.org/10.1126/science.1207745 · full reference ›

Supported · moderate evidence — Expecting that information will remain accessible in an external source (such as saved files or a search tool) leads people to remember the information itself less well, while remembering where to find it—the ‘Google effect’.

Sparrow et al.’s demonstration that anticipated external access shifts memory from content to location is widely cited and conceptually robust, though some specific transactive-memory effects have shown replication variability, so the broad pattern is well supported while individual effect sizes are debated.

Sources: Sparrow, Liu & Wegner (2011), Google Effects on Memory: Cognitive Consequences of Having Information at Our Fingertips, Science — https://doi.org/10.1126/science.1207745 · Risko & Gilbert (2016), Cognitive Offloading, Trends in Cognitive Sciences — https://doi.org/10.1016/j.tics.2016.07.002 · full reference ›

Supported · strong evidence — Durable, flexible learning is produced by effortful conditions (desirable difficulties) rather than by smooth, easy intake, so productive struggle remains necessary even when an AI could remove it.

The desirable-difficulties framework is a well-established and repeatedly replicated principle: effortful retrieval, spacing, and generation reliably improve long-term retention relative to passive, fluent study.

Sources: Bjork & Bjork (2011), Making Things Hard on Yourself, but in a Good Way: Creating Desirable Difficulties to Enhance Learning — https://bjorklab.psych.ucla.edu/wp-content/uploads/sites/13/2016/04/EBjork_RBjork_2011.pdf · Soderstrom & Bjork (2015), Learning Versus Performance: An Integrative Review, Perspectives on Psychological Science — https://journals.sagepub.com/doi/10.1177/1745691615569000 · full reference ›

Supported · strong evidence — Having an AI generate questions and quiz the learner harnesses retrieval practice, which yields stronger long-term retention than passively reading explanations or answers the AI supplies.

The testing effect—that retrieving information outperforms restudying it on delayed tests—is a cornerstone finding replicated across materials and populations; using a tool to prompt retrieval rather than to present answers aligns the activity with this principle.

Sources: Roediger & Karpicke (2006), Test-Enhanced Learning: Taking Memory Tests Improves Long-Term Retention, Psychological Science — https://journals.sagepub.com/doi/10.1111/j.1467-9280.2006.01693.x · Adesope, Trevisan & Sundararajan (2017), Rethinking the Use of Tests: A Meta-Analysis of Practice Testing, Review of Educational Research — https://journals.sagepub.com/doi/10.3102/0034654316689306 · full reference ›

Supported · moderate evidence — Generating an explanation in one’s own words and explaining material back (self-explanation / teaching-back) improves understanding and retention more than re-reading a supplied explanation.

Meta-analytic evidence shows prompting self-explanation produces a moderate, reliable benefit to learning across domains; the related generation and protege/teaching-expectancy effects converge on the value of producing rather than just receiving an explanation.

Sources: Bisra, Liu, Nesbit, Salimi & Winne (2018), Inducing Self-Explanation: A Meta-Analysis, Educational Psychology Review — https://doi.org/10.1007/s10648-018-9434-x · Chi, De Leeuw, Chiu & LaVancher (1994), Eliciting Self-Explanations Improves Understanding, Cognitive Science — https://doi.org/10.1207/s15516709cog1803_3 · full reference ›

Supported · moderate evidence — Attempting to retrieve or solve a problem before being shown the answer produces better learning than studying the answer first, so worked AI solutions should follow an attempt rather than precede it.

Research on the pretesting and generation effects shows that an effortful (even unsuccessful) retrieval attempt before feedback enhances later retention compared with simply studying the material, supporting attempt-then-check sequencing.

Sources: Kornell, Hays & Bjork (2009), Unsuccessful Retrieval Attempts Enhance Subsequent Learning, Journal of Experimental Psychology: LMC — https://doi.org/10.1037/a0015729 · Richland, Kornell & Kao (2009), The Pretesting Effect: Do Unsuccessful Retrieval Attempts Enhance Learning?, Journal of Experimental Psychology: Applied — https://doi.org/10.1037/a0016496 · full reference ›

Supported · moderate evidence — Pairing practice attempts with corrective feedback improves learning, particularly by catching and correcting errors so wrong answers are not rehearsed—a benefit an AI tutor can provide if used to give feedback after attempts.

Meta-analytic and review evidence shows tests with feedback generally outperform tests without it, with feedback especially valuable for correcting errors and protecting against retention of wrong responses.

Sources: Rowland (2014), The Effect of Testing Versus Restudy on Retention: A Meta-Analytic Review, Psychological Bulletin — https://doi.org/10.1037/a0037559 · Adesope, Trevisan & Sundararajan (2017), Rethinking the Use of Tests: A Meta-Analysis of Practice Testing, Review of Educational Research — https://journals.sagepub.com/doi/10.3102/0034654316689306 · full reference ›

Supported · moderate evidence — Fluent, easy-to-process material (such as a smooth AI explanation) inflates the subjective feeling of understanding out of step with actual later memory, producing an illusion of competence.

It is well documented that processing fluency and familiarity inflate judgments of learning relative to real retention; learners systematically overrate easy, fluent study experiences, a hazard that applies directly to passively reading AI-generated explanations.

Sources: Koriat & Bjork (2005), Illusions of Competence in Monitoring One’s Knowledge During Study, Journal of Experimental Psychology: LMC — https://bjorklab.psych.ucla.edu/wp-content/uploads/sites/13/2016/07/Koriat_Bjork_2005.pdf · Dunlosky, Rawson, Marsh, Nathan & Willingham (2013), Improving Students’ Learning With Effective Learning Techniques, Psychological Science in the Public Interest — https://journals.sagepub.com/doi/10.1177/1529100612453266 · full reference ›

Supported · strong evidence — Large language models can produce fluent, confident output that is factually incorrect or fabricated (including invented citations), so their output must be verified against reliable sources before being trusted or cited.

Hallucination—the generation of fluent but ungrounded or false content—is a well-documented and intrinsic limitation of current large language models, including the fabrication of plausible-looking references, making independent verification necessary for any factual or citational use.

Sources: Ji, Lee, Frieske, Yu, Su, Xu, Ishii, Bang, Madotto & Fung (2023), Survey of Hallucination in Natural Language Generation, ACM Computing Surveys — https://doi.org/10.1145/3571730 · Walters & Wilder (2023), Fabrication and Errors in the Bibliographic Citations Generated by ChatGPT, Scientific Reports — https://doi.org/10.1038/s41598-023-41032-5 · full reference ›