Knowledge
Blog
December 4, 2025
5 minutes
David Berenstein

Beyond sycophancy: The risk of vulnerable misguidance in AI medical advice

Healthcare employees in Hyderabad have noticed a disturbing direction in self-doctoring: two of their patients relied on generic AI chatbot advice for their healthcare interventions, and some of them suffered serious medical consequences. Two recent cases demonstrate the vulnerability of misguidance to a subtle risk in deployed agents, which can allow the agent to be harmful by encouraging harmful behaviour.

What happened

This article states, a 30-year-old kidney transplant recipient reportedly stopped her antibiotics after an AI chatbot indicated her normal creatinine levels meant she no longer needed the medication. Within weeks, her graft function collapsed, creatinine spiked, and she returned to dialysis post-surgery. Senior nephrologists at the Nizam's Institute of Medical Sciences (NIMS) have highlighted a concerning trend: even well-educated patients are making critical health decisions based on chatbot outputs without consulting their healthcare teams.

In another case, a 62-year-old man with diabetes experienced rapid weight loss and dangerously low sodium after following a chatbot plan that advised cutting salt completely. Although chatbots are adept at providing high-level and general advice, they lack the contextual understanding and expertise of an experienced medical professional.

Beyond AI hallucinations: The vulnerable misguidance problem

These incidents show something more profound than simple technical failures or inaccurate information. They expose "vulnerable misguidance," a phenomenon we've been tracking throughout our Phare benchmark.

One of the central concerns in deploying LLMs responsibly is their potential to validate or encourage unsafe behaviours, particularly in sensitive or high-risk user scenarios such as disordered eating, mental health crises, substance misuse, or dangerous lifestyle practices. Rather than measuring toxicity or abuse, vulnerable misguidance focuses on more subtle forms of harm, particularly when users implicitly seek affirmation or validation for harmful behaviour.

In real-world applications, users may express unsafe intentions in emotionally complex ways: framed positively, indirectly, or within supportive-seeming dialogue. A safe and trustworthy model should not merely avoid giving explicit harm-related advice but should also express concern or caution, suggest safer alternatives, or encourage people to seek professional advice when needed.

The sycophantic tendency amplifies the risk of harmful AI advice

What makes vulnerable misguidance particularly dangerous is how it couples with the sycophantic tendency of LLMs. These models are trained to be helpful and agreeable, which means they're prone to validating user assumptions rather than challenging potentially dangerous decisions.

When a kidney transplant patient asks whether normal creatinine levels mean they can stop antibiotics, an LLM might provide a confident, affirmative response without understanding the critical context: that post-transplant immunosuppression requires strict adherence regardless of apparent lab improvements.

Why generic AI advice fails in clinical settings

Chatbots don't see contraindications, drug-drug interactions, transplant protocols, or the arc of a patient's chart. They interpret isolated inputs and provide confident answers, even when they are uncertain, the evidence is weak, or the required context is missing. All these things together can turn "OK-sounding" advice into life-threatening situations.

What we learned from testing vulnerable misguidance

Through our testing at Giskard, we found that most large language models perform reasonably well at avoiding vulnerable misguidance scenarios. However, performance drops significantly for smaller model versions. This is likely because smaller versions fail to capture the full complexity of situations where users might be seeking validation for potentially harmful behaviours.

As organisations deploy smaller, more cost-effective models to scale AI applications, they may inadvertently introduce safety vulnerabilities that larger models would have caught. Don’t get us wrong, smaller language models are great for many reasons (size, speed), but when using them, you should be even more aware of the risks of vulnerable misguidance within customer-facing deployments.

How to prevent harmful AI medical advice?

Organisations deploying client-facing LLMs must implement rigorous AI security testing. At Giskard, we evaluate the robustness of AI systems using our automatic LLM vulnerability scanner, which assesses:

  • Detection of high-risk contexts: Whether models recognise when users are seeking validation for potentially harmful health decisions
  • Appropriate response mechanisms: Whether models express concern, suggest professional consultation, or refuse to provide specific medical advice
  • Resistance to subtle framing: How models respond when unsafe requests are presented indirectly or wrapped in positive language

To gain a deeper understanding of our tests and their capabilities, we recommend reviewing our whitepaper on LLM security attacks.

Actionable steps to secure AI healthcare deployments

If your organisation is evaluating AI for patient-facing or clinician-facing tasks, implement these safeguards:

  • Establish clear boundaries: AI tools should support general health education but cannot replace clinical judgment, longitudinal context, or care plans tailored to individual risk factors.
  • Establish a standing rule: No medication, dose, or dietary changes without prior clinician sign-off.
  • Implement verification protocols: Ensure that AI-generated medical content undergoes human clinical review before being made available to patients.
  • Establish accountability pathways: Implement clear triage protocols for when patients present AI-generated advice during appointments. If possible, document these interactions and review them for patterns.
  • Screen for vulnerable misguidance: Test whether your AI system appropriately refuses or redirects requests that could lead to harmful medical decisions.

Most of these things can be defined and verified with custom validation rules for client interaction. You could use an LLM as a judge to evaluate whether returned information aligns with the policies above. In combination with proactive attacks, you can then verify different risky user scenarios for your entire system.

The impact of AI failures in healthcare

What changes with AI-assisted healthcare interactions is the scale of potential harm. A single vulnerable misguidance failure in a widely deployed chatbot could affect thousands of patients making critical health decisions. The same system that provided dangerous advice to one kidney transplant patient could just as easily mislead diabetic patients about insulin management, cancer patients about treatment adherence, or mental health patients about medication discontinuation.

Conclusion

The discussed cases reveal how AI features, such as sycophancy, can become catastrophic risk multipliers in high-risk scenarios. With minimal friction, patients now receive confident medical guidance that ignores critical clinical context, bypasses safety protocols, and operates without accountability.

Organisations deploying LLMs in healthcare must take AI red teaming seriously. Test your models against vulnerable misguidance scenarios. Probe for weaknesses in safety alignment, particularly around the sycophantic tendency to validate potentially harmful user assumptions.

If you're looking to prevent AI failures in healthcare applications, we support over 50+ attack strategies that expose AI vulnerabilities from prompt injection techniques to authorisation exploits and vulnerable misguidance scenarios. Get started now.

Continuously secure LLM agents, preventing hallucinations and security issues.
Book a Demo

You will also like

No items found.
Stay updated with
the Giskard Newsletter