The paper critiques existing multimodal LLM post‑training for clinical image reasoning, noting its outcome‑centric focus leads to sparse credit assignment. Analysis shows cascading errors from early‑stage reasoning failures dominate incorrect predictions. The