This story of David and Goliath is an iconic biblical narrative about the power of faith and courage against overwhelming odds. But the story can also give us a conceptual framework for understanding the relative strengths and weaknesses of small language models (SLMs) and large language models (LLMs).
In the Bible story, David offers agility, precision, and efficiency, while Goliath posseses raw size, strength, and reach. These characteristics compare well with AI models, where SLMs give us an optimized design of SLMs while LLMs give us expansive capabilities of LLMs. Both can succeed in different arenas.
While LLMs demonstrate broad generalization, deep reasoning, and multi-domain integration, SLMs can exceed their performance in specialized, resource-constrained, or privacy-critical contexts. In SLMs special attention is given to data curation—which directly influences accuracy, bias control, and explainability. In addition, in AI, unlike David and Goliath, many SLMs are the “children” of LLMs and are created, in part, through knowledge distillation, a process in which the “wisdom” of LLMs is used to train more efficient SLMs.
In the application of AI in diverse fields such as medical diagnostics, finance, and engineering/manufacturing processes, a similar contest plays out between SLMs and LLMs. The LLM, like Goliath, is vast in scale—trained on immense datasets with hundreds of billions of parameters, capable of handling complex reasoning, and possessing a wide “reach” across multiple domains. The SLM, like David, is compact and highly specialized, having less than 30 billion parameters, and optimized for speed, efficiency, and precision in narrowly defined tasks.
The strength of an SLM lies not in the quantity of data, but in the quality and relevance of its training corpus. For example, medical SLMs benefit from carefully curated datasets that remove irrelevant information, provide control for demographic biases, and ensure alignment with the specific diagnostic scope of the model. An SLM trained for oncology diagnostics might be built exclusively from verified oncology case reports, structured Electronic Health Record data, and relevant medical imaging annotations.
Similarly, a 17 March 2025 article in AIWire, titled “AI Takes the A-Train and Also Goes Under the Sea,” which discussed the potential use of SLMs in transit system maintenance and the detection and identification of undetonated discarded World War II Munitions, highlighted the importance of extensive data curation in the development of the SLMs for those applications.
(Konstantin K4/Shutterstock)
In contrast, LLMs may have hundreds of billions of parameters as they are trained on huge, diverse mixed-quality datasets. This breadth grants them cross-domain reasoning but also increases the risk of introducing noise, factual drift, or bias from unrelated domains. SLMs, which generally have less 30 billion parameters (with mini-SLMs having less than 10 billion parameters), curation acts like David’s sling—precise, targeted, and optimized for impact. By contrast, LLMs’ broad intake is more akin to Goliath’s massive arsenal: powerful but less specialized.
Most SLMs inherit targeted capabilities through a process called “Knowledge Distillation” from LLMs without inheriting the latter’s computational overhead. In this process, the LLM acts as a teacher model, producing high-quality, context-specific outputs or embeddings from complex datasets. The SLM–the student model–is then trained to replicate these outputs with fewer parameters and reduced complexity.
For example, in the domain of medical diagnostics, this can take the form of:
- Synthetic case generation: LLMs create rare-disease scenarios to augment SLM training.
- Guided labeling: LLMs assist in annotating large unlabeled datasets with medically relevant tags.
- Context compression: LLMs summarize complex patient histories into essential structured data for SLM ingestion.
The following posits a framework for creating a specialized medical SLM derived from an LLM:
Step 1 — Teacher Model Selection
Select a high-performance LLM with demonstrated accuracy in medical reasoning tasks, such as clinical note interpretation or diagnostic triage. The chosen LLM should have access to extensive biomedical corpora and, ideally, regulatory compliance for handling medical data.
Step 2 — Domain-Specific Data Curation
Compile a gold-standard dataset tailored to the target diagnostic field (e.g., dermatology, cardiology). This should include:
- Peer-reviewed case reports
- De-identified patient records
- Structured lab and imaging data
(Michael Traitov/Shutterstock)
- Balanced demographic representation
Step 3 — Teacher Inference Pass
Run the curated dataset through the LLM to produce high-quality outputs, which can include diagnostic predictions, reasoning chains, and structured feature vectors. These outputs will serve as soft labels for the student model.
Step 4 — Student Model Training
Train the SLM using the LLM’s outputs as the supervisory signal. Techniques such as temperature scaling and loss weighting ensure that the SLM captures the LLM’s nuanced reasoning while avoiding overfitting.
Step 5 — Evaluation and Bias Auditing
Evaluate the SLM against a held-out test set, focusing on:
- Diagnostic accuracy
- Inference speed
- Robustness to noisy inputs
- Bias detection across patient subgroups
Step 6 — Deployment in Clinical Context
Implement the SLM in settings where real-time, resource-efficient diagnostics are essential—such as point-of-care devices, offline telemedicine kits, or low-resource hospitals.
Through knowledge distillation, SLMs gain sharper accuracy in their niche, much as David might learn Goliath’s battle techniques but adapt them for his own faster, more agile style.
Some SLMs are trained independently from raw data without any LLM teacher — these may have smaller datasets, more curated data, or domain-specific corpora. In practice, however, a lot of modern high-performance SLMs are distilled from LLMs, because it’s faster, cheaper, and usually yields better results than training a small model entirely from scratch.
The following table describes the relative advantages of SLMs and LLMs.
| Feature |
David / SLM |
Goliath/LLM |
| Scope | Narrow, task-specific (e.g., skin lesion classification, track defect, munitions type) | Broad, multi-specialty diagnostic support |
| Data Dependence | Highly curated, domain-specific datasets | Large, heterogeneous datasets |
| Inference Speed | Milliseconds, suitable for real-time analysis | Seconds to minutes, cloud-dependent |
| Resource Use | Runs on local devices or small servers | Requires high-performance cloud infrastructure |
| Bias Control | Easier to audit/curate due to smaller scope | Harder to detect domain-specific bias |
| Knowledge Distillation Potential | Benefits greatly from targeted distillation | Acts as the teacher, rarely distilled itself |
In medical diagnostics and in other applications, the choice between deploying a David or a Goliath is rarely about which is “better” in the abstract; it is about which is fit for the application at hand. LLMs remain indispensable for complex, cross-disciplinary reasoning and exploratory medical analysis. SLMs, empowered by rigorous data curation and enhanced through knowledge distillation from LLMs, can offer unmatched precision, privacy, and efficiency in narrowly defined domains. An example of this is MedMobile, developed at NYU-Langone Medical Center et al. MedMobile can be run on a mobile device and can be used to diagnosis common medical conditions. It uses Phi-3-mini, an open-sourced 3.8B parameter language model, fine-tuned with manual data (curated by human experts) and synthetic data (artificially generated from GPT-4 and textbooks). MedMobile was able to obtain a passing score on MedQA, a benchmark test result similar to the US Medical Licensing Examination.
As in the biblical story, victory in AI-enabled applications do not always belong to the biggest contender—it belongs to the one whose tools, training, and tactics are most precisely matched to the challenge.
About the author: Paul Muzio is currently an advisor to Intersect360’s HPC AI Leadership Organization (HALO). Previously, he held positions in HPC at the City University of New York, Network
Computing Systems, Inc., and Grumman Aerospace Corp.
The post The David and Goliath Paradigm: Comparing Small and Large Language Models appeared first on AIwire.

