As memory constraints and energy costs are currently testing the limits of AI scaling, compression is becoming one of the industry’s most active areas of research. As we recently reported, Google’s recent TurboQuant release targets the key-value cache, one of the most memory-intensive components of inference. Now, a new startup is aiming to compress the model itself.
PrismML, founded by Caltech researchers, has emerged from stealth with a $16.25 million seed round and an open source release of what it describes as a “1-bit” large language model family. The company says its approach can dramatically reduce model size and energy consumption while maintaining performance comparable to standard 16-bit models.
The benchmark scores of 1-bit Bonsai 8B compared to other models in the same parameter class (Credit: PrismML)
The Bonsai model family’s flagship model is Bonsai 8B, an 8-billion-parameter model trained on Google v4 TPUs. According to PrismML, the model achieves competitive performance on benchmark suites including MMLU Redux, MuSR, GSM8K, HumanEval+, IFEval, and BFClv3, but with a memory footprint of roughly 1GB, compared to about 16GB for a typical 16-bit equivalent. PrismML is also releasing 1-bit Bonsai 4B and 1.7B models, with 0.5GB and 0.24GB memory footprint, respectively.
PrismML says its models are fully binarized end to end, with all weights constrained to a single bit across embeddings, attention layers, and MLP blocks, with “no higher-precision escape hatches.” While quantization is widely used, pushing it to 1-bit across the entire network has historically degraded model quality, particularly for reasoning tasks. The company attributes its results to a new mathematical framework developed at Caltech, but has not yet detailed the training methods or stabilization techniques that would be required to make such extreme compression viable.
PrismML CEO Babak Hassibi, a computer scientist and mathematician at Caltech, described the approach as a new paradigm for AI that will adapt to diverse hardware environments. “We spent years developing the mathematical theory required to compress a neural network without losing its reasoning capabilities,” Hassibi said in a release. “We see 1-bit not as an endpoint, but as a starting point.”
PrismML founders from left: Sahin Lale, Babak Hassibi, Omead Pooladzandi, and Reza Sadri (Credit: PrismML)
The company claims its 1-bit models can deliver up to eight times faster processing and reduce energy consumption by as much as 75 to 80% on existing hardware. PrismML also predicts that future hardware optimized for 1-bit operations could further improve efficiency by replacing complex multiplications with simpler arithmetic.
Vinod Khosla of Khosla Ventures, which participated in PrismML’s seed round, described the work as a “mathematical breakthrough” with the potential to reshape how AI systems are deployed.
“AI’s future will not be defined by who can build the largest datacenters. It will be defined by who can deliver the most intelligence per unit of energy and cost. PrismML represents that kind of breakthrough,” he said in a statement.
That perspective reflects the idea that AI will not remain confined to data centers but will instead be deployed across edge devices and local environments. PrismML says its models are designed to run on consumer and edge devices, potentially enabling more capable AI applications in smartphones, wearables, and robotics without relying on cloud infrastructure.
(Aila Images/Shutterstock)
PrismML’s claim that a fully 1-bit model can match the capabilities of higher-precision systems remains unproven outside the company’s own benchmark results. Extreme quantization techniques have historically struggled to preserve accuracy in complex reasoning tasks. Independent third-party benchmarks and real-world deployments will be critical in determining whether PrismML’s approach represents a true breakthrough or a more limited optimization.
In a blog post, PrismML describes what it calls “intelligence density,” a metric that attempts to capture how much capability a model delivers per unit of size. By that measure, the company says its 1-bit models redefine the tradeoff between model size and performance, maintaining competitive results at a fraction of the footprint. However, the metric depends on the company’s benchmark choices and definition of the metric itself, and has not yet been independently validated. Whether it proves to be a meaningful way to compare models or remains a company-specific metric will depend on how it holds up under further scrutiny.
For now, the release is another example of efficiency-driven AI design as the industry looks for alternatives to the escalating costs of scaling model size and infrastructure. While recent research like Google’s TurboQuant focuses on compressing specific components of inference, PrismML’s ambitious model compression could greatly expand where AI models can realistically run and how they are deployed.
This article first appeared on HPCwire.
The post PrismML Emerges From Stealth With 1-Bit LLM Family appeared first on AIwire.

