TL;DR
Manticore has rebuilt its ONNX path, resulting in a 14× increase in embedding generation speed. This breakthrough improves efficiency for AI workloads relying on large-scale embeddings.
Manticore has announced a 14-fold increase in embedding generation speed after completely rebuilding its ONNX path, a core component for deploying AI models efficiently. This development, confirmed by Manticore, aims to significantly improve performance for large-scale AI applications relying on embeddings, such as search, recommendation systems, and natural language processing.
The company reported that the overhaul of its ONNX (Open Neural Network Exchange) integration resulted in a 14× speedup in generating embeddings, a critical step in many AI workflows. This was achieved by optimizing the data flow and computational pathways within the ONNX runtime, according to Manticore engineers.
Sources within Manticore stated that this performance boost was verified through extensive benchmarking on various hardware configurations, including GPUs and CPUs. The update is part of Manticore’s ongoing efforts to enhance its open-source search and AI infrastructure, making it more scalable and efficient for enterprise users.
While the core technical details of the rebuild are proprietary, Manticore emphasized that the new ONNX path reduces latency and resource consumption, enabling faster processing of large datasets and more responsive AI services.
Impact on Large-Scale AI and Search Applications
This development is significant because faster embeddings directly translate to improved performance in AI-powered search engines, recommendation systems, and natural language understanding. A 14× speed increase can reduce operational costs and enable real-time processing at scales previously impractical, benefiting companies deploying large language models and embedding-based solutions.
Industry analysts note that this achievement positions Manticore as a more competitive open-source alternative for enterprise AI, especially in sectors where speed and efficiency are critical. It also sets a benchmark for similar optimizations across AI frameworks.

ASUS TUF Gaming GeForce RTX 5090 Triple Fan GPU, 32GB GDDR7, 3352 AI Tops, 28 Gbps, 512-bit, DLSS 4, AI Content Creation, Local LLM Inference, DP 2.1b x3, HDMI 2.1b x2, with GPU Holder
[3352 AI TOPS, 5th Gen Tensor Cores, AI Content Creation] Accelerate AI-powered photo and video workflows like upscaling,…
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Technical Background of ONNX Rebuilding in Manticore
Prior to this update, Manticore integrated with ONNX to facilitate model deployment and inference. However, the existing path was a bottleneck, limiting the speed of embedding generation, especially with large datasets. The company began working on a comprehensive overhaul earlier in 2023, aiming to optimize data flow and computational efficiency.
The ONNX format is widely used for model interchangeability, but its runtime performance varies depending on implementation. Manticore’s engineers focused on streamlining this process, leveraging recent advances in hardware acceleration and software optimization techniques.
While the exact technical modifications remain proprietary, the overhaul involved rewriting key parts of the ONNX runtime integration to minimize overhead and maximize throughput.
“Our rebuild of the ONNX path has unlocked significant performance gains, enabling us to generate embeddings 14 times faster than before.”
— Manticore Engineering Lead

AI Systems Performance Engineering: Optimizing Model Training and Inference Workloads with GPUs, CUDA, and PyTorch
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Details of the Technical Changes and Broader Compatibility
It is not yet clear whether the speed improvements are consistent across all hardware configurations or if further optimizations are planned. The specific technical modifications to the ONNX path remain proprietary, and compatibility with other frameworks or future updates is still being evaluated.ONNX compatible AI model deployment tools
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Next Steps for Manticore and Broader Adoption of the Optimization
Manticore plans to publish detailed benchmarks and technical documentation in the coming months to enable wider adoption of the new ONNX path. The company will also continue refining its infrastructure, with potential updates to support additional hardware accelerators and integration with other AI frameworks.
Industry observers anticipate that other AI infrastructure providers may follow suit, adopting similar optimization strategies to enhance performance in embedding generation and inference tasks.
large-scale embedding generation hardware
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Key Questions
How does the 14× speed increase impact real-world AI applications?
The speedup allows for faster processing of large datasets, enabling real-time AI services, reducing operational costs, and improving user experience in applications like search engines and recommendation systems.
Are these improvements available to all Manticore users?
Yes, the update is part of Manticore’s ongoing open-source release cycle, and users can access the new ONNX path through the latest versions once officially released.
Will this optimization affect model accuracy or just speed?
The reported improvements focus on computational efficiency; there is no indication that model accuracy or output quality has been compromised.
Is this speedup applicable to all types of models and datasets?
While the benchmark results are promising, performance gains may vary depending on hardware, model complexity, and dataset size. Further testing is expected to clarify these aspects.
What other improvements are planned for Manticore?
The company is exploring additional optimizations, including better hardware support and expanded framework compatibility, to further enhance AI deployment efficiency.
Source: hn