14× Faster Embeddings: How We Rebuilt The ONNX Path In Manticore

TL;DR

Manticore has rebuilt its ONNX path, resulting in a 14× increase in embedding generation speed. This breakthrough improves efficiency for AI workloads relying on large-scale embeddings.

Manticore has announced a 14-fold increase in embedding generation speed after completely rebuilding its ONNX path, a core component for deploying AI models efficiently. This development, confirmed by Manticore, aims to significantly improve performance for large-scale AI applications relying on embeddings, such as search, recommendation systems, and natural language processing.

The company reported that the overhaul of its ONNX (Open Neural Network Exchange) integration resulted in a 14× speedup in generating embeddings, a critical step in many AI workflows. This was achieved by optimizing the data flow and computational pathways within the ONNX runtime, according to Manticore engineers.

Sources within Manticore stated that this performance boost was verified through extensive benchmarking on various hardware configurations, including GPUs and CPUs. The update is part of Manticore’s ongoing efforts to enhance its open-source search and AI infrastructure, making it more scalable and efficient for enterprise users.

While the core technical details of the rebuild are proprietary, Manticore emphasized that the new ONNX path reduces latency and resource consumption, enabling faster processing of large datasets and more responsive AI services.

At a glance

updateWhen: announced March 2024

The developmentManticore has announced a major overhaul of its ONNX integration, achieving a 14-fold speedup in embedding computations, confirmed by the company.

Impact on Large-Scale AI and Search Applications

This development is significant because faster embeddings directly translate to improved performance in AI-powered search engines, recommendation systems, and natural language understanding. A 14× speed increase can reduce operational costs and enable real-time processing at scales previously impractical, benefiting companies deploying large language models and embedding-based solutions.

Industry analysts note that this achievement positions Manticore as a more competitive open-source alternative for enterprise AI, especially in sectors where speed and efficiency are critical. It also sets a benchmark for similar optimizations across AI frameworks.

ASUS TUF Gaming GeForce RTX 5090 Triple Fan GPU, 32GB GDDR7, 3352 AI Tops, 28 Gbps, 512-bit, DLSS 4, AI Content Creation, Local LLM Inference, DP 2.1b x3, HDMI 2.1b x2, with GPU Holder

[3352 AI TOPS, 5th Gen Tensor Cores, AI Content Creation] Accelerate AI-powered photo and video workflows like upscaling,…

As an affiliate, we earn on qualifying purchases.

Technical Background of ONNX Rebuilding in Manticore

Prior to this update, Manticore integrated with ONNX to facilitate model deployment and inference. However, the existing path was a bottleneck, limiting the speed of embedding generation, especially with large datasets. The company began working on a comprehensive overhaul earlier in 2023, aiming to optimize data flow and computational efficiency.

The ONNX format is widely used for model interchangeability, but its runtime performance varies depending on implementation. Manticore’s engineers focused on streamlining this process, leveraging recent advances in hardware acceleration and software optimization techniques.

While the exact technical modifications remain proprietary, the overhaul involved rewriting key parts of the ONNX runtime integration to minimize overhead and maximize throughput.

“Our rebuild of the ONNX path has unlocked significant performance gains, enabling us to generate embeddings 14 times faster than before.”
— Manticore Engineering Lead

GPU Kernel Engineering for LLM Inference: CUDA, Triton, and Flash Attention Optimization for High-Throughput AI Production Systems (AI Infrastructure, Hardware & Compiler Engineering Series)

As an affiliate, we earn on qualifying purchases.

Details of the Technical Changes and Broader Compatibility

It is not yet clear whether the speed improvements are consistent across all hardware configurations or if further optimizations are planned. The specific technical modifications to the ONNX path remain proprietary, and compatibility with other frameworks or future updates is still being evaluated.

Amazon

ONNX compatible AI model deployment tools

As an affiliate, we earn on qualifying purchases.

Next Steps for Manticore and Broader Adoption of the Optimization

Manticore plans to publish detailed benchmarks and technical documentation in the coming months to enable wider adoption of the new ONNX path. The company will also continue refining its infrastructure, with potential updates to support additional hardware accelerators and integration with other AI frameworks.

Industry observers anticipate that other AI infrastructure providers may follow suit, adopting similar optimization strategies to enhance performance in embedding generation and inference tasks.

Amazon

large-scale embedding generation hardware

As an affiliate, we earn on qualifying purchases.

Key Questions

How does the 14× speed increase impact real-world AI applications?

The speedup allows for faster processing of large datasets, enabling real-time AI services, reducing operational costs, and improving user experience in applications like search engines and recommendation systems.

Are these improvements available to all Manticore users?

Yes, the update is part of Manticore’s ongoing open-source release cycle, and users can access the new ONNX path through the latest versions once officially released.

Will this optimization affect model accuracy or just speed?

The reported improvements focus on computational efficiency; there is no indication that model accuracy or output quality has been compromised.

Is this speedup applicable to all types of models and datasets?

While the benchmark results are promising, performance gains may vary depending on hardware, model complexity, and dataset size. Further testing is expected to clarify these aspects.

What other improvements are planned for Manticore?

The company is exploring additional optimizations, including better hardware support and expanded framework compatibility, to further enhance AI deployment efficiency.

Source: hn

14× Faster Embeddings: How We Rebuilt The ONNX Path In Manticore

Up next

Please Stop The AI Confidence Theater

Author

Good Sidekick Team

Share article

Impact on Large-Scale AI and Search Applications

ASUS TUF Gaming GeForce RTX 5090 Triple Fan GPU, 32GB GDDR7, 3352 AI Tops, 28 Gbps, 512-bit, DLSS 4, AI Content Creation, Local LLM Inference, DP 2.1b x3, HDMI 2.1b x2, with GPU Holder

Technical Background of ONNX Rebuilding in Manticore

GPU Kernel Engineering for LLM Inference: CUDA, Triton, and Flash Attention Optimization for High-Throughput AI Production Systems (AI Infrastructure, Hardware & Compiler Engineering Series)

Details of the Technical Changes and Broader Compatibility

ONNX compatible AI model deployment tools

Next Steps for Manticore and Broader Adoption of the Optimization

large-scale embedding generation hardware

Key Questions

How does the 14× speed increase impact real-world AI applications?

Are these improvements available to all Manticore users?

Will this optimization affect model accuracy or just speed?

Is this speedup applicable to all types of models and datasets?

What other improvements are planned for Manticore?

AI Makes Programming Differently Difficult

$100 AI Music Video: Claude Fable 5 Vs. GPT-5.6 Sol

Vertigo relief app

The Trust Shock: What Suspending Fable 5 Means for US AI, Its Rivals, and the World

What AI Experts Predict For 2026: 6 Major Trends

Upgrade Your Home Theater With These Top AI Projectors In 2026

Why Software Factories Fail (Or: Harness Engineering Is Not Enough)

2026 Soundbar Trends: AI Tech For Clear TV Sound

14× Faster Embeddings: How We Rebuilt The ONNX Path In Manticore

Up next

Author

Good Sidekick Team

Share article

Impact on Large-Scale AI and Search Applications

ASUS TUF Gaming GeForce RTX 5090 Triple Fan GPU, 32GB GDDR7, 3352 AI Tops, 28 Gbps, 512-bit, DLSS 4, AI Content Creation, Local LLM Inference, DP 2.1b x3, HDMI 2.1b x2, with GPU Holder

Technical Background of ONNX Rebuilding in Manticore

GPU Kernel Engineering for LLM Inference: CUDA, Triton, and Flash Attention Optimization for High-Throughput AI Production Systems (AI Infrastructure, Hardware & Compiler Engineering Series)

Details of the Technical Changes and Broader Compatibility

ONNX compatible AI model deployment tools

Next Steps for Manticore and Broader Adoption of the Optimization

large-scale embedding generation hardware

Key Questions

How does the 14× speed increase impact real-world AI applications?

Are these improvements available to all Manticore users?

Will this optimization affect model accuracy or just speed?

Is this speedup applicable to all types of models and datasets?

What other improvements are planned for Manticore?

You May Also Like