TL;DR

Manticore has rebuilt its ONNX path, resulting in a 14× increase in embedding generation speed. This breakthrough improves efficiency for AI workloads relying on large-scale embeddings.

Manticore has announced a 14-fold increase in embedding generation speed after completely rebuilding its ONNX path, a core component for deploying AI models efficiently. This development, confirmed by Manticore, aims to significantly improve performance for large-scale AI applications relying on embeddings, such as search, recommendation systems, and natural language processing.

The company reported that the overhaul of its ONNX (Open Neural Network Exchange) integration resulted in a 14× speedup in generating embeddings, a critical step in many AI workflows. This was achieved by optimizing the data flow and computational pathways within the ONNX runtime, according to Manticore engineers.

Sources within Manticore stated that this performance boost was verified through extensive benchmarking on various hardware configurations, including GPUs and CPUs. The update is part of Manticore’s ongoing efforts to enhance its open-source search and AI infrastructure, making it more scalable and efficient for enterprise users.

While the core technical details of the rebuild are proprietary, Manticore emphasized that the new ONNX path reduces latency and resource consumption, enabling faster processing of large datasets and more responsive AI services.

At a glance
updateWhen: announced March 2024
The developmentManticore has announced a major overhaul of its ONNX integration, achieving a 14-fold speedup in embedding computations, confirmed by the company.

Impact on Large-Scale AI and Search Applications

This development is significant because faster embeddings directly translate to improved performance in AI-powered search engines, recommendation systems, and natural language understanding. A 14× speed increase can reduce operational costs and enable real-time processing at scales previously impractical, benefiting companies deploying large language models and embedding-based solutions.

Industry analysts note that this achievement positions Manticore as a more competitive open-source alternative for enterprise AI, especially in sectors where speed and efficiency are critical. It also sets a benchmark for similar optimizations across AI frameworks.

ASUS TUF Gaming GeForce RTX 5090 Triple Fan GPU, 32GB GDDR7, 3352 AI Tops, 28 Gbps, 512-bit, DLSS 4, AI Content Creation, Local LLM Inference, DP 2.1b x3, HDMI 2.1b x2, with GPU Holder

ASUS TUF Gaming GeForce RTX 5090 Triple Fan GPU, 32GB GDDR7, 3352 AI Tops, 28 Gbps, 512-bit, DLSS 4, AI Content Creation, Local LLM Inference, DP 2.1b x3, HDMI 2.1b x2, with GPU Holder

[3352 AI TOPS, 5th Gen Tensor Cores, AI Content Creation] Accelerate AI-powered photo and video workflows like upscaling,…

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Technical Background of ONNX Rebuilding in Manticore

Prior to this update, Manticore integrated with ONNX to facilitate model deployment and inference. However, the existing path was a bottleneck, limiting the speed of embedding generation, especially with large datasets. The company began working on a comprehensive overhaul earlier in 2023, aiming to optimize data flow and computational efficiency.

The ONNX format is widely used for model interchangeability, but its runtime performance varies depending on implementation. Manticore’s engineers focused on streamlining this process, leveraging recent advances in hardware acceleration and software optimization techniques.

While the exact technical modifications remain proprietary, the overhaul involved rewriting key parts of the ONNX runtime integration to minimize overhead and maximize throughput.

“Our rebuild of the ONNX path has unlocked significant performance gains, enabling us to generate embeddings 14 times faster than before.”

— Manticore Engineering Lead

AI Systems Performance Engineering: Optimizing Model Training and Inference Workloads with GPUs, CUDA, and PyTorch

AI Systems Performance Engineering: Optimizing Model Training and Inference Workloads with GPUs, CUDA, and PyTorch

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Details of the Technical Changes and Broader Compatibility

It is not yet clear whether the speed improvements are consistent across all hardware configurations or if further optimizations are planned. The specific technical modifications to the ONNX path remain proprietary, and compatibility with other frameworks or future updates is still being evaluated.
Amazon

ONNX compatible AI model deployment tools

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Next Steps for Manticore and Broader Adoption of the Optimization

Manticore plans to publish detailed benchmarks and technical documentation in the coming months to enable wider adoption of the new ONNX path. The company will also continue refining its infrastructure, with potential updates to support additional hardware accelerators and integration with other AI frameworks.

Industry observers anticipate that other AI infrastructure providers may follow suit, adopting similar optimization strategies to enhance performance in embedding generation and inference tasks.

Amazon

large-scale embedding generation hardware

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Key Questions

How does the 14× speed increase impact real-world AI applications?

The speedup allows for faster processing of large datasets, enabling real-time AI services, reducing operational costs, and improving user experience in applications like search engines and recommendation systems.

Are these improvements available to all Manticore users?

Yes, the update is part of Manticore’s ongoing open-source release cycle, and users can access the new ONNX path through the latest versions once officially released.

Will this optimization affect model accuracy or just speed?

The reported improvements focus on computational efficiency; there is no indication that model accuracy or output quality has been compromised.

Is this speedup applicable to all types of models and datasets?

While the benchmark results are promising, performance gains may vary depending on hardware, model complexity, and dataset size. Further testing is expected to clarify these aspects.

What other improvements are planned for Manticore?

The company is exploring additional optimizations, including better hardware support and expanded framework compatibility, to further enhance AI deployment efficiency.

Source: hn

You May Also Like

World Model Readiness: Are You Ready for AI That Acts?

Assess your organization’s readiness for AI systems capable of predicting and acting in complex environments with the new diagnostic tool.

US lifts curbs on Anthropic’s Fable, Mythos AI models

The US government has lifted restrictions on Anthropic’s Fable and Mythos AI models, allowing broader deployment and research activities.

Cutrova: Edit the Words, Not the Timeline

Cutrova introduces a local-first, transcript-based video editing tool that simplifies editing, enhances privacy, and lowers the barrier for content creators.

SpaceX Owns Every Layer of AI Now. The Model Is Still the Weak Link.

SpaceX completes $60B purchase of Cursor, owning all AI layers except the model, which remains the weak link. Impact on AI industry and competition.