FAQs

How does Kompact AI work, and what are its key features?

Kompact AI provides a runtime that enables AI models to run efficiently on CPU architectures without any performance degradation. It allows enterprises to achieve high-performance inference without relying on GPUs.Key features include:

  • Cost-effective AI – Drastically lower inference costs compared to GPU-based deployments.
  • High-performance AI – Optimized models deliver fast, reliable results even at scale.
  • Accessible AI – Easy to deploy across a range of cloud and on-prem environments.
  • Environmentally sustainable AI – Reduced energy consumption through efficient CPU usage.

Kompact AI does not train models or alter their weights. Instead, we focus on optimising token generation throughput and reducing system latency on standard CPU infrastructure. Once a model is optimised, we evaluate its performance using the same benchmark tests as those published by the original model developers. Optimisation continues iteratively until the model matches or preserves the original accuracy scores—ensuring no compromise in output quality.

What are applications or use cases for Kompact AI?

Kompact AI enables developers to build and deploy fast, cost-efficient AI applications using models with fewer than 50 million parameters. These models can run seamlessly on standard CPUs, both in cloud and on-prem environments.
Developers can use Kompact AI to:

Implement RAG systems for context-rich question answering.

Create agentic AI workflows that automate multi-step tasks and decision-making processes

Build lightweight LLM applications such as:

  • A Knowledge Q&A App with enterprise-grade performance.
  • A Big Data Querying Agent that interprets natural language prompts.
  • A Productivity Assistant tailored for internal workflows.
Does Kompact AI integrate with other commonly used tools such as LangChain?

Yes, Kompact AI integrates seamlessly with tools like LangChain, LlamaIndex, Super AGI, etc. Since KAI provides a runtime to execute optimised models, the way these models are used within different applications and workflows — including those built with LangChain — remains unchanged.

What kind of support and training resources do you offer to new users?

New users are walked through a guided Experience Centre session to understand deployments and use cases. We also provide quick-start guides and technical documentation to support setup and integration.

I have developed or fine-tuned a model. Can I port it to Kompact AI?

Yes — as long as the model is open source and you can share the architecture details, we can work with it. We don’t support optimization for proprietary or closed-source models. Once we have the required inputs, we can optimize the model for specified CPU hardware.

Can I train or fine-tune models using Kompact AI?

Kompact AI currently supports inference only. Fine-tuning capabilities will be available soon, but model training is not supported at this time.

What does it take to deploy Kompact AI?

Deployment is simple and takes a few minutes. Customers share their target environment details, and we provide a ready-to-run package — cloud machine images (e.g., AMIs for AWS) or on-prem virtual images (e.g., OVA).

Is Kompact AI available as an open-source framework?

Kompact AI is not available as an open source framework.

Are the models running on Kompact AI quantised or distilled?

No. We use the original model weights without quantization or distillation. Our focus is on optimising token generation throughput and reducing system latency on standard CPU infrastructure — not modifying the model architecture or accuracy.

Can I run Kompact AI-optimised models on Edge Devices, including laptops and mobile devices?

Support for running Kompact AI-optimised models on edge devices—including laptops and mobile phones—is planned for Q4 2025.

A list of AI models (for coding also) currently optimized for CPU execution within your platform.

Please visit kompact.ai to view the list of optimized models.

I am using RAG. Can I use Kompact AI for inference?

Yes, Kompact AI supports inference for RAG workflows. We've developed our own RAG-based application using a KAI-optimized model.

How does the performance scale as there is an increase in the model scale beyond 50B?

Performance scaling depends heavily on the available CPU architecture, memory bandwidth, and the batching strategy employed. We recommend working with models under 50B for cost-effective CPU-based inference.

What benchmarks do you use (e.g., LikeBench, Arena-Hard, GPQA Diamond)?

We use the same benchmarks adopted by the original model developers depending on the model type and task. This ensures a like-for-like comparison of accuracy before and after optimisation.

What is the timeframe you take to optimise a model?

The optimization timeline depends on several factors, including the chosen LLM, model architecture, target hardware, and performance expectations. We assess these inputs to provide an estimated timeline on a case-by-case basis.

Are there any benchmark results for Kompact AI?

We plan to release the software for testing and benchmarking in the coming weeks and will notify you once it's available. Alongside the release, we also intend to publish white papers and technical reports to provide deeper insights into the platform.

Will Kompact AI scale in and scale out based on the number of processors?

Yes. We will be introducing Autoscaling capabilities in Kompact AI.