Frequently Asked Questions
Kompact AI is a complete, end-to-end platform for AI inference, enabling LLMs of varying sizes to run on CPUs without any loss in performance.
Build lightweight LLM applications such as:
- The runtime to execute the Models.
- Remote REST‑based server for serving model inferences remotely.
- Observability to track model and system performance.
- Client-Side SDKs in Go, Python, Java, .NET, and JavaScript, which are OpenAI Compatible for writing downstream applications that use the Kompact AI models.
It is a component of Kompact AI for running the model on a CPU.
Yes, you can run any open-source model on the Kompact AI runtime.
Yes, you can download models from Hugging Face and run them on the Kompact AI runtime.
Yes, you can run proprietary or closed-source models on the Kompact AI runtime. You can connect with us, and we can tell you the steps on how to execute the model in Kompact AI.
The Kompact AI runtime supports fine-tuned models as well. Connect with us, and we’ll guide you through the steps to run your model on Kompact AI.
Models are served via a remote REST-based server hosted on NGINX, which supports pluggable modules to implement custom access controls.
Kompact AI’s observability tracks inputs, outputs, SLAs, user requests, and system metrics like CPU, memory, and network usage. With OpenTelemetry support, it integrates seamlessly with tools like Prometheus and Grafana for monitoring.
There are two ways to access a model for inferencing in Kompact AI:
- Using Plain Vanilla HTTP(s)
- Client-Side SDKs in Go, Python, Java, .NET, and JavaScript. No Code Change. They are OpenAI-compatible.
Kompact AI does not alter the weights of the model.
Kompact AI do not do any distillation of a model.
There is no performance degradation in models optimised by Kompact AI.
After optimisation, we benchmark the model against the original developers’ tests and iterate until it matches the original accuracy, ensuring no loss in quality.
We currently focus on models with under 50 billion parameters. Support for larger models will be available from Q1 2026.
Yes, you can build a RAG application using a model optimised by Kompact AI. You can connect with us, and we can tell you the steps on how to execute the model in Kompact AI.
Yes, you can build a Agentic AI application using a model optimised by Kompact AI. Connect with us, and we’ll guide you through the steps to run the model on Kompact AI.
Yes, Kompact AI integrates seamlessly with LangChain.
Yes, Kompact AI integrates seamlessly with LlamaIndex.
Kompact AI currently supports inference only. Fine-tuning capabilities will be available soon, but model training is not supported at this time.
Yes, very much. You can take any quantised model and execute it as long as the CPU supports that. You can connect with us, and we can tell you the steps on how to execute the model in Kompact AI.
Kompact AI model images are available on Google Cloud, Microsoft Azure, and AWS.
Kompact AI is not available as an open-source framework.
System requirements vary based on the model being executed and the desired throughput. They are determined on a case-by-case basis, depending on the specific use case.
Yes, you can deploy Kompact AI optimised models on premise servers.
No. We use original model weights without quantisation or distillation, focusing solely on boosting throughput and reducing latency on standard CPUs without altering architecture or accuracy.
Please have a look at our models page. These are models that we have optimised. If your model is not listed, let us know; we will incorporate it. Alternatively, you can use Kompact AI and build on your own, too.
Please have a look at our models page. These are models that we have optimised. If your model is not listed, let us know; we will incorporate it. Alternatively, you can use Kompact AI and build on your own, too.
Yes, Kompact AI supports inference for RAG workflows. We've developed our own RAG-based application using a KAI-optimized model.
Currently, Kompact AI models are on Intel CPUs. We plan to release models optimised for AMD, Ampere, Qualcomm, and ARM very soon.
Kompact itself manages model memory.
No, we do not use PyTorch.
Yes. Kompact AI autoscales. We can work with your DevOps team and show you how it can be achieved.
We'd be happy to help. Please write to us at contact@ziroh.com with a brief description of what you're building, and our team will get back to you.