npm install @qvac/sdk

import { completion, LLAMA_3_2_1B_INST_Q4_0, loadModel, unloadModel } from "@qvac/sdk";

// Supports any Pear or HTTP URL
const modelId = await loadModel({
  modelSrc: LLAMA_3_2_1B_INST_Q4_0,
  modelType: "llm",
});

const history = [
  {
    role: "user",
    content: "QVAC, how may entropy be reversed?",
  },
];

const result = completion({
  modelId,
  history,
  stream: true,
});

for await (const token of result.tokenStream) {
  console.log(token);
}

await unloadModel({modelId});

The power of decentralized, local AI in a single API

Our robust, cross-platform AI SDK gives your team the ability to unlock the value and utility of local AI across any app or platform.

Serverless peer-to-peer distribution and encryption are optional but natively built-in.

We keep adding more capabilities to help you unlock new privacy-preserving AI possibilities for user experience or business models.

Find it on Github

Cross-platform AI for all your JS environments

Run AI models natively across multiple JS environments, be it Node.js, Expo, and Bun. The SDK abstracts away platform complexity while providing consistent AI capabilities whether you're building desktop apps, mobile apps, or server applications

Decentralization that doesn’t get in the way

We baked-in the entire Pear stack (by Holepunch) to enable decentralized model sharing, delegated inference or decentralized vector databases. P2P is native but optional; you can still run RAG using Chroma, LanceDB or SQLite-vector or fetch models from the most common providers or from your filesystem.

import { loadModel, unloadModel, GTE_LARGE_FP16, ragSaveEmbeddings, ragSearch } from "@qvac/sdk";

const query = "machine learning algorithms";

const samples = ["sample 1", "sample 2"];

const modelId = await loadModel({
  modelSrc: GTE_LARGE_FP16,
  modelType: "llm"

const docs = await ragSaveEmbeddings({
  modelId,
  documents: samples,
  chunk: false,
});

const results = await ragSearch({
  modelId,
  query,
  topK: 3,
});

await unloadModel({modelId});

Local AI that scales

Create distributed AI inference networks where devices can provide or consume AI services. Enable resource sharing across the network, allowing lightweight devices to access powerful AI models running on other peers in the network.

import { startQVACProvider } from "@qvac/sdk";

const topic = "some topics";

const response = await startQVACProvider({
  topic,
  firewall: undefined
});

One SDK: all of AI.

Seamlessly integrate multiple AI capabilities including completion, transcription, tool calling, embeddings and retrieval, translation, vision or text-to-speech using a single entrypoint. It also supports streaming and multimodal inputs.

import { loadModel, unloadModel, textToSpeech, TTS_PIPER_NORMAN_EN_US_ONNX_MEDIUM, TTS_PIPER_NORMAN_EN_US_ONNX_MEDIUM_CONFIG } from "@qvac/sdk";

const eSpeakDataPath ="some path";

const modelId = await loadModel({
  modelSrc: TTS_PIPER_NORMAN_EN_US_ONNX_MEDIUM,
  modelType: "tts",
  configSrc: TTS_PIPER_NORMAN_EN_US_ONNX_MEDIUM_CONFIG,
  eSpeakDataPath,
  modelConfig: {
    language: "en",
  }
});

const result = textToSpeech({
  modelId,
  text: "QVAC SDK is the canonical entry point to QVAC",
  inputType: "text",
  stream: false
});

const audioBuffer = await result.buffer;

await unloadModel ({ modelId });

QVAC SDK

QVAC Fabric

Workbench

Health

QVAC Genesis

The power of decentralized, local AI in a single API

Cross-platform AI for all your JS environments

Decentralization that doesn’t get in the way

Local AI that scales

One SDK: all of AI.

FAQ

QVAC Genesis

The power of decentralized, local AI in a single API

Cross-platform AI for all your JS environments

Decentralization that doesn’t get in the way

Local AI that scales

One SDK: all of AI.

FAQ

1. What can I do with the QVAC SDK?

2. What is the QVAC SDK and how does it relate to the Holepunch stack?

3. Which JavaScript environments does the QVAC SDK support?

4. What is the Bare runtime?

5. How does the SDK implement Retrieval-Augmented Generation (RAG)?

6. Where can I load models to use with the SDK?

7. Can I load multiple models simultaneously?

8. Can I use tool calling within LLM completions?

9. Can I use an MCP server with the SDK?

10. How does delegated inference enable peer-to-peer resource sharing?

11. What is the purpose of Blind Relays in the QVAC ecosystem?

12. How does the SDK handle downloading and loading sharded models?