QVAC Genesis is a family of open datasets focused on real understanding - not imitation. Our aim is to provide the global AI community with the high-quality data needed to accelerate the development of open-source models.
The second release of QVAC Genesis expands coverage to 10 new domains, for example chemistry, computer science, statistics, machine learning, astronomy and econometrics, while also introducing an improved methodology that produces higher-quality synthetic datasets.
More than a scale increase, our research aims to empower the community to develop models that reason, and explain, grounding intelligence in understanding not imitation - a deliberate shift in how educational AI data should be built.
The QVAC Genesis family is made available under a Creative Commons license reinforcing our commitment to open, community-driven AI research.
We start with QVAC Genesis I, a synthetic dataset purpose-built for education-specific content, offering deep and comprehensive coverage across key STEM domains.
The high-quality dataset has been rigorously validated across multiple educational benchmarks, demonstrating superior performance across school and college-level subjects like Logical deduction, Mathematics, Biology, and Medicine.
Test Genesis I yourself using our open-source pre-trained base model.
Perform continual pre-training, test, and compare on a proven baseline instantly and discover how Genesis I provides a practical foundation for developing next-generation STEM learning assistants that genuinely understand complex STEM concepts.