New Wikidata Database Enhances AI Accessibility
Wikidata, the sister project of Wikipedia, has introduced an advanced database designed to facilitate easier ingestion by AI models. This initiative aims to empower smaller development teams to leverage vast amounts of structured data, effectively democratizing access to information, reports 24brussels.
The new system was developed over the past year by a team at Wikimedia Deutschland. It transforms 19 million entries from Wikidata into a vectorized format, allowing for improved contextual analysis. With this update, the data can be represented graphically, connecting entities like Douglas Adams to relevant topics such as his books and the concept of “human.”
While Wikidata’s user interface remains unchanged, the backend enhancements will simplify the process for developers creating chatbots and other AI applications. Lydia Pintscher, Wikidata portfolio lead, stated, “The goal of the project is to level the playing field for AI developers outside the monied core of Big Tech.” This means smaller tech firms can now harness curated data without competing directly with larger companies that traditionally dominate the space.
The vectorized data is expected to better reflect niche topics, which are often overlooked by mainstream AI systems. By making this information more accessible, the initiative seeks to enhance the quality of AI-generated content across various applications. As Pintscher noted, “It’s about giving them that edge up and to at least give them a chance, right?”
Developers have expressed interest in the new database, which was built using a model from Jina AI. The infrastructure to store the vector database has been provided free of charge by IBM’s DataStax. Although the current database focuses on information available until September 2024, Pintscher reassures that minor edits will not significantly affect the vector’s relevance.
Through this transformation, Wikidata strengthens its position as a vital resource for artificial intelligence research and development. As AI continues to permeate various sectors, the ability to access structured, contextually-rich data like that in Wikidata will play a crucial role in the evolution of intelligent systems.
