MultiLLM by VerifAI is an open-source Python framework designed to enhance the reliability of AI-generated content.
By concurrently invoking multiple large language models (LLMs) such as GPT-3.5 and Google Bard, MultiLLM assesses their outputs to determine the most accurate result, effectively reducing the risk of AI hallucinations.
Developers and researchers can utilize MultiLLM to compare code snippets, textual responses, or other outputs generated by various LLMs.
The framework’s extensibility allows for the integration of new models and customization of ranking functions, enabling tailored evaluations across diverse AI applications.