Truthfulness

status: experimental

A public demo interface for Truthfulness is in the works and will be available soon. The screenshots shown below are from a larger experiment where Truthfulness was used.

Truthfulness fact-checking a market analysis summary

Truthfulness being used to fact-check a market analysis summary generated by a large language model for Intel's 2023 SEC Form 10-K filing, within an investor relations platform.

1Market summary generated by the LLM

2Truthfulness generated fact-check scores and confidence levels

3Truthfulness identified source pages for LLM claims

Why Truthfulness

I'm honestly tired of people making a big deal about large language models producing hallucinations or incorrect answers. This is 2025 (😉 IYKYK), and by now, it should be well-known that AI models can make mistakes. Most LLM providers even include clear disclaimers stating that their models are not always accurate.

As AI engineers and researchers, we certainly have a responsibility to close this gap and reduce hallucinations as much as possible. But while we work toward that goal, we need to acknowledge this limitation and design tools that help users navigate it effectively.

Instead of focusing solely on fixing the models, I have turned my attention to building practical systems that introduce a layer of transparency. My focus is now on enabling measurable trust. Can we give users not just answers, but an understanding of how grounded those answers are? Can we offer evidence, a confidence_score, and even provide a link to the exact source, down to the page and line, where we believe the model's claim is coming from?

That is what led to the creation of Truthfulness, a system designed to bring accountability and clarity to the use of AI-generated answers.

What Truthfulness Does

Truthfulness is a fact-checking engine that enables confidence scoring and source verification for claims made by large language models. It is designed to work alongside LLMs, verifying their outputs against source documents such as PDF filings, research papers, or reports.

In the current demo interface, users can upload any PDF document and ask questions about its content through a conversational chat interface. For example, after uploading a company's annual report, a user might ask, "What was the total net revenue for the year?"

Investor submitting a question to the LLM

Fact check results with confidence scores

Images showing an investor's conversation with an LLM based on Intel's SEC Form 10K filing

4Image 4 shows the investor submitting a question and the model generating a response.

5Image 5 shows the investor clicking the fact check button, which displays the verification result including the confidence score and supporting evidence with page numbers, provided by Truthfulness.

6Image 6 presents the document view, highlighting the specific evidence used by Truthfulness to support the LLM's response.

The large language model responds with an answer as expected. At that point, the Truthfulness engine activates in the background. It analyzes the model's answer, searches the uploaded document for semantically similar sentences, and returns a similarity score along with the page number of the most likely source. The user is then able to review the evidence and judge whether the AI's response is trustworthy.

The purpose is not to eliminate hallucination entirely but to make it visible, traceable, and measurable. Truthfulness empowers users to engage conversationally with AI while maintaining verifiability and factual grounding.

Verification in the demo is also user-controlled. Not every AI response needs to be fact-checked, so users can choose to trigger Truthfulness manually to avoid unnecessary computation and streamline the experience.

How Truthfulness Works

The technical pipeline behind Truthfulness begins with document ingestion. In the demo interface, when a user uploads a PDF, the system extracts all text content using PyMuPDF. This extracted text is then segmented into individual sentences using the Punkt tokenizer from the Natural Language Toolkit, or NLTK. Each sentence is tagged with its corresponding page number so that future verification can point not just to what was matched, but exactly where it appears in the document.

Each sentence is then embedded as a numerical vector using the Nomic_Embed model. This model transforms the sentence into a fixed-length semantic representation that captures meaning in vector space. The vectors are normalized using L2_normalization to ensure fair and consistent similarity comparisons during later stages.

Once normalized, the sentence vectors are indexed using FAISS, which stands for Facebook AI Similarity Search. FAISS is built for large-scale semantic search and allows fast retrieval of similar vectors. The index is optimized for inner product similarity and is stored both in memory for fast access and on disk for persistence. Redis is optionally integrated as an additional caching layer to support efficient access in multi-user environments.

Alongside the FAISS index, Truthfulness stores a structured mapping between each sentence vector and its metadata, including the sentence text and the page number it originated from. This mapping is saved in a JSON format and serves as a critical reference point during fact-checking.

When a user submits a question through the interface, a large language model such as GPT generates a natural language answer. Truthfulness then breaks this response down into individual claim-level statements. Each claim is embedded using the same Nomic model and normalized in the same way as the document sentences.

Fact-checking begins by searching for semantically similar sentences using FAISS. For each embedded claim, the system retrieves the most relevant candidates from the indexed document. It calculates similarity scores between the claim and the document sentences. If a high-confidence match is found, the system considers the claim supported and returns the matched sentence, the similarity score, and the original page number from which the sentence was taken.

If no strong match is found, the claim is labeled as unsupported, and the system explicitly indicates that no sufficient evidence was located in the source document.

Technologies Used in Truthfulness

Truthfulness combines multiple advanced components to build a high-performance and reliable fact-checking system.

NLTK provides sentence segmentation through its Punkt tokenizer.

Nomic_Embed generates high-quality vector embeddings for sentences and claims.

FAISS is used for fast and scalable vector similarity search.

Redis offers optional caching and distributed storage of vector data for multi-user scenarios.

FastAPI powers the backend API layer, managing system interactions and responses.

The demo interface additionally uses PyMuPDF for extracting text from uploaded PDF documents, but this is not part of the core Truthfulness engine.

Truthfulness is designed for efficiency, scalability, and interpretability. It supports parallel processing for indexing and ingestion, in-memory caching for rapid lookup, and persistent storage to maintain reliability across sessions. The architecture includes fallback mechanisms to ensure smooth operation even when individual components face intermittent failures.

Why It Matters

Truthfulness addresses the gap between the expressive capabilities of large language models and the factual rigor of document-based systems. While LLMs are excellent at generating fluid and engaging responses, they are prone to factual errors. Traditional document search, on the other hand, offers reliability but lacks conversational flexibility.

By combining these strengths, Truthfulness enables users to ask natural questions, receive fluent answers, and validate those answers through supporting evidence from source documents. Each claim can be paired with a similarity score, a retrieved sentence, and the page number it came from, giving users confidence and control over the information they receive.

If you are interested in exploring, contributing to, or deploying Truthfulness within your own workflows or platforms, feel free to reach out.