Launch Week Day 5: Dark mode- now live across site, docs, and console! - Learn More

Pinecone Assistant makes it easy to build knowledgeable chat and agent-based AI applications in minutes. Simply upload documents, ask questions about them, and receive context snippets or AI-generated responses that reference the uploaded documents.

Citations or references in Pinecone Assistant help ensure responses are explainable and grounded in your proprietary knowledge. Each citation links to one or more references, pointing to specific sections of a document. With citation highlights, Pinecone Assistant can now pinpoint the exact section or sentence used to generate a response—providing even greater transparency and trust. In this technical guide, we’ll show you how to get started with Pinecone Assistant and leverage citation highlights.

What are citation highlights

When querying large documents, it can be challenging to verify the accuracy of generated responses. Traditional citations provide references to entire documents or pages, but they often lack precision, making it difficult to validate the specific source of information and provide a user experience that makes it easy to build confidence that the response is grounded.

Citation highlights solve this problem by pinpointing the exact sentence or passage used to generate a response. Instead of merely linking to a document or page, Pinecone Assistant extracts and presents the precise text fragment that supports the answer. This not only enhances transparency but also builds trust, allowing users to quickly verify information without manually searching through extensive files.

Citations are an array of references, highlights, and files, structured as the above hierarchy.

We will run through an example that shows how highlighting makes finding relevant citations easy to find using a basic question answering use case. For a financial report for Netflix, we ask the question “Who is the CFO of Netflix?”. Pinecone Assistant gives the correct answer “The Senior Vice President and Chief Financial Officer of Netflix is Spencer Neumann.” with a reference to page 78 of the report - the most relevant page in the report as it is the signed certification of the CFO:

Example file/page level reference for a citation. View original source file: https://s22.q4cdn.com/959853165/files/doc_financials/2023/ar/Netflix-10-K-01262024.pdf#page=78.

With the introduction of highlights, we see Assistant can now pinpoint an exact phrase to support the answer:

# CERTIFICATION OF CHIEF FINANCIAL OFFICER PURSUANT TO SECTION 302 OF THE SARBANES-OXLEY ACT OF 2002 I, Spencer Neumann, certify that

Rather than providing the entire page, this highlight can be provided to the user to easily demonstrate exactly how the question was answered.

Getting started

To get started, let’s create an assistant and load a document. Citation highlights are available in the Pinecone console or API versions 2025-04 and later, so make sure you have the latest version installed.

!pip install --upgrade pinecone pinecone-plugin-assistant

Now you’re ready to create a new assistant:

import pinecone_plugins.assistant.models
from pinecone import Pinecone
import pinecone_plugins, os


os.environ["PINECONE_API_KEY"] = api_key

# Set Assistant name
assistant_name = "citations-examples"

pc = Pinecone()
assistants_list = pc.assistant.list_assistants()
if assistant_name not in [a.name for a in assistants_list]:
    assistant = pc.assistant.create_assistant(assistant_name)
else:
    assistant = pc.assistant.Assistant(assistant_name=assistant_name)

assistant

Download Netflix’s 2023 10K Fillings and upload them to your assistant. Note: Pinecone Assistant supports the following file types as input: PDF, JSON, Markdown, Text, and Docx.

!wget -O netflix-10k.pdf https://s22.q4cdn.com/959853165/files/doc_financials/2023/ar/Netflix-10-K-01262024.pdf
file_names = [f.name for f in assistant.list_files()]

file_name = "netflix-10k.pdf"

if file_name not in file_names:
  # Upload a file with metadata
  response = assistant.upload_file(
      file_path=file_name,
      timeout=None
  )
  print(response)
else:
  print(f"file {file_name} already uploaded")

assistant.list_files()

# [{'name': 'netflix-10k.pdf', ...}]  

Running queries and analyzing citations

Let’s now run a simple query on our documents:

from pinecone_plugins.assistant.models import Message
messages = [Message(role= "user", content ="Who is ths Senior Vice President and Chief Financial Officer of Netflix?")]
response = assistant.chat(messages=messages, include_highlights = True)

The assistant returns a response message and citations (references) with citation highlights:

Response message

This is a simple string that is the direct answer to the question it can be accessed as follows

response.message.content
# The Senior Vice President and Chief Financial Officer of Netflix is Spencer Neumann.

Citations and citation highlights:

Citations are structured as an array, with references mapping to specific locations in a document. Each reference includes a highlight object containing the precise excerpt used.

response.citations[0].position
# 83

response.citations[0].references[0].file.name
# netflix-10k.pdf

response.citations[0].references[0].pages[0]
# 78

response.citations[0].references[0].highlight.content
# CERTIFICATION OF CHIEF FINANCIAL OFFICER PURSUANT TO SECTION 302 OF THE SARBANES-OXLEY ACT OF 2002 I, Spencer Neumann, certify that

Inline citations

Inline citations embed relevant citations directly within the text, placing them exactly where the referenced information appears.

Since the citation structure is explicit and flexible, we need to write a small helper function that will insert citations into the text with [ ] around them:

def insert_citations(response) -> str:
    """
    Insert citation markers [i] at specified positions in the text.
    Processes positions in order, adjusting for previous insertions.

    Args:
        response: Pinecone Assistant Chat Response

    Returns:
        Modified text with citation markers inserted
    """
    result = response.message.content
    citations = response.citations
    offset = 0  # Keep track of how much we've shifted the text

    for i, cite in enumerate(citations, start=1):
        citation = f"[{i}]"
        position = cite.position

        adjusted_position = position + offset
        result = result[:adjusted_position] + citation + result[adjusted_position:]

        offset += len(citation)

    return result

With inline citation, the example from above would instead be structured as below:

insert_citations(response)

# The Senior Vice President and Chief Financial Officer of Netflix is Spencer Neumann[1].

Accessing a file on a specific page

Some files and file browser viewers (e.g. Chrome on PDFs) allow you to view files on a certain page. In our example, each URL is digitally signed in request, so to query a file on a certain page (as a blue link).

from IPython.display import display, Markdown

display(Markdown(f"Page cited: [link]({response.citations[0].references[0].file.signed_url}#page={response.citations[0].references[0].pages[0]})"))

# url = f"{response.citations[0].references[0].file.signed_url}#page={response.citations[0].references[0].pages[0]}" 

Start building today

With citation highlights, you will benefit from:

  • Greater precision: Direct access to the exact portion of text that supports a response.
  • Improved trustworthiness: Clear visibility into how and where information is derived.
  • Enhanced efficiency: Reduced time spent verifying sources within large documents.

Pinecone Assistant is now generally available for all users in the US and EU regions. For Standard and Enterprise users, usage starts at $0.05/Assistant per hour, and Context Processed Tokens are $5/1M tokens. See our pricing page for more information or check out the below resources to learn more:

Share: