RAG Meetup at Pinecone HQEvaluating RAG Applications Workshop with Weights and BiasesRegister

Streamlining CI/CD with Pinecone Local

What is Pinecone Local?

Pinecone Local is an in-memory Pinecone Vector Database emulator available as a Docker image. It provides developers with a powerful tool for local development and testing.

It integrates smoothly into CI/CD environments, allowing efficient and cost-effective testing without a live billing account.

In this article, we’ll explore how you can use Pinecone Local in your GitHub Actions workflows to do API contract testing, reduce costs and speed up your CICD testing jobs.

Benefits of using Pinecone Local in your cloud CICD workflow

Pinecone local provides:

  1. Faster test execution
  2. Reduced cloud costs
  3. Improved isolation between test runs
  4. Consistency between development and CI environments

Combining Pinecone Local and GitHub Actions

You can use GitHub Actions and Pinecone Local to build the following workflow, which you can configure to run whenever changes are pushed on a feature branch, or merged to main:

Pinecone Local workflow for CICD platforms
  1. Pull the Pinecone Local Docker image
  2. Start a Pinecone Local instance for each test run
  3. Execute tests against the local instance
  4. Tear down the instance after tests complete

Here's a starter GitHub Action workflow that you can extend for your own needs:

name: Pinecone CI/CD with Local

on:
  push:
    branches: [ main ]
  pull_request:
    branches: [ main ]

jobs:
  test:
    runs-on: ubuntu-latest

    steps:
    - uses: actions/checkout@v2

    - name: Set up Python
      uses: actions/setup-python@v2
      with:
        python-version: '3.x'

    - name: Set up Docker
      uses: docker-practice/actions-setup-docker@master

    - name: Start Pinecone Local
      run: |
        docker pull ghcr.io/pinecone-io/pinecone-index:latest
        docker run -d \
          --name pinecone-local \
          -e PORT=5081 \
          -e INDEX_TYPE=serverless \
          -e DIMENSION=768 \
          -e METRIC=cosine \
          -p 5081:5081 \
          --platform linux/amd64 \
          ghcr.io/pinecone-io/pinecone-index:latest

    - name: Install dependencies
      run: |
        python -m pip install --upgrade pip
        pip install "pinecone[grpc]" pytest

    - name: Run tests
      env:
        PINECONE_API_KEY: dummy-key
        PINECONE_ENVIRONMENT: local
        PINECONE_INDEX: my-index
      run: |
        pytest tests/

    - name: Stop Pinecone Local
      run: docker stop pinecone-local

A practical example: Upsert with metadata and fetch vectors

Let's look at a practical example of writing some Python code to run against our Pinecone Local instance.

To get started, we'll pull the latest Pinecone Local Docker image:

# Pull the latest Pinecone Local image
docker pull ghcr.io/pinecone-io/pinecone-index:latest

Next, we run an instance of Pinecone Local, using environment variables to configure its functionality and the port it will listen on:

# Start Pinecone Local with one index - take note of the port mappings
docker run -d \
--name index1 \
-e PORT=5081 \
-e INDEX_TYPE=serverless \
-e DIMENSION=2 \
-e METRIC=cosine \
-p 5081:5081 \
--platform linux/amd64 \
ghcr.io/pinecone-io/pinecone-index:latest

Next, install the latest pinecone-client:

pip install "pinecone-client[grpc]"

Now, we can write a test.py file with the following contents:

from pinecone.grpc import PineconeGRPC, GRPCClientConfig
import time

# Initialize a client. An API key must be passed, but the 
# value does not matter.
pc = PineconeGRPC(api_key="pclocal")

# Target the indexes. Use the host and port number and disable TLS (SSL) 
# connections since we're going over localhost
index1 = pc.Index(host="localhost:5081", grpc_config=GRPCClientConfig(secure=False))
# Upsert records into index1
index1.upsert(
    vectors=[
        {
            "id": "vec1", 
            "values": [1.0, 1.5],
            "metadata": {"genre": "comedy"}
        },
        {
            "id": "vec2", 
            "values": [2.0, 1.0],
            "metadata": {"genre": "drama"}
        },
        {
            "id": "vec3", 
            "values": [0.1, 3.0],
            "metadata": {"genre": "comedy"}
        }
    ],
    namespace="example-namespace"
)

# Wait for the indexes to be updated
time.sleep(5)

# Check the number of records in each index
print(index1.describe_index_stats())

# Query index2 with a metadata filter
query = index1.query(
    vector=[1.0, 1.5],
    filter={"genre": {"$eq": "comedy"}},
    top_k=1,
    include_values=True,
    include_metadata=True,
    namespace='example-namespace'
)

print(query)

Run the test file with:

python test.py

If all goes well, you should see output similar to the following:

# Output of describe_index_stats call
{'dimension': 2,
 'index_fullness': 0.0,
 'namespaces': {'example-namespace': {'vector_count': 3}},
 'total_vector_count': 3}
 
 # Output of query
{'matches': [{'id': 'vec1',
              'metadata': {'genre': 'comedy'},
              'score': 1.0,
              'sparse_values': {'indices': [], 'values': []},
              'values': [1.0, 1.5]}],
 'namespace': 'example-namespace'}

When to use Pinecone Local in your CICD

Pinecone Local is particularly well-suited for the following CI/CD scenarios:

  1. Rapid Iteration: When developers need quick feedback on changes affecting vector search functionality.
  2. Pull Request Validation: Ensure code changes don't break existing vector search capabilities before merging.
  3. Integration Testing: Testing how your application interacts with Pinecone's API without affecting production data.
  4. Offline Development: Allowing developers to work on vector search features without an internet connection or cloud account.

Best Practices for Using Pinecone Local in CI/CD

  1. Environment Parity: To catch issues early, use the same Pinecone Local setup in both the local development and CI environments.
  2. Resource Management: Ensure proper cleanup of Pinecone Local instances after each CI run to prevent resource conflicts.
  3. Configuration via Environment Variables: Use environment variables to configure your tests, making it easy to switch between local and cloud environments when needed.

Conclusion

Pinecone Local offers a powerful solution for integrating vector database testing into CI/CD pipelines. Providing a containerized, in-memory emulator of Pinecone's vector database enables faster, more reliable, and cost-effective testing processes.

Pinecone Local can streamline your development workflow and make building comprehensive test coverage for your projects easier.

Share: