PRICING

Build without limits

Flexible pricing for teams at every scale

Pay-as-you-go

For developers and small teams

Intelligent routing

  • 10K free routing recommendations per month
  • $10 for every 10K additional recommendations
  • Pre-trained router for chat auto mode
  • 3 free custom routers

Prompt optimization

  • 10 free successful optimizations per month
  • $20 for each additional successful optimization
  • 4 target models per run
  • SOTA performance and prompt portability

Custom

For enterprise teams ready to scale

  • Agent optimization
  • Bulk pricing
  • VPC deployments
  • Bring your own models
  • Custom evaluation metrics
  • Priority API job queue
  • More target models per run
  • More custom routers
  • Custom ZDR policies
  • 24/7 support

We offer discounts for startups and researchers

Frequently asked questions

Intelligent routing dynamically predicts which LLM to use for each incoming input, maximizing accuracy while significantly reducing cost. With new models released every month and agent inference costs exploding, intelligently selecting when to use powerful models or cheaper models can make an enormous difference in your AI inference spend without negatively impacting output quality.

Our intelligent model routing offers flexible cost and quality tradeoff controls which you can adjust to fit your personal and business needs. Teams using Not Diamond achieve 30-90% cost savings over their workloads by using Not Diamond.

Our solution is specifically designed to integrate with coding agent harnesses which represent the largest sources of AI spend for individual developers and organizations. If you have custom harness integration needs, please reach out!

Not Diamond is stack-agnostic and is designed to integrate with your existing toolchain. We are not a gateway, and our intelligent router simply determines when to use which model. Requests are then executed client-side through your gateway of choice.

For a given input and set of candidate models, Not Diamond will return a recommendation for which model to use. Each API request returns a single routing recommendation, independent of input size.

The latency of each routing recommendation will range from 10–100ms depending on the amount of data used to train your router. Additional network latency may be incurred depending on your infrastructure setup. Please reach out to us if you have specific latency requirements.

Prompt optimization is a design-time algorithm that takes your static prompt templates and evaluation dataset and uses LLMs in an agentic loop to iterate over many variations of your prompt for each target model you’ve specified. The iterations are guided by reinforcement learning against the evaluation dataset and the algorithm leverages self-reflective improvements on the part of the optimizer agent. At the end of the optimization loop, a unique optimized prompt is returned for each target model together with a report of the accuracy improvements.

Prompt optimization is extremely data efficient and can work with as few as three data samples. However, in order to maximize the accuracy of your optimized prompts, we recommend providing more data if you have access to it.

Not Diamond is SOC-2 and ISO 27001 compliant. We provide custom ZDR policies, VPC deployments, and 24/7 on-call support to the most sophisticated AI teams in the world.

100x your AI dev cycles

Let the machine build the machine