Together AI logo

Together AI Review

Visit

Open-source AI platform for building and deploying machine learning models

Together AI is a cloud platform for training, fine-tuning, and deploying open-source AI models.

Together AI·Founded 2022·Usage-based from 0.03Free PlanFree TrialLLM PlatformsAI APIsAI CloudAI DevOps

AI Panel Score

0 AI reviews

AI Editor Approved

About Together AI

Together AI provides infrastructure and tools for developers to work with open-source AI models. The platform offers model training, fine-tuning capabilities, and API access for deployment.

Together AI is a cloud-based platform that specializes in open-source artificial intelligence models and infrastructure. The platform provides developers and organizations with tools to train, fine-tune, and deploy various AI models without requiring extensive machine learning infrastructure expertise. The platform offers access to a wide range of open-source models including large language models, image generation models, and other AI capabilities. Users can fine-tune these models on their own data or use pre-trained models through API endpoints. Together AI handles the underlying infrastructure, including GPU clusters and scaling requirements. The service targets developers, AI researchers, and companies looking to integrate AI capabilities into their applications without building infrastructure from scratch. It competes with other AI platform providers by focusing specifically on open-source models rather than proprietary solutions. Together AI offers both API access for inference and training capabilities for custom model development. The platform aims to make open-source AI models more accessible by providing managed infrastructure and simplified deployment options.

Features

AI

  • Fine-Tuning

    Fine-tunes open-source models for production workloads using the latest research techniques to improve accuracy, reduce hallucinations, and control behavior without managing training infrastructure.

  • Together Kernel Collection

    A collection of GPU kernels that enables up to 90% faster pre-training and optimized performance across compute workloads.

Core

  • Accelerated Compute

    Scales from self-serve instant clusters to thousands of GPUs, optimized for better performance using the Together Kernel Collection.

  • Batch Inference

    Processes massive workloads asynchronously at scale up to 30 billion tokens per model with any serverless model or private deployment.

  • Dedicated Container Inference

    Provides GPU infrastructure purpose-built for generative media workloads, supporting video, audio, and image model deployment with performance acceleration.

  • Dedicated Model Inference

    Deploys models on dedicated infrastructure purpose-built for teams who need speed, control, and optimized economics.

  • Managed Storage

    Offers high-performance managed object storage and parallel filesystems optimized for AI-native workloads with zero egress fees.

  • Sandbox

    Provides fast, secure code sandboxes at scale for setting up full-scale development environments for AI apps and agents.

  • Serverless Inference

    Runs open-source models on demand with no infrastructure to manage and no long-term commitments, powered by cutting-edge inference research.

Customization

  • Workload-Specific Optimization

    Applies workload-specific optimizations to reduce infrastructure costs by up to 60% compared to standard deployments.

Pricing Plans

Popular

Serverless Inference

$0/usage

Pay-per-token API access to hosted models. Most teams start here.

  • Chat, vision, image, audio, video, transcription, embeddings, rerank, moderation
  • Text models from $0.02–$1.25 per 1M input tokens
  • Image generation from $0.0006–$0.134 per image
  • Video generation from $0.14–$3.20 per video
  • Audio TTS from $0.0015–$65.00 per 1M characters
  • Batch API pricing available for select models

Dedicated Inference

$4/hourly

Single-tenant GPU instances for teams needing guaranteed performance and custom models.

  • Guaranteed performance with no resource sharing
  • Support for custom models
  • Autoscaling and traffic spike handling
  • 1x H100 80GB at $3.99/hr, 1x H200 141GB at $5.49/hr, 1x B200 180GB at $9.95/hr

GPU Clusters – On-Demand

$3/hourly

Pay-as-you-go GPU cluster capacity billed hourly.

  • NVIDIA HGX H100 at $3.49/hr
  • NVIDIA HGX H200 at $4.19/hr
  • NVIDIA HGX B200 at $7.49/hr
  • No long-term commitment required

GPU Clusters – Reserved

$3/weekly

Reserved GPU cluster capacity for 6+ days with discounted rates.

  • NVIDIA HGX H100 from $2.99/hr (1 week) to $2.55/hr (4–6 months)
  • NVIDIA HGX H200 from $3.49/hr (1 week) to $2.89/hr (4–6 months)
  • NVIDIA HGX B200 from $7.15/hr (1 week) to $6.39/hr (4–6 months)
  • NVIDIA GB200 NVL72 and GB300 NVL72: contact for pricing
  • Minimum reservation of 6 days

Code Sandbox

$0/per session

VM sandboxes and secure code interpreter for LLM-generated code execution.

  • Code Interpreter: $0.03 per 60-minute session
  • Per vCPU compute: $0.0446/hr
  • Per GiB RAM: $0.0149/hr
  • Shared filesystem storage: $0.16/GiB/month

Fine-Tuning – Standard

$0/per 1M tokens

Train open-source models up to 100B parameters using LoRA or full fine-tuning.

  • Supervised Fine-Tuning LoRA: $0.48–$2.90 per 1M tokens by model size
  • Supervised Fine-Tuning Full: $0.54–$3.20 per 1M tokens by model size
  • Direct Preference Optimization LoRA: $1.20–$7.25 per 1M tokens
  • Direct Preference Optimization Full: $1.35–$8.00 per 1M tokens
  • Supports models up to 100B parameters

Fine-Tuning – Specialized

Free

Fine-tuning for large specialized models like DeepSeek, Llama 4, Qwen3, Kimi K2, and others.

  • DeepSeek-R1/V3 SFT LoRA: $10/1M tokens, min $20 charge
  • Llama 4 Maverick SFT LoRA: $8/1M tokens, min $16 charge
  • Kimi K2 SFT LoRA: $15/1M tokens, min $60 charge
  • GLM-5 SFT LoRA: $40/1M tokens, min $60 charge
  • Minimum charges vary by model

AI Panel Reviews

AI panel reviews are being generated for this product.

Buyer Questions

Common questions answered by our AI research team

Pricing

What is the price difference between LoRA and Full Fine-Tuning for models up to 16B parameters, and is there a minimum charge?

For models up to 16B, Supervised Fine-Tuning costs $0.48/1M tokens for LoRA vs $0.54/1M tokens for Full Fine-Tuning, and Direct Preference Optimization costs $1.20/1M tokens for LoRA vs $1.35/1M tokens for Full Fine-Tuning. The standard pricing table for up to 16B models does not list a minimum charge; minimum charges appear only in the Specialized pricing section for specific models.

Features

Can I deploy video, audio, and image models on Dedicated Container Inference, and what GPU hardware options are available per hour?

Yes, Dedicated Container Inference is described as 'GPU infrastructure purpose-built for generative media workloads' that supports deploying 'video, audio, and image models with performance acceleration powered by Together Research.' However, the pricing page only lists hourly hardware options under Dedicated Inference (not Dedicated Container Inference specifically): 1x H100 80GB at $3.99/hr, 1x H200 141GB at $5.49/hr, and 1x B200 180GB at $9.95/hr.

Security

Does the Code Sandbox use isolated, single-tenant environments, and how is pricing structured for vCPU and RAM usage?

The content describes Code Sandboxes as 'fast, secure code sandboxes' but does not specify single-tenant isolation. Pricing is structured as $0.0446 per vCPU/hour and $0.0149 per GiB RAM/hour for compute costs, plus a Code Interpreter option priced at $0.03 per 60-minute session.

Setup

How do I get started with Serverless Inference — is there any infrastructure to manage or long-term commitment required?

Serverless Inference is described as 'the fastest way to run open-source models on demand' with 'no infrastructure to manage, no long-term commitments.' You can get started immediately through the platform without any setup or commitment requirements.

Integration

Can I use the Batch Inference API with privately deployed models, and what is the token scale limit per model?

Yes, Batch Inference explicitly supports 'any serverless model or private deployment' and can 'scale to 30 billion tokens per model.'

Product Information

  • Company

    Together AI
  • Founded

    2022
  • Pricing

    Usage-based from 0.03
  • Free Trial

    Available
  • Free Plan

    Available

Platforms

web

About Together AI

Build what's next on the AI Native Cloud. Full-stack AI platform for inference, fine-tuning, and GPU clusters — powered by cutting-edge research.

Resources

Documentation
API
Blog

Built With

Webflow

Also in LLM Platforms