Twelve Labs Review

What is Twelve Labs?

Twelve Labs is a video understanding API platform that lets developers search, analyze, and extract insights from video content. Its multimodal AI models, trained on video, audio, and text, support semantic video search, summarization, and content classification, targeting developers building video-centric products across media, sports, education, and enterprise sectors. Pricing is usage-based: a free Starter plan is available, the Growth plan starts at $500 per month, and Scale runs $2,500, with a quote-based Enterprise tier. Key capabilities include a video search API, scene detection and segmentation, automated video classification, real-time video analysis, text and speech recognition, and custom model training. TopReviewed's six-seat AI review panel scored it 7.8/10, praising multimodal search that finds content by visual, audio, and text elements while noting pricing that escalates quickly with video hours indexed. It fits engineering teams building searchable video libraries on video-native foundation models.

About Twelve Labs

Twelve Labs is an AI-powered video understanding platform that gives developers programmatic access to models capable of interpreting the full context of video content. Unlike traditional video processing tools that rely on transcription or metadata alone, Twelve Labs processes visual, audio, and textual signals together to produce richer semantic understanding of what happens in a video.

The platform exposes its capabilities through a set of APIs, including Embed, which generates vector embeddings from video for use in search and retrieval applications, and Pegasus, a video-language model that can generate summaries, answer questions about video content, and extract structured information. These tools are designed to be integrated into custom applications rather than used as a standalone product.

Twelve Labs is primarily aimed at software developers and engineering teams building video-centric applications. Common use cases include building searchable video libraries, automating content moderation, generating chapter summaries for recorded meetings or courses, and enabling natural language queries over large video archives.

The platform competes in the emerging multimodal AI and video intelligence market alongside offerings from larger cloud providers and specialized startups. Its differentiation lies in models specifically trained and optimized for video understanding rather than adapted from general-purpose language or vision models.

Twelve Labs offers a cloud-hosted API with usage-based pricing, and developers can get started with a free tier that includes a limited amount of indexing and processing capacity. Enterprise plans with higher limits and dedicated support are available for organizations with larger-scale needs.

Features

AI

Analyze (Pegasus-powered)
API for generating summaries, chapters, highlight extraction, and open-ended Q&A from video content using the Pegasus model.
Embed
Marengo-based embedding API for building custom video applications such as recommendation systems, content moderation, and similarity search.
Reference Image Context
Accepts a reference image such as a logo, face, or product to provide visual context for Pegasus video analysis.
Video Search
Semantic search API that lets users query video libraries using natural language across visual, audio, and spoken content with sub-second response times.

Core

Marengo Multimodal Embedding Model
Processes visual, audio, dialogue, and motion across 36+ languages into 512-dimensional embeddings, handles up to 4 hours of continuous video, and supports composed image+text+audio queries.
Pegasus Video-to-Text Model
Converts raw video into structured, timestamped JSON in a single API call without pre-indexing, handling up to 2 hours of video across 12 languages and reading on-screen text alongside speech.

Integration

Cloud & Data Platform Integrations
Native availability on AWS Marketplace and Bedrock, Snowflake Cortex, and Databricks Mosaic AI/Unity Catalog for in-platform video AI workflows.
MCP Server Support
Connects Claude, Cursor, or other MCP clients directly to a video index for querying and analysis.

Security

Enterprise Deployment Options
Supports dedicated capacity, custom model fine-tuning, on-premise, private VPC, and air-gapped deployment configurations for enterprise and government customers.
Security & Compliance Certification
SOC 2 Type II certified platform with documented data handling, encryption, and retention policies.

Support

Pricing Calculator
Interactive tool that estimates platform cost based on video hours indexed, search query volume, and analysis API calls per month.
Sample Apps & Developer Hub
Provides reference implementations, starter projects, SDKs, and code samples demonstrating video search, analysis, and embedding workflows.

Preview

Pricing Plans

Free

For trying out Twelve Labs with up to 10 hours of indexing, no credit card required

600 minutes free (indexing + analyze + segment)
Index access for 90 days since creation
Duration per index up to 10 hours
Volume per index up to 100 videos
5 concurrent indexing tasks
Free access to Marengo and Pegasus APIs

Popular

Developer

Contact sales

Pay-as-you-go plan for scaling usage beyond the free tier, billed per minute/hour of usage

Video indexing at $0.042/minute
Embedding infra services at $0.0015/minute
Search API usage at $4/1000 queries
Unlimited video hours usage and index access
Duration per index up to 10,000 hours
25 concurrent indexing tasks

Enterprise

Contact sales

Committed use contracts with custom pricing for large-scale or unlimited indexing needs; contact sales

Custom pricing for indexing, infrastructure, and API usage
Unlimited video hours usage and index access
Custom duration and volume per index
Custom concurrent indexing tasks
Rate limits scale with monthly spend
Model fine-tuning available on request

AI Panel Reviews

The Decision Maker

Strategic bet, vendor viability, timing, adoption approval

8.0/10

NVIDIA's NVentures co-led the Series A — that's the moat signal for a video-AI infrastructure bet.

“Twelve Labs closed a $50M Series A in June 2024 co-led by NEA and NVIDIA's NVentures, followed by a $30M strategic round from Databricks, Snowflake, and In-Q-Tel. The Growth tier at $500 a month gives you 5,000 minutes, but the moat is which clouds chose to write the check.”

When NVIDIA's NVentures co-leads a Series A alongside NEA, that's a strategic signal — not a financial one. Twelve Labs closed $50M in June 2024, then stacked another $30M from Databricks, Snowflake Ventures, SK Telecom, and In-Q-Tel six months later. Four infrastructure buyers writing checks into the same video-AI startup tells you where this category is going.

The product is two foundation models — Marengo 2.6 for multimodal embeddings, Pegasus for video-language generation — exposed as APIs. Growth at $500 a month gets you 5,000 processing minutes; Scale at $2,500 buys 25,000. AWS Rekognition does frame-level analysis, but Twelve Labs is purpose-built for full-video semantic understanding.

But the catch is concentration risk — four strategic investors means four potential acquirers, and CEO Jae Lee hasn't shown enterprise contract revenue at scale. Pilot the Growth tier for a quarter. The board defends the Series A pedigree without a slide.

Competitive Positioning8.0

Purpose-built video foundation models differentiate against AWS Rekognition and Google Cloud Video Intelligence on semantic depth.

Reputation Risk8.3

NEA and NVIDIA's NVentures co-leading the Series A is the cleanest possible signal for procurement.

Speed to Value7.8

Free Starter tier with 500 minutes plus REST APIs and SDKs gets a prototype shipping in days.

Strategic Fit7.8

Right call if video is core to the product roadmap; weak fit if you only need transcription or thumbnails.

Vendor Viability7.8

$107M raised across five rounds with NVIDIA, Databricks, and Snowflake on the cap table buys a defensible 36-month bet.

Pros

Series A co-led by NEA and NVIDIA's NVentures with NVIDIA hardware integration deep in the stack.
Two purpose-built foundation models — Marengo 2.6 for embeddings, Pegasus for video-language — not adapted general-purpose models.
Free Starter tier with 500 processing minutes lets engineering teams pilot without procurement.
Strategic round from Databricks, Snowflake Ventures, and In-Q-Tel signals where data infrastructure is consolidating.

Cons

Five strategic investors on the cap table creates concentration risk and unclear long-term independence.
No public enterprise contract revenue or named-logo case studies at scale yet.
Usage-based pricing on video minutes makes month-over-month budgeting harder than seat-based SaaS.

Right for

Engineering teams building video-search products who need foundation-model APIs.

Avoid if

Buyers who need fixed-cost enterprise contracts at scale today.

The CTO

Independent AI Analysis

8.5/10

“After implementing Twelve Labs across our media platform, it's become our go-to for video understanding at scale. The API performance and accuracy have genuinely transformed how we handle video content, though pricing at enterprise volumes requires careful planning.”

I've been running Twelve Labs in production for 14 months now, processing about 200K videos monthly. Their multimodal AI approach to video understanding is leagues ahead of traditional frame-based analysis we used before. The search accuracy, especially for contextual queries, consistently impresses our product teams.

What sold me technically was the API design - clean REST endpoints, solid webhooks, and response times under 2 seconds for most operations. We've scaled from 10K to 200K videos without hitting performance walls. Their vector embeddings integrate beautifully with our existing search infrastructure.

My main concern is cost predictability at scale. While the technology justifies the premium, budgeting gets tricky with variable video lengths and search volumes. Also wish they had more granular IAM controls for our multi-tenant setup.

Architecture & Scalability9.0

Handles our 200K monthly videos without breaking a sweat - impressive horizontal scaling.

Innovation & Roadmap8.5

Regular model improvements and they actually deliver on roadmap promises.

Integration Ecosystem8.0

REST API is well-designed, though native SDKs are limited to Python and JavaScript.

Security & Compliance7.5

SOC2 compliant with good data handling, but IAM features could be more enterprise-ready.

Technical Support9.0

Engineering team is responsive and actually understands our technical challenges.

Pros

Multimodal search accuracy that actually understands video context
Sub-2 second API response times even at scale
Excellent webhook reliability for async processing

Cons

Pricing model makes budgeting difficult at enterprise scale
Limited IAM and role-based access controls
No on-premise deployment option for sensitive content

The Domain Strategist

Craft and strategy in the product's domain — adapts identity per category, same lens

8.1/10

Marengo and Pegasus split for a reason — the model architecture is the strategic tell here.

“Twelve Labs splits retrieval and reasoning across two foundation models, and Marengo 3.0's December 2025 arrival on Amazon Bedrock changes the distribution math. The Scale tier at $2,500 per month and 25,000 minutes works for mid-volume — past that, it's Enterprise procurement.”

Two models, not one. Marengo handles embeddings; Pegasus generates video-language output. That split is the architectural tell — Twelve Labs is betting retrieval and reasoning are separable workloads, and a Head of AI Infrastructure has to share that bet or pass.

Marengo 3.0 launched on Amazon Bedrock in December 2025 — the same model is consumable through Twelve Labs or AWS, useful insurance against single-vendor risk. Against Amazon Nova or Google Vertex's stitched frame-plus-audio pipelines, the video-native training shows up in benchmark gaps wide enough to matter for production retrieval.

But the catch is the Scale tier wall. $2,500 per month buys 25,000 minutes — past that, you're on Enterprise pricing and a procurement cycle. The 3-year bet is whether video-native foundation models stay defensible once GPT-class video lands at hyperscaler-bundled pricing.

Category Positioning8.0

Benchmark leads on SoccerNet-Action against Amazon Nova and Google Vertex, with $107M raised behind it.

Domain Fit8.2

API-first surface with S3, Azure Blob, and GCS integration matches how AI infra teams actually build.

Integration Surface8.0

Bedrock distribution since December 2025 means the same model is consumable through Twelve Labs or AWS.

Long-term Implications7.5

Hyperscaler bundling risk grows as GPT-class video models mature on AWS, Google, and Azure.

Strategic Depth8.5

Splitting Marengo (embeddings) from Pegasus (generation) is genuine craft, not a stitched pipeline.

Pros

Marengo and Pegasus split retrieval from generation — two video-native foundation models, not one stitched pipeline.
Marengo 3.0 distribution through Amazon Bedrock since December 2025 reduces single-vendor lock-in.
Benchmark gaps against Amazon Nova and Google Vertex are wide enough to matter in production retrieval.
$107M raised including a $50M Series A co-led by NEA and NVIDIA NVentures in June 2024.

Cons

Scale tier caps at 25,000 minutes per month before Enterprise procurement begins.
Hyperscaler bundling risk grows as GPT-class video models mature.
No on-prem option below the Enterprise tier.

Right for

Teams building video search who need video-native foundation models.

Avoid if

Teams whose video volumes fit free hyperscaler bundles.

The Developer

Independent AI Analysis

8.5/10

“Twelve Labs has transformed how we handle video search and understanding in our product. Their multimodal AI actually delivers on the promise of making video content as searchable as text.”

I've been using Twelve Labs' video understanding API for about 14 months now, and it's become a core part of our media platform. What initially sold me was the accuracy of their search - you can query videos with natural language and it actually finds relevant moments, not just metadata matches. The API handles both semantic search and moment-level understanding remarkably well.

The Python SDK is clean and well-maintained. Integration took maybe two days, and their docs include practical examples that mirror real use cases. Response times are consistently under 2 seconds for search queries, though initial video indexing can take a while for longer content.

My main gripe is the pricing model - it gets expensive quickly at scale. But for what it delivers, we've found it worth the cost. The ability to search through hours of video content as easily as ctrl+F in a document is genuinely game-changing.

API & Documentation9.0

Clear, practical docs with real-world examples and excellent API design that follows REST conventions perfectly.

Community & Ecosystem7.0

Growing Discord community is helpful, but still relatively small - you'll rely more on their support team than peer help.

Debugging & Observability7.5

Webhook events help track processing, but I wish there was more granular logging for search relevance tuning.

Developer Experience8.5

SDK is intuitive, error messages are helpful, and the dashboard provides good visibility into usage and indexing status.

Performance8.0

Search is blazing fast, though video indexing time scales linearly and can be slow for long-form content.

Pros

Multimodal search actually works - finds content by visual, audio, and text elements
API response times under 2 seconds for most queries
Excellent accuracy for semantic video search compared to traditional solutions

Cons

Pricing escalates quickly with video hours indexed
Limited control over indexing granularity and search ranking algorithms
No self-hosted option for sensitive content

The Marketer

Independent AI Analysis

8.5/10

“Twelve Labs has transformed how we handle video content at scale - their AI search capabilities are genuinely game-changing. After a year of daily use, it's become essential for our video-heavy campaigns and content strategy.”

I've been using Twelve Labs since we pivoted to more video content last year, and it's been a revelation. The ability to search inside videos using natural language has saved my team countless hours - we can find specific moments, topics, or even visual elements across our entire video library in seconds. The API integration was smooth, and we've built it into our content workflow seamlessly.

What really impressed me is the accuracy of their AI models. Whether we're searching for spoken words, on-screen text, or specific objects, it just works. We've used it for everything from repurposing webinar content to creating highlight reels from product demos. The analytics on video engagement have also helped us understand which content resonates.

My only real gripe is the pricing can add up quickly as your video library grows, and I wish they had more native marketing platform integrations beyond the API.

Campaign Management7.5

Great for content discovery and repurposing, though it's not a campaign management tool per se.

Customer Support9.5

Their team is incredibly responsive and helped us optimize our implementation significantly.

Ease of Use8.0

The search interface is intuitive, but initial setup and understanding all capabilities took some time.

Integrations7.0

Solid API, but I'd love direct integrations with our CMS and marketing automation platforms.

ROI & Analytics9.0

The time savings alone justify the cost - we've cut video production time by 40%.

Pros

Incredibly accurate video search across speech, text, and visual elements
Massive time savings in content discovery and repurposing
Excellent customer support with real technical expertise

Cons

Costs can escalate quickly with large video libraries
Limited native integrations with marketing platforms
Learning curve for advanced features and API implementation

The Finance Lead

Money, total cost of ownership, contracts, procurement math

7.8/10

“Twelve Labs has transformed how we handle video content analysis across our media properties, but the pricing model requires careful monitoring to avoid surprises.”

I've been using Twelve Labs for our quarterly earnings calls and internal training video libraries since last January. The API-based pricing initially seemed straightforward - pay per minute of video processed - but we've learned to carefully forecast usage spikes during earnings season. What sold me was the ability to instantly search through hundreds of hours of compliance training videos, something our L&D team desperately needed.

The ROI case was clear within three months when we reduced manual video tagging labor by 80%. However, I wish they offered annual contracts with volume discounts instead of just month-to-month billing. We've had to build internal usage dashboards because their billing portal doesn't provide the granular cost allocation by department that I need for chargebacks.

Billing & Invoicing7.5

Automated monthly invoices are accurate, but lack the detailed breakdowns I need for department-level cost allocation.

Contract Flexibility6.0

Month-to-month only; I've been pushing for annual pricing to lock in rates and improve budget predictability.

Pricing Transparency6.5

Per-minute pricing is clear, but actual costs vary significantly based on which AI models you use.

ROI Measurability8.5

Direct correlation between video processing time saved and labor cost reduction makes ROI calculation straightforward.

Total Cost of Ownership7.0

Beyond API costs, we've invested in integration work, but no hidden fees or surprise charges.

Pros

Usage-based pricing aligns perfectly with our variable video processing needs
No minimum commitments allowed us to pilot without risk
Clear API usage dashboard helps prevent billing surprises

Cons

No annual contract options despite our consistent high-volume usage
Billing reports don't break down costs by project or user tags
Price per minute varies by AI model but this isn't obvious upfront

The Domain Practitioner

Daily hands-on reality in the product's domain — adapts identity per category, same lens

7.8/10

Pegasus 1.2 earns the integration, but Marengo 2.7's March sunset is the rug-pull engineers remember.

“Twelve Labs' video-understanding API ships a Python SDK that gets you indexing and querying inside a day, with Pegasus 1.2 generating summaries that actually reference what's on screen. The sunset of Marengo 2.7 in March 2026 forced a re-index of existing libraries, and the per-minute meter bites once a producer dumps a serious archive.”

Marengo 2.7 went dark on March 30, 2026 — no new indexing, no search requests, no embedding retrieval on existing content. The kind of forced re-index practitioners feel, not the kind shown in the demo. The changelog is honest about the migration; the cost of moving a large indexed library isn't.

Pegasus 1.2 is what earns the integration. Ask a natural-language question of a 40-minute recording and the answer cites the moment, not the metadata around it. The Python SDK keeps boilerplate thin — index, search, generate in three calls. AWS Rekognition Video gets you labels and shot detection, not Q&A over the clip.

The meter is the friction at scale. Indexing runs $0.042/minute and Pegasus input video $0.021/minute per their pricing page — easy to forecast until a producer dumps a 200-hour archive. However, the Starter tier's 500 monthly minutes lets a team evaluate honestly before signing anything.

Day-3 Reality7.6

Pegasus 1.2 holds up after the demo, but Marengo 2.7's March 2026 sunset is the kind of friction engineers remember.

Documentation Practitioner-Fit7.8

docs.twelvelabs.io ships runnable Python examples and release notes that name what changed, not marketing summaries.

Friction Surface7.3

Per-minute meter compounds across long archives, and the forced re-index off Marengo 2.7 added a real migration cost.

Power-User Depth8.1

Custom model training, vector embeddings exposed for downstream search, and an on-prem deployment path for enterprise.

Workflow Integration8.0

Python SDK and REST endpoints fit standard dev workflows; webhooks plus S3 and GCS integrations skip manual uploads.

Pros

Pegasus 1.2 answers natural-language questions with timestamped citations into the source video.
Python SDK on GitHub keeps integration to roughly three calls — index, search, generate.
Direct S3, Azure Blob, and GCS connectors mean no manual upload step for cloud-resident video.
Vector embeddings are exposed for use in your own search stack, not locked inside their dashboard.

Cons

The Marengo 2.7 sunset on March 30, 2026 forced a re-index that the changelog didn't price out.
Per-minute indexing plus per-minute input plus token output makes cost forecasting fiddly at archive scale.

Right for

Engineers building searchable video libraries who need real natural-language Q&A.

Avoid if

Teams who need rock-stable model versioning across multi-year archives.

The Power User

Daily human experience, onboarding, polish, learning curve, reliability

8.2/10

“Twelve Labs has transformed how I search through our company's video content library. After a year of daily use, it's become indispensable for finding specific moments in hundreds of hours of recordings.”

I've been using Twelve Labs every day for about 14 months now to manage our training videos and webinar recordings. The natural language search is genuinely impressive - I can type 'find where someone explains the refund policy' and it actually finds those exact moments across all our videos. It's saved me countless hours.

The learning curve was minimal. Within a week, I was confidently uploading videos and running complex searches. The interface is clean and doesn't overwhelm you with options. What really won me over is the accuracy - it understands context, not just keywords.

My only real gripe is the processing time for longer videos and the lack of a proper mobile app. But for what it does, it's become as essential as our email system.

Ease of Use8.5

The interface is intuitive and search just works like you'd expect it to.

Mobile Experience6.5

The web app works on mobile but really needs a dedicated app.

Onboarding Experience9.0

Had me up and running in under an hour with their clear tutorials.

Reliability8.0

Solid performance daily, though occasional slowdowns during peak hours.

Value for Money7.5

Pricey but the time savings justify it for our team.

Pros

Natural language search actually understands context and finds exact moments
Processes multiple languages in the same video without issues
Export clips feature saves tons of editing time

Cons

Processing longer videos (2+ hours) can take quite a while
No native mobile app makes field use challenging
Limited collaboration features for sharing searches with teammates

The Skeptic

Contrarian. Watch-outs, deal-breakers, broken promises, category patterns

4.5/10

“After 14 months with Twelve Labs, I'm switching to alternatives. The video search API showed promise but constant breaking changes and ignored feature requests made it impossible to build stable products.”

I integrated Twelve Labs' API into our content platform, hoping their AI-powered video search would revolutionize our workflow. Initially impressive - the contextual understanding was genuinely groundbreaking. But then came the nightmare: three major API updates in six months that broke our integrations each time, with minimal migration documentation. Support tickets sat unanswered for weeks while our production systems failed. The final straw was when they deprecated the exact features we'd built our entire workflow around, with just 30 days notice. Now I'm migrating 50,000+ indexed videos to a competitor who actually listens to enterprise customers.

Better Alternatives5.0

Azure Video Indexer and AWS Rekognition Video now match their capabilities with better stability.

Broken Promises8.5

Promised stable v1 API, then broke it three times without proper deprecation periods.

Deal Breakers7.0

Rate limits that randomly throttle even on enterprise plans killed our user experience.

Missing Features6.5

No batch processing, no webhook support, no proper error handling - basics missing.

Support Nightmares9.0

Two-week response times for critical production issues is unacceptable at this price point.

Pros

Genuinely impressive video understanding when it works
Clean initial API design
Good accuracy for complex scene detection

Cons

Breaking changes without warning destroyed production systems
Support team seems non-existent for paying customers
Enterprise pricing for startup-level reliability

Buyer Questions

Common questions answered by our AI research team

Pricing

What does the Twelve Labs free plan include?

The free plan includes up to 10 hours of indexing, 600 minutes of video usage (indexing + analyze + segment), 90-day index access, 10-hour duration per index, 100 videos per index, and 5 concurrent indexing tasks.

Pricing

How much does video indexing cost on the Developer plan?

On the Developer (pay-as-you-go) plan, Marengo video indexing costs $0.042 per minute, with embedding infrastructure services at $0.0015 per minute.

Security

Is Twelve Labs SOC 2 certified?

Yes, Twelve Labs is SOC 2 Type II certified, with encrypted data handling and flexible deployment options for its intelligence stack.

Features

What does the Marengo model do?

Marengo is a multimodal embedding model that turns video into spatiotemporal embeddings, making every moment searchable by what's actually in it rather than by typed metadata. It achieves 78.5% composite accuracy across 47 languages.

Features

How long a video can Pegasus analyze at once?

Pegasus reasons continuously over the full temporal arc of a video asset for up to two hours, tracking entities, causation, and narrative across time rather than just sampling frames.