Together AI
Together AI is a research-driven AI company empowering developers and researchers to train, fine-tune, and deploy open-source generative AI models at scale through a purpose-built AI acceleration cloud platform.
Founding Story
Together AI was founded in June 2022 by Vipul Ved Prakash, Ce Zhang, Chris Ré, and Percy Liang, who were driven by the belief that open and decentralized alternatives to closed AI systems were going to be important. The founders, who are researchers and professors at Stanford, wanted to democratize AI by making open-source models accessible to developers and enterprises globally. Prior to Together AI, the team had already released successful open-source projects including RedPajama, GPT-JT, and OpenChatKit, which garnered support from hundreds of thousands of AI developers.
Discussions
No discussions yet
Be the first to start a discussion about Together AI
Leadership
Founders
Vipul Ved Prakash
Co-founded Cloudmark, an anti-spam company that developed Vipul's Razor. Founded Topsy, a social media search and analytics company acquired by Apple for over $200 million in 2013. Background in large-scale distributed systems and information retrieval.
Ce Zhang
Associate Professor at University of Chicago. PhD from University of Wisconsin-Madison, postdoctoral researcher at Stanford under Chris Ré. Research focus on machine learning systems and decentralized computing.
Chris Ré
Professor at Stanford AI Lab, Stanford Center for Research on Foundation Models (CRFM), and Machine Learning Group. Research focus on foundation models and machine learning systems.
Percy Liang
Associate Professor of Computer Science at Stanford University. BS from MIT (2004), PhD from UC Berkeley (2011). Research focus on machine learning and natural language understanding. Co-founder of Stanford Center for Research on Foundation Models.
Executive Team
Vipul Ved Prakash
Founder & CEO
Previously founded Topsy (acquired by Apple for $200M+) and co-founded Cloudmark. Expert in large-scale distributed systems and information retrieval.
Ce Zhang
Founder & CTO
Associate Professor at University of Chicago. PhD from UW-Madison, postdoc at Stanford. Expert in machine learning systems and decentralized computing.
Business Model
Revenue Model
Consumption-based revenue model with per-token pricing for inference, GPU cluster rentals (hourly rates), fine-tuning services (per token), and custom model consulting. No subscription tiers or minimum commitments for serverless offerings. Options for instant clusters, reserved clusters, and Frontier AI Factory for large-scale deployments.
Pricing Tiers
Ranges from $0.05-$3.50 per 1M input tokens and $0.06-$7.00 per 1M output tokens depending on model (e.g., Llama 3.3 70B at $0.88/$0.88, DeepSeek-R1 at $3.00/$7.00, Llama 3.2 3B Turbo at $0.06/$0.06)
Dedicated inference infrastructure with custom pricing based on requirements
Standard and specialized pricing options available for LoRA and full fine-tuning
Self-service GPU clusters from 64 to 10,000+ GPUs with hourly billing
Reserved GPU capacity for long-term deployments
Large-scale GPU deployments for frontier AI development
Code execution and interpretation services
Target Markets
- AI-native startups and scale-ups
- Enterprise software companies
- AI researchers and academics
- Open-source AI developers
- AI application developers
- Large enterprises deploying AI
- AI-powered code editors and IDEs
- Voice AI and real-time conversational agents
- AI video generation at scale
- Customer support automation
- AI agents and reasoning models
- Document parsing and information extraction
- Cursor
- Decagon
- Salesforce
- Zoom
History & Milestones
Raised $305M Series B led by General Catalyst and co-led by Prosperity7 at $3.3B valuation
Acquired CodeSandbox to add built-in code interpretation capabilities
Became NVIDIA Cloud Partner in the NVIDIA Partner Network
Raised $106M funding led by Salesforce Ventures at $1.25B valuation
Grew user base to over 450,000 AI developers
1 AI Tool by Together AI
Together AI
9moEnd-to-end platform for generative AI with fast inference, fine-tuning, and GPU cluster solutions
