Agentic Data Layer for R&D
ArcellAI is the autonomous data intelligence platform that designs and executes data-engineering pipelines and statistical experimentation for deep tech R&D. Purpose-built AI agents automate the toughest 80% of data work—from ingestion and harmonization to modeling and reporting—with integrated provenance and reproducibility.
The Problem
R&D teams face exploding data complexity, driving wasted effort, rising costs, and slower discovery.
- 75-90% failure rate: AI deployments fail in specialized biological domains
- 400% growth: Bioinformatics services to 4× by 2035, signaling unmet tooling needs and complexity
- $250B/yr lost: Biotech loses ~$250B annually to fragmented data and infrastructure debt
- 80% wasted time: Data scientists spend most time on data plumbing instead of discovery
The Solution
AI platform automating design and execution of data engineering pipelines and statistical experimentation.
- 100% deployment success: Purpose-built AI agents with domain-specific intelligence via context engineering
- Radical simplicity: Tools and integrations simplify complex biological data workflows
- 80% productivity gain: Autonomous workflows handle the toughest data work end-to-end
- Agentic Data Engineering: Automates ingestion → cleaning/transformation → lineage → orchestration across your R&D stack
- Context-Aware Reasoning: Domain-specific intelligence through advanced tool use and context engineering
- Provenance & Reproducibility: Versioned datasets, transformation lineage, and auditable workflows captured in a provenance graph
- Self-Driving Semantic Layer: Autonomously defines and centralizes research metrics, experimental KPIs, and statistical calculations
Product: VC-GPT
Our first product is VC-GPT, a Virtual Cells analytics agent built on research published at ICML'25.
- Domain-specific reasoning via in-context learning with standardized, expert-designed tools and APIs
- Autonomous and reproducible workflows for complex biological data analysis
- Bioengineering hardware stack API integrations for seamless lab connectivity
- Planner-Executor-Critic architecture: Breaks objectives into reliable multi-step workflows
Market Opportunity
- TAM: $500B AI in R&D
- SAM: $250B AI in physical sciences and engineering
- SOM: $200M ARR potential by 2034
- Beachhead: $25B AI in biotechnology
- Secondary Beachhead: $80B Healthcare Big Data analytics
Traction
- ICML'25 publication for foundational technology (funded by ICML, Harvard, and Chan-Zuckerberg)
- 5 businesses on waitlist
- 3 pilot interest forms from major biopharma company and startups
- Handshake revenue share agreement with MIT startup
- 2+ computational biologists interested in collaboration and testing product
- Antler VC residency, calculus.house at The Residency network
- Top 5 of 2000+ selected for Choose Good Quests pitch by Savant
- Live prototype "VC-GPT" demoed at calculus.house
All-MIT CS Founding Team
- Alex Velez-Arce - Founder & CEO/CTO: AI research & strategy at FAANG+, MIT CS, Harvard BioAI
- Jesus Caraballo - Founding Engineer: AI research HMS, backend engineering, MIT Computer Science
- built data & ml products accounting for $100M+ in revenue
- Open-sourced virtual cells AI platform with 30K+ MAU
- ICML-and-NeurIPS-published researcher in BioML
- Formerly SWE at Pinterest and AI researcher at Harvard
top 0.1% of peachscore startups