Self-learning compiler cloud

Run Your Code on Any Chip.
Maximum Performance.

Unlock true performance portability across processors. Our self-learning compiler, docc, uses cloud-based tuning to automatically optimize your code for any target processor. Start locally and scale effortlessly with CI/CD pipelines on our multi-chip cloud.

Optimized for leading hardware platforms

Soon

Soon

+ Add your platform Open-source

Building on SPCL's cutting-edge research that won the

ACM Gordon Bell Prize 2019 ACM Gordon Bell Prize 2025

Technology

A Compiler That Adapts To Your Application

Our self-learning compiler explores thousands of optimization paths to tailor your code for a specific hardware. Achieve maximum performance, surpassing the one-size-fits-all approach of traditional compilers.

C/C++

PyTorch

ONNX

Self-Learning Compiler

Collecting Codes...

Performance

LOW

Target Hardware

Optimized Executable

Usage

One Source, Any Target

Write your code once in plain C/C++, ONNX or PyTorch. Our self-learning compiler automatically generates optimized kernels for different hardware targets.

Rodinia Benchmarks

backprop.c

// Backpropagation Layer Forward Pass
void bpnn_layerforward(
  const float* l1, const float* l2,
  float* conn,
  int n1, int n2
)
  for (int j = 1; j <= n2; j++) {
    sum = 0.0;
    for (int k = 0; k <= n1; k++) {	
      sum += conn[k * n2 + j] * l1[k]; 
    }
    l2[j] = squash(sum);
  }
}

output

← Select a target platform

✓ Auto-tuned for maximum performance •✓ Memory layout optimized •✓ Hardware-specific intrinsics applied

Workflow

Build, Optimize and Deploy with a Multi-Chip Cloud

While docc will always run locally on your machine, our multi-chip cloud automates compilation and performance tuning at scale. You write the algorithms, we handle the rest.

daisy-cloud-runner — bash — 80x24

Connect Your Workflow

Simply add our GitHub app to your repository. We automatically detect changes and trigger optimization pipelines on every push.

Multi-Architecture Build

Your code is compiled in parallel across our cloud of NVIDIA, AMD, and Tenstorrent hardware. No local setup required.

Autotuning and Performance Analysis

Our compiler explores thousands of optimizations for your code. Our dashboard gives you the performance reports.

Proven Results

Built for the Most Demanding Applications

From computational fluid dynamics and molecular simulations to AI models, our compiler cloud handles the most complex workloads.

Watch Demo

Case StudySC '25, St. Louis, Mo - Nov 16-21. Felix LeClair

OpenFOAM on Tenstorrent

As a collaboration with Tenstorrent, we ported the industry-standard CFD toolkit OpenFOAM to Tenstorrent's RISC-V based Blackhole accelerator.

Zero Code Changes: Original C++ source code compiled directly with docc.

Automatic Optimization: docc automatically identifies offloadable code sections and moves them to the accelerator.

Full Portability: Seamlessly switch between Wormhole, Blackhole or other vendors' cards in our cloud.

View on GitHub

OpenFOAM Computational Fluid Dynamics

Computational fluid dynamics solver ported to run on Tenstorrent Blackhole.

HPCCG - Conjugate Gradient Solver

Conjugate gradient solver with sparse matrices optimized for NVIDIA GPUs and Tenstorrent Blackhole.

Coming Soon

AI Model Optimization from ONNX

A model zoo of AI models compiled for various hardware targets.

Seamlessly integrates with your existing build systems

MMake

CMCMake

NNinja

Pricing

Start Optimizing for Any Chip

Join the multi-chip revolution. Get started with our free tier and scale as you grow.

Starter

Free

Free plan for individual developers and small projects

✔ 2,500 database queries per month with docc
✔ 500 minutes for CI/CD on Daisy Cloud
✔ 1 GB of storage for files
✔ Up to 60 minutes per job

Start Now

Enterprise

Custom

Custom plans for businesses and large teams

✔ Custom number of database queries per month with docc
✔ Custom minutes and storage
✔ Self-hosted runners to benchmark on your hardware
✔ Analysis and compiler support

Request a Demo

Run Your Code on Any Chip. Maximum Performance.