Biotech & Life Sciences

Write Python. Run at scale. Advance what’s possible.

From drug discovery to precision medicine, your team can focus on the science while Union.ai handles the infrastructure.

Try the devbox

A free, local sandbox to
explore the Union.ai platform.

Chat with an engineer

Trusted by leading biotech and life sciences teams

AlphaFold, RDKit, GATK, Legacy C++. One Python Workflow.

Every task declares its own container image, resource profile, and dependencies. No shared environment, no version conflicts, no rewrites.

Per-task container images so AlphaFold, RDKit, and your legacy C++ tools each run in isolation
Mixed CPU, GPU, and high-memory nodes in a single workflow with per-step resource requests
Typed data handoffs between tasks instead of hand-rolled serialization

Copied to clipboard!

import flyte

alphafold_env = flyte.TaskEnvironment(
    name="alphafold",
    image=flyte.Image.from_debian_base(name="alphafold-env").with_pip_packages(
        "flyte[tui]>=2.0",
        "biopython>=1.84",
        "numpy",
    ),
    resources=flyte.Resources(cpu=4, memory="16Gi", gpu="A100:1"),
)

rdkit_env = flyte.TaskEnvironment(
    name="rdkit",
    image=flyte.Image.from_debian_base(name="rdkit-env").with_apt_packages(
        "libxrender1", "libxext6",
    ).with_pip_packages(
        "flyte[tui]>=2.0",
        "rdkit>=2024.3.5",
        "numpy",
    ),
    resources=flyte.Resources(cpu=2, memory="4Gi"),
    depends_on=[alphafold_env],
)

Caching & Compute

Faster DBTL Cycles. Less Wasted Compute.

Design-Build-Test-Learn cycles take weeks to months with expensive computation at every step. Flyte caches task outputs by hashing inputs and code, so a tweaked parameter only reruns what it affected.

Input-hash caching so you never re-fold a sequence or re-dock a compound you've already scored
Intra-task checkpointing that lets long-running jobs resume from where they left off after a failure
Per-task resource requests to reserve GPUs only for the steps that need them

Reproducibility & Compliance

Every Result. Traceable. Six Months Later.

Union versions every workflow, container, and artifact automatically. When a lab result contradicts a model, you can trace exactly what produced the prediction.

Strong typing on all task inputs and outputs so dataflow mismatches fail before a run starts
Artifact versioning with automatic lineage from raw input to final output
Immutable execution records for clinical and regulated pipeline auditability

Biotech Pipelines in Pure Python

Protein design, folding, scoring, screening, no YAML, no DSLs. Write it in Python, run it at scale.

import asyncioimport flyte # CPU image for scoringscore_env = flyte.TaskEnvironment(    name="protein-score",    image=flyte.Image.from_debian_base().with_pip_packages("biopython"),    resources=flyte.Resources(cpu="2", memory="4Gi"),) # GPU image for structure predictionfold_env = flyte.TaskEnvironment(    name="protein-fold",    image=flyte.Image.from_debian_base().with_pip_packages("torch", "biopython"),    resources=flyte.Resources(gpu="L4:1", memory="16Gi"),    depends_on=[score_env],)  @fold_env.task()(cache="auto")async def fold(sequence: str) -> flyte.io.File:    """Predict 3D structure, cached per sequence."""    pdb_path = run_esmfold(sequence)    return flyte.io.File(path=pdb_path)  @score_env.task(cache="auto")async def score(structure: flyte.io.File, target: str) -> float:    """Score binding affinity with legacy tool."""    local_path = await structure.download()    return run_legacy_scorer(structure.download(), target)  @score_env.taskasync def screen(sequences: list[str], target: str = "ACE2") -> str:    """Screen sequences: fold in parallel, then score each structure."""    # Fold all sequences in parallel — each is cached per input    structures = await asyncio.gather(        *[fold(sequence=s) for s in sequences]    )    # Score each structure on the scoring tier    scores = await asyncio.gather(        *[score(structure=st, target=target) for st in structures]    )    return ranked(sequences, scores)

Cached folding on GPU

cache="auto" hashes the input sequence. Re-runs skip any fold that's already been computed.

Declare your infra in code

Each task declares its own container image, resources, and dependencies. GPU for folding, CPU for scoring, no shared environments.

Typed parallel execution

fold returns flyte.io.File, score consumes it. Type mismatches fail before anything runs.

Real results

Case studies, technical deep dives, and conversations with the biotech teams building on Union.