Skip to content
    OpenStack based foundationMulti tenant governanceKubernetes and SlurmNVIDIA AMD Tenstorrent

    Cloud Foundation for AI clouds

    A provider grade cloud infrastructure layer plus the FlexAI CloudFoundry platform layer, built to run governed AI workloads across heterogeneous GPU fleets with repeatable ops.

    Architecture

    Purpose built architecture for AI clouds

    A layered architecture that unifies infrastructure and CloudFoundry intelligence, while delivering clean, monetizable AI services at the top layer.

    Customer consumption layer
    • AI Services SKUs only
    • End user portal and API access
    • Service tiers and quotas
    FlexAI CloudFoundry platform
    • Commercialization and governance engine
    • AI development and operations plane
    • Policy enforcement and metering outputs
    Cloud infrastructure layer
    • OpenStack based foundation
    • Compute storage and network resources
    • Kubernetes clusters plus optional Slurm
    Capabilities

    Technical foundation

    The core primitives needed to build and operate AI clouds with heterogeneous hardware and governed tenants.

    Compute and GPU abstraction
    Support hyperscaler and on prem GPU resources, unify provisioning, and expose consistent instance types and scheduling across mixed pools.
    Storage interoperability
    S3 compatible storage plus connectors for GCS and Cloudflare R2 for datasets, checkpoints, artifacts, and logs.
    High bandwidth networking
    InfiniBand connectivity and NVLink aware networking patterns for distributed training and high throughput inference.
    Governance and tenancy
    Quota management, tenant isolation, policy frameworks, and billing grade usage signals across infrastructure and AI services.
    Runtime matrix
    Inference fine tuning and training runtimes across CUDA and ROCm with serving stacks like vLLM plus common frameworks.
    HPC alignment
    Kubernetes first orchestration with an option for Slurm on Kubernetes to enable HPC scheduling semantics and queues.
    Cloud infrastructure capabilities
    Compute services
    OpenStack based virtualization for Bare Metal, VM and GPU backed instances plus Kubernetes worker pools.
    GPU support
    NVIDIA, AMD, Tenstorrent support with multi generation GPUs and placement rules.
    Networking
    Dedicated network segments for management, storage, and customer traffic. High bandwidth low latency between nodes.
    Storage
    S3 compatible object storage plus optional integrations for GCS and Cloudflare R2.
    CloudFoundry platform plane

    Core platform services powering commercialization and AI operations.

    Commercialization and governance engine
    Token management
    Billing and metering outputs
    Policy enforcement
    Quota and tier controls
    Audit logs and governance reporting
    AI development and operations
    Model registry
    Pipeline orchestration
    Training and serving frameworks
    End to end observability
    Blueprints and golden paths
    Orchestration

    Orchestration and scheduling

    Kubernetes first orchestration with optional Slurm integration for HPC scheduling semantics, enabling teams to run distributed workloads with the right scheduler for the job.

    Kubernetes
    Cloud native orchestration for containerized AI workloads with GPU-aware scheduling, autoscaling, and multi-tenant isolation across heterogeneous clusters.
    Slurm
    HPC scheduling semantics on Kubernetes for teams that need job queues, gang scheduling, and priority-based resource allocation for large-scale training runs.
    Integrations

    Integration surfaces

    Operational surfaces you plug into when offering AI services to multiple tenants and teams.

    Integrate into your stack
    Identity and access
    LDAP or AD integration, tenant and project mapping, RBAC boundaries per org and workspace.
    Billing and Business Support Systems (BSS)
    API export of usage and metering outputs to downstream billing systems for SKUs and reporting.
    Observability
    Metrics traces and logs, audit trails, and end to end monitoring across VMs, K8s workloads, and AI services.
    Storage and artifacts
    Dataset and checkpoint storage via S3 compatible backends with lifecycle and retention policies.
    Operational notes

    Build service tiers at the governance layer. Enforce quotas and policies before jobs hit GPU pools.

    Treat runtimes as products. Version them, test them, and expose them as supported options across tenants.

    Keep auditability first class with end to end monitoring, traces, and exportable metering signals.

    GPU access

    Choose the control plane that fits your team

    Go fast with FCS, go deep with bare metal, or stay lightweight with VMs. All paths plug into the AI Factory story.

    Flexible GPU access for every stage of scale
    Choose control level and operations model, then scale up without replatforming.
    Provisioning and allocation based on workload needs
    Managed experience
    • Intent based allocation of drivers, runtimes, and infrastructure components aligned to workload requirements
    • Right sizing, scaling, retries, and recovery built in
    • Best for inference, fine tuning, training, and batch
    Tip: Start with FCS for speed. Move to bare metal for maximum control. Keep the same workload intent.
    Supported GPUs
    Pick the right architecture for the workload and price point.
    NVIDIA
    H100H200B200A100L40L40SRTX
    AMD
    MI300XMI300AMI325MI350
    Factory grade reliability
    Guardrails, policy, and recovery so teams spend time shipping, not babysitting clusters.
    Bare metal for multi node training
    Use bare metal nodes when you want maximum control and predictable performance for large training jobs.
    Distributed workloads
    VMs for simple stacks
    Spin up a VM with the tools you already use. Great for experiments and straightforward serving.
    Bring your image
    FCS for managed AI services
    Hand off infrastructure to FlexAI. Get orchestration, reliability, and governance without heavy lift.
    Outcome first
    FAQ

    Technical questions

    Common architecture clarifications for cloud foundation deployments.

    Can this run on existing hyperscaler capacity as well as on prem?Toggle

    Yes. You can onboard resources from hyperscalers into the same operational plane for centralized ops and management, alongside on prem GPU pools.

    What is the top layer supposed to expose to end users?Toggle

    Keep it clean. Sell AI services SKUs like managed AI services and inference endpoints. Keep foundation concerns inside the platform and infrastructure layers.

    How do you keep runtimes consistent across mixed hardware?Toggle

    A supported runtime matrix plus policy driven placement rules. Tenants pick workflows and constraints, the platform schedules to matching pools and versions.

    Ready for a technical walkthrough
    Architecture, integration points, runtime matrix, and tenant governance flows.