Knowledge Base

Find answers to common questions about Cloudmersive products and services.



Cloudmersive Private Cloud AI Server on Azure Best Practices
1/30/2026 - Cloudmersive Support


This guide covers Azure best practices for running Cloudmersive Private Cloud AI Server on GPU-enabled virtual machines, including the recommended VM sizes (single A100 or H100) and the recommended OS image (Debian LTS on Azure).


Recommended Azure VM sizes (single-GPU)

For a single-GPU AI Server deployment, Cloudmersive recommends Azure’s NC-series GPU-optimized VMs:

1× NVIDIA A100 (80GB)

  • VM size: Standard_NC24ads_A100_v4
  • vCPU / RAM: 24 vCPU, 220 GiB RAM
  • GPU: 1× NVIDIA A100 80GB
  • Use when: strong performance, widely used for inference and training on A100

Azure reference: NCA100v4 series sizing/specs.

1× NVIDIA H100 (94GB)

  • VM size: Standard_NC40ads_H100_v5
  • vCPU / RAM: 40 vCPU, 320 GiB RAM
  • GPU: 1× NVIDIA PCIe H100 94GB
  • Use when: maximum performance for modern LLM inference/training workloads

Azure reference: NCads H100 v5 series sizing/specs.

Notes

  • These SKUs typically require GPU quota approval and may be limited to specific regions/availability zones.
  • Always validate availability in your target region before committing to an architecture.

Recommended OS image (Azure best practice)

Cloudmersive recommends using the official Debian LTS image in Azure Marketplace:

  • Image name (Marketplace listing): Debian 12 “Bookworm” for Microsoft Azure
  • Why Debian 12: Debian 12 is the current stable “Bookworm” release line (with ongoing point releases/security updates). ([Debian][4])
  • Azure endorsement context: Debian images on Azure are published by endorsed distribution partners (Debian images historically via Credativ / Debian publisher).

Core deployment best practices

Networking

  • Place AI Server VMs in a dedicated VNet/subnet with:

    • Inbound restricted to only what you need (e.g., your API gateway / internal callers).
    • No public SSH if possible-use Azure Bastion or a locked-down jump host.
  • Use Standard Load Balancer or Application Gateway only if you’re scaling horizontally and need stable front-door routing.

Storage

  • Prefer Premium SSD v2 or Ultra Disk for heavy model I/O (depending on region/requirements).

  • Keep models on:

    • Managed disks for simplicity, or
    • Azure Blob + local caching if you manage warm-up/caching behavior.

GPU drivers & CUDA

  • Use Azure’s GPU VM guidance for the NC-family and install supported NVIDIA drivers/CUDA versions consistent with your AI Server build. (Azure’s NC family documentation is the authoritative starting point.)

Availability & scaling

  • For production:

    • Use Availability Zones where supported for the chosen SKU/region.
    • Consider N+1 capacity (one extra node) to handle maintenance events and failover.
  • If you plan multi-node inference/training later, standardize early on one GPU family (A100 vs H100) to avoid mixed-performance behavior.

Security posture

  • Encrypt disks (default Azure encryption; add CMK if required by policy).
  • Apply OS hardening + regular patching (Debian security updates).
  • Store secrets (API keys, internal credentials) in Azure Key Vault

600 free API calls/month, with no expiration

Sign Up Now or Sign in with Google    Sign in with Microsoft

Questions? We'll be your guide.

Contact Sales