Cloudmersive Private Cloud AI Server on Azure Best Practices

Knowledge Base

Find answers to common questions about Cloudmersive products and services.

1/30/2026 - Cloudmersive Support

This guide covers Azure best practices for running Cloudmersive Private Cloud AI Server on GPU-enabled virtual machines, including the recommended VM sizes (single A100 or H100) and the recommended OS image (Debian LTS on Azure).

Recommended Azure VM sizes (single-GPU)

For a single-GPU AI Server deployment, Cloudmersive recommends Azure’s NC-series GPU-optimized VMs:

1× NVIDIA A100 (80GB)

VM size: Standard_NC24ads_A100_v4
vCPU / RAM: 24 vCPU, 220 GiB RAM
GPU: 1× NVIDIA A100 80GB
Use when: strong performance, widely used for inference and training on A100

Azure reference: NCA100v4 series sizing/specs.

1× NVIDIA H100 (94GB)

VM size: Standard_NC40ads_H100_v5
vCPU / RAM: 40 vCPU, 320 GiB RAM
GPU: 1× NVIDIA PCIe H100 94GB
Use when: maximum performance for modern LLM inference/training workloads

Azure reference: NCads H100 v5 series sizing/specs.

Notes

These SKUs typically require GPU quota approval and may be limited to specific regions/availability zones.
Always validate availability in your target region before committing to an architecture.

Recommended OS image (Azure best practice)

Cloudmersive recommends using the official Debian LTS image in Azure Marketplace:

Image name (Marketplace listing): Debian 12 “Bookworm” for Microsoft Azure
Why Debian 12: Debian 12 is the current stable “Bookworm” release line (with ongoing point releases/security updates). ([Debian][4])
Azure endorsement context: Debian images on Azure are published by endorsed distribution partners (Debian images historically via Credativ / Debian publisher).

Core deployment best practices

Networking

Place AI Server VMs in a dedicated VNet/subnet with:
- Inbound restricted to only what you need (e.g., your API gateway / internal callers).
- No public SSH if possible-use Azure Bastion or a locked-down jump host.
Use Standard Load Balancer or Application Gateway only if you’re scaling horizontally and need stable front-door routing.

Storage

Prefer Premium SSD v2 or Ultra Disk for heavy model I/O (depending on region/requirements).
Keep models on:
- Managed disks for simplicity, or
- Azure Blob + local caching if you manage warm-up/caching behavior.

GPU drivers & CUDA

Use Azure’s GPU VM guidance for the NC-family and install supported NVIDIA drivers/CUDA versions consistent with your AI Server build. (Azure’s NC family documentation is the authoritative starting point.)

Availability & scaling

For production:
- Use Availability Zones where supported for the chosen SKU/region.
- Consider N+1 capacity (one extra node) to handle maintenance events and failover.
If you plan multi-node inference/training later, standardize early on one GPU family (A100 vs H100) to avoid mixed-performance behavior.

Security posture

Encrypt disks (default Azure encryption; add CMK if required by policy).
Apply OS hardening + regular patching (Debian security updates).
Store secrets (API keys, internal credentials) in Azure Key Vault

Knowledge Base

Recommended Azure VM sizes (single-GPU)

1× NVIDIA A100 (80GB)

1× NVIDIA H100 (94GB)

Recommended OS image (Azure best practice)

Core deployment best practices

Networking

Storage

GPU drivers & CUDA

Availability & scaling

Security posture

600 free API calls/month, with no expiration

API Products

Virus Scan APIs

Content Disarm and Reconstruction APIs

Spam Detection APIs

Document Conversion & Processing APIs

Document AI APIs

Natural Language Processing (NLP) APIs

Optical Character Recognition (OCR) APIs

Image and Face Recognition and Processing APIs

Questions? We'll be your guide.