LLM Cost & Throughput Calculator

Configuration

Model Size

Select the model parameter count

GPU Type

GPU configuration for inference

Cloud Provider

Daily Requests

Estimated number of inference requests per day

Average Tokens per Request

Average input + output tokens

GPU Utilization

Expected GPU utilization percentage

Monthly Infrastructure Cost

Based on selected GPU and provider pricing

Cost per 1M Tokens

Effective cost per million tokens processed

Estimated Throughput

Tokens per second at specified utilization

Max Daily Capacity

Maximum requests per day at average token count

GPU Count Required

Minimum GPUs needed for this configuration

GPU Type	Memory	AWS ($/hour)	GCP ($/hour)	Azure ($/hour)	Best For
NVIDIA A10G	24 GB	$1.10	$1.28	$1.15	7B models, cost-sensitive
NVIDIA A100 40GB	40 GB	$4.10	$3.90	$4.25	13B-70B models
NVIDIA A100 80GB	80 GB	$5.00	$4.80	$5.20	70B models, high batch
NVIDIA H100	80 GB	$8.10	$7.90	$8.40	Maximum performance

Use quantization: INT8 or INT4 quantization can reduce memory requirements by 50-75%, enabling smaller GPU instances
Batch requests: Higher batch sizes improve GPU utilization and reduce cost per token
Spot instances: AWS/GCP spot instances can reduce costs by 60-70% for fault-tolerant workloads
Auto-scaling: Scale GPU instances based on demand to avoid paying for idle capacity
Consider managed APIs: For low-volume workloads, managed APIs (OpenAI, Anthropic) may be more cost-effective than self-hosting