Configuration

Select the model parameter count
GPU configuration for inference
Estimated number of inference requests per day
Average input + output tokens
Expected GPU utilization percentage

Results

Monthly Infrastructure Cost
$0
Based on selected GPU and provider pricing
Cost per 1M Tokens
$0
Effective cost per million tokens processed
Estimated Throughput
0
Tokens per second at specified utilization
Max Daily Capacity
0
Maximum requests per day at average token count
GPU Count Required
0
Minimum GPUs needed for this configuration

GPU Pricing Reference (October 2025)

GPU Type Memory AWS ($/hour) GCP ($/hour) Azure ($/hour) Best For
NVIDIA A10G 24 GB $1.10 $1.28 $1.15 7B models, cost-sensitive
NVIDIA A100 40GB 40 GB $4.10 $3.90 $4.25 13B-70B models
NVIDIA A100 80GB 80 GB $5.00 $4.80 $5.20 70B models, high batch
NVIDIA H100 80 GB $8.10 $7.90 $8.40 Maximum performance

Cost Optimization Tips