You’ve set up Prometheus, deployed Grafana, and built beautiful dashboards showing CPU and memory usage for all your containers. Life is good. Then one day, a container goes rogue, consumes all available RAM, and your entire Docker host becomes unresponsive.
You check Grafana. It shows the container was using “only” 2GB of memory. But your host has 32GB. What happened?
The answer: You never set resource limits. And without limits, your monitoring is essentially meaningless.
The Problem: Unlimited Containers
When 100% doesn’t mean what you think it means
By default, Docker containers have no resource constraints. They can use as much CPU and memory as the host kernel scheduler allows. This creates two major issues:
- Runaway containers can kill your host – One misbehaving container can starve others
- Your monitoring percentages are meaningless – What does “50% memory usage” mean when there’s no defined limit?
Let’s look at what Grafana actually shows you:
| Without Limits | With Limits |
|---|---|
| Memory: 2GB used | Memory: 2GB / 4GB (50%) |
| CPU: 150% (of one core?) | CPU: 1.5 / 2.0 cores (75%) |
| No alerts possible | Alert when > 80% of limit |
| Host can OOM | Container gets OOM-killed first |
Pro tip: If your Grafana dashboard shows container memory in absolute values (GB) instead of percentages, you’re doing it wrong.
Understanding Docker Resource Constraints
The knobs you didn’t know you needed
Docker provides several resource constraint options:
Memory Limits
| Option | Description | Example |
|---|---|---|
--memory / -m |
Hard memory limit | --memory=512m |
--memory-swap |
Memory + swap limit | --memory-swap=1g |
--memory-reservation |
Soft limit (hint to scheduler) | --memory-reservation=256m |
--oom-kill-disable |
Disable OOM killer (dangerous!) | --oom-kill-disable |
CPU Limits
| Option | Description | Example |
|---|---|---|
--cpus |
Number of CPUs (can be fractional) | --cpus=1.5 |
--cpu-shares |
Relative weight (default: 1024) | --cpu-shares=512 |
--cpuset-cpus |
Pin to specific CPU cores | --cpuset-cpus="0,1" |
--cpu-period / --cpu-quota |
Fine-grained control | --cpu-quota=50000 |
Docker Compose: Setting Limits Properly
The right way to do it
Here’s a proper docker-compose.yml with resource limits:
services:
webapp:
image: nginx:alpine
deploy:
resources:
limits:
cpus: "1.0"
memory: 512M
reservations:
cpus: "0.25"
memory: 128M
1. For docker-compose (non-swarm), also add:
mem_limit: 512m
mem_reservation: 128m
cpus: 1.0
database:
image: postgres:15
deploy:
resources:
limits:
cpus: "2.0"
memory: 2G
reservations:
cpus: "0.5"
memory: 512M
mem_limit: 2g
mem_reservation: 512m
cpus: 2.0
environment:
POSTGRES_PASSWORD: secret
Important: The deploy.resources section only works in Docker Swarm mode. For standalone docker-compose, use mem_limit, mem_reservation, and cpus directly.
Compatibility Matrix
| Syntax | docker-compose up | docker stack deploy |
|---|---|---|
mem_limit |
✅ Works | ❌ Ignored |
cpus |
✅ Works | ❌ Ignored |
deploy.resources |
❌ Ignored | ✅ Works |
Pro tip: Include both syntaxes if you might deploy to both environments.
Memory Limits: Best Practices
Don’t let your containers eat the whole buffet
Rule 1: Always Set Hard Limits
services:
myapp:
image: myapp:latest
mem_limit: 1g
memswap_limit: 1g # Same as mem_limit = no swap
Setting memswap_limit equal to mem_limit disables swap for the container. This is usually what you want – swap is slow and hides memory problems.
Rule 2: Use Reservations for Scheduling
services:
myapp:
mem_limit: 1g
mem_reservation: 512m
- Limit: Maximum the container can use
- Reservation: Guaranteed minimum (used for scheduling decisions)
Rule 3: Know Your Application’s Memory Profile
| Application Type | Typical Memory Pattern | Recommended Limit |
|---|---|---|
| Nginx/Static | Stable, low | 128-256MB |
| Node.js API | Grows with connections | 512MB-1GB |
| Java/JVM | Heap + Metaspace | 1.5x heap size |
| PostgreSQL | shared_buffers + connections | 2x shared_buffers |
| Redis | Dataset size + overhead | 1.2x dataset size |
Rule 4: Handle OOM Gracefully
When a container hits its memory limit, the OOM killer terminates it. Prepare for this:
services:
myapp:
mem_limit: 512m
restart: unless-stopped # Auto-restart on OOM
logging:
driver: json-file
options:
max-size: "10m" # Prevent logs from eating memory
CPU Limits: Best Practices
Fair sharing is caring
Understanding CPU Options
services:
1. Option 1: Simple CPU count limit
webapp:
cpus: 1.5 # Can use up to 1.5 CPU cores
1. Option 2: Relative weight (for sharing)
background-worker:
cpu_shares: 512 # Half priority of default (1024)
1. Option 3: Pin to specific cores
latency-sensitive:
cpuset: "0,1" # Only use cores 0 and 1
When to Use What
| Scenario | Use This | Example |
|---|---|---|
| Hard limit needed | cpus |
Production services |
| Relative priority | cpu_shares |
Background tasks |
| Isolation required | cpuset |
Latency-sensitive apps |
| Development | None or low | Local testing |
The CPU Shares Trap
cpu_shares only matters when CPU is contested. If your host has idle CPU, a container with cpu_shares: 256 can still use 100% of available CPU.
1. This container gets 1/4 priority when CPU is scarce
1. But can use ALL available CPU when host is idle
background-job:
cpu_shares: 256 # Default is 1024
Monitoring with Prometheus and Grafana
Making your dashboards actually useful
The Metrics That Matter
With cAdvisor or Docker’s built-in metrics, you get:
| Metric | Description | Use For |
|---|---|---|
container_memory_usage_bytes |
Current memory usage | Absolute usage |
container_spec_memory_limit_bytes |
Memory limit | Calculating percentage |
container_cpu_usage_seconds_total |
CPU time consumed | CPU usage rate |
container_spec_cpu_quota |
CPU quota (microseconds) | Calculating percentage |
container_spec_cpu_period |
CPU period (microseconds) | Calculating percentage |
Prometheus Queries for Grafana
Memory Usage Percentage (the useful one):
1. Memory usage as percentage of limit
(container_memory_usage_bytes{name=~".+"} /
container_spec_memory_limit_bytes{name=~".+"}) * 100
CPU Usage Percentage:
1. CPU usage as percentage of limit
(rate(container_cpu_usage_seconds_total{name=~".+"}[5m]) /
(container_spec_cpu_quota{name=~".+"} /
container_spec_cpu_period{name=~".+"})) * 100
Memory Usage with Limit Context:
1. Shows usage and limit on same graph
container_memory_usage_bytes{name="myapp"}
container_spec_memory_limit_bytes{name="myapp"}
The Problem with Unlimited Containers
Here’s what happens in Grafana when you don’t set limits:
1. Without limits, this returns 0 or infinity
container_spec_memory_limit_bytes{name="unlimited-container"}
1. Result: 0 (no limit set)
1. Your percentage calculation becomes:
1. usage_bytes / 0 = infinity (or NaN)
Your dashboard either shows nothing, infinity, or some nonsensical number.
Grafana Dashboard Best Practices
Dashboards that actually help
Panel 1: Memory Usage with Limit Reference
1. Query A: Current Usage
container_memory_usage_bytes{name=~"$container"}
1. Query B: Limit (as reference line)
container_spec_memory_limit_bytes{name=~"$container"}
Visualization: Time series with both lines. Usage should stay well below the limit.
Panel 2: Memory Percentage Gauge
1. Single stat / Gauge
(container_memory_usage_bytes{name="$container"} /
container_spec_memory_limit_bytes{name="$container"}) * 100
Thresholds:
- 🟢 Green: 0-70%
- 🟡 Yellow: 70-85%
- 🔴 Red: 85-100%
Panel 3: CPU Usage Percentage
1. Rate of CPU usage vs limit
(rate(container_cpu_usage_seconds_total{name="$container"}[5m]) /
(container_spec_cpu_quota{name="$container"} /
container_spec_cpu_period{name="$container"})) * 100
Panel 4: OOM Kill Counter
1. Number of times container was OOM-killed
increase(container_oom_events_total{name=~"$container"}[24h])
If this is > 0, your memory limit is too low (or you have a memory leak).
Alert Rules
1. Prometheus alerting rules
groups:
- name: container-resources
rules:
- alert: ContainerMemoryHigh
expr: |
(container_memory_usage_bytes /
container_spec_memory_limit_bytes) > 0.85
for: 5m
labels:
severity: warning
annotations:
summary: "Container {{ $labels.name }} memory > 85%"
- alert: ContainerOOMKilled
expr: increase(container_oom_events_total[1h]) > 0
labels:
severity: critical
annotations:
summary: "Container {{ $labels.name }} was OOM-killed"
- alert: ContainerCPUThrottled
expr: |
rate(container_cpu_cfs_throttled_seconds_total[5m]) > 0.1
for: 5m
labels:
severity: warning
annotations:
summary: "Container {{ $labels.name }} is being CPU throttled"
Real-World Example: Full Stack with Limits
A complete docker-compose setup
services:
1. Reverse Proxy - Low memory, low CPU
nginx:
image: nginx:alpine
mem_limit: 128m
mem_reservation: 64m
cpus: 0.5
restart: unless-stopped
ports:
- "80:80"
- "443:443"
1. Application - Medium resources
api:
image: myapi:latest
mem_limit: 1g
mem_reservation: 256m
cpus: 1.0
restart: unless-stopped
environment:
NODE_ENV: production
1. Tell Node.js about memory limit
NODE_OPTIONS: "--max-old-space-size=768"
1. Database - Higher resources, stable usage
postgres:
image: postgres:15
mem_limit: 2g
mem_reservation: 1g
cpus: 2.0
restart: unless-stopped
environment:
POSTGRES_PASSWORD: ${DB_PASSWORD}
command:
- "postgres"
- "-c"
- "shared_buffers=512MB"
- "-c"
- "max_connections=100"
1. Cache - Memory-bound workload
redis:
image: redis:7-alpine
mem_limit: 512m
mem_reservation: 256m
cpus: 0.5
restart: unless-stopped
command: redis-server --maxmemory 400mb --maxmemory-policy allkeys-lru
1. Background Worker - Lower priority
worker:
image: myworker:latest
mem_limit: 512m
mem_reservation: 128m
cpus: 0.5
cpu_shares: 512 # Lower priority than other containers
restart: unless-stopped
1. Monitoring Stack
prometheus:
image: prom/prometheus:latest
mem_limit: 1g
mem_reservation: 512m
cpus: 1.0
restart: unless-stopped
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml
- prometheus_data:/prometheus
grafana:
image: grafana/grafana:latest
mem_limit: 512m
mem_reservation: 256m
cpus: 0.5
restart: unless-stopped
volumes:
- grafana_data:/var/lib/grafana
cadvisor:
image: gcr.io/cadvisor/cadvisor:latest
mem_limit: 256m
mem_reservation: 128m
cpus: 0.5
restart: unless-stopped
volumes:
- /:/rootfs:ro
- /var/run:/var/run:ro
- /sys:/sys:ro
- /var/lib/docker/:/var/lib/docker:ro
privileged: true
volumes:
prometheus_data:
grafana_data:
Quick Reference Card
╔══════════════════════════════════════════════════════════════════╗
║ DOCKER RESOURCE LIMITS CHEAT SHEET ║
╠══════════════════════════════════════════════════════════════════╣
║ MEMORY LIMITS ║
║ mem_limit: 512m Hard limit (container gets OOM-killed) ║
║ mem_reservation: 256m Soft limit (scheduling hint) ║
║ memswap_limit: 512m Memory + swap (same = no swap) ║
╠══════════════════════════════════════════════════════════════════╣
║ CPU LIMITS ║
║ cpus: 1.5 Limit to 1.5 CPU cores ║
║ cpu_shares: 512 Relative weight (default: 1024) ║
║ cpuset: "0,1" Pin to specific cores ║
╠══════════════════════════════════════════════════════════════════╣
║ COMMON LIMITS BY SERVICE TYPE ║
║ Nginx/Proxy: 128-256MB RAM, 0.5 CPU ║
║ Node.js App: 512MB-1GB RAM, 1.0 CPU ║
║ Java App: 1-4GB RAM, 2.0 CPU ║
║ PostgreSQL: 1-4GB RAM, 2.0 CPU ║
║ Redis: 256-512MB RAM, 0.5 CPU ║
║ Prometheus: 512MB-2GB RAM, 1.0 CPU ║
║ Grafana: 256-512MB RAM, 0.5 CPU ║
╠══════════════════════════════════════════════════════════════════╣
║ GRAFANA PROMQL ║
║ Memory %: (container_memory_usage_bytes / ║
║ container_spec_memory_limit_bytes) * 100 ║
║ CPU %: (rate(container_cpu_usage_seconds_total[5m]) / ║
║ (spec_cpu_quota / spec_cpu_period)) * 100 ║
╚══════════════════════════════════════════════════════════════════╝
Appendix: Sizing Guidelines
| Container Type | Memory Limit | CPU Limit | Notes |
|---|---|---|---|
| Static web server | 🟢 64-128MB | 🟢 0.25-0.5 | Very lightweight |
| PHP-FPM | 🟡 256-512MB | 🟡 0.5-1.0 | Per worker process |
| Node.js | 🟡 512MB-1GB | 🟡 0.5-1.0 | Set NODE_OPTIONS |
| Python/Django | 🟡 256-512MB | 🟡 0.5-1.0 | Per worker |
| Java/Spring | 🔴 1-4GB | 🟠 1.0-2.0 | JVM needs headroom |
| PostgreSQL | 🟠 1-4GB | 🟠 1.0-2.0 | Depends on shared_buffers |
| MySQL/MariaDB | 🟠 1-4GB | 🟠 1.0-2.0 | Depends on buffer pool |
| Redis | 🟡 256MB-2GB | 🟢 0.5-1.0 | Depends on dataset |
| Elasticsearch | 🔴 2-8GB | 🔴 2.0-4.0 | Memory hungry |
| Prometheus | 🟠 512MB-2GB | 🟡 0.5-1.0 | Grows with metrics |
| Grafana | 🟡 256-512MB | 🟢 0.5 | Lightweight |
Legend:
- 🟢 Low resource usage
- 🟡 Medium resource usage
- 🟠 High resource usage
- 🔴 Very high resource usage
Conclusion
Setting resource limits on your Docker containers is not optional – it’s essential for:
- Meaningful monitoring – Percentages that actually mean something
- Reliable alerting – Know when containers approach their limits
- System stability – One container can’t take down your host
- Fair resource sharing – Predictable performance across services
Without limits, your Grafana dashboards are just pretty pictures. With limits, they become actual operational tools.
Fun fact: The default Docker memory limit is “unlimited,” which in Linux terms means the container can use all available RAM plus all available swap. On a 32GB host with 16GB swap, that’s a 48GB “limit.” Your 512MB Node.js app has a lot of room to misbehave before anyone notices.
Stay limited, stay observable 🔐
Leave a Reply