Docker Resource Limits: Why Your Grafana Dashboards Are Lying to You

You’ve set up Prometheus, deployed Grafana, and built beautiful dashboards showing CPU and memory usage for all your containers. Life is good. Then one day, a container goes rogue, consumes all available RAM, and your entire Docker host becomes unresponsive.

You check Grafana. It shows the container was using “only” 2GB of memory. But your host has 32GB. What happened?

The answer: You never set resource limits. And without limits, your monitoring is essentially meaningless.


The Problem: Unlimited Containers

When 100% doesn’t mean what you think it means

By default, Docker containers have no resource constraints. They can use as much CPU and memory as the host kernel scheduler allows. This creates two major issues:

  1. Runaway containers can kill your host – One misbehaving container can starve others
  2. Your monitoring percentages are meaningless – What does “50% memory usage” mean when there’s no defined limit?

Let’s look at what Grafana actually shows you:

Without Limits With Limits
Memory: 2GB used Memory: 2GB / 4GB (50%)
CPU: 150% (of one core?) CPU: 1.5 / 2.0 cores (75%)
No alerts possible Alert when > 80% of limit
Host can OOM Container gets OOM-killed first

Pro tip: If your Grafana dashboard shows container memory in absolute values (GB) instead of percentages, you’re doing it wrong.


Understanding Docker Resource Constraints

The knobs you didn’t know you needed

Docker provides several resource constraint options:

Memory Limits

Option Description Example
--memory / -m Hard memory limit --memory=512m
--memory-swap Memory + swap limit --memory-swap=1g
--memory-reservation Soft limit (hint to scheduler) --memory-reservation=256m
--oom-kill-disable Disable OOM killer (dangerous!) --oom-kill-disable

CPU Limits

Option Description Example
--cpus Number of CPUs (can be fractional) --cpus=1.5
--cpu-shares Relative weight (default: 1024) --cpu-shares=512
--cpuset-cpus Pin to specific CPU cores --cpuset-cpus="0,1"
--cpu-period / --cpu-quota Fine-grained control --cpu-quota=50000

Docker Compose: Setting Limits Properly

The right way to do it

Here’s a proper docker-compose.yml with resource limits:

services:
  webapp:
    image: nginx:alpine
    deploy:
      resources:
        limits:
          cpus: "1.0"
          memory: 512M
        reservations:
          cpus: "0.25"
          memory: 128M
    1. For docker-compose (non-swarm), also add:
    mem_limit: 512m
    mem_reservation: 128m
    cpus: 1.0

  database:
    image: postgres:15
    deploy:
      resources:
        limits:
          cpus: "2.0"
          memory: 2G
        reservations:
          cpus: "0.5"
          memory: 512M
    mem_limit: 2g
    mem_reservation: 512m
    cpus: 2.0
    environment:
      POSTGRES_PASSWORD: secret

Important: The deploy.resources section only works in Docker Swarm mode. For standalone docker-compose, use mem_limit, mem_reservation, and cpus directly.

Compatibility Matrix

Syntax docker-compose up docker stack deploy
mem_limit ✅ Works ❌ Ignored
cpus ✅ Works ❌ Ignored
deploy.resources ❌ Ignored ✅ Works

Pro tip: Include both syntaxes if you might deploy to both environments.


Memory Limits: Best Practices

Don’t let your containers eat the whole buffet

Rule 1: Always Set Hard Limits

services:
  myapp:
    image: myapp:latest
    mem_limit: 1g
    memswap_limit: 1g  # Same as mem_limit = no swap

Setting memswap_limit equal to mem_limit disables swap for the container. This is usually what you want – swap is slow and hides memory problems.

Rule 2: Use Reservations for Scheduling

services:
  myapp:
    mem_limit: 1g
    mem_reservation: 512m
  • Limit: Maximum the container can use
  • Reservation: Guaranteed minimum (used for scheduling decisions)

Rule 3: Know Your Application’s Memory Profile

Application Type Typical Memory Pattern Recommended Limit
Nginx/Static Stable, low 128-256MB
Node.js API Grows with connections 512MB-1GB
Java/JVM Heap + Metaspace 1.5x heap size
PostgreSQL shared_buffers + connections 2x shared_buffers
Redis Dataset size + overhead 1.2x dataset size

Rule 4: Handle OOM Gracefully

When a container hits its memory limit, the OOM killer terminates it. Prepare for this:

services:
  myapp:
    mem_limit: 512m
    restart: unless-stopped  # Auto-restart on OOM
    logging:
      driver: json-file
      options:
        max-size: "10m"  # Prevent logs from eating memory

CPU Limits: Best Practices

Fair sharing is caring

Understanding CPU Options

services:
  1. Option 1: Simple CPU count limit
  webapp:
    cpus: 1.5  # Can use up to 1.5 CPU cores

  1. Option 2: Relative weight (for sharing)
  background-worker:
    cpu_shares: 512  # Half priority of default (1024)

  1. Option 3: Pin to specific cores
  latency-sensitive:
    cpuset: "0,1"  # Only use cores 0 and 1

When to Use What

Scenario Use This Example
Hard limit needed cpus Production services
Relative priority cpu_shares Background tasks
Isolation required cpuset Latency-sensitive apps
Development None or low Local testing

The CPU Shares Trap

cpu_shares only matters when CPU is contested. If your host has idle CPU, a container with cpu_shares: 256 can still use 100% of available CPU.

1. This container gets 1/4 priority when CPU is scarce
1. But can use ALL available CPU when host is idle
background-job:
  cpu_shares: 256  # Default is 1024

Monitoring with Prometheus and Grafana

Making your dashboards actually useful

The Metrics That Matter

With cAdvisor or Docker’s built-in metrics, you get:

Metric Description Use For
container_memory_usage_bytes Current memory usage Absolute usage
container_spec_memory_limit_bytes Memory limit Calculating percentage
container_cpu_usage_seconds_total CPU time consumed CPU usage rate
container_spec_cpu_quota CPU quota (microseconds) Calculating percentage
container_spec_cpu_period CPU period (microseconds) Calculating percentage

Prometheus Queries for Grafana

Memory Usage Percentage (the useful one):

1. Memory usage as percentage of limit
(container_memory_usage_bytes{name=~".+"} /
 container_spec_memory_limit_bytes{name=~".+"}) * 100

CPU Usage Percentage:

1. CPU usage as percentage of limit
(rate(container_cpu_usage_seconds_total{name=~".+"}[5m]) /
 (container_spec_cpu_quota{name=~".+"} /
  container_spec_cpu_period{name=~".+"})) * 100

Memory Usage with Limit Context:

1. Shows usage and limit on same graph
container_memory_usage_bytes{name="myapp"}
container_spec_memory_limit_bytes{name="myapp"}

The Problem with Unlimited Containers

Here’s what happens in Grafana when you don’t set limits:

1. Without limits, this returns 0 or infinity
container_spec_memory_limit_bytes{name="unlimited-container"}
1. Result: 0 (no limit set)

1. Your percentage calculation becomes:
1. usage_bytes / 0 = infinity (or NaN)

Your dashboard either shows nothing, infinity, or some nonsensical number.


Grafana Dashboard Best Practices

Dashboards that actually help

Panel 1: Memory Usage with Limit Reference

1. Query A: Current Usage
container_memory_usage_bytes{name=~"$container"}

1. Query B: Limit (as reference line)
container_spec_memory_limit_bytes{name=~"$container"}

Visualization: Time series with both lines. Usage should stay well below the limit.

Panel 2: Memory Percentage Gauge

1. Single stat / Gauge
(container_memory_usage_bytes{name="$container"} /
 container_spec_memory_limit_bytes{name="$container"}) * 100

Thresholds:

  • 🟢 Green: 0-70%
  • 🟡 Yellow: 70-85%
  • 🔴 Red: 85-100%

Panel 3: CPU Usage Percentage

1. Rate of CPU usage vs limit
(rate(container_cpu_usage_seconds_total{name="$container"}[5m]) /
 (container_spec_cpu_quota{name="$container"} /
  container_spec_cpu_period{name="$container"})) * 100

Panel 4: OOM Kill Counter

1. Number of times container was OOM-killed
increase(container_oom_events_total{name=~"$container"}[24h])

If this is > 0, your memory limit is too low (or you have a memory leak).

Alert Rules

1. Prometheus alerting rules
groups:
  - name: container-resources
    rules:
      - alert: ContainerMemoryHigh
        expr: |
          (container_memory_usage_bytes /
           container_spec_memory_limit_bytes) > 0.85
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Container {{ $labels.name }} memory > 85%"

      - alert: ContainerOOMKilled
        expr: increase(container_oom_events_total[1h]) > 0
        labels:
          severity: critical
        annotations:
          summary: "Container {{ $labels.name }} was OOM-killed"

      - alert: ContainerCPUThrottled
        expr: |
          rate(container_cpu_cfs_throttled_seconds_total[5m]) > 0.1
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Container {{ $labels.name }} is being CPU throttled"

Real-World Example: Full Stack with Limits

A complete docker-compose setup

services:
  1. Reverse Proxy - Low memory, low CPU
  nginx:
    image: nginx:alpine
    mem_limit: 128m
    mem_reservation: 64m
    cpus: 0.5
    restart: unless-stopped
    ports:
      - "80:80"
      - "443:443"

  1. Application - Medium resources
  api:
    image: myapi:latest
    mem_limit: 1g
    mem_reservation: 256m
    cpus: 1.0
    restart: unless-stopped
    environment:
      NODE_ENV: production
      1. Tell Node.js about memory limit
      NODE_OPTIONS: "--max-old-space-size=768"

  1. Database - Higher resources, stable usage
  postgres:
    image: postgres:15
    mem_limit: 2g
    mem_reservation: 1g
    cpus: 2.0
    restart: unless-stopped
    environment:
      POSTGRES_PASSWORD: ${DB_PASSWORD}
    command:
      - "postgres"
      - "-c"
      - "shared_buffers=512MB"
      - "-c"
      - "max_connections=100"

  1. Cache - Memory-bound workload
  redis:
    image: redis:7-alpine
    mem_limit: 512m
    mem_reservation: 256m
    cpus: 0.5
    restart: unless-stopped
    command: redis-server --maxmemory 400mb --maxmemory-policy allkeys-lru

  1. Background Worker - Lower priority
  worker:
    image: myworker:latest
    mem_limit: 512m
    mem_reservation: 128m
    cpus: 0.5
    cpu_shares: 512  # Lower priority than other containers
    restart: unless-stopped

  1. Monitoring Stack
  prometheus:
    image: prom/prometheus:latest
    mem_limit: 1g
    mem_reservation: 512m
    cpus: 1.0
    restart: unless-stopped
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
      - prometheus_data:/prometheus

  grafana:
    image: grafana/grafana:latest
    mem_limit: 512m
    mem_reservation: 256m
    cpus: 0.5
    restart: unless-stopped
    volumes:
      - grafana_data:/var/lib/grafana

  cadvisor:
    image: gcr.io/cadvisor/cadvisor:latest
    mem_limit: 256m
    mem_reservation: 128m
    cpus: 0.5
    restart: unless-stopped
    volumes:
      - /:/rootfs:ro
      - /var/run:/var/run:ro
      - /sys:/sys:ro
      - /var/lib/docker/:/var/lib/docker:ro
    privileged: true

volumes:
  prometheus_data:
  grafana_data:

Quick Reference Card

╔══════════════════════════════════════════════════════════════════╗
║           DOCKER RESOURCE LIMITS CHEAT SHEET                     ║
╠══════════════════════════════════════════════════════════════════╣
║ MEMORY LIMITS                                                    ║
║   mem_limit: 512m         Hard limit (container gets OOM-killed) ║
║   mem_reservation: 256m   Soft limit (scheduling hint)           ║
║   memswap_limit: 512m     Memory + swap (same = no swap)         ║
╠══════════════════════════════════════════════════════════════════╣
║ CPU LIMITS                                                       ║
║   cpus: 1.5               Limit to 1.5 CPU cores                 ║
║   cpu_shares: 512         Relative weight (default: 1024)        ║
║   cpuset: "0,1"           Pin to specific cores                  ║
╠══════════════════════════════════════════════════════════════════╣
║ COMMON LIMITS BY SERVICE TYPE                                    ║
║   Nginx/Proxy:      128-256MB RAM,  0.5 CPU                      ║
║   Node.js App:      512MB-1GB RAM,  1.0 CPU                      ║
║   Java App:         1-4GB RAM,      2.0 CPU                      ║
║   PostgreSQL:       1-4GB RAM,      2.0 CPU                      ║
║   Redis:            256-512MB RAM,  0.5 CPU                      ║
║   Prometheus:       512MB-2GB RAM,  1.0 CPU                      ║
║   Grafana:          256-512MB RAM,  0.5 CPU                      ║
╠══════════════════════════════════════════════════════════════════╣
║ GRAFANA PROMQL                                                   ║
║   Memory %:  (container_memory_usage_bytes /                     ║
║               container_spec_memory_limit_bytes) * 100           ║
║   CPU %:     (rate(container_cpu_usage_seconds_total[5m]) /      ║
║               (spec_cpu_quota / spec_cpu_period)) * 100          ║
╚══════════════════════════════════════════════════════════════════╝

Appendix: Sizing Guidelines

Container Type Memory Limit CPU Limit Notes
Static web server 🟢 64-128MB 🟢 0.25-0.5 Very lightweight
PHP-FPM 🟡 256-512MB 🟡 0.5-1.0 Per worker process
Node.js 🟡 512MB-1GB 🟡 0.5-1.0 Set NODE_OPTIONS
Python/Django 🟡 256-512MB 🟡 0.5-1.0 Per worker
Java/Spring 🔴 1-4GB 🟠 1.0-2.0 JVM needs headroom
PostgreSQL 🟠 1-4GB 🟠 1.0-2.0 Depends on shared_buffers
MySQL/MariaDB 🟠 1-4GB 🟠 1.0-2.0 Depends on buffer pool
Redis 🟡 256MB-2GB 🟢 0.5-1.0 Depends on dataset
Elasticsearch 🔴 2-8GB 🔴 2.0-4.0 Memory hungry
Prometheus 🟠 512MB-2GB 🟡 0.5-1.0 Grows with metrics
Grafana 🟡 256-512MB 🟢 0.5 Lightweight

Legend:

  • 🟢 Low resource usage
  • 🟡 Medium resource usage
  • 🟠 High resource usage
  • 🔴 Very high resource usage

Conclusion

Setting resource limits on your Docker containers is not optional – it’s essential for:

  1. Meaningful monitoring – Percentages that actually mean something
  2. Reliable alerting – Know when containers approach their limits
  3. System stability – One container can’t take down your host
  4. Fair resource sharing – Predictable performance across services

Without limits, your Grafana dashboards are just pretty pictures. With limits, they become actual operational tools.


Fun fact: The default Docker memory limit is “unlimited,” which in Linux terms means the container can use all available RAM plus all available swap. On a 32GB host with 16GB swap, that’s a 48GB “limit.” Your 512MB Node.js app has a lot of room to misbehave before anyone notices.


Stay limited, stay observable 🔐

Leave a Reply

Your email address will not be published. Required fields are marked *