노드 모니터링 스택 (Prometheus + Grafana)

체인에 무관하게 적용할 수 있는 모니터링 스택 구성 예시입니다. 각 체인 클라이언트가 노출하는 Prometheus 메트릭을 수집·시각화하는 구성입니다.

전체 구조

flowchart LR
  Node["Node client<br/>(geth / agave / bitcoind …)"] -->|scrape| Prom[Prometheus]
  Prom -->|query| Grafana[Grafana Dashboards]
  Prom -->|rules| AM[Alertmanager]
  AM -->|webhook| Slack
  AM --> Pager[Pagerduty]
  AM --> Discord
  NodeExporter[node_exporter :9100] --> Prom

docker-compose 예시

# monitoring/docker-compose.yml
services:
  prometheus:
    image: prom/prometheus:v2.54.1
    container_name: prometheus
    restart: unless-stopped
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml:ro
      - ./rules.yml:/etc/prometheus/rules.yml:ro
      - prom-data:/prometheus
    command:
      - --config.file=/etc/prometheus/prometheus.yml
      - --storage.tsdb.retention.time=30d
    ports:
      - "127.0.0.1:9090:9090"

  grafana:
    image: grafana/grafana:11.3.0
    container_name: grafana
    restart: unless-stopped
    environment:
      GF_SECURITY_ADMIN_PASSWORD: ${GF_ADMIN_PASSWORD:?set in .env}
      GF_INSTALL_PLUGINS: ""
    volumes:
      - grafana-data:/var/lib/grafana
    ports:
      - "127.0.0.1:3000:3000"

  node-exporter:
    image: prom/node-exporter:v1.8.2
    container_name: node-exporter
    restart: unless-stopped
    network_mode: host
    pid: host
    volumes:
      - /:/host:ro,rslave
    command:
      - --path.rootfs=/host
      - --collector.filesystem.mount-points-exclude=^/(sys|proc|dev|host|etc)($$|/)

  alertmanager:
    image: prom/alertmanager:v0.27.0
    container_name: alertmanager
    restart: unless-stopped
    volumes:
      - ./alertmanager.yml:/etc/alertmanager/alertmanager.yml:ro
    ports:
      - "127.0.0.1:9093:9093"

volumes:
  prom-data:
  grafana-data:

prometheus.yml (주요 체인 scrape)

global:
  scrape_interval: 15s
  evaluation_interval: 15s

rule_files:
  - /etc/prometheus/rules.yml

alerting:
  alertmanagers:
    - static_configs:
        - targets: ["alertmanager:9093"]

scrape_configs:
  - job_name: node
    static_configs:
      - targets: ["host.docker.internal:9100"]

  - job_name: geth
    metrics_path: /debug/metrics/prometheus
    static_configs:
      - targets: ["host.docker.internal:6060"]

  - job_name: prysm
    static_configs:
      - targets: ["host.docker.internal:8080"]   # beacon chain

  - job_name: bitcoind
    # bitcoind는 기본 Prometheus endpoint가 없으므로 bitcoin-exporter 사용
    # https://github.com/jvstein/bitcoin-prometheus-exporter
    static_configs:
      - targets: ["host.docker.internal:9332"]

  - job_name: agave
    static_configs:
      - targets: ["host.docker.internal:9100"]   # solana validator metrics

  - job_name: polkadot
    static_configs:
      - targets: ["host.docker.internal:9615"]

  - job_name: nitro   # Arbitrum
    static_configs:
      - targets: ["host.docker.internal:9642"]

핵심 경보 규칙 (rules.yml)

groups:
  - name: node-health
    interval: 1m
    rules:
      - alert: DiskAlmostFull
        expr: (node_filesystem_avail_bytes / node_filesystem_size_bytes) < 0.10
        for: 10m
        labels: { severity: critical }
        annotations:
          summary: "디스크 여유 공간 10% 미만 ({{ $labels.mountpoint }})"

      - alert: HighCPU
        expr: 100 * (1 - avg by (instance)(rate(node_cpu_seconds_total{mode="idle"}[5m]))) > 90
        for: 15m
        labels: { severity: warning }

  - name: blockchain
    interval: 30s
    rules:
      - alert: GethStalled
        expr: increase(chain_head_block[5m]) == 0
        for: 5m
        labels: { severity: critical }
        annotations:
          summary: "Geth 블록 헤드가 5분 동안 진행되지 않음"

      - alert: SolanaBehind
        expr: solana_confirmed_slot - solana_processed_slot > 200
        for: 3m
        labels: { severity: warning }

      - alert: LowPeers
        expr: p2p_peers < 5
        for: 10m
        labels: { severity: warning }

Grafana 대시보드 추천

각 ID는 grafana.com/grafana/dashboards/<id> 에서 JSON을 바로 import 할 수 있습니다. 사설 네트워크에서 운영한다면 JSON을 저장소에 캡처해 두고 CI에서 grafana-cli dashboards import 로 주기 배포하는 것이 안정적입니다.

체인	Dashboard ID	바로가기	커버 범위
Node Exporter	1860	JSON	시스템 CPU / 메모리 / 디스크 기본
Geth / Ethereum	14053	JSON	chain head, peers, tx pool
Prysm Beacon	13457	JSON	slot, head, validator attestations
Bitcoin (jvstein exporter)	11125	JSON	chain height, mempool, connections
Solana Validator	13403	JSON	slot, skip rate, vote credits
Polkadot	13840	JSON	best block, finalized, peers
Nitro (Arbitrum)	17867	JSON	L2 block, L1 batch post delay
Cosmos SDK / Tendermint	11036	JSON	block height, validator jailing

스크린샷 포함하기

자체 저장소에서 운영 스크린샷(예: 리전별 블록 latency)을 공유하려면:

Grafana UI → Share panel → Direct link rendered image 활성화 후 PNG 저장
common/dashboards/ 디렉토리에 커밋 (EXIF 메타데이터 제거, 2 MB 이내 권장)
스크린샷을 스크립트로 주기 갱신하려면 Grafana Image Renderer 플러그인 + GitHub Actions cron 조합이 깔끔

체인별 가이드의 updates/ 섹션에서도 스크린샷을 inline 링크로 참조할 수 있습니다.

최소 3종 경보 체크리스트

블록 진행 정체: 체인별 chain_head_block / slot / ledger_seq 증가 정체 감지
피어 수 하락: p2p_peers < threshold
디스크 압박: < 10% 여유 → critical, < 20% → warning

클라이언트별 메트릭 Endpoint 요약

자동 생성: npm run build:metrics — 수정 시 common/metrics-ports.json 을 편집하세요.

Chain	Component	Port	Path	활성화 플래그 / 설정
`arbitrum`	Nitro EL	6070	`/debug/metrics/prometheus`	--metrics --metrics.port=6070
`arbitrum`	Nitro 전역	9642	`/`	--metrics-server.addr=0.0.0.0
`base`	op-node	7300	`/metrics`	--metrics.enabled --metrics.addr=0.0.0.0
`bitcoin`	bitcoin-prometheus-exporter	9332	`/metrics`	별도 exporter 컨테이너
`celestia`	celestia-app (CometBFT)	26660	`/metrics`	config.toml: prometheus=true
`cosmos`	gaiad (CometBFT)	26660	`/metrics`	config.toml: prometheus=true
`cronos`	cronosd (CometBFT)	26660	`/metrics`	config.toml: prometheus=true
`ethereum`	Geth	6060	`/debug/metrics/prometheus`	--metrics --metrics.addr=0.0.0.0 --metrics.port=6060
`ethereum`	Prysm beacon	8080	`/metrics`	--monitoring-host 0.0.0.0 --monitoring-port 8080
`injective`	injectived (CometBFT)	26660	`/metrics`	config.toml: prometheus=true
`kava`	kava (CometBFT)	26660	`/metrics`	config.toml: prometheus=true
`optimism`	op-node	7300	`/metrics`	--metrics.enabled --metrics.addr=0.0.0.0
`polkadot`	polkadot	9615	`/metrics`	--prometheus-external
`sei`	seid (CometBFT)	26660	`/metrics`	config.toml: prometheus=true
`solana`	Agave validator	9100	`/metrics`	--expose-metrics + --metrics-config
`worldchain`	op-node	7300	`/metrics`	--metrics.enabled --metrics.addr=0.0.0.0

운영 팁

메트릭 포트는 내부 네트워크에서만 scrape 가능하도록 바인딩 제한.
Grafana admin 패스워드는 매니지드 시크릿으로 관리, 초기 admin/admin 로그인 후 즉시 변경.
Alertmanager 경보 채널은 최소 2개(Slack + SMS/Pagerduty) 이중화 권장.