Prometheus & Grafana ๋ฅผ ํ†ตํ•ด ๋ชจ๋‹ˆํ„ฐ๋ง์„ ํ•ด๋ณด์ž (Spring boot)

Prometheus

ํ”„๋กœ๋ฉ”ํ…Œ์šฐ์Šค๋Š” ๋Œ€์ƒ ์‹œ์Šคํ…œ์œผ๋กœ๋ถ€ํ„ฐ ๊ฐ์ข… ๋ชจ๋‹ˆํ„ฐ๋ง ์ง€ํ‘œ๋ฅผ ์ˆ˜์ง‘ํ•˜์—ฌ ์ €์žฅํ•˜๊ณ  ๊ฒ€์ƒ‰ํ•  ์ˆ˜ ์žˆ๋Š” ์‹œ์Šคํ…œ์ด๋‹ค.

 

 

ํ”„๋กœ๋ฉ”ํ…Œ์šฐ์Šค ํŠน์ง•

  • ๋ฉ”ํŠธ๋ฆญ ๊ธฐ๋ฐ˜์˜ ์˜คํ”ˆ์†Œ์Šค ๋ชจ๋‹ˆํ„ฐ๋ง ์‹œ์Šคํ…œ์ด๋‹ค.
  • ์ด๋ฒคํŠธ ๋ชจ๋‹ˆํ„ฐ๋ง ๋ฐ ๊ฒฝ๊ณ ์— ์‚ฌ์šฉ๋˜๋Š” ๋ฌด๋ฃŒ ์†Œํ”„ํŠธ์›จ์–ด ์‘์šฉ ํ”„๋กœ๊ทธ๋žจ์ด๋‹ค.
  • ์œ ์—ฐํ•œ ์ฟผ๋ฆฌ(PromQL) ๋ฐ ์‹ค์‹œ๊ฐ„ ๊ฒฝ๊ณ ๊ฐ€ ๊ฐ€๋Šฅํ•˜๋‹ค.
  • ๊ตฌ์กฐ๊ฐ€ ๊ฐ„๋‹จํ•ด์„œ ์šด์˜์ด ์‰ฝ๊ณ , ๊ฐ•๋ ฅํ•œ ์ฟผ๋ฆฌ ๊ธฐ๋Šฅ์„ ๊ฐ€์ง€๊ณ  ์žˆ์œผ๋ฉฐ, ๊ทธ๋ผํŒŒ๋‚˜(Grafana)๋ฅผ ํ†ตํ•œ ์‹œ๊ฐํ™”๋ฅผ ์ง€์›ํ•œ๋‹ค.
  •  ELK์™€ ๊ฐ™์€ ๋กœ๊น…์ด ์•„๋‹ˆ๋ผ, ๋Œ€์ƒ ์‹œ์Šคํ…œ์œผ๋กœ๋ถ€ํ„ฐ ๊ฐ์ข… ๋ชจ๋‹ˆํ„ฐ๋ง ์ง€ํ‘œ๋ฅผ ์ˆ˜์ง‘ํ•˜์—ฌ ์ €์žฅํ•˜๊ณ  ๊ฒ€์ƒ‰ํ•  ์ˆ˜ ์žˆ๋Š” ์‹œ์Šคํ…œ์ด๋‹ค.
  • ํ”„๋กœ๋ฉ”ํ…Œ์šฐ์Šค๊ฐ€ ์ฃผ๊ธฐ์ ์œผ๋กœ exporter(๋ชจ๋‹ˆํ„ฐ๋ง ๋Œ€์ƒ ์‹œ์Šคํ…œ)๋กœ๋ถ€ํ„ฐ pulling ๋ฐฉ์‹์œผ๋กœ ๋ฉ”ํŠธ๋ฆญ์„ ์ฝ์–ด์„œ ์ˆ˜์ง‘ํ•œ๋‹ค.

 

๋ฉ”ํŠธ๋ฆญ์ด๋ž€?

์ˆ˜์ง‘ํ•˜๋Š” ์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ๋ฅผ ๋งํ•œ๋‹ค.
ํ”„๋กœ๋ฉ”ํ…Œ์šฐ์Šค์˜ ๋ฉ”ํŠธ๋ฆญ์€ "๋ฉ”ํŠธ๋ฆญ๋ช…{ํ•„๋“œ1=๊ฐ’, ํ•„๋“œ2=๊ฐ’} ์ƒ˜ํ”Œ๋ง๋ฐ์ดํ„ฐ" ์™€ ๊ฐ™์ด ์ˆ˜์ง‘๋œ๋‹ค.
์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค๋ž€?

์‹œ๊ณ„์—ด DB๋Š” ์‹œ๊ฐ„์„ ์ถ•(ํ‚ค)์œผ๋กœ ์‹œ๊ฐ„์˜ ํ๋ฆ„์— ๋”ฐ๋ผ ๋ฐœ์ƒํ•˜๋Š” ๋ฐ์ดํ„ฐ๋ฅผ ์ €์žฅํ•˜๋Š” ๋ฐ ์ตœ์ ํ™”๋œ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค์ด๋‹ค.

 

๊ธฐ๋Šฅ ๋ฐ ๊ตฌ์„ฑ

  • ๋ฉ”ํŠธ๋ฆญ ์ˆ˜์ง‘, ์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ ์ €์žฅ
  • ์œ ์—ฐํ•œ ์ฟผ๋ฆฌ ์–ธ์–ด์ธ PromQL์„ ํ†ตํ•œ ์„ฑ๋Šฅ ๋ถ„์„
  • ๊ทธ๋ผํŒŒ๋‚˜๋ฅผ ํ†ตํ•œ ๋ฐ์ดํ„ฐ ์‹œ๊ฐํ™”
  • alertmanager๋ฅผ ํ†ตํ•œ ์•Œ๋ฆผ ์ƒ์„ฑ 

 

 

Grafana

  • ํ”„๋กœ๋ฉ”ํ…Œ์šฐ์Šค๋ฅผ ๋น„๋กฏํ•œ ์—ฌ๋Ÿฌ ๋ฐ์ดํ„ฐ๋“ค์„ ์‹œ๊ฐํ™”ํ•ด์ฃผ๋Š” ๋ชจ๋‹ˆํ„ฐ๋ง ํˆด
  • ์‹œ์Šคํ…œ ๊ด€์ (CPU, ๋ฉ”๋ชจ๋ฆฌ, ๋””์Šคํฌ)์˜ ๋ฉ”ํŠธ๋ฆญ ์ง€ํ‘œ๋ฅผ ์‹œ๊ฐํ™”ํ•˜๋Š”๋ฐ ํŠนํ™”
  • ์•Œ๋žŒ๊ธฐ๋Šฅ์„ ๋ฌด๋ฃŒ๋กœ ์‚ฌ์šฉ ๊ฐ€๋Šฅ

 

 

Architecture

 

 

 

 

์ด์ œ Spring boot ํ”„๋กœ์ ํŠธ๋ฅผ ๋ชจ๋‹ˆํ„ฐ๋ง ํ•ด๋ณด์ž

 

Spring Boot Actuator๋Š” ์Šคํ”„๋ง ๋ถ€ํŠธ ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜์„ ๋ชจ๋‹ˆํ„ฐ๋งํ•˜๊ณ  ๊ด€๋ฆฌํ•˜๊ธฐ ์œ„ํ•œ ๊ธฐ๋Šฅ์„ ์ œ๊ณตํ•˜๋Š” ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์ด๋‹ค. Actuator๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜์˜ ๋‹ค์–‘ํ•œ ์ƒํƒœ์™€ ๋ฉ”ํŠธ๋ฆญ์— ๋Œ€ํ•œ ์ •๋ณด๋ฅผ ๋…ธ์ถœํ•  ์ˆ˜ ์žˆ๋‹ค. ์ฆ‰ expoter ์—ญํ• ์„ ํ•˜๋Š” ๊ฒƒ์ด๋‹ค.

acutator๊ฐ€ ์•„๋‹ˆ๋ผ๋ฉด ํ†ต์ƒ ๋งŽ์ด ์‚ฌ์šฉ๋˜๋Š” expoter๋Š” 'Node Expoter', 'CAdvisors' ๋ฅผ ์‚ฌ์šฉํ•ด์„œ ๋ชจ๋‹ˆํ„ฐ๋ง ํ•  ์ˆ˜ ์žˆ๋‹ค. Prometheus์™€ Grafana์™€ ๊ฐ™์€ ๋ชจ๋‹ˆํ„ฐ๋ง ๋„๊ตฌ๋Š” ์ด๋Ÿฌํ•œ Actuator ์—”๋“œํฌ์ธํŠธ๋ฅผ ํ†ตํ•ด ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜์˜ ์ƒํƒœ ๋ฐ ์„ฑ๋Šฅ ๋ฐ์ดํ„ฐ๋ฅผ ์ˆ˜์ง‘ํ•  ์ˆ˜ ์žˆ๋‹ค.

 

 

์šฐ์„  build.gradle์— actuaor์™€ prometheus๋ฅผ ์ถ”๊ฐ€ํ•ด์ฃผ์ž

	// actuator
	implementation 'org.springframework.boot:spring-boot-starter-actuator'

	// prometheus
	implementation 'io.micrometer:micrometer-registry-prometheus'

 

 

 

๊ทธ๋Ÿฌ๊ณ  application.yml์— ์—”๋“œํฌ์ธํŠธ๋ฅผ ์„ค์ •ํ•ด์ฃผ๋ฉด ๋œ๋‹ค.

management:
  endpoints:
    web:
      base-path: /jikgong
      exposure:
        include: prometheus

 

 

 

[์ฃผ์†Œ]:8080/jikgong/prometheus ์œผ๋กœ ์ ‘์†ํ•˜๋ฉด ์•„๋ž˜์™€ ๊ฐ™์€ ํ™”๋ฉด์„ ๋งŒ๋‚˜๋ณผ ์ˆ˜ ์žˆ์„ ๊ฒƒ์ด๋‹ค.

base-path๋ฅผ ๋”ฐ๋กœ ์ง€์ •ํ•˜์ง€ ์•Š์œผ๋ฉด 800:/acutaor/prometheus ์ผ ๊ฒƒ์ด๋‹ค. 

 

 

 

์ด์ œ Prometheus์™€ Grafana์„ ๋„์šธ docker-composeํŒŒ์ผ์„ ์ž‘์„ฑํ•ด์ค€๋‹ค.

์ด๋•Œ prometheus์— volume์„ ์žก์•„ ์™ธ๋ถ€์—์„œ prometheus์„ค์ •์„ ํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•ด์ฃผ์—ˆ๋‹ค. 

version: "3.8"

services:

  prometheus:
      image: prom/prometheus
      ports:
        - "9090:9090"
      volumes:
        - /home/kevin/prometheus/conf/prometheus.yml:/etc/prometheus/prometheus.yml
      command:
        - '--config.file=/etc/prometheus/prometheus.yml'
      restart: always

  grafana:
    image: "grafana/grafana"
    ports:
      - "3000:3000"
    volumes:
      - /home/kevin/grafana/conf_grafana:/config_files
    restart: always
    depends_on:
      - prometheus
    privileged: true

 

 

 

 

volume์œผ๋กœ ์žก์•˜๋˜ prometheus.yml์„ ์ž‘์„ฑํ•ด์ฃผ์ž.

# my global config
global:
  scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
  alertmanagers:
    - static_configs:
        - targets:
          # - alertmanager:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: "prometheus"

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
      - targets: ["localhost:9090"]


  - job_name: "java_application"
    metrics_path: '/jikgong/prometheus'
    scrape_interval: 5s
    static_configs:
      - targets: ["์„œ๋ฒ„IP:8080"]

 

 

 

 

์œ„์—์„œ ์ž‘์„ฑํ–ˆ๋˜ prometheus, grafana docker compose ํŒŒ์ผ์„ ์‹คํ–‰์‹œ์ผœ์ฃผ๊ณ 

9090ํฌํŠธ๋กœ ์ ‘์†ํ•ด์ค€๋‹ค.

๊ทธ๋ฆฌ๊ณ  status -> targets ๋ฉ”๋‰ด๋กœ ๋“ค์–ด๊ฐ€์„œ ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜์ด ์ •์ƒ์ ์œผ๋กœ ๋กœ๋“œ ๋๋Š”์ง€ ํ™•์ธํ•ด์ฃผ๋ฉด ๋œ๋‹ค.

 

์—ฌ๊ธฐ ์ œ๋Œ€๋กœ ๋œจ์ง€ ์•Š๋Š”๋‹ค๋ฉด prometheus.yml์˜ targets ์ฃผ์†Œ๋ฅผ ์ž˜ ๋ชป ์ ์—ˆ์„ ๊ฐ€๋Šฅ์„ฑ์ด ์žˆ๋‹ค. 

๋‚˜๊ฐ™์€ ๊ฒฝ์šฐ์—” spring boot๋ฅผ ๋„์šด ๋„คํŠธ์›Œํฌ์™€ prometheus, grafana๋ฅผ ๋„์šด ๋„คํŠธ์›Œํฌ๊ฐ€ ๋‹ฌ๋ผ host.docker.internal:8080 ์œผ๋กœ ์„ค์ •ํ–ˆ์„ ๋• ์ ‘์†์ด ๋˜์ง€ ์•Š์•˜์—ˆ๋‹ค. 

 

 

 

๊ทธ๋ผํŒŒ๋‚˜๋Š” 3000๋ฒˆ ํฌํŠธ๋กœ ์žก์•˜๊ธฐ ๋•Œ๋ฌธ์—

[ip์ฃผ์†Œ]:3000 ์œผ๋กœ ์ ‘์†ํ•ด์ค€๋‹ค. ์•„๋ž˜์™€ ๊ฐ™์€ ํ™”๋ฉด์ด ๋œฐํ…๋ฐ ๋”ฐ๋กœ ์„ค์ •ํ•˜์ง€ ์•Š์•˜์œผ๋ฉด admin/admin์ด๋‹ค. 

๋”ฐ๋กœ ์„ค์ •ํ•˜๊ณ  ์‹ถ๋‹ค๋ฉด docker-compose ํŒŒ์ผ์—์„œ ๋”ฐ๋กœ ์„ค์ •ํ•  ์ˆ˜ ์žˆ๋‹ค.

 

 

 

datasource๋ฅผ ์ถ”๊ฐ€ํ•ด์ค€๋‹ค.

 

 

 

Prometheus๋ฅผ ํด๋ฆญํ•˜๊ณ  connetion์— ์ฃผ์†Œ๋ฅผ ์ž…๋ ฅํ•ด์ค€๋‹ค.

์ œ๋Œ€๋กœ ์ฃผ์†Œ๋ฅผ ์ž…๋ ฅํ–ˆ๋‹ค๋ฉด ์•„๋ž˜ ๊ทธ๋ฆผ์ฒ˜๋Ÿผ ๋œฐ ๊ฒƒ์ด๋‹ค.

 

 

 

๋งŒ๋“ค์–ด์ง„ datasource์— dashboard๋ฅผ buildํ•ด์ค€๋‹ค. 

๋ณธ์ธ์ด ์ง์ ‘ ๋Œ€์‰ฌ๋ณด๋“œ๋ฅผ ๊ตฌ์„ฑํ•ด๋„ ๋˜์ง€๋งŒ ์ด๋ฏธ ์ž˜ ๋งŒ๋“ค์–ด์ง„ ๋Œ€์‹œ๋ณด๋“œ๋“ค์ด ์žˆ๊ธฐ ๋•Œ๋ฌธ์— ์ทจํ–ฅ๊ฒƒ ์„ ํƒํ•˜๋ฉด ๋œ๋‹ค.

๋‚˜๋Š” ์ด๋ฏธ ๋งŒ๋“ค์–ด์ง„ ๊ฑธ ์‚ฌ์šฉํ•˜๊ธฐ ์œ„ํ•ด import๋ฅผ ํ•ด์ฃผ์—ˆ๋‹ค.

 

 

 

์ด์ œ ๊ทธ๋ผํŒŒ๋‚˜๋ฅผ ํ†ตํ•ด Spring boot์˜ ๋ฉ”ํŠธ๋ฆญ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๊ฒŒ๋๋‹ค.

์„ธํŒ…์€ ์–ด๋ ต์ง€์•Š์œผ๋‚˜, ์ด ์ง€ํ‘œ๋ฅผ ๊ฐ€์ง€๊ณ  ์–ผ๋งˆ๋‚˜ ์ž˜ ์šด์˜์„ ํ•˜๋ƒ๊ฐ€ ์ค‘์š”ํ•œ ๊ฒƒ ๊ฐ™๋‹ค.

 

์ถ”ํ›„์— 5xx์—๋Ÿฌ๊ฐ€ ๋ฐœ์ƒ, ๋ฉ”๋ชจ๋ฆฌ ์ผ์ •๋Ÿ‰ ์ด์ƒ ์‚ฌ์šฉ, CPU ์ผ์ •๋Ÿ‰ ์ด์ƒ ์‚ฌ์šฉ ๋“ฑ ์กฐ๊ฑด์„ ๊ฑธ์–ด slack์œผ๋กœ ์•Œ๋žŒ์„ ๋ฐ›์•„ ๋ณผ ์ˆ˜ ์žˆ๊ฒŒ๋” ๊ตฌ์ถ•ํ•  ์˜ˆ์ •์ด๋‹ค.

 

 

๊ทธ๋ผํŒŒ๋‚˜์˜ alerting์„ ์‚ฌ์šฉํ•˜์—ฌ slack์œผ๋กœ ์•Œ๋žŒ ๋ฐ›๊ธฐ๋ฅผ ๋„์ „.. ํ–ˆ์ง€๋งŒ ์ •๋ง ์–ด๋ ค์› ๋‹ค. ์œ ํŠœ๋ธŒ์— ๋‹น๊ทผ๋งˆ์ผ“ ๊ฐœ๋ฐœ์ž๋‹˜์ด ์„ค๋ช…ํ•ด์ฃผ์‹œ๋Š” ์˜์ƒ์„ ์ฐธ๊ณ ํ•˜์—ฌ ๊ตฌํ˜„ํ•ด๋ณด๋ ค ํ–ˆ์ง€๋งŒ, ๋ฒ„์ „๋„ ๋‹ค๋ฅด๊ณ  ๋งˆ์Œ๊ฐ™์ด ์ž˜ ๋˜์ง€ ์•Š์•˜๋‹ค.

 

๊ทธ๋ž˜์„œ ์šฐ์„ ์€ AOP๋ฅผ ์ ์šฉํ•ด ์‘๋‹ต๊นŒ์ง€ x์ดˆ ์ด์ƒ ๋„˜์–ด๊ฐ„ api ์š”์ฒญ์— ๋Œ€ํ•ด์„œ slack์œผ๋กœ ์•Œ๋ฆผ์„ ๋ณด๋‚ด๋Š” ๊ฒƒ์œผ๋กœ ์ž„์‹œ ๋Œ€์ฒด ํ•˜์˜€๋‹ค. 

 

์•„๋ž˜๋Š” ํ•ด๋‹น ์ฝ”๋“œ์ด๋‹ค. 

@Around("pointcut()") // ์–ด๋“œ๋ฐ”์ด์Šค + ํฌ์ธํŠธ์ปท ์„ค์ •
    public Object advice(ProceedingJoinPoint proceedingJoinPoint) throws Throwable {
        String methodName = proceedingJoinPoint.getSignature().getName();

        long start = System.currentTimeMillis();
        Object result = proceedingJoinPoint.proceed();// ํƒ€๊นƒ ๋ฉ”์†Œ๋“œ ํ˜ธ์ถœ

        long end = System.currentTimeMillis();
        long runningTime = end - start;

        if (runningTime <= 1000) {
            log.info("[์ •์ƒ ์‹คํ–‰] method = {}, ์‹คํ–‰์‹œ๊ฐ„ = {} ms", methodName, runningTime);
        } else {
            String message = "[!๊ฒฝ๊ณ ] [๊ธฐ์ค€ ์‹คํ–‰ ์‹œ๊ฐ„์„ ์ดˆ๊ณผํ•˜์˜€์Šต๋‹ˆ๋‹ค] method = " + methodName + ", ์‹คํ–‰์‹œ๊ฐ„ = " + runningTime + "ms";
            log.error(message);
            // Slack ๋ฉ”์‹œ์ง€ ์ „์†ก
            HashMap<String, String> data = new HashMap<>();
            data.put("๊ฒฝ๊ณ  ๋‚ด์šฉ", message);
            slackService.sendMessage("๊ฒฝ๊ณ : ์‹คํ–‰ ์‹œ๊ฐ„ ์ดˆ๊ณผ", data);
        }
        return result;
    }