프로메테우스 Client Library : Instrumentation for Python Application

프로메테우스 가이드북
이민석's avatar
Jun 09, 2024
프로메테우스 Client Library : Instrumentation for Python Application

프로메테우스 가이드북은 A to Z Metnros (Udemy) — Prometheus | The Complete Hands-On for Monitoring & Alerting를 듣고 작성한 가이드북입니다.

가이드북의 전체 목차 및 인덱싱은 프로메테우스 가이드북 — 소개 페이지를 참고해주세요.

개요

개별 Application에 대한 세부 Metrics 수집을 하기 위해서는 Application level의 수정이 필요합니다. 이와 관련된 개념을 Client Libraries라고 부릅니다.

다양한 언어에 대한 공식/비공식 Promethes Client Library가 지원되고 있습니다.

  • Using client libraries, with usually adding tow or three lines of code, you add your desired instrumentations to your code and define custom metric to be exposed.

  • Three are a number of client libraries available for all the major language and runtimes.

  • Prometheus project officially provide client libraries in Go, Java or Scalar, Python and Ruby.

  • Unofficial third-party client libraries: Bash, C, C++, PHP and more

  • Client libraries take care of all the bookkepping and producing the Prometheus format metrics.

Metric Type

  1. 카운터(Counter Type)

  2. 게이지(Gauge Type)

  3. 요약(Summary Type)

카운터(Counter Type)

  • A counter is a cumulative metric that represents a single monotonically increasing counter whose value can only increase or it can be reset to zero on restart.

  • Counters are mainly used to track how often a particular code path is executed.

    • e.g. use counters to represent the number of requests served, tasks completed, or errors

  • Counters have one main method: inc() that increase the counter value by one.

  • Do ont use the counters to expose a value that can decrease

    • e.g. Temperature, the number of currently running process, etc

게이지(Gauge Type)

  • A gauge is a metric that represents a single numerical value that can arbitrarily go up and down.

  • Gauge represent a snapshot of some current state.

    • e.g. used for measured values like temperature, current memory usage or anything whose value can go both up and down.

  • Gauges have thress main methods: inc(), dec(), set() that increase, decreases value by one and set the gauge to an arbitrary value respectively.

요약(Summary Type)

  • A summary samples observations like request durations - how long your application took the respond to a request, latency and request sizes.

  • Summary track the size and number of events

  • Summary has one primary method observe() to which we pass the size of the event

  • Summary exposes multiple time series during a scrape:

    • The total sum (<base_name>_num) of all observed values

    • The count (<base_name>_count) of events that has been observed.

  • Summary metrics may also include quantiles over a sliding time window.

히스토그램(Histogram Type)

  • A histogram samples observations (usually things like request durations or response sizes) and counts them in configurable buckets.

  • The instrumentation for histograms is the same as for Summary.

  • Histogram exposes multiple time series during a scrape:

    • The total sum (<base_name>_sum) of all observed values

    • The count (<base_name>_count) of events that have been observed.

  • The main purpose of using Histogram is calculating quantiles.

Metric Naming Convention

  • Metric names should start with a letter, and can be followed with any number of letters, numbers, and underscores.

  • Metrics must have unique names, and client libraries would report an error if you try to register the same metric twice for your application.

  • If applicable, when exposing the time series for Counter type metric, a ‘_total’ suffix is added automatically in the exposed metric.

  • Should represent the same logical thing-being-measured across all label dimensions.

예제

Python Application

아래와 같은 Python Application이 있습니다.

import http.server

APP_PORT = 8000

class HandleRequests(http.server.BaseHTTPRequestHandler):
   def do_GET(self):
      self.send_response(200)
      self.send_handler("Content-Type", "text/html")
      self.end_headers()
      self.wfile.wrtie(bytes("<html><head><title>FirstApplication</title></head><body style=color: #333; margin-top: 30px;'><center><h2>Welcome to our first Prometheus-Python application.</center></body></html>", "utf-8"))
      self.wfile.close()

if __name__ == "__main__":
   server = http.server.HTTPServer(('localhost', APP_PORT)< HandleRequests)
   server.serve_forever()

Prometheus Client

해당 Python Application에 Prometheus Client를 추가합니다.

import http.server
from prometheus_client import start_http_server

APP_PORT = 8000
METRICS_PORT = 8001

# ...

if __name__ == "__main__":
   start_httP-server(METRIC_PORT)

Prometheus Server

이미 가동 중인 Prometheus Server에 다음과 같이 scrap_configs 에 신규 잡을 추가합니다.

scrap_configs
   # ...
   # ...
   - job_name: "prom_python_app"
     static_confgis:
     - targets: ["localhost:8001"]

Prometheus Client - add new COUNTER

이제 수집 준비가 완료되었다면, Prometheus Client에 COUNTER를 추가해볼 것입니다.

import http.server
from prometheus_client import start_http_server, Counter

APP_PORT = 8000
METRICS_PORT = 8001

REQUEST_COUNTER = Counter('app_requests_count',
                          'total all http request count')

# ...

또한 COUNTER 값을 증가시킬 메서드를 호출합시다.

# ...

REQUEST_COUNTER = Counter('app_requests_count',
                          'total all http reqeust count')

class HandleRequests(http.server.BaseHTTPRequestHandler):
   def do_GET(self):
      REQUEST_COUNTER.inc()

      # ...
# ...

Prometheus Client - add new COUNTER with LABELS

다양한 경로(path)에 대해서 COUNTER 매트릭을 제대로 보기 위해서는 라벨(LABEL)을 활용할 수 있습니다.

# ...

REQUEST_COUNTER = Counter('app_requests_count',
                          'total all http reqeust count',
                          ["app_name", "endpoint"])

class HandleRequests(http.server.BaseHTTPRequestHandler):
   def do_GET(self):
      REQUEST_COUNTER.labels("prom_python_app", self.path).inc()

      # ...
# ...

Prometheus Client - add new GAUGE

다음과 같이 Gauge Metric을 수집하도록 구성된 Python Application이 있습니다.

아래에는 REQUEST_INPROGRESS, REQUEST_LAST_SERVED라는 2개의 Gauge Libraries가 설정되어 있습니다.

import http.server
import random
import time
from proemtheus_client import start_http_server, Gauge

REQUEST_INPROGRESS = Gauge('app_requests_inprogress',
                           'number of application requests in progress') # ⛳️
REQUEST_LAST_SERVED = Gauge('app_last_served',
                            'Time the application was last served') # ⛳️

APP_PORT = 8000
METRIC_PORT = 8001

class HandleRequests(http.server.BaseHTTPRequestHandler):

   def do_GET(self):
      REQUEST_INPROGRESS.inc() # ⛳️
      time.sleep(5)            # ⛳️

      self.send_response(200)
      self.send_handler("Content-Type", "text/html")
      self.end_headers()
      self.wfile.wrtie(bytes("<html><head><title>FirstApplication</title></head><body style=color: #333; margin-top: 30px;'><center><h2>Welcome to our first Prometheus-Python application.</center></body></html>", "utf-8"))
      self.wfile.close()

      REQUEST_LAST_SERVED.set(time.time())
      REQUEST_INPROGRESS.dec() # ⛳️

if __name__ == "__main__":
   start_http_server(METRICS_PORT) # ⛳️

   server = http.server.HTTPServer(('localhost', APP_PORT)< HandleRequests)
   server.serve_forever()

물론 위에서 작성한 부분들은 @REQUEST_INPGORESS.track_inprogress()REQUEST_LAST_SERVED.set_to_current_time() 를 사용하여 간략하게 작성할 수 있습니다.

# ...


class HandleRequests(http.server.BaseHTTPRequestHandler):

   @REQUEST_INPROGRESS.track_inprogress()
   def do_GET(self):
      REQUEST_LAST_SERVED.set_to_current_time()

Prometheus Client - add new SUMMARY

다음과 같이 Summary Metric을 수집하도록 구성된 Python Application이 있습니다.

아래에는 REQUEST_RESPOND_TIME이라는 1개의 Summary Libraries가 설정되어 있습니다.

import http.server
import time
from prometheus_client import start_http_server, Summary

REQUEST_RESPOND_TIME = Summary('app_response_latency_seconds',
                               'Response latency in seconds')

APP_PORT = 8000
METRICS_PORT = 8001

class HandleRequests(http.server.BaseHTTPRequestHandler):
   def do_GET(self):
      start_time = time.time()                # ⛳️
      time.sleep()                            # ⛳️
  
      self.send_response(200)
      self.send_handler("Content-Type", "text/html")
      self.end_headers()
      self.wfile.wrtie(bytes("<html><head><title>FirstApplication</title></head><body style=color: #333; margin-top: 30px;'><center><h2>Welcome to our first Prometheus-Python application.</center></body></html>", "utf-8"))
      self.wfile.close()

      end_time = time.time()                   # ⛳️
      taken_time = end_time - start_time.      # ⛳️
      REQUEST_RESPOND_TIME.observe(taken_time) # ⛳️

if __name__ == "__main__":
   start_http_server(METRICS_PORT)

   server = http.server.HTTPServer(('localhost', APP_PORT)< HandleRequests)
   server.serve_forever()

이 메서드를 사용하면 총 2개의 매트릭이 수집되는 것을 알 수 있습니다.

  • app_response_latency_seconds_count

  • app_response_latency_seconds_sum

이 매트릭에 대하여 다음과 같은 PromQL 쿼리문을 실행할 수 있습니다.

rate(app_response_latency_seconds_sum[5m]) \
   / rate(app_response_latecny_seconds_count[5m])

하지만 역시 위에서 적힌 부분을 @REQUEST_RESPOND_TIME.time()을 이용해서 간단하게 작성할 수 있습니다.

# ...

class HandleRequests(http.server.BaseHTTPRequestHandler):

   @REQEUST_RESPOND_TIME.time() # ⛳️
   def do_GET(self):
      # ...

# ...

Prometheus Client - add new HISTOGRAM

다음과 같이 Histogram Metric을 수집하도록 구성된 Python Application이 있습니다.

아래에는 REQUEST_RESPOND_TIME이라는 1개의 Histogram Libraries가 설정되어 있습니다.

import http.server
import time
from prometheus_client import start_http_server, Histogram

REQUEST_RESPOND_TIME = Histogram('app_response_latency_seconds',
                                 'Response latency in seconds')  # ⛳️

APP_PORT = 8000
METRICS_PORT = 8001

class HandleRequests(http.server.BaseHTTPRequestHandler):

   #REQUEST_RESPOND_TIME.time() # ⛳️
   def do_GET(self):
      # ...

# ...

수집하는 데이터는 SUMMARY 데이터와 동일하게 시간(Duration)에 대한 값입니다.

HISTOGRAM 데이터는 시간(Duration)을 각 구간별로 나누어서 아래와 같이 계단식 범위를 가지는 매트릭이 수집됩니다.

이때 수집되는 히스토그램 구간을 버킷(bucket)이라고 부르며, 기본값은 다음과 같습니다.

app_response_latency_seconds_bucket{le="0.005"}
app_response_latency_seconds_bucket{le="0.01"}
app_response_latency_seconds_bucket{le="0.025"}
app_response_latency_seconds_bucket{le="0.05"}
app_response_latency_seconds_bucket{le="0.075"}
app_response_latency_seconds_bucket{le="1.0"}
app_response_latency_seconds_bucket{le="2.5"}
app_response_latency_seconds_bucket{le="5.0"}
app_response_latency_seconds_bucket{le="7.5"}
app_response_latency_seconds_bucket{le="10.0"}
app_response_latency_seconds_bucket{le="+Inf"}
app_response_latency_seconds_count 1.0
app_response_latency_seconds_sum 5.007552497001598

하지만 새로운 사용자 지정 버킷(Custom Bucket)을 설정하고 싶다면 아래와 같이 할 수 있습니다. 이 버킷은 다양하게 구성될수록 유용하지만, 동시에 많은 리소스가 소모됩니다.

따라서 제공하고자 하는 앱과 기능의 유형에 따라서 적절한 버킷의 크기를 정해야 합니다.

import http.server
import time
from prometheus_client import start_http_server, Histogram

REQUEST_RESPOND_TIME = Histogram('app_response_latency_seconds',
                                 'Response latency in seconds',
                                 buckets=[0.1, 0.5, 1,2,3,4,5,10])  # ⛳️

APP_PORT = 8000
METRICS_PORT = 8001

class HandleRequests(http.server.BaseHTTPRequestHandler):

   #REQUEST_RESPOND_TIME.time() # ⛳️
   def do_GET(self):
      # ...

# ...

위에서 버킷의 종류를 변경했기 때문에, 수집되는 데이터의 유형도 변경되었습니다.

app_response_latency_seconds_bucket{le="0.1"} 0.0
app_response_latency_seconds_bucket{le="0.5"} 0.0
app_response_latency_seconds_bucket{le="1.0"} 0.0
app_response_latency_seconds_bucket{le="2.0"} 1.0
app_response_latency_seconds_bucket{le="3.0"} 1.0
app_response_latency_seconds_bucket{le="4.0"} 1.0
app_response_latency_seconds_bucket{le="5.0"} 1.0
app_response_latency_seconds_bucket{le="10.0"} 1.0
app_response_latency_seconds_bucket{le="+Inf"} 1.0
app_response_latency_seconds_count 1.0
app_response_latency_seconds_sum 1.007552497001598
Share article

Unchaptered