프로메테우스 소개

프로메테우스 가이드북
이민석's avatar
Jun 02, 2024
프로메테우스 소개

프로메테우스 가이드북A to Z Metnros (Udemy) — Prometheus | The Complete Hands-On for Monitoring & Alerting를 듣고 작성한 가이드북입니다.

가이드북의 전체 목차 및 인덱싱은 프로메테우스 가이드북 — 소개 페이지를 참고해주세요.

개요

  1. 기본 개념(Fundamental Concepts)

  2. 대체 도구(Alternative Tools)

기본 개념(Fundamental Concepts)

프로메테우스(Prometheus)는 활성화된 생태계를 가진 오픈 소스모니터링 및 알람툴킷입니다.

[Concept]

Prometheus is an open-source system monitoring and alerting toolkit with an active ecosystem.

  • It’s is tool that allows you to analyse how your applications and infrastructgure are performing from the metrics discovered by it.

  • Prometheus components are written in Go.

  • Uses a ‘multi-dimensional’ data model with ‘time series data’ identified by ‘metric name’ and ‘key/value pairs’.
    e.g. <Metric_Name> { <Key> = <Value> }
    e.g. http_requests_total {method=”get”}

[Out of the box features]

  • Use a very simple query language as ‘PromQL

    • PromQL is a ‘read-only’ and a flexible query language, that allows aggregation across of the labels stored in its time series.

  • No reliance on distributed stroage; single server nodes are automous.

  • Default libraries and severs available for Prometheus — Windows, Linux, machines, MySQL, etc.

  • To monitor custom services, you can add instrumentation to your code, via Prometheus client libraries like Go, Java or Scala, Python, Ruby and many mores.

  • Full-fledged monitoring system with its own Alertmanager.

[About Prometheus Project]

  • Was started in yaer 2012 at SoundCloud.

  • Later on, it joined the Cloud Native Computing Foundation and in year 2016, it become the seconds hosted project of CloudNatvie after kubernetes.

  • Licensed under Apache 2.0 license.

  • 474 contributors, 7872 commits, 31000+ stars on GitHub.

  • Very active developer and user community around the globe.

모니터링 도구 요구사항

모니터링 시스템을 구축하기 위한 도구 결정에 앞서서, 모니터링 시스템이 어떤 요구사항을 가지고 있는지 생각해볼 필요가 있습니다. 모니터링 시스템은 일반적으로 아래의 4가지를 충족해야 합니다.

  1. 시계열 데이터(Timestamp가 있는 이벤트 및 데이터)를 수집하거나 수신 대기합니다.

  2. 이벤트 및 데이털르 스토리지에 효과적으로 저장합니다.

  3. 모니터링 결과 조회를 위한 쿼리 기능(Query Function)을 지원해야 합니다.

  4. 모니터링 시각화를 위한 그래픽 모니터링(Graphical Monitoring)을 제공해야 합니다.

  • Collect or at least listen for events, typically with a timestamp.

  • Effectively store those events in storage.

  • Should supports querying feature.

  • Provie a graphical monitoring.

대체 도구(Alternative Tools)

위에서 언급한 4가지 요구사항을 충족하는 다양한 툴들은 Prometheus 외에도 존재합니다.

  1. Graphite

  2. InfuxDB

  3. OpenTSDB

  4. Nagios

  5. Sensu

대체 비교 간 비교 : Prometheus vs Graphite

  1. 기본 항목

  2. 데이터 모델 항목

  3. 저장공간 항목

  4. 선택 고려사항

기본 항목

  • Graphite is merely a storage and graphing framework.
    It is a simpler ‘data logging’ and ‘graphing’ tool for time series data.

  • Graphite preciesely dose two things:

    • Store numeric time series data.

    • Render graphs of this data.

  • Graphite dose not have direct data collection support, a separate component called Carbon (Twisted daemon) passively listens for time series data.

  • Prometheus is ‘full-fledged’ comprehansive and service monitoring system.

  • Prometheus is more feature-reich than any other tool.

    • Provide flexible query language.

    • Push gateway for collecting metrics from short lived batch jobs.

    • Wide range of exporters.

    • Third(3rd) party tools.

데이터 모델 항목

  • In Graphite, metric names consist of ‘dot-separated’ components which implicitly encode dimensions.

  • Prometheus encocdes dimensions explicitly as ‘key-value’ pairs, called labels, attached to a metric name.
    This labelling allows easy filtering, grouping, and matching via the query language.

e.g.

  • Graphite

    • stars.api-server.tracks.post.500 →93

  • Prometheus

    • api_server_http_requests_total { method=”POST”,handler=”/tracks”,status=”500”,instance=”<sample1>”} → 34

    • api_server_http_requests_total { method=”POST”,handler=”/tracks”,status=”500”,instance=”<sample1>”} → 28

    • api_server_http_requests_total { method=”POST”,handler=”/tracks”,status=”500”,instance=”<sample1>”} → 31

저장공간 항목

  • Graphite expects samples to arrive at regular interval.
    Evry time series is stored in a separate file with new samples overwrite old ones after a certain amount of time.

  • Prometheus allows stores samples at arbitrary intervals as scapes or rule evaluations occure.
    New samples are simple appended.

선택 고려사항

  1. If you wanted a clustered solution that can hold historical data long term, Graphite may be a better choice.

  2. Graphite may also be preffered if your existing infrastructure already uses collection tools like fluentd, collectd, or statd.

  1. However, if you are starting from scratch and the intend is to implement end to end monitring solution then Promethus may be a better choice

  2. Prometheus is easier than Graphite to run and integrate into your environment owing to the fact that is has all the features packed into it for end to end monitoring.

대체 비교 간 비교 : Prometheus vs InfluxDB

  • InfluxDB — Same differences as in the case of Graphite apply here for InfluxDB.

  • Prometheus — Prometheus AlertManger is consist of Kapacitor and InfluxDB.

기본 용어 정의

  1. Monitoring
    Monitoring is a systematic process of collecting and recording the activities taking place in a target projecdt, programme or service and then using that recorded values to check if the targets are reaching their objectives or not.

  2. Alert/Alerting

    An alert is the outcome of an alerting rule in Prometheus that is actively firing. Alerts are sent from Prometheus to the AlertManger.

  3. AlertManager

    The AlretManager takes in alerts from Prometheus server, aggregates them into groups, de-duplicates, applies sliences, throttles, and then sends out notifications to email, Pagerduty, Slack etc.

  4. Target

    A target is the definition of an object to scrape.
    Target is an object whose metrics are to be monitored.

  5. Instance

    In Prometheus terms, an endpoint you can scrape is called an instance.

  6. Job

    Job is collection of targets/instances with the same purpose.

    For example, an API server job with four replicated instances;
    
    -job: api-server
      -instance 1: 1.2.3.4:5670
      -instance 2: 1.2.3.4:5671
      -instance 3: 1.2.3.4:5672
      -instance 4: 1.2.3.4:5673
  7. Sample

    A sample is a single value at a point in time in a time series.

    For example...
    
    - http_requests_total {method="GET"} -> 43

아키텍처

아키텍처 개요

  1. 프로메테우스 서버

    1. 검색(Retrieval)

    2. 저장(TSDB:Time Series Database/Storage)

    3. API 서버(HTTP 서버)

  2. 프로메테우스 매트릭 가져오기

    1. 프로메테우스 타겟

    2. 프로메테우스 푸쉬 게이트웨이

  3. 서비스 발견

    1. kubernetes

    2. file_sd…

  4. 알람 매니저

    1. Pagerduty

    2. Email

    3. ETC

  5. 데이터 시각화

    1. Prometheus with UI

    2. Grafana

    3. API Clients

아키텍처와 프로메테우스 라이프 사이클

  1. 프로메테우스 서버가 서비스 탐색(Service Discovery) 진행

  2. 대상들에서 2가지 방식에 따라서 데이터 수집(Pull Metrics) 진행

    1. 대상들은 Push Gateway로 데이터를 전송하고 Retrieval이 데이터를 풀링

    2. 대상들에 Retrieval에 데이터를 풀링

  3. 데이터를 TSDB(Time Series Database)에 저장하고 이는 HDD/SSD에 기록

  4. 알람 매니저(Alert Manager)를 통해서 여러 대상들에게 알림 발송

  5. 시각화 대쉬보드는 PromQL을 통해서 데이터를 질의하고 이를 시각화 가능

Share article

Unchaptered