Configure and Start Prometheus

Milvus generates detailed time series metrics. This page shows you how to pull these metrics into Prometheus, and how to connect Grafana and Alertmanager to Prometheus for flexible data visualizations and notifications.

Before you begin

  • Make sure you have already read Monitoring and Alerting and learned about the monitoring and alerting solutions of Milvus.

Install Prometheus

  1. Download the Prometheus tarball for your operating system.
  2. Go to the Prometheus file directory, and make sure Prometheus is installed successfully:

    $ ./prometheus --version
    
    You can add the path to Prometheus to PATH. This makes it easy to start Prometheus from any shell.

Configure and start Prometheus

  1. Start Pushgateway:

    ./pushgateway
    
    You must start Pushgateway before starting the Milvus Server.
  2. Start the Prometheus monitor in server_config.yaml and set the address and port number of Pushgateway:

    metric:
      enable: true       # Set the value to true to enable the Prometheus monitor.
      address: 127.0.0.1 # Set the IP address of Pushgateway.
      port: 9091         # Set the port number of Pushgateway.
    
    In the Kubernetes cluster, you need to set the server_config.yaml for each node to monitor.
  3. Go to the Prometheus root directory, and download starter Prometheus configuration file for Milvus:

    $ wget https://raw.githubusercontent.com/milvus-io/docs/v0.10.1/assets/monitoring/prometheus.yml \ -O prometheus.yml
    
  4. Download starter alerting rules for Milvus to the Prometheus root directory:

    wget -P rules https://raw.githubusercontent.com/milvus-io/docs/v0.10.1/assets/monitoring/alert_rules.yml
    
  5. Edit the Prometheus configuration file according to your needs:

    • global: Configures parameters such as scrape_interval and evaluation_interval.
    global:
     scrape_interval:     2s # Set the crawl time interval to 2s.
     evaluation_interval: 2s # Set the evaluation interval to 2s.
    
    • alerting: Sets the address and port of Alertmanager.
    alerting:
    alertmanagers:
    - static_configs:
      - targets: ['localhost:9093']
    
    • rule_files: Specifies the file that defines the alerting rules.
    rule_files:
      - "alert_rules.yml"
    
    • scrapeconfigs: Sets `jobnameandtargets` for scraping data.
    scrape_configs:
    - job_name: 'prometheus'
      static_configs:
      - targets: ['localhost:9090']
    
    - job_name: 'pushgateway'
      honor_labels: true
      static_configs:
      - targets: ['localhost:9091']
    
    See Prometheus Configuration for more information about the configuration file of Prometheus.
  6. Start Prometheus:

    ./prometheus --config.file=prometheus.yml
    

Configure Alertmanager

Events to create alert rules

Active monitoring helps you identify problems early. But it is also essential to create alerting rules that promptly send notifications when there are events that require investigation or intervention.

This section includes the most important events for which you must create alerting rules.

Server is down

  • Rule: Send an alert when the Milvus server is down.
  • How to detect: If the Milvus server is down, No Data is displayed for various metrics on the monitoring dashboard.

CPU/GPU temperature is too high

  • Rule: Send an alert when the CPU/GPU temperature exceeds 80 degrees Celsius.
  • How to detect: Check the metrics CPU Temperature and GPU Temperature on the monitoring dashboard.

Configuration steps

  1. Download the latest Alertmanager tarball for your operating system.
  2. Ensure that Alertmanager is properly installed:

    $ alertmanager --version
    
    You can add the path to Alertmanager to PATH. This makes it easy to start Alertmanager from any shell.
  3. Create the Alertmanager configuration file to specify the desired receivers for notifications, and add it to Alertmanager root directory.
  4. Start the Alertmanager server, with the --config.file flag pointing to the configuration file:

    alertmanager --config.file=alertmanager.yml
    
  5. Use your browser to open http://<hostname of machine running alertmanager>:9093, and use the Alertmanager UI to define rules for muting alerts.
Edit
© 2019 - 2020 Milvus. All rights reserved.