milvus-logo

Configure and Start Prometheus

This page describes how to configure and start up Prometheus, and how to connect Alertmanager to Prometheus for metrics visualization and early warning purposes.

Install Prometheus

  1. Download the Prometheus tarball for your operating system.

  2. Go to the directory holding the Prometheus file, and ensure that Prometheus is properly installed:

    $ ./prometheus --version
    
    You can add the path to Prometheus to PATH. This makes it easy to start Prometheus from any shell.

Configure and start Prometheus

  1. Start Pushgateway:

    ./pushgateway
    
    You must start Pushgateway before starting the Milvus Server.
  2. Start the Prometheus monitor in server_config.yaml and set the address and port number of Pushgateway:

    metric:
      enable: true                 # Set the value to true to enable the Prometheus monitor.
      address: <your_IP_address>   # Set the IP address of Pushgateway.
      port: 9091                   # Set the port number of Pushgateway.
    
    In the Kubernetes cluster, you need to set the server_config.yaml for each node to monitor.
  3. Go to the Prometheus root directory, and download starter Prometheus configuration file for Milvus:

    $ wget https://raw.githubusercontent.com/milvus-io/docs/master/v1.0.0/assets/monitoring/prometheus.yml \ -O prometheus.yml
    
  4. Download starter alerting rules for Milvus to the Prometheus root directory:

    wget -P rules https://raw.githubusercontent.com/milvus-io/docs/master/v1.0.0/assets/monitoring/alert_rules.yml
    
  5. Edit the Prometheus configuration file according to your needs:

    • global: Configures parameters such as scrape_interval and evaluation_interval.
    global:
      scrape_interval:     2s # Set the crawl time interval to 2s.
      evaluation_interval: 2s # Set the evaluation interval to 2s.
    
    • alerting: Sets the address and port of Alertmanager.
    alerting:
    alertmanagers:
    - static_configs:
       - targets: ['localhost:9093']
    
    • rule_files: Specifies the file that defines the alerting rules.
    rule_files:
       - "alert_rules.yml"
    
    • scrape_configs: Sets job_name and targets for scraping data.
    scrape_configs:
    - job_name: 'prometheus'
       static_configs:
       - targets: ['localhost:9090']
    
    - job_name: 'pushgateway'
       honor_labels: true
       static_configs:
       - targets: ['localhost:9091']
    
    See Prometheus Configuration for more information about the configuration file of Prometheus.
  6. Start Prometheus:

    ./prometheus --config.file=prometheus.yml
    

After starting up Prometheus, you can display and render on its interface the metrics that Milvus provides. See Milvus Metrics for more information.

Configure Alertmanager

Events to create alert rules

Proactively monitoring metrics contributes to identification of emerging issues. Creating alerting rules for events requiring immediate intervention is essential as well.

This section includes the most important events for which you must create alerting rules.

Server is down

  • Rule: Send an alert when the Milvus server is down.
  • How to detect: If the Milvus server is down, No Data is displayed for various metrics on the monitoring dashboard.

CPU/GPU temperature is too high

  • Rule: Send an alert when the CPU/GPU temperature exceeds 80 degrees Celsius.
  • How to detect: Check the metrics CPU Temperature and GPU Temperature on the monitoring dashboard.

Configuration steps

  1. Download the latest Alertmanager tarball for your operating system.

  2. Ensure that Alertmanager is properly installed:

    $ alertmanager --version
    
    You can add the path to Alertmanager to PATH. This makes it easy to start Alertmanager from any shell.
  3. Create the Alertmanager configuration file to specify the desired receivers for notifications, and add it to Alertmanager root directory.

  4. Start the Alertmanager server, with the --config.file flag pointing to the configuration file:

    alertmanager --config.file=alertmanager.yml
    
  5. Use your browser to open http://<hostname of machine running alertmanager>:9093, and use the Alertmanager UI to define rules for muting alerts.

FAQ

How can I differentiate if I have multiple Milvus nodes connected to Pushgateway? You can add a Prometheus instance in prometheus.yaml. Then Prometheus or Granafa will show the monitoring data, as well as the source node.
On this page