Skip to content

QoA4ML - Quality of Analytics for ML

Source code

QoA4ML

Monitoring Client

QoA Client: an object that observes metrics, generates metric reports, and sends them to the Observation service via a list of connectors (e.g., messaging connector: RabbitMQ).

The developers only need to init a QoAClient at the beginning and use it to observe/evaluate metrics by self-instrumentation (calling its functions) at the right place in the source code.

  • To initiate a QoA Client, developers can specify a configuration file path or refer to a configuration as a dictionary, or give the registration service (URL) where the client can get its configuration.

The configuration contains the information about the client and its configuration in form of dictionary

Example:

clientConf = {
    "client":{
        "user_id": "aaltosea1",
        "instance_name": "ML02",
        "stage_id": "ML",
        "method": "REST",
        "application_name": "test",
        "role": "ml"
    },
    "connector":{
        "amqp_connector":{
            "class": "amqp",
            "conf":{
                "end_point": "localhost",
                "exchange_name": "qoa4ml",
                "exchange_type": "topic",
                "out_routing_key": "qoa.report.ml"
            }
        }
    }
}
qoaClient = QoaClient(config_dict=clientConf)

The connector is the dictionary containing multiple connector configuration (amqp, mqtt, kafka) If 'connector' is not define, developer must give 'registration_url' The 'registration_url' specify the service where the client register for monitoring service. If it's set, the client register with the service and receive connector configuration. For example: "http://localhost:5001/registration"

  • Via this client, developers can call different monitoring probes to measure desired metrics and categorize them into data quality, service performance or inference quality.

  • By using our probes (e.g., observeErronous, observeMissing, and observeInferenceMetric), the metrics are already categorized in the quality report.

  • For unsupported metrics or user-defined metrics, the developers can report them by using observeMetric providing metric's names and their expected categories. For example qoaClient.observeMetric(metric_name="image_width", value=200, category=1).

  • Category: metrics are categorized into following groups:

  • 0 - Quality: Performance (metrics for evaluating service performance e.g., response time, throughput)
  • 1 - Quality: Data (metrics for evaluating data quality e.g., missing, duplicate, erroneous)
  • 2 - Quality: Inference (metrics for evaluating quality of ML inference, measured from inferences e.g., accuracy, confidence)
  • 3 - Resource: metrics for evaluating resource utilization e.g. CPU, Memory
  • To send the quality report to the observation service, the developers can call report from the QoAClient. For example: qualityReport = qoaClient.report(), the function will additionally return the report at current stage and save it to qualityReport
  • To aggregate reports from previous stage (in a pipeline) for building the computation graphs, the client can call importPReport. For example qoaClient.importPReport(previousReport)

Probes

  • QoA4ML Probes: libraries and lightweight modules capturing metrics. They are integrated into suitable ML serving frameworks and ML code
  • Probe properties:
  • Can be written in different languages (Python, GoLang)
  • Can have different communications to monitoring systems (depending on probes and its ML support)
  • Capture metrics with a clear definition/scope
    • e.g., Response time for an ML stage (training) or a service call (of ML APIs)
    • Thus output of probes must be correlated to objects to be monitored and the tenant
  • Support high or low-level metrics/attributes
    • depending on probes implementation
  • Can be instrumented into source code or standalone

Metric

We support some metric classes for collecting different types of metric: Counter, Gauge, Summary, Histogram

  • Metric: an original class providing some common functions on an metric object.
  • Attribute:
    • metric_name
    • description
    • value
  • Function:
    • __init__: let user define the metric name, description and default value.
    • set: set its value to a specific value
    • get_val: get current value
    • get_name: return metric name
    • get_des: return metric description
    • __str__: return information about the metric in form of string
    • to_dict: return information about the metric in form of dictionary
  • Counter
  • Attribute: same as Metric & on further developing
  • Function:
    • inc: increase the value of the metric by the given number/by 1 by default.
    • reset: set the value back to zero.
  • Gauge
  • Attribute: same as Metric & on further developing
  • Function:
    • inc: increase the value of the metric by a given number/by 1 by default.
    • dec: decrease the value of the metric by a given number/by 1 by default.
    • set: set the value to a given number.
  • Summary
  • Attribute: same as Metric & on further developing
  • Function:
    • inc: increase the value of the metric by a given number/by 1 by default.
    • dec: decrease the value of the metric by a given number/by 1 by default.
    • set: set the value to a given number.
  • Histogram
  • Attribute: same as Metric & on further developing
  • Function:
    • inc: increase the value of the metric by a given number/by 1 by default.
    • dec: decrease the value of the metric by a given number/by 1 by default.
    • set: set the value to a given number.

QoA4ML Reports

This module defines QoA_Report, an object that provide functions to export monitored metric to the following schema:

{
    "computationGraph":{
        "instances":{
            "@instance_id":{
                "instance_name": "@name_of_instance",
                "method": "@method/task/function",
                "previous_instance":["@list_of_previous_instance"]
            },
            ...
        },
        "last_instance": "@name_of_last_instance_in_the_graph"
    },
    "quality":{
        "data":{
            "@stage_id":{
                "@metric_name":{
                    "@instance_id": "@value"
                }
            }
        },
        "performance":{
            "@stage_id":{
                "@metric_name":{
                    "@instance_id": "@value"
                }
            }
        },
        "inference":{
            "@inference_id":{
                "value": "@value",
                "confident": "@confidence",
                "accuracy": "@accuracy",
                "instance_id": "@instance_id",
                "source": ["@list_of_inferences_to_infer_this_inference"]
            }
        }
    }
}

The example is shown in example/reports/qoa_report/example.txt

  • Attribute:

  • previous_report_instance = list previous services

  • report_list: list of reports from previous services
  • previous_inference: list previous inferences
  • quality_report: report all quality (data, service, inference qualtiy) of the service
  • execution_graph: report the execution graph
  • report: the final report to be sent

  • Function:

  • __init__: init as empty report.
  • import_report_from_file: init QoA Report from json file.
  • importPReport: import reports from previous service to build the execution and inference graph
  • build_execution_graph: build execution graph from list of previous reports
  • build_quality_report: build the quality report from metrics collected in runtime
  • generateReport: return the final report.
  • observeMetric: observe metrics in runtime with 3 categories: service quality, data quality, inference qualtiy. This can be extended to observe resource metrics.

Examples

https://github.com/rdsea/QoA4ML/tree/main/example

Overview

Class

Probes will be integrated to client program or system service to collect metrics at the edge Probes will generate reports and sent to message broker using different connector. Corresponding collector should be used to acquire the metrics.

Collector

The manager/orchestrator have to integrate collector to collect metric using different protocols for further analysis.

  • Attribute:
  • Function:

  • __init__: take a configuration as a dict containing information about the data source, eg. broker, channel, queue, etc. It can take an object as an attribute host to return the message for further processing.

  • If the collector is initiated by an object inherited class, this class must implement message_processing function to process the message returned by the collector. Otherwise, the collector will print the message to the console.

  • on_request: handle message from data source (message broker,...)

  • start & stop: start and stop consuming message

  • get_queue: return the queue name.

Connector

Connectors are implement with different protocols for sending report. Example: sending report to message broker - AMQP/MQTT

  • Attribute:
  • Function:

  • __init__: take a configuration as a dict containing information about the data sink, eg. broker, channel, queue, etc. It can take a bool parameter log for logging messages for further processing.

  • send_data: a function to send data to specified routing_key/queue with a corresponding key corr_id to trace back message.

Utilities

A module provide some frequently used functions and some function to directly collect system metrics.

Note

  • eva_duplicate, eva_erronous, eva_missing, and detect_outlier probes are using ydata-quality library, which is only available for Python 3.8
  • For using ML quality probes, you may need to install a few more dependencies, e.g., tensorflow and Pillow.
  • QoaClient uses AMQP protocol by default. To use MQTT, you may need to install paho-mqtt.
  • To monitor Docker stats, you need to install docker python client.
  • To connect with Prometheus, you need to install prometheus-client