Skip to content

system_monitoring_probe

qoa4ml.probes.system_monitoring_probe

Classes

SystemMonitoringProbe

SystemMonitoringProbe is responsible for monitoring system resources and creating reports based on usage statistics.

Parameters:

Name Type Description Default
config
SystemProbeConfig

Configuration settings for the system monitoring probe.

required
connector
BaseConnector

Connector to send the report data.

required
client_info
Optional[ClientInfo]

Information about the client, default is None.

None

Attributes:

Name Type Description
config SystemProbeConfig

The system monitoring probe configuration.

node_name str

The name of the node being monitored.

environment EnvironmentEnum

The environment in which the node is running.

cpu_metadata dict

Metadata about the CPU.

gpu_metadata dict

Metadata about the GPU.

mem_metadata dict

Metadata about the memory.

metadata dict

General metadata about the node.

Methods:

Name Description
get_cpu_metadata

Get metadata about the CPU.

get_cpu_usage

Get the CPU usage of the system.

get_gpu_metadata

Get metadata about the GPU.

get_gpu_usage

Get the GPU usage of the system.

get_mem_metadata

Get metadata about the memory.

get_mem_usage

Get the memory usage of the system.

create_report

Create a JSON report based on system resource usage statistics.

Functions
__init__(config, connector, client_info=None)

Initialize an instance of SystemMonitoringProbe.

Parameters:

Name Type Description Default
config SystemProbeConfig

Configuration settings for the system monitoring probe.

required
connector BaseConnector

Connector to send the report data.

required
client_info Optional[ClientInfo]

Information about the client, default is None.

None
create_report()

Create a JSON report based on system resource usage statistics.

Returns:

Type Description
str

JSON-encoded report containing system resource usage statistics.

Notes
  • This method collects CPU, GPU, and memory usage stats for the system.
  • Reports are generated differently based on the environment (HPC or other).
get_cpu_metadata()

Get metadata about the CPU.

Returns:

Type Description
dict

Dictionary containing metadata about the CPU.

get_cpu_usage()

Get the CPU usage of the system.

Returns:

Type Description
dict

Dictionary containing the CPU usage information in percentage.

get_gpu_metadata()

Get metadata about the GPU.

Returns:

Type Description
dict

Dictionary containing metadata about the GPU.

get_gpu_usage()

Get the GPU usage of the system.

Returns:

Type Description
dict

Dictionary containing the GPU usage information.

get_mem_metadata()

Get metadata about the memory.

Returns:

Type Description
dict

Dictionary containing memory metadata in gigabytes.

get_mem_usage()

Get the memory usage of the system.

Returns:

Type Description
dict

Dictionary containing the memory usage in megabytes.

Functions