dataquality_utils

`qoa4ml.utils.dataquality_utils` ¶

Classes¶

Functions¶

`eva_duplicate(data)` ¶

Evaluate and return the number or percentage of duplicate entries in the data.

Parameters:

data : numpy.ndarray or pandas.DataFrame Input data to be evaluated.

Returns:

dict or None A dictionary containing the following keys if successful: - DataQualityEnum.DUPLICATE_RATIO: Percentage of duplicate data. - DataQualityEnum.TOTAL_DUPLICATE: Total number of duplicate entries. Returns None if the input data type is unsupported or if an exception occurs.

`eva_erronous(data, errors=None)` ¶

Evaluate and return the number or percentage of erroneous data entries.

Parameters:

data : numpy.ndarray or pandas.DataFrame Input data to be evaluated. errors : list, optional List of items considered as errors. If not provided, NaNs will be considered as errors.

Returns:

dict or None A dictionary containing the following keys if successful: - DataQualityEnum.TOTAL_ERRORS: Total number of errors. - DataQualityEnum.ERROR_RATIOS: Percentage of errors. Returns None if the input data type is unsupported or if an exception occurs.

`eva_input_file_type(input_file, allowed_data_type)` ¶

Check if the input file matches any of the allowed data types

Parameters:

input_file : UploadFile The uploaded file object to be checked for data type. allowed_data_type : List[str] A list of allowed data types to compare against the content type of the input file.

Returns:

bool True if the content type of the input file is in the list of allowed data types, otherwise False.

`eva_missing(data, null_count=True, correlations=False, predict=False)` ¶

Evaluate and return statistics about missing data in the dataset.

Parameters:

data : numpy.ndarray or pandas.DataFrame Input data to be evaluated. null_count : bool, default=True If True, return the count of missing values in each column. correlations : bool, default=False If True, return the correlation matrix of missing values. predict : bool, default=False If True, enable missing data prediction (not implemented).

Returns:

dict or None A dictionary containing: - DataQualityEnum.NULL_COUNT: Count of missing values (if null_count is True). - DataQualityEnum.NULL_CORRELATIONS: Correlation matrix of missing values (if correlations is True). Returns None if the input data type is unsupported or if an exception occurs.

`eva_none(data)` ¶

Evaluate and return statistics about valid and None (NaN) values in the dataset.

Parameters:

data : numpy.ndarray or pandas.DataFrame Input data to be evaluated.

Returns:

dict or None A dictionary containing the following keys if successful: - DataQualityEnum.TOTAL_VALID: Total count of valid (non-NaN) entries. - DataQualityEnum.TOTAL_NONE: Total count of None (NaN) entries. - DataQualityEnum.NONE_RATIO: Percentage of none/NaN entries (100 * none_count / total; 0.0 when the dataset is empty). Field name is authoritative — previous versions accidentally computed the valid ratio. Returns None if the input data type is unsupported or if an exception occurs.

`image_quality(input_image)` ¶

Assess various quality metrics of an input image.

Parameters:

Name	Type	Description	Default
`input_image` ¶	`bytes or ndarray`	The input image in either byte format or as a numpy array.	required

Returns:

Type	Description
`dict`	A dictionary keyed by `ImageQualityNameEnum` with: - `image_size`: tuple `(width, height)`. - `color_mode`: PIL color mode (e.g. `"RGB"`). - `color_channel`: number of color channels.

Raises:

Type	Description
`TypeError`	If `input_image` is neither `bytes` nor `numpy.ndarray`.

dataquality_utils

qoa4ml.utils.dataquality_utils ¶

Classes¶

Functions¶

eva_duplicate(data) ¶

eva_erronous(data, errors=None) ¶

eva_input_file_type(input_file, allowed_data_type) ¶

eva_missing(data, null_count=True, correlations=False, predict=False) ¶

eva_none(data) ¶

image_quality(input_image) ¶

input_image ¶

`qoa4ml.utils.dataquality_utils` ¶

`eva_duplicate(data)` ¶

`eva_erronous(data, errors=None)` ¶

`eva_input_file_type(input_file, allowed_data_type)` ¶

`eva_missing(data, null_count=True, correlations=False, predict=False)` ¶

`eva_none(data)` ¶

`image_quality(input_image)` ¶

`input_image` ¶