dataquality_utils
qoa4ml.utils.dataquality_utils
¶
Classes¶
Functions¶
eva_duplicate(data)
¶
Evaluate and return the number or percentage of duplicate entries in the data.
Parameters:
data : numpy.ndarray or pandas.DataFrame Input data to be evaluated.
Returns:
dict or None A dictionary containing the following keys if successful: - DataQualityEnum.duplicate_ratio: Percentage of duplicate data. - DataQualityEnum.total_duplicate: Total number of duplicate entries. Returns None if the input data type is unsupported or if an exception occurs.
eva_erronous(data, errors=None)
¶
Evaluate and return the number or percentage of erroneous data entries.
Parameters:
data : numpy.ndarray or pandas.DataFrame Input data to be evaluated. errors : list, optional List of items considered as errors. If not provided, NaNs will be considered as errors.
Returns:
dict or None A dictionary containing the following keys if successful: - DataQualityEnum.total_errors: Total number of errors. - DataQualityEnum.error_ratios: Percentage of errors. Returns None if the input data type is unsupported or if an exception occurs.
eva_input_file_type(input_file, allowed_data_type)
¶
Check if the input file matches any of the allowed data types
Parameters:
input_file : UploadFile The uploaded file object to be checked for data type. allowed_data_type : List[str] A list of allowed data types to compare against the content type of the input file.
Returns:
bool True if the content type of the input file is in the list of allowed data types, otherwise False.
eva_missing(data, null_count=True, correlations=False, predict=False)
¶
Evaluate and return statistics about missing data in the dataset.
Parameters:
data : numpy.ndarray or pandas.DataFrame Input data to be evaluated. null_count : bool, default=True If True, return the count of missing values in each column. correlations : bool, default=False If True, return the correlation matrix of missing values. predict : bool, default=False If True, enable missing data prediction (not implemented).
Returns:
dict or None A dictionary containing: - DataQualityEnum.null_count: Count of missing values (if null_count is True). - DataQualityEnum.null_correlations: Correlation matrix of missing values (if correlations is True). Returns None if the input data type is unsupported or if an exception occurs.
eva_none(data)
¶
Evaluate and return statistics about valid and None (NaN) values in the dataset.
Parameters:
data : numpy.ndarray or pandas.DataFrame Input data to be evaluated.
Returns:
dict or None A dictionary containing the following keys if successful: - DataQualityEnum.total_valid: Total count of valid (non-NaN) entries. - DataQualityEnum.total_none: Total count of None (NaN) entries. - DataQualityEnum.none_ratio: Percentage of valid entries. Returns None if the input data type is unsupported or if an exception occurs.
image_quality(input_image)
¶
Assess various quality metrics of an input image.
Parameters:
input_image : bytes or np.ndarray The input image in either byte format or as a numpy array.
Returns:
dict A dictionary containing the following keys: - ImageQualityNameEnum.image_size: The size of the image (width, height). - ImageQualityNameEnum.color_mode: The color mode of the image (e.g., 'RGB'). - ImageQualityNameEnum.color_channel: The number of color channels in the image.