Quality Control
The QualityControl
class provides comprehensive data quality assessment and outlier detection methods for aerosol
instrument data.
Overview
Quality control is essential for ensuring data reliability in aerosol measurements. The QualityControl class offers various statistical methods to identify and handle outliers, invalid measurements, and data quality issues.
Key Features
- Multiple Outlier Detection Methods - N-sigma, IQR, time-aware rolling methods
- Flexible Thresholds - Configurable parameters for different data types
- Time-Aware Processing - Considers temporal patterns in data
- Status Code Filtering - Handles instrument-specific error codes
- Data Completeness Checks - Validates temporal coverage requirements
Methods
Statistical Outlier Detection
N-Sigma Method
from AeroViz.rawDataReader.core.qc import QualityControl
qc = QualityControl()
cleaned_data = qc.n_sigma(df, std_range=3)
Interquartile Range (IQR) Method
# Basic IQR
cleaned_data = qc.iqr(df)
# With log transformation
cleaned_data = qc.iqr(df, log_dist=True)
Time-Aware Rolling IQR
# Rolling IQR with time awareness
cleaned_data = qc.time_aware_rolling_iqr(
df,
window_size='24h',
iqr_factor=3.0,
min_periods=5
)
Advanced Quality Control
Bidirectional Trend Analysis
# Detect outliers while considering data trends
outlier_mask = qc.bidirectional_trend_std_QC(
df,
window_size='6h',
std_factor=3.0,
trend_window='30min',
trend_factor=2.0
)
Status Code Filtering
# Filter based on instrument status codes
error_mask = qc.filter_error_status(
df,
error_codes=[1, 2, 4, 16],
special_codes=[384, 1024]
)
Completeness Validation
# Check hourly data completeness
completeness_mask = qc.hourly_completeness_QC(
df,
freq='6min',
threshold=0.75 # Require 75% data availability
)
Usage Examples
Basic Outlier Removal
from AeroViz.rawDataReader.core.qc import QualityControl
import pandas as pd
# Load your data
df = pd.read_csv('instrument_data.csv', index_col=0, parse_dates=True)
# Initialize QC
qc = QualityControl()
# Apply basic outlier detection
cleaned_df = qc.n_sigma(df, std_range=3)
Advanced Time-Aware Processing
# For time series with trends
cleaned_df = qc.time_aware_rolling_iqr(
df,
window_size='12h',
iqr_factor=2.5,
min_periods=10
)
# With bidirectional trend consideration
outlier_mask = qc.bidirectional_trend_std_QC(
df,
window_size='6h',
std_factor=3.0,
trend_window='1h'
)
# Apply the mask
final_df = df.where(~outlier_mask, np.nan)
Instrument-Specific QC
# For instruments with status codes (e.g., AE33)
error_mask = qc.filter_error_status(
df,
error_codes=[1, 2, 4, 16, 32], # Common error codes
special_codes=[384, 1024, 2048] # Specific error conditions
)
# Remove error data
qc_df = df.where(~error_mask, np.nan)
API Reference
AeroViz.rawDataReader.core.qc.QualityControl
A class providing various methods for data quality control and outlier detection
Functions
_ensure_dataframe
staticmethod
Ensure input data is in DataFrame format
_transform_if_log
staticmethod
Transform data to log scale if required
n_sigma
classmethod
Detect outliers using n-sigma method
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df
|
DataFrame
|
Input data |
required |
std_range
|
int
|
Number of standard deviations to use as threshold |
5
|
Returns:
Type | Description |
---|---|
DataFrame
|
Cleaned DataFrame with outliers masked as NaN |
iqr
classmethod
Detect outliers using Interquartile Range (IQR) method
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df
|
DataFrame
|
Input data |
required |
log_dist
|
bool
|
Whether to apply log transformation to data |
False
|
Returns:
Type | Description |
---|---|
DataFrame
|
Cleaned DataFrame with outliers masked as NaN |
rolling_iqr
classmethod
Detect outliers using rolling window IQR method
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df
|
DataFrame
|
Input data |
required |
window_size
|
int
|
Size of the rolling window |
24
|
log_dist
|
bool
|
Whether to apply log transformation to data |
False
|
Returns:
Type | Description |
---|---|
DataFrame
|
Cleaned DataFrame with outliers masked as NaN |
time_aware_iqr
classmethod
Detect outliers using time-aware IQR method
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df
|
DataFrame
|
Input data |
required |
time_window
|
str
|
Time window size (e.g., '1D' for one day) |
'1D'
|
log_dist
|
bool
|
Whether to apply log transformation to data |
False
|
Returns:
Type | Description |
---|---|
DataFrame
|
Cleaned DataFrame with outliers masked as NaN |
mad_iqr_hybrid
classmethod
Detect outliers using a hybrid of MAD and IQR methods
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df
|
DataFrame
|
Input data |
required |
mad_threshold
|
float
|
Threshold for MAD method |
3.5
|
log_dist
|
bool
|
Whether to apply log transformation to data |
False
|
Returns:
Type | Description |
---|---|
DataFrame
|
Cleaned DataFrame with outliers masked as NaN |
options: show_source: false show_bases: false show_inheritance_diagram: false members_order: source show_if_no_docstring: false filters:
- "!^_" docstring_section_style: table heading_level: 3 show_signature_annotations: true separate_signature: true group_by_category: true show_category_heading: true
Related Documentation
- AbstractReader - Base class that uses QualityControl methods
- RawDataReader - Factory function with built-in QC options
- Instrument Documentation - Instrument-specific QC procedures