AbstractReader
The AbstractReader class is the foundation of AeroViz's data reading system, providing a standardized interface for
reading and processing aerosol instrument data.
Core Architecture
AbstractReader serves as the base class for all instrument-specific readers in AeroViz. It defines the common interface and provides shared functionality for data processing, quality control, and output formatting.
Overview
The AbstractReader implements a consistent workflow for all aerosol instruments:
- Data Ingestion - Read raw instrument files
- Format Detection - Automatically identify data structure
- Quality Control - Apply built-in validation and filtering
- Standardization - Convert to unified output format
- Metadata Handling - Preserve instrument and measurement metadata
Usage Pattern
While you can use AbstractReader directly, it's typically accessed through the RawDataReader factory function which
automatically selects the appropriate reader based on your instrument type.
Key Features
- Flexible Input Handling - Supports various file formats and structures
- Built-in Quality Control - Configurable data validation and filtering
- Metadata Preservation - Maintains instrument configuration and measurement context
- Extensible Design - Easy to subclass for new instruments
- Error Handling - Robust error reporting and recovery
Implementation Note
AbstractReader is an abstract base class. For actual data reading, use instrument-specific implementations or the
RawDataReader factory function.
API Reference
AeroViz.rawDataReader.core.AbstractReader
Bases: ABC
Abstract class for reading raw data from different instruments.
This class serves as a base class for reading raw data from various instruments. Each instrument
should have a separate class that inherits from this class and implements the abstract methods.
The abstract methods are _raw_reader and _QC.
The class handles file management, including reading from and writing to pickle files, and implements quality control measures. It can process data in both batch and streaming modes.
Attributes:
| Name | Type | Description |
|---|---|---|
nam |
str
|
Name identifier for the reader class |
path |
Path
|
Path to the raw data files |
meta |
dict
|
Metadata configuration for the instrument |
logger |
ReaderLogger
|
Custom logger instance for the reader |
reset |
bool
|
Flag to indicate whether to reset existing processed data |
append |
bool
|
Flag to indicate whether to append new data to existing processed data |
qc |
bool or str
|
Quality control settings |
qc_freq |
str or None
|
Frequency for quality control calculations |
Initialize the AbstractReader.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
Path or str
|
Path to the directory containing raw data files |
required |
reset
|
bool or str
|
If True, forces re-reading of raw data If 'append', appends new data to existing processed data |
False
|
qc
|
bool or str
|
If True, performs quality control If str, specifies the frequency for QC calculations |
True
|
**kwargs
|
dict
|
Additional keyword arguments: log_level : str Logging level for the reader suppress_warnings : bool If True, suppresses warning messages |
{}
|
Notes
Creates necessary output directories and initializes logging system. Sets up paths for pickle files, CSV files, and report outputs.
Attributes
logger
instance-attribute
Functions
_QC
abstractmethod
Abstract method for quality control processing.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
DataFrame
|
Input DataFrame containing raw data |
required |
Returns:
| Type | Description |
|---|---|
DataFrame
|
Quality controlled data with QC_Flag column |
Notes
Must be implemented by child classes to handle instrument-specific QC. This method should only check raw data quality (status, range, completeness). Derived parameter validation should be done in _process().
__call__
Process data for a specified time range.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
start
|
datetime
|
Start time for data processing |
required |
end
|
datetime
|
End time for data processing |
required |
mean_freq
|
str
|
Frequency for resampling the data |
'1h'
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
Processed and resampled data for the specified time range |
Notes
The processed data is also saved to a CSV file.
_generate_report
Calculate and log data quality rates for different time periods.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
raw_data
|
DataFrame
|
Raw data before quality control |
required |
qc_data
|
DataFrame
|
Data after quality control |
required |
qc_flag
|
Series
|
QC flag series indicating validity of each row |
None
|
Notes
Calculates rates for specified QC frequency if set. Updates the quality report with calculated rates.
_outlier_process
Process outliers in the data.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
_df
|
DataFrame
|
Input DataFrame containing potential outliers |
required |
Returns:
| Type | Description |
|---|---|
DataFrame
|
DataFrame with outliers processed |
Notes
Implementation depends on specific instrument requirements.
_process
Process data to calculate derived parameters.
This method is called after _QC() to calculate instrument-specific derived parameters (e.g., absorption coefficients, AAE, SAE).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
DataFrame
|
Quality-controlled DataFrame with QC_Flag column |
required |
Returns:
| Type | Description |
|---|---|
DataFrame
|
DataFrame with derived parameters added and QC_Flag updated |
Notes
Default implementation returns the input unchanged. Override in child classes to implement instrument-specific processing.
The method should: 1. Skip calculation for rows where QC_Flag != 'Valid' (optional optimization) 2. Calculate derived parameters 3. Validate derived parameters and update QC_Flag if invalid
_raw_reader
abstractmethod
Abstract method to read raw data files.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
file
|
Path or str
|
Path to the raw data file |
required |
Returns:
| Type | Description |
|---|---|
DataFrame
|
Raw data read from the file |
Notes
Must be implemented by child classes to handle specific file formats.
_read_raw_files
Read and process raw data files.
Returns:
| Type | Description |
|---|---|
tuple[DataFrame | None, DataFrame | None]
|
Tuple containing: - Raw data DataFrame or None - Quality controlled DataFrame or None |
Notes
Handles file reading and initial processing.
_run
Main execution method for data processing.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
user_start
|
datetime
|
Start time for processing |
required |
user_end
|
datetime
|
End time for processing |
required |
Returns:
| Type | Description |
|---|---|
DataFrame
|
Processed data for the specified time range |
Notes
Coordinates the entire data processing workflow.
_save_data
Save processed data to files.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
raw_data
|
DataFrame
|
Raw data to save |
required |
qc_data
|
DataFrame
|
Quality controlled data to save |
required |
Notes
Saves data in both pickle and CSV formats.
_timeIndex_process
Process time index of the DataFrame.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
_df
|
DataFrame
|
Input DataFrame to process |
required |
user_start
|
datetime
|
User-specified start time |
None
|
user_end
|
datetime
|
User-specified end time |
None
|
append_df
|
DataFrame
|
DataFrame to append to |
None
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
DataFrame with processed time index |
Notes
Handles time range filtering and data appending.
progress_reading
Context manager for tracking file reading progress.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
files
|
list
|
List of files to process |
required |
Yields:
| Type | Description |
|---|---|
Progress
|
Progress bar object for tracking |
Notes
Uses rich library for progress display.
reorder_dataframe_columns
staticmethod
Reorder DataFrame columns according to specified lists.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
DataFrame
|
Input DataFrame |
required |
order_lists
|
list[list]
|
Lists specifying column order |
required |
keep_others
|
bool
|
If True, keeps unspecified columns at the end |
False
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
DataFrame with reordered columns |
update_qc_flag
staticmethod
Update QC_Flag column for rows matching the mask.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
DataFrame
|
DataFrame with QC_Flag column |
required |
mask
|
Series
|
Boolean mask indicating rows to flag |
required |
flag_name
|
str
|
Name of the flag to add |
required |
Returns:
| Type | Description |
|---|---|
DataFrame
|
DataFrame with updated QC_Flag column |
Related Documentation
- RawDataReader Factory - High-level interface for instrument data reading
- Quality Control - Data validation and filtering options
- Supported Instruments - Available instrument implementations
Quick Example
from AeroViz import RawDataReader
from datetime import datetime
# Using the factory function (recommended)
data = RawDataReader(
instrument='AE33',
path='/path/to/data',
start=datetime(2024, 1, 1),
end=datetime(2024, 12, 31)
)
# Direct usage (advanced - for custom implementations)
from AeroViz.rawDataReader.core import AbstractReader
class MyInstrumentReader(AbstractReader):
nam = 'MyInstrument'
def _raw_reader(self, file):
# Custom file reading logic
pass
def _QC(self, df):
# Custom QC logic
return df