Skip to content

AbstractReader

The AbstractReader class is the foundation of AeroViz's data reading system, providing a standardized interface for reading and processing aerosol instrument data.

Core Architecture

AbstractReader serves as the base class for all instrument-specific readers in AeroViz. It defines the common interface and provides shared functionality for data processing, quality control, and output formatting.

Overview

The AbstractReader implements a consistent workflow for all aerosol instruments:

  1. Data Ingestion - Read raw instrument files
  2. Format Detection - Automatically identify data structure
  3. Quality Control - Apply built-in validation and filtering
  4. Standardization - Convert to unified output format
  5. Metadata Handling - Preserve instrument and measurement metadata

Usage Pattern

While you can use AbstractReader directly, it's typically accessed through the RawDataReader factory function which automatically selects the appropriate reader based on your instrument type.

Key Features

  • Flexible Input Handling - Supports various file formats and structures
  • Built-in Quality Control - Configurable data validation and filtering
  • Metadata Preservation - Maintains instrument configuration and measurement context
  • Extensible Design - Easy to subclass for new instruments
  • Error Handling - Robust error reporting and recovery

Implementation Note

AbstractReader is an abstract base class. For actual data reading, use instrument-specific implementations or the RawDataReader factory function.

API Reference

AeroViz.rawDataReader.core.AbstractReader

AbstractReader(path: Path | str, reset: bool | str = False, qc: bool | str = True, **kwargs)

Bases: ABC

Abstract class for reading raw data from different instruments.

This class serves as a base class for reading raw data from various instruments. Each instrument should have a separate class that inherits from this class and implements the abstract methods. The abstract methods are _raw_reader and _QC.

The class handles file management, including reading from and writing to pickle files, and implements quality control measures. It can process data in both batch and streaming modes.

Attributes:

Name Type Description
nam str

Name identifier for the reader class

path Path

Path to the raw data files

meta dict

Metadata configuration for the instrument

logger ReaderLogger

Custom logger instance for the reader

reset bool

Flag to indicate whether to reset existing processed data

append bool

Flag to indicate whether to append new data to existing processed data

qc bool or str

Quality control settings

qc_freq str or None

Frequency for quality control calculations

Initialize the AbstractReader.

Parameters:

Name Type Description Default
path Path or str

Path to the directory containing raw data files

required
reset bool or str

If True, forces re-reading of raw data If 'append', appends new data to existing processed data

False
qc bool or str

If True, performs quality control If str, specifies the frequency for QC calculations

True
**kwargs dict

Additional keyword arguments: log_level : str Logging level for the reader suppress_warnings : bool If True, suppresses warning messages

{}
Notes

Creates necessary output directories and initializes logging system. Sets up paths for pickle files, CSV files, and report outputs.

Attributes

append instance-attribute
append = reset == 'append'
csv_nam instance-attribute
csv_nam = output_folder / f'_read_{lower()}_qc.csv'
csv_nam_raw instance-attribute
csv_nam_raw = output_folder / f'_read_{lower()}_raw.csv'
csv_out instance-attribute
csv_out = output_folder / f'output_{lower()}.csv'
kwargs instance-attribute
kwargs = kwargs
logger instance-attribute
logger = ReaderLogger(nam, output_folder, upper() if not get('suppress_warnings') else 'ERROR')
meta instance-attribute
meta = meta[nam]
nam class-attribute instance-attribute
nam = 'AbstractReader'
path instance-attribute
path = Path(path)
pkl_nam instance-attribute
pkl_nam = output_folder / f'_read_{lower()}_qc.pkl'
pkl_nam_raw instance-attribute
pkl_nam_raw = output_folder / f'_read_{lower()}_raw.pkl'
qc instance-attribute
qc = qc
qc_freq instance-attribute
qc_freq = qc if isinstance(qc, str) else None
report_out instance-attribute
report_out = output_folder / 'report.json'
reset instance-attribute
reset = reset is True

Functions

QC_control staticmethod
QC_control()
_QC abstractmethod
_QC(df: DataFrame) -> DataFrame

Abstract method for quality control processing.

Parameters:

Name Type Description Default
df DataFrame

Input DataFrame containing raw data

required

Returns:

Type Description
DataFrame

Quality controlled data with QC_Flag column

Notes

Must be implemented by child classes to handle instrument-specific QC. This method should only check raw data quality (status, range, completeness). Derived parameter validation should be done in _process().

__call__
__call__(start: datetime, end: datetime, mean_freq: str = '1h') -> DataFrame

Process data for a specified time range.

Parameters:

Name Type Description Default
start datetime

Start time for data processing

required
end datetime

End time for data processing

required
mean_freq str

Frequency for resampling the data

'1h'

Returns:

Type Description
DataFrame

Processed and resampled data for the specified time range

Notes

The processed data is also saved to a CSV file.

_generate_report
_generate_report(raw_data, qc_data, qc_flag=None) -> None

Calculate and log data quality rates for different time periods.

Parameters:

Name Type Description Default
raw_data DataFrame

Raw data before quality control

required
qc_data DataFrame

Data after quality control

required
qc_flag Series

QC flag series indicating validity of each row

None
Notes

Calculates rates for specified QC frequency if set. Updates the quality report with calculated rates.

_outlier_process
_outlier_process(_df)

Process outliers in the data.

Parameters:

Name Type Description Default
_df DataFrame

Input DataFrame containing potential outliers

required

Returns:

Type Description
DataFrame

DataFrame with outliers processed

Notes

Implementation depends on specific instrument requirements.

_process
_process(df: DataFrame) -> DataFrame

Process data to calculate derived parameters.

This method is called after _QC() to calculate instrument-specific derived parameters (e.g., absorption coefficients, AAE, SAE).

Parameters:

Name Type Description Default
df DataFrame

Quality-controlled DataFrame with QC_Flag column

required

Returns:

Type Description
DataFrame

DataFrame with derived parameters added and QC_Flag updated

Notes

Default implementation returns the input unchanged. Override in child classes to implement instrument-specific processing.

The method should: 1. Skip calculation for rows where QC_Flag != 'Valid' (optional optimization) 2. Calculate derived parameters 3. Validate derived parameters and update QC_Flag if invalid

_raw_reader abstractmethod
_raw_reader(file)

Abstract method to read raw data files.

Parameters:

Name Type Description Default
file Path or str

Path to the raw data file

required

Returns:

Type Description
DataFrame

Raw data read from the file

Notes

Must be implemented by child classes to handle specific file formats.

_read_raw_files
_read_raw_files() -> tuple[DataFrame | None, DataFrame | None]

Read and process raw data files.

Returns:

Type Description
tuple[DataFrame | None, DataFrame | None]

Tuple containing: - Raw data DataFrame or None - Quality controlled DataFrame or None

Notes

Handles file reading and initial processing.

_run
_run(user_start, user_end)

Main execution method for data processing.

Parameters:

Name Type Description Default
user_start datetime

Start time for processing

required
user_end datetime

End time for processing

required

Returns:

Type Description
DataFrame

Processed data for the specified time range

Notes

Coordinates the entire data processing workflow.

_save_data
_save_data(raw_data: DataFrame, qc_data: DataFrame) -> None

Save processed data to files.

Parameters:

Name Type Description Default
raw_data DataFrame

Raw data to save

required
qc_data DataFrame

Quality controlled data to save

required
Notes

Saves data in both pickle and CSV formats.

_timeIndex_process
_timeIndex_process(_df, user_start=None, user_end=None, append_df=None)

Process time index of the DataFrame.

Parameters:

Name Type Description Default
_df DataFrame

Input DataFrame to process

required
user_start datetime

User-specified start time

None
user_end datetime

User-specified end time

None
append_df DataFrame

DataFrame to append to

None

Returns:

Type Description
DataFrame

DataFrame with processed time index

Notes

Handles time range filtering and data appending.

progress_reading
progress_reading(files: list) -> Generator

Context manager for tracking file reading progress.

Parameters:

Name Type Description Default
files list

List of files to process

required

Yields:

Type Description
Progress

Progress bar object for tracking

Notes

Uses rich library for progress display.

reorder_dataframe_columns staticmethod
reorder_dataframe_columns(df, order_lists: list[list], keep_others: bool = False)

Reorder DataFrame columns according to specified lists.

Parameters:

Name Type Description Default
df DataFrame

Input DataFrame

required
order_lists list[list]

Lists specifying column order

required
keep_others bool

If True, keeps unspecified columns at the end

False

Returns:

Type Description
DataFrame

DataFrame with reordered columns

update_qc_flag staticmethod
update_qc_flag(df: DataFrame, mask: Series, flag_name: str) -> DataFrame

Update QC_Flag column for rows matching the mask.

Parameters:

Name Type Description Default
df DataFrame

DataFrame with QC_Flag column

required
mask Series

Boolean mask indicating rows to flag

required
flag_name str

Name of the flag to add

required

Returns:

Type Description
DataFrame

DataFrame with updated QC_Flag column

Quick Example

from AeroViz import RawDataReader
from datetime import datetime

# Using the factory function (recommended)
data = RawDataReader(
    instrument='AE33',
    path='/path/to/data',
    start=datetime(2024, 1, 1),
    end=datetime(2024, 12, 31)
)

# Direct usage (advanced - for custom implementations)
from AeroViz.rawDataReader.core import AbstractReader


class MyInstrumentReader(AbstractReader):
    nam = 'MyInstrument'

    def _raw_reader(self, file):
        # Custom file reading logic
        pass

    def _QC(self, df):
        # Custom QC logic
        return df