AbstractReader

The AbstractReader class is the foundation of AeroViz's data reading system, providing a standardized interface for reading and processing aerosol instrument data.

Core Architecture

AbstractReader serves as the base class for all instrument-specific readers in AeroViz. It defines the common interface and provides shared functionality for data processing, quality control, and output formatting.

Overview

The AbstractReader implements a consistent workflow for all aerosol instruments:

Data Ingestion - Read raw instrument files
Format Detection - Automatically identify data structure
Quality Control - Apply built-in validation and filtering
Standardization - Convert to unified output format
Metadata Handling - Preserve instrument and measurement metadata

Usage Pattern

While you can use AbstractReader directly, it's typically accessed through the RawDataReader factory function which automatically selects the appropriate reader based on your instrument type.

Key Features

Flexible Input Handling - Supports various file formats and structures
Built-in Quality Control - Configurable data validation and filtering
Metadata Preservation - Maintains instrument configuration and measurement context
Extensible Design - Easy to subclass for new instruments
Error Handling - Robust error reporting and recovery

Implementation Note

AbstractReader is an abstract base class. For actual data reading, use instrument-specific implementations or the RawDataReader factory function.

API Reference

AeroViz.rawDataReader.core.AbstractReader

AbstractReader(path: Path | str, reset: bool | str = False, qc: bool | str = True, **kwargs)

Bases: ABC

Abstract class for reading raw data from different instruments.

This class serves as a base class for reading raw data from various instruments. Each instrument should have a separate class that inherits from this class and implements the abstract methods. The abstract methods are _raw_reader and _QC.

The class handles file management, including reading from and writing to pickle files, and implements quality control measures. It can process data in both batch and streaming modes.

Attributes:

Name	Type	Description
`nam`	`str`	Name identifier for the reader class
`path`	`Path`	Path to the raw data files
`meta`	`dict`	Metadata configuration for the instrument
`logger`	`ReaderLogger`	Custom logger instance for the reader
`reset`	`bool`	Flag to indicate whether to reset existing processed data
`append`	`bool`	Flag to indicate whether to append new data to existing processed data
`qc`	`bool or str`	Quality control settings
`qc_freq`	`str or None`	Frequency for quality control calculations

Initialize the AbstractReader.

Parameters:

Name	Type	Description	Default
`path`	`Path or str`	Path to the directory containing raw data files	required
`reset`	`bool or str`	If True, forces re-reading of raw data If 'append', appends new data to existing processed data	`False`
`qc`	`bool or str`	If True, performs quality control If str, specifies the frequency for QC calculations	`True`
`**kwargs`	`dict`	Additional keyword arguments: log_level : str Logging level for the reader suppress_warnings : bool If True, suppresses warning messages	`{}`

Notes

Creates necessary output directories and initializes logging system. Sets up paths for pickle files, CSV files, and report outputs.

Attributes

append `instance-attribute`

append = reset == 'append'

csv_nam `instance-attribute`

csv_nam = output_folder / f'_read_{lower()}_qc.csv'

csv_nam_raw `instance-attribute`

csv_nam_raw = output_folder / f'_read_{lower()}_raw.csv'

csv_out `instance-attribute`

csv_out = output_folder / f'output_{lower()}.csv'

kwargs `instance-attribute`

kwargs = kwargs

logger `instance-attribute`

logger = ReaderLogger(nam, output_folder, upper() if not get('suppress_warnings') else 'ERROR')

meta `instance-attribute`

meta = meta[nam]

nam `class-attribute` `instance-attribute`

nam = 'AbstractReader'

path `instance-attribute`

path = Path(path)

pkl_nam `instance-attribute`

pkl_nam = output_folder / f'_read_{lower()}_qc.pkl'

pkl_nam_raw `instance-attribute`

pkl_nam_raw = output_folder / f'_read_{lower()}_raw.pkl'

qc `instance-attribute`

qc = qc

qc_freq `instance-attribute`

qc_freq = qc if isinstance(qc, str) else None

report_out `instance-attribute`

report_out = output_folder / 'report.json'

reset `instance-attribute`

reset = reset is True

Functions

QC_control `staticmethod`

QC_control()

_QC `abstractmethod`

_QC(df: DataFrame) -> DataFrame

Abstract method for quality control processing.

Parameters:

Name	Type	Description	Default
`df`	`DataFrame`	Input DataFrame containing raw data	required

Returns:

Type	Description
`DataFrame`	Quality controlled data with QC_Flag column

Notes

Must be implemented by child classes to handle instrument-specific QC. This method should only check raw data quality (status, range, completeness). Derived parameter validation should be done in _process().

call

__call__(start: datetime, end: datetime, mean_freq: str = '1h') -> DataFrame

Process data for a specified time range.

Parameters:

Name	Type	Description	Default
`start`	`datetime`	Start time for data processing	required
`end`	`datetime`	End time for data processing	required
`mean_freq`	`str`	Frequency for resampling the data	`'1h'`

Returns:

Type	Description
`DataFrame`	Processed and resampled data for the specified time range

Notes

The processed data is also saved to a CSV file.

_generate_report

_generate_report(raw_data, qc_data, qc_flag=None) -> None

Calculate and log data quality rates for different time periods.

Parameters:

Name	Type	Description	Default
`raw_data`	`DataFrame`	Raw data before quality control	required
`qc_data`	`DataFrame`	Data after quality control	required
`qc_flag`	`Series`	QC flag series indicating validity of each row	`None`

Notes

Calculates rates for specified QC frequency if set. Updates the quality report with calculated rates.

_outlier_process

_outlier_process(_df)

Process outliers in the data.

Parameters:

Name	Type	Description	Default
`_df`	`DataFrame`	Input DataFrame containing potential outliers	required

Returns:

Type	Description
`DataFrame`	DataFrame with outliers processed

Notes

Implementation depends on specific instrument requirements.

_process

_process(df: DataFrame) -> DataFrame

Process data to calculate derived parameters.

This method is called after _QC() to calculate instrument-specific derived parameters (e.g., absorption coefficients, AAE, SAE).

Parameters:

Name	Type	Description	Default
`df`	`DataFrame`	Quality-controlled DataFrame with QC_Flag column	required

Returns:

Type	Description
`DataFrame`	DataFrame with derived parameters added and QC_Flag updated

Notes

Default implementation returns the input unchanged. Override in child classes to implement instrument-specific processing.

The method should: 1. Skip calculation for rows where QC_Flag != 'Valid' (optional optimization) 2. Calculate derived parameters 3. Validate derived parameters and update QC_Flag if invalid

_raw_reader `abstractmethod`

_raw_reader(file)

Abstract method to read raw data files.

Parameters:

Name	Type	Description	Default
`file`	`Path or str`	Path to the raw data file	required

Returns:

Type	Description
`DataFrame`	Raw data read from the file

Notes

Must be implemented by child classes to handle specific file formats.

_read_raw_files

_read_raw_files() -> tuple[DataFrame | None, DataFrame | None]

Read and process raw data files.

Returns:

Type	Description
`tuple[DataFrame \| None, DataFrame \| None]`	Tuple containing: - Raw data DataFrame or None - Quality controlled DataFrame or None

Notes

Handles file reading and initial processing.

_run

_run(user_start, user_end)

Main execution method for data processing.

Parameters:

Name	Type	Description	Default
`user_start`	`datetime`	Start time for processing	required
`user_end`	`datetime`	End time for processing	required

Returns:

Type	Description
`DataFrame`	Processed data for the specified time range

Notes

Coordinates the entire data processing workflow.

_save_data

_save_data(raw_data: DataFrame, qc_data: DataFrame) -> None

Save processed data to files.

Parameters:

Name	Type	Description	Default
`raw_data`	`DataFrame`	Raw data to save	required
`qc_data`	`DataFrame`	Quality controlled data to save	required

Notes

Saves data in both pickle and CSV formats.

_timeIndex_process

_timeIndex_process(_df, user_start=None, user_end=None, append_df=None)

Process time index of the DataFrame.

Parameters:

Name	Type	Description	Default
`_df`	`DataFrame`	Input DataFrame to process	required
`user_start`	`datetime`	User-specified start time	`None`
`user_end`	`datetime`	User-specified end time	`None`
`append_df`	`DataFrame`	DataFrame to append to	`None`

Returns:

Type	Description
`DataFrame`	DataFrame with processed time index

Notes

Handles time range filtering and data appending.

progress_reading

progress_reading(files: list) -> Generator

Context manager for tracking file reading progress.

Parameters:

Name	Type	Description	Default
`files`	`list`	List of files to process	required

Yields:

Type	Description
`Progress`	Progress bar object for tracking

Notes

Uses rich library for progress display.

reorder_dataframe_columns `staticmethod`

reorder_dataframe_columns(df, order_lists: list[list], keep_others: bool = False)

Reorder DataFrame columns according to specified lists.

Parameters:

Name	Type	Description	Default
`df`	`DataFrame`	Input DataFrame	required
`order_lists`	`list[list]`	Lists specifying column order	required
`keep_others`	`bool`	If True, keeps unspecified columns at the end	`False`

Returns:

Type	Description
`DataFrame`	DataFrame with reordered columns

update_qc_flag `staticmethod`

update_qc_flag(df: DataFrame, mask: Series, flag_name: str) -> DataFrame

Update QC_Flag column for rows matching the mask.

Parameters:

Name	Type	Description	Default
`df`	`DataFrame`	DataFrame with QC_Flag column	required
`mask`	`Series`	Boolean mask indicating rows to flag	required
`flag_name`	`str`	Name of the flag to add	required

Returns:

Type	Description
`DataFrame`	DataFrame with updated QC_Flag column

RawDataReader Factory - High-level interface for instrument data reading
Quality Control - Data validation and filtering options
Supported Instruments - Available instrument implementations

Quick Example

from AeroViz import RawDataReader
from datetime import datetime

# Using the factory function (recommended)
data = RawDataReader(
    instrument='AE33',
    path='/path/to/data',
    start=datetime(2024, 1, 1),
    end=datetime(2024, 12, 31)
)

# Direct usage (advanced - for custom implementations)
from AeroViz.rawDataReader.core import AbstractReader


class MyInstrumentReader(AbstractReader):
    nam = 'MyInstrument'

    def _raw_reader(self, file):
        # Custom file reading logic
        pass

    def _QC(self, df):
        # Custom QC logic
        return df

AbstractReader

Overview

Key Features

API Reference

AeroViz.rawDataReader.core.AbstractReader

Attributes

append instance-attribute

csv_nam instance-attribute

csv_nam_raw instance-attribute

csv_out instance-attribute

kwargs instance-attribute

logger instance-attribute

meta instance-attribute

nam class-attribute instance-attribute

path instance-attribute

pkl_nam instance-attribute

pkl_nam_raw instance-attribute

qc instance-attribute

qc_freq instance-attribute

report_out instance-attribute

reset instance-attribute

Functions

QC_control staticmethod

_QC abstractmethod

__call__

_generate_report

_outlier_process

_process

_raw_reader abstractmethod

_read_raw_files

_run

_save_data

_timeIndex_process

progress_reading

reorder_dataframe_columns staticmethod

update_qc_flag staticmethod

Related Documentation

append `instance-attribute`

csv_nam `instance-attribute`

csv_nam_raw `instance-attribute`

csv_out `instance-attribute`

kwargs `instance-attribute`

logger `instance-attribute`

meta `instance-attribute`

nam `class-attribute` `instance-attribute`

path `instance-attribute`

pkl_nam `instance-attribute`

pkl_nam_raw `instance-attribute`

qc `instance-attribute`

qc_freq `instance-attribute`

report_out `instance-attribute`

reset `instance-attribute`

QC_control `staticmethod`

_QC `abstractmethod`

call

_raw_reader `abstractmethod`

reorder_dataframe_columns `staticmethod`

update_qc_flag `staticmethod`