Skip to content

AbstractReader

The AbstractReader class is the foundation of AeroViz's data reading system, providing a standardized interface for reading and processing aerosol instrument data.

Core Architecture

AbstractReader serves as the base class for all instrument-specific readers in AeroViz. It defines the common interface and provides shared functionality for data processing, quality control, and output formatting.

Overview

The AbstractReader implements a consistent workflow for all aerosol instruments:

  1. Data Ingestion - Read raw instrument files
  2. Format Detection - Automatically identify data structure
  3. Quality Control - Apply built-in validation and filtering
  4. Standardization - Convert to unified output format
  5. Metadata Handling - Preserve instrument and measurement metadata

Usage Pattern

While you can use AbstractReader directly, it's typically accessed through the RawDataReader factory function which automatically selects the appropriate reader based on your instrument type.

Key Features

  • Flexible Input Handling - Supports various file formats and structures
  • Built-in Quality Control - Configurable data validation and filtering
  • Metadata Preservation - Maintains instrument configuration and measurement context
  • Extensible Design - Easy to subclass for new instruments
  • Error Handling - Robust error reporting and recovery

Implementation Note

AbstractReader is an abstract base class. For actual data reading, use instrument-specific implementations or the RawDataReader factory function.

API Reference

AeroViz.rawDataReader.core.AbstractReader

AbstractReader(path: Path | str, reset: bool | str = False, qc: bool | str = True, **kwargs)

Bases: ABC

Abstract class for reading raw data from different instruments. Each instrument should have a separate class that inherits from this class and implements the abstract methods. The abstract methods are _raw_reader and _QC.

List the file in the path and read pickle file if it exists, else read raw data and dump the pickle file the pickle file will be generated after read raw data first time, if you want to re-read the rawdata, please set 'reset=True'

A core initialized method for reading raw data from different instruments.

Parameters:

Name Type Description Default
path str | Path

The path of the raw data file.

required
reset bool | str

Whether to reset the raw data before reading.

False
qc bool | str

Whether to read QC data before reading.

True
**kwargs dict

Additional keyword arguments passed to the reader.

{}

Attributes

nam class-attribute instance-attribute

nam = 'AbstractReader'

path instance-attribute

path = Path(path)

meta instance-attribute

meta = meta[nam]

logger instance-attribute

logger = ReaderLogger(nam, output_folder, upper() if not get('suppress_warnings') else 'ERROR')

reset instance-attribute

reset = reset is True

append instance-attribute

append = reset == 'append'

qc instance-attribute

qc = qc

qc_freq instance-attribute

qc_freq = qc if isinstance(qc, str) else None

kwargs instance-attribute

kwargs = kwargs

pkl_nam instance-attribute

pkl_nam = output_folder / f'_read_{lower()}.pkl'

csv_nam instance-attribute

csv_nam = output_folder / f'_read_{lower()}.csv'

pkl_nam_raw instance-attribute

pkl_nam_raw = output_folder / f'_read_{lower()}_raw.pkl'

csv_nam_raw instance-attribute

csv_nam_raw = output_folder / f'_read_{lower()}_raw.csv'

csv_out instance-attribute

csv_out = output_folder / f'output_{lower()}.csv'

report_out instance-attribute

report_out = output_folder / 'report.json'

Functions

__call__

__call__(start: datetime, end: datetime, mean_freq: str = '1h') -> DataFrame

Process data for specified time range.

_raw_reader abstractmethod

_raw_reader(file)

Implement in child classes to read raw data files.

_QC abstractmethod

_QC(df: DataFrame) -> DataFrame

Implement in child classes for quality control.

__calculate_rates

__calculate_rates(raw_data, qc_data, all_keys=False, with_log=False)

Calculate acquisition rate, yield rate, and total rate.

Parameters:

Name Type Description Default
raw_data DataFrame

Raw data before quality control

required
qc_data DataFrame

Data after quality control

required
all_keys bool

Whether to calculate rates for all deterministic keys

False
with_log bool

Whether to output calculation logs

False

Returns:

Type Description
dict

Dictionary containing calculated rates

_rate_calculate

_rate_calculate(raw_data, qc_data) -> None

__generate_grouped_report

__generate_grouped_report(current_time, weekly_raw_groups, weekly_qc_groups, monthly_raw_groups, monthly_qc_groups)

Generate acquisition and yield reports based on grouped data

_timeIndex_process

_timeIndex_process(_df, user_start=None, user_end=None, append_df=None)

Process time index, resample data, extract specified time range, and optionally append new data.

Parameters:

Name Type Description Default
_df DataFrame

Input DataFrame with time index

required
user_start datetime or str

Start of user-specified time range

None
user_end datetime or str

End of user-specified time range

None
append_df DataFrame

DataFrame to append to the result

None

Returns:

Type Description
DataFrame

Processed DataFrame with properly formatted time index

_outlier_process

_outlier_process(_df)

Process outliers.

_save_data

_save_data(raw_data: DataFrame, qc_data: DataFrame) -> None

Save data to files.

progress_reading

progress_reading(files: list) -> Generator

_read_raw_files

_read_raw_files() -> tuple[DataFrame | None, DataFrame | None]

Read and process raw files.

_run

_run(user_start, user_end)

reorder_dataframe_columns staticmethod

reorder_dataframe_columns(df, order_lists: list[list], keep_others: bool = False)

Reorder DataFrame columns.

time_aware_IQR_QC staticmethod

time_aware_IQR_QC(df: DataFrame, time_window='1D', log_dist=False) -> DataFrame

filter_error_status staticmethod

filter_error_status(_df, error_codes, special_codes=None)

Filter data containing specified error status codes and specially handle certain specific codes.

Parameters:

Name Type Description Default
_df DataFrame

A DataFrame containing a 'Status' column

required
error_codes list

A List of status codes for bitwise testing

required
special_codes list

List of special status codes for exact matching

None

Returns:

Type Description
DataFrame

Filtered DataFrame

Notes

This function performs two types of filtering: 1. Bitwise filtering that checks if any error_codes are present in the Status 2. Exact matching for special_codes

options: show_source: false show_bases: true show_inheritance_diagram: false members_order: alphabetical show_if_no_docstring: false filters:

  • "!^_"
  • "!^init" docstring_section_style: table heading_level: 3 show_signature_annotations: true separate_signature: true group_by_category: true show_category_heading: true

Quick Example

```python from AeroViz import RawDataReader from datetime import datetime

# Using the factory function (recommended)
data = RawDataReader(
    instrument='AE33',
    path='/path/to/data',
    start=datetime(2024, 1, 1),
    end=datetime(2024, 12, 31)
)

# Direct usage (advanced - for custom implementations)
from AeroViz.rawDataReader.core import AbstractReader

class MyInstrumentReader(AbstractReader):
    nam = 'MyInstrument'

    def _raw_reader(self, file):
        # Custom file reading logic
        pass

    def _QC(self, df):
        # Custom QC logic
        return df
```