RawDataReader
Factory function for reading and processing instrument data in AeroViz.
Overview
RawDataReader is a factory function that provides a unified interface for reading and processing data from various scientific instruments. It automatically handles data loading, quality control, and time series processing.
Function Signature
AeroViz.rawDataReader.RawDataReader
RawDataReader(instrument: str, path: Path | str, reset: bool | str = False, qc: bool | str = True, start: datetime | str = None, end: datetime | str = None, mean_freq: str = '1h', size_range: tuple[float, float] | None = None, suppress_warnings: bool = False, log_level: Literal['DEBUG', 'INFO', 'WARNING', 'ERROR'] = 'INFO', **kwargs)
Factory function to instantiate the appropriate reader module for a given instrument and return the processed data over the specified time range.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
instrument
|
str
|
The instrument name for which to read data, must be a valid key in the meta dictionary |
required |
path
|
Path or str
|
The directory where raw data files for the instrument are stored |
required |
reset
|
bool or str
|
Data processing control mode: False (default) - Use existing processed data if available True - Force reprocess all data from raw files 'append' - Add new data to existing processed data |
False
|
qc
|
bool or str
|
Quality control and rate calculation mode: True (default) - Apply QC and calculate overall rates False - Skip QC and return raw data only str - Calculate rates at specified intervals: 'W' - Weekly rates 'MS' - Month start rates 'QS' - Quarter start rates 'YS' - Year start rates Can add number prefix (e.g., '2MS' for bi-monthly) |
True
|
start
|
datetime
|
Start time for filtering the data |
None
|
end
|
datetime
|
End time for filtering the data |
None
|
mean_freq
|
str
|
Resampling frequency for averaging the data (e.g., '1h' for hourly mean) |
'1h'
|
size_range
|
tuple[float, float]
|
Size range in nanometers (min_size, max_size) for SMPS/APS data filtering |
None
|
suppress_warnings
|
bool
|
Whether to suppress warning messages (default: False) |
False
|
log_level
|
(DEBUG, INFO, WARNING, ERROR)
|
Logging level (default: 'INFO') |
'DEBUG'
|
**kwargs
|
Additional arguments to pass to the reader module |
{}
|
Returns:
Type | Description |
---|---|
DataFrame
|
Processed data with specified QC and time range |
Raises:
Type | Description |
---|---|
ValueError
|
If QC mode or mean_freq format is invalid |
TypeError
|
If parameters are of incorrect type |
KeyError
|
If instrument name is not found in the supported instruments list |
FileNotFoundError
|
If path does not exist or cannot be accessed |
Examples:
>>> from AeroViz import RawDataReader
>>>
>>> # Using string inputs
>>> df_ae33 = RawDataReader(
... instrument='AE33',
... path='/path/to/your/data/folder',
... reset=True,
... qc='1MS',
... start='2024-01-01',
... end='2024-06-30',
... mean_freq='1h',
... )
>>> # Using Path and datetime objects
>>> from pathlib import Path
>>> from datetime import datetime
>>>
>>> df_ae33 = RawDataReader(
... instrument='AE33',
... path=Path('/path/to/your/data/folder'),
... reset=True,
... qc='1MS',
... start=datetime(2024, 1, 1),
... end=datetime(2024, 6, 30),
... mean_freq='1h',
... )
Basic Usage
from pathlib import Path
from datetime import datetime
from AeroViz import RawDataReader
data = RawDataReader(
instrument='AE33',
path=Path('/path/to/data'),
start=datetime(2024, 2, 1),
end=datetime(2024, 8, 31),
mean_freq='1h'
)
More Examples
Scenario 1: Basic Usage with NEPH Instrument
neph_data = RawDataReader(
instrument='NEPH',
path=Path('/path/to/your/data/folder'),
reset=True,
start=datetime(2024, 2, 1),
end=datetime(2024, 4, 30),
mean_freq='1h'
)
Console Output:
╔════════════════════════════════════════════════════════════════════════════════╗
║ Reading NEPH RAW DATA from 2024-02-01 00:00:00 to 2024-04-30 23:59:59 ║
╚════════════════════════════════════════════════════════════════════════════════╝
▶ Reading NEPH files ━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00 file_name.dat
▶ Scatter Coe. (550 nm)
├─ Sample Rate : 100.0%
├─ Valid Rate : 100.0%
└─ Total Rate : 100.0%
Expected Output:
- Hourly averaged NEPH data for the entire year.
- Will include scattering coefficients and other NEPH-related metrics.
Scenario 2: AE33 with Quality Control and Rate Calculation
ae33_data = RawDataReader(
instrument='AE33',
path=Path('/path/to/your/data/folder'),
reset=True,
qc='1MS', # print qc each month
start=datetime(2024, 1, 1),
end=datetime(2024, 8, 31),
mean_freq='1h',
)
Console Output:
╔════════════════════════════════════════════════════════════════════════════════╗
║ Reading AE33 RAW DATA from 2024-02-01 00:00:00 to 2024-05-31 23:59:59 ║
╚════════════════════════════════════════════════════════════════════════════════╝
▶ Reading AE33 files ━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00 AE33_AE33-S07-00599_20240225.dat
AE33_AE33-S07-00599_20240704.dat may not be a whole daily data. Make sure the file is correct. # some warming or
AE33_AE33-S07-00599_20240711.dat may not be a whole daily data. Make sure the file is correct. # error print
▶ Processing: 2024-02-01 to 2024-02-29
▶ BC Mass Conc. (880 nm)
├─ Sample Rate : 26.3%
├─ Valid Rate : 99.5%
└─ Total Rate : 26.1%
▶ Processing: 2024-03-01 to 2024-03-31
▶ BC Mass Conc. (880 nm)
├─ Sample Rate : 100.0%
├─ Valid Rate : 100.0%
└─ Total Rate : 100.0%
▶ Processing: 2024-04-01 to 2024-04-30
▶ BC Mass Conc. (880 nm)
├─ Sample Rate : 100.0%
├─ Valid Rate : 100.0%
└─ Total Rate : 100.0%
▶ Processing: 2024-05-01 to 2024-05-31
▶ BC Mass Conc. (880 nm)
├─ Sample Rate : 100.0%
├─ Valid Rate : 100.0%
└─ Total Rate : 100.0%
Expected Output:
- Hourly AE33 data with quality control applied monthly.
- Includes black carbon concentrations and absorption coefficients.
- Will generate a CSV file with the processed data.
Scenario 3: SMPS with Specific Time Range
smps_data = RawDataReader(
instrument='SMPS',
path=Path('/path/to/your/data/folder'),
start=datetime(2024, 2, 1),
end=datetime(2024, 8, 31),
mean_freq='30min',
size_range=(11.8, 593.5) # user input size range
)
Console Output:
╔════════════════════════════════════════════════════════════════════════════════╗
║ Reading SMPS RAW DATA from 2024-02-01 00:00:00 to 2024-08-31 23:59:59 ║
╚════════════════════════════════════════════════════════════════════════════════╝
▶ Reading SMPS files ━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00 240817.txt
SMPS file: 240816.txt is not match the default size range (11.8, 593.5), it is (11.0, 593.5) # print the unmatch file
▶ Bins
├─ Sample Rate : 1.7%
├─ Valid Rate : 93.3%
└─ Total Rate : 1.6%
Expected Output:
- SMPS data for the summer months (June to August).
- 30-minute averaged data points.
- Includes particle size distribution information.
Advanced Features
Size Range Filtering
For size-resolved instruments (SMPS, APS, GRIMM):
data = RawDataReader(
instrument="SMPS",
path="data/",
start="2024-01-01",
end="2024-01-31",
size_range=(10, 500) # nm
)
Quality Control and Rate Calculation
data = RawDataReader(
instrument='AE33',
path=Path('/path/to/data'),
reset=True,
qc='1MS', # Calculate and print QC rates monthly
start=datetime(2024, 1, 1),
end=datetime(2024, 12, 31),
)
Example console output:
▶ Processing: 2024-02-01 to 2024-02-29
▶ BC Mass Conc. (880 nm)
├─ Sample Rate : 26.3%
├─ Valid Rate : 99.5%
└─ Total Rate : 26.1%
Output Files
After processing, the following files are generated in the {instrument}_outputs
directory:
_read_{instrument}_raw.csv
: Merged raw data with original time resolution_read_{instrument}_raw.pkl
: Raw data in pickle format_read_{instrument}.csv
: Quality controlled data_read_{instrument}.pkl
: QC data in pickle formatOutput_{instrument}
: Final processed data file{instrument}.log
: Processing log file
Supported Instruments
For detailed specifications of supported instruments, see Instruments API Reference.
See Also
- Base Class API - Documentation for the abstract base class
- Quality Control API - Details about quality control implementation