Skip to content

Scanning Mobility Particle Sizer (SMPS)

The SMPS is an instrument used for measuring particle size distributions in the nanometer range.

AeroViz.rawDataReader.script.SMPS.Reader

Reader(path: Path | str, reset: bool | str = False, qc: bool | str = True, **kwargs)

Bases: AbstractReader

SMPS (Scanning Mobility Particle Sizer) Data Reader

A specialized reader for SMPS data files, which measure particle size distributions in the range of 11.8-593.5 nm.

See full documentation at docs/source/instruments/SMPS.md for detailed information on supported formats and QC procedures.

Attributes

nam class-attribute instance-attribute

nam = 'SMPS'

MIN_HOURLY_COUNT class-attribute instance-attribute

MIN_HOURLY_COUNT = 5

MIN_TOTAL_CONC class-attribute instance-attribute

MIN_TOTAL_CONC = 2000

MAX_TOTAL_CONC class-attribute instance-attribute

MAX_TOTAL_CONC = 10000000.0

MAX_LARGE_BIN_CONC class-attribute instance-attribute

MAX_LARGE_BIN_CONC = 4000

LARGE_BIN_THRESHOLD class-attribute instance-attribute

LARGE_BIN_THRESHOLD = 400

STATUS_COLUMN class-attribute instance-attribute

STATUS_COLUMN = 'Status Flag'

STATUS_OK class-attribute instance-attribute

STATUS_OK = 'Normal Scan'

SECONDARY_STATUS_COLUMN class-attribute instance-attribute

SECONDARY_STATUS_COLUMN = 'Instrument Errors'

METADATA_ALIASES class-attribute instance-attribute

METADATA_ALIASES = {'Total Concentration (#/cm³)': 'Total Conc. (#/cm)', 'Aerosol Temperature (C)': 'Sample Temp (C)', 'Aerosol Humidity (%)': 'Relative Humidity (%)', 'Aerosol Density (g/cm³)': 'Density (g/cm)', 'Impactor D50 (nm)': 'D50 (nm)', 'Test Name': 'Title', 'Geo. Std. Dev': 'Geo. Std. Dev.', 'DMA Column transit time Tf (s)': 'tf (s)', 'DMA Exit to Optical Detector Td (s)': 'td + 0.5 (s)'}

Functions

__call__

__call__(start=None, end=None, mean_freq=None)

Return the dN/dlogDp distribution; write S/V + a stats sidecar.

The parent pipeline produces the QC-applied, resampled dN/dlogDp frame (diameters in nm as columns) and stamps df.attrs. We then write the number / surface / volume distributions and a QC-aligned statistics file next to the main output. Pass append_stats=True to also append the statistics columns to the returned frame (default keeps it a clean PSD matrix for psd_stats / merge_psd / SizeDist).

_raw_reader

_raw_reader(file)

Read and parse raw SMPS data files.

Returns all columns from the raw file. Column selection is deferred to _QC() and _process() stages.

Supported formats: - S80 TXT (AIM old): tab-separated, header at 'Sample #' - S82 TXT (AIM 10.3): tab-separated, header at 'Sample #' - CSV (AIM 11.x): comma-separated, header at 'Scan Number'

_bin_signature staticmethod

_bin_signature(df)

Stable fingerprint of a file's size-bin grid (sorted tuple of diameter columns, rounded to 2 decimals so trivial float jitter doesn't split otherwise-identical scans into different groups).

_partition_compatible_scans

_partition_compatible_scans(df_list, files)

Keep files whose size-bin grid matches the dominant group; drop the rest so the concat sees one consistent schema.

The grouping fingerprint is the file's sorted size-bin tuple. The "dominant" group is picked by total row count, not file count — this stops a swarm of tiny files from outvoting one large-but-typical file. The minority files are not silently discarded: every dropped file is named in a warning, so the user can re-run them in isolation (different folder, or with size_range=) if both grids are wanted.

_QC

_QC(_df)

Perform quality control on SMPS particle size distribution data.

QC Rules Applied
  1. Status Error : Non-empty status flag indicates instrument error
  2. Insufficient : Less than 5 measurements per hour
  3. Invalid Number Conc : Total number concentration outside valid range (2000-1e7 #/cm³)
  4. DMA Water Ingress : Bins >400nm with concentration > 4000 dN/dlogDp (indicates water in DMA)

_process

_process(_df)

Return the QC'd dN/dlogDp size bins (plus QC_Flag).

The size distribution itself is the canonical SMPS product. Summary statistics (total / GMD / GSD / mode, mode fractions) and the surface and volume distributions are derived quantities — compute them on demand with :func:AeroViz.psd_stats / :func:AeroViz.psd_distributions rather than baking them into the reader output. This keeps the reader's return type a plain dN/dlogDp DataFrame (diameters as columns), which is exactly what psd_stats / merge_psd / SizeDist consume.

Data Format

  • File format:
    • .txt files (tab-delimited) from AIM 8.x / 9.x / 10.3
    • .csv files (comma-delimited) from AIM 11.x
  • Sampling frequency: 6 minutes (typical)
  • File naming pattern: *.txt or *.csv
  • Timestamp formats:
    • mm/dd/yy HH:MM:SS (US format, older versions)
    • mm/dd/yyyy HH:MM:SS (US format, newer versions)
    • dd/mm/yyyy HH:MM:SS (EU format)
  • Default size grid: 11.8–593.5 nm (110 bins) on AIM 10.3; 11.34–615.27 nm (112 bins) on AIM 11.x. A folder mixing both versions cannot be outer-joined safely — see "Mixed AIM versions" below.

Measurement Parameters

The SMPS provides particle size distribution measurements:

Parameter Value Description
Size range 11.8-593.5 nm Default particle diameter range
Output dN/dlogDp Number concentration per size bin
Unit #/cm³ Particle number concentration

Data Processing

Data Reading

  • Automatically detects and skips header rows
  • Supports multiple date formats based on AIM version
  • Handles transposed data formats
  • Extracts and sorts particle size columns numerically
  • Validates size range against expected settings

Quality Control

The SMPS reader uses the declarative QCFlagBuilder system with the following rules:

+-----------------------------------------------------------------------+
|                         QC Thresholds                                 |
+-----------------------------------------------------------------------+
| MIN_HOURLY_COUNT  = 5        measurements per hour                    |
| MIN_TOTAL_CONC    = 2000     #/cm³                                    |
| MAX_TOTAL_CONC    = 1e7      #/cm³                                    |
| MAX_LARGE_BIN_CONC= 4000     dN/dlogDp (DMA water ingress indicator)  |
| LARGE_BIN_THRESH  = 400      nm                                       |
| STATUS_OK         = "Normal Scan"   (for `Status Flag` column)        |
| SECONDARY status  = empty or "Normal Scan" (`Instrument Errors` col)  |
+-----------------------------------------------------------------------+

+-----------------------------------------------------------------------+
|                            _QC() Pipeline                             |
+-----------------------------------------------------------------------+
|                                                                       |
|  [Pre-process] Apply size range filter, calculate total concentration |
|       |                                                               |
|       v                                                               |
|  +-----------------------------------------+                          |
|  | Rule: Status Error  (OR of two columns) |                          |
|  +-----------------------------------------+                          |
|  | `Status Flag` != "Normal Scan", OR      |                          |
|  | `Instrument Errors` non-empty           |                          |
|  | (each empty form '' / 'nan' / 'None'    |                          |
|  | and "Normal Scan" exempted; tokens in   |                          |
|  | `ignored_status_errors` exempted)       |                          |
|  +-----------------------------------------+                          |
|           |                                                           |
|           v                                                           |
|  +-------------------------+    +-------------------------+           |
|  | Rule: Insufficient      |    | Rule: Invalid Number    |           |
|  +-------------------------+    |       Conc              |           |
|  | < 5 measurements        |    +-------------------------+           |
|  | per hour                |    | Total conc. outside     |           |
|  +-------------------------+    | range (2000-1e7 #/cm³)  |           |
|                                 +-------------------------+           |
|           |                              |                            |
|           v                              v                            |
|           |                     +-------------------------+           |
|           |                     | Rule: DMA Water Ingress |           |
|           |                     +-------------------------+           |
|           |                     | Bins > 400nm with       |           |
|           |                     | conc. > 4000 dN/dlogDp  |           |
|           |                     | (indicates water in DMA)|           |
|           |                     +-------------------------+           |
|                                                                       |
+-----------------------------------------------------------------------+

QC Rules Applied

Rule Condition Description
Status Error Status Flag ≠ "Normal Scan" OR Instrument Errors non-empty Older AIM 10.3 sub-versions use Status Flag (e.g. "Conditioner Temperature Error"); newer AIM 10.3 + AIM 11.x put hardware warnings in Instrument Errors (e.g. "Low aerosol flow"). Both columns are checked and OR'd. Empty cells ('', 'nan', Python-None-stringified 'None') and the positive "Normal Scan" sentinel (some instruments write it into Instrument Errors instead of leaving it empty, e.g. FS) are never errors. Tokens listed in the ignored_status_errors kwarg are exempted, with comma-split semantics — "Low aerosol flow,Neutralizer not active" passes when both tokens are whitelisted.
Insufficient < 5 measurements/hour Less than 5 measurements per hour
Invalid Number Conc Total < 2000 OR > 1e7 #/cm³ Total number concentration outside valid range
DMA Water Ingress Bins >400nm > 4000 dN/dlogDp Water contamination in DMA column

Whitelisting benign status warnings

If an instrument runs in a known-low-aerosol mode and reports "Low aerosol flow" on every scan, every row trips Status Error. Pass ignored_status_errors=[...] to RawDataReader to suppress those tokens:

df = RawDataReader(
    instrument='SMPS', path='/data/TP_SMPS',
    start='2026-01-01', end='2026-05-31',
    ignored_status_errors=['Low aerosol flow', 'Neutralizer not active'],
)

Token-level matching: a row passes when EVERY comma-split token is either the OK sentinel or in the whitelist. "Low aerosol flow,Sheath flow error" still fails because Sheath flow error is not whitelisted.

Size Distribution QC Visualization

    dN/dlogDp
       ^
       |
  4000 +                              ........... MAX_LARGE_BIN_CONC
       |      ___                     :          (DMA water ingress)
       |     /   \                    :
       |    /     \____               :
       +---+------+-------+-------+---+---> Dp (nm)
          11.8   100     400    593.5
                          ^
                    LARGE_BIN_THRESHOLD

Output Data

The processed data contains:

Column Unit Description
Size bins dN/dlogDp Number concentration for each particle size

QC_Flag Handling

  • The intermediate file (_read_smps_qc.pkl/csv) contains the QC_Flag column
  • The final output has invalid data set to NaN and QC_Flag column removed

Mixed AIM versions

A single folder is allowed to contain a mix of AIM versions, but the reader treats each scan grid as its own logical instrument — concatenating mismatched grids would create NaN-only columns that downstream completeness checks read as 100% Insufficient.

Partition by bin grid (auto-applied): Files are bucketed by their sorted size-bin tuple. The bucket with the most rows wins; the others are dropped before concat with a warning naming every skipped file. The picked-by-row-count rule means a swarm of tiny minority files cannot outvote one large export just by file count.

To process the dropped bucket explicitly, either move those files to a separate folder, or pass size_range= to force _raw_reader to reject files outside that exact range:

# Process only AIM 11.x scans in a mixed folder:
df = RawDataReader('SMPS', path=mixed_folder, size_range=(11.34, 615.27))

Metadata column aliases: AIM 11.x renames many metadata columns that carry the same physical quantity as AIM 10.3. The reader rewrites the AIM 11.x form to the AIM 10.3 form on every parsed file, so a folder of either version (or a partitioned-down folder) produces a consistent schema downstream. The 9 renamed pairs:

AIM 11.x AIM 10.3 canonical
Total Concentration (#/cm³) Total Conc. (#/cm)
Aerosol Temperature (C) Sample Temp (C)
Aerosol Humidity (%) Relative Humidity (%)
Aerosol Density (g/cm³) Density (g/cm)
Impactor D50 (nm) D50 (nm)
Test Name Title
Geo. Std. Dev Geo. Std. Dev.
DMA Column transit time Tf (s) tf (s)
DMA Exit to Optical Detector Td (s) td + 0.5 (s)

AIM 11.x columns that have no AIM 10.3 equivalent — the 4-way error split (Classifier Errors / Detector Status / Communication Status / Neutralizer Status), granular DMA timings (THIGH / TLOW / TUP / TDOWN), Sheath Pressure/Temp/Humidity, etc. — are intentionally kept under their AIM 11.x names. Collapsing them onto AIM 10.3's coarser Instrument Errors / Scan Time would lose information.

Notes

  • Different AIM software versions may produce different file formats — see "Mixed AIM versions" above for how the reader isolates and reconciles them
  • Size range validation ensures data quality
  • DMA water ingress detection: High concentrations in bins >400nm indicate water contamination in the DMA column
  • Automatic format detection and parsing