Skip to content

SMEAR-EE/SCD41_DATA

Repository files navigation

SMEAR Estonia CO2 Sensor Processing

Version License Python

Configurable, reproducible preprocessing pipeline for SCD41 and LGR CO2 sensor data collected at SMEAR Estonia (smear.emu.ee). Transforms raw JSON message logs into quality-controlled 10-minute aggregates in CSV and Parquet formats. Accompanies: "Fourteen-month co-located SCD41 low-cost IoT CO2 sensor dataset from hemiboreal Estonia", Scientific Data (nature), [status: in review].

Status: v0.4.0 - Stable release with comprehensive validation suite and enhanced data quality reporting

Quick Start

# 1. Setup (one-time)
./setup.sh

# 2. Activate environment
source venv/bin/activate

# 3. Configure sensors (copy template, then edit paths)
cp config/sensor_config.yaml config/sensors.yaml
# edit config/sensors.yaml  # Set your data paths

# 4. Run processing
jupyter notebook process_data.ipynb

Processed data appears in output/.

What It Does

Processes many sensors (4x SCD41, 1x LGR reference) with:

  • Time-synced resampling to 10-min intervals (epoch-aligned)
  • Quality control tracking uptime, data quality flags, validation report
  • Comprehensive validation suite with JSON/Markdown reports and coverage metrics
  • Multi-out CSV and Parquet formats
  • Visuals uptime, validation, and QC plots

(Default) Output Structure

Output Location
Sensors output/sensors/
Merged output/merged/
Validation output/validation/
Logs output/logs/

Configuration

Copy the template and edit config/sensors.yaml to set input data paths:

cp config/sensor_config.yaml config/sensors.yaml
sensors:
  - name: CO2_SCT1_2M
    input_dir: /path/to/your/data  # ← Change this
    sensor_type: scd41
    # Other settings auto-configured

Documentation

Troubleshooting

Import errors? Ensure virtual environment is active: source venv/bin/activate

No data found? Check paths in config/sensors.yaml and logs: tail -f output/logs/<logfile>.log

Reprocess data:

pipeline = SensorPipeline(config, force=True)
pipeline.run_all(year=2024, month=10)

Container Deployment

docker compose up or podman-compose up

See DOCKER.md for full setup instructions: volume mounts, config path notes, Docker/Podman commands, and output structure.

Requires editing volume paths in docker-compose.yaml — see DOCKER.md for details.

Requirements

  • Python 3.10+
  • 8 GB RAM minimum, 16 GB recommended (pandas loads large DataFrames day-by-day)
  • Modern multi-core CPU (e.g. Intel Core i5/Ultra 5 or equivalent)
  • ~500 MB disk space for dependencies, plus space for raw and processed data
  • Dependencies installed via requirements.txt: pandas, numpy, pyarrow, matplotlib, seaborn

Citation

Zorec, M. et al. (2026). SMEAR Estonia CO2 Sensor Processing Pipeline.
Version 0.4.0. GitHub: SMEAR-EE/SCD41_DATA
Dataset available at: 10.5281/zenodo.18984845

Version

Current Release: v0.4.0 (Stable)

License

About

Configurable, reproducible preprocessing pipeline for SCD41 and LGR CO2 sensor data collected at SMEAR Estonia (smear.emu.ee). Transforms raw JSON message logs into quality-controlled 10-minute aggregates in CSV and Parquet formats.

Topics

Resources

License

Stars

Watchers

Forks

Contributors