Skip to content

[RFC] Reproducibility Metadata: Auto-generate STUDY.json in Generated Artifacts #502

@GaneshPatil7517

Description

@GaneshPatil7517

Problem

When concore init, concore setup, and concore compose generate study
artifacts (concore.toml, docker-compose.yml, runner.py), they don't record
how and when they were generated. Five years from now, a researcher
reproducing a historical study won't know:

  • What tool versions were used when this study was generated?
  • Which compiler/runtime was detected on the original machine?
  • What concore version created these files?
  • Were any manual edits made after generation?

This violates concore's core mission: reproducible science.

Proposed Solution

When concore init --interactive, concore setup, or concore compose
generate a study, automatically create a STUDY.json metadata file:

{
  "generated_by": "concore setup",
  "concore_version": "1.0.0",
  "timestamp": "2026-03-15T10:30:00Z",
  "python_version": "3.13.12",
  "tools_detected": {
    "g++": "6.3.0 at /usr/bin/g++",
    "python3": "3.13.12 at /usr/bin/python3",
    "docker": "not found"
  },
  "platform": "Windows-10",
  "study_name": "my_control_study",
  "checksums": {
    "concore.toml": "sha256:abc123...",
    "docker-compose.yml": "sha256:def456..."
  }
}

Researchers can then verify that the study is reproducible by checking:

  1. Tool versions match what was originally available
  2. File integrity (checksums haven't drifted)
  3. concore version compatibility

Benefits:

  1. Generated artifacts become self-documenting
  2. Reproducibility is auditable
  3. Historical studies can be traced and verified
  4. Fits the principle: "Generated files should be independent of the generator"

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions