Skip to content

feat: Add Schema.undo_aliases#293

Open
gab23r wants to merge 8 commits intoQuantco:mainfrom
gab23r:undo-aliases
Open

feat: Add Schema.undo_aliases#293
gab23r wants to merge 8 commits intoQuantco:mainfrom
gab23r:undo-aliases

Conversation

@gab23r
Copy link
Contributor

@gab23r gab23r commented Mar 3, 2026

fixes #292

I went ahead and implemented a fix before approval but this is more to provide a concrete basis for discussion. But of course I am curious about your though and would be happy to change the api or anything.

Motivation

Users coming from pydantic may find the current alias behavior surprising. In pydantic, the alias is used for input parsing, but the attribute name is used when accessing the data:

from pydantic import BaseModel, Field

class Model(BaseModel):
    price: int = Field(alias="price ($)")

m = Model(**{"price ($)": 2})

print(repr(m))
# Model(price=2)

print(m.model_dump())
# {'price': 2}

User might want to get the same behavior with column names in the dataframe.
This PR provides tools to achieve similar behavior in dataframely when desired.

Changes

This PR adds two features for working with column aliases:

  1. use_attribute_names class parameter - Controls how column.name is resolved when accessing columns via schema attributes
  2. undo_aliases() method - Renames DataFrame columns from their alias names to their attribute names

Features

use_attribute_names class parameter

When set to True, accessing a column via a schema attribute returns the attribute name instead of the alias:

class MySchema(dy.Schema, use_attribute_names=True):
    price = dy.Int64(alias="price ($)")

MySchema.price.name  # Returns "price" (attribute name)
MySchema.column_names() # Returns ['price']

This setting is inherited by child schemas.

undo_aliases() method

Renames columns from their alias names to their attribute names:

class MySchema(dy.Schema, use_attribute_names=True):
    price = dy.Int64(alias="price ($)")
    production_rank = dy.Int64(alias="Production rank")

df = pl.DataFrame({"price ($)": [100], "Production rank": [1]})
MySchema.undo_aliases(df)
# shape: (1, 2)
# ┌───────┬─────────────────┐
# │ price ┆ production_rank │
# │ ---   ┆ ---             │
# │ i64   ┆ i64             │
# ╞═══════╪═════════════════╡
# │ 100   ┆ 1               │
# └───────┴─────────────────┘

Works with both DataFrame and LazyFrame.

Important Notes

  • validate() does NOT rename columns - The validated DataFrame keeps the original column names (aliases)
  • Renaming is explicit - Use undo_aliases() when you want to rename columns to attribute names
  • No breaking changes - use_attribute_names defaults to False

Example Workflow

class MySchema(dy.Schema, use_attribute_names=True):
    price = dy.Int64(alias="price ($)")


# Read messy Excel file
df = pl.read_excel("data.xlsx")  # columns: ["price ($)"]

df_renamed = MySchema.undo_aliases(df)
df_renamed.columns  # ["price"]

# Validate
validated = MySchema.validate(df_renamed)
validated.columns  # ["price"]

Copilot AI review requested due to automatic review settings March 3, 2026 23:14
@gab23r gab23r requested a review from delsner as a code owner March 3, 2026 23:14
@github-actions github-actions bot added the enhancement New feature or request label Mar 3, 2026
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds opt-in tooling to work with column aliases vs attribute names in Dataframely schemas, enabling users to explicitly rename aliased DataFrame columns back to schema attribute identifiers (Pydantic-like ergonomics).

Changes:

  • Add use_attribute_names schema class parameter to control whether schema column identifiers use attribute names vs aliases.
  • Track attribute→alias metadata and expose alias→attribute mapping via _alias_mapping().
  • Add Schema.undo_aliases() to rename DataFrame/LazyFrame columns from alias names to attribute names, plus tests.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 6 comments.

File Description
dataframely/_base_schema.py Extends schema metaclass/metadata to support use_attribute_names and alias mappings.
dataframely/schema.py Adds Schema.undo_aliases() public helper to rename columns using alias mapping.
tests/columns/test_alias.py Adds tests for inheritance behavior, alias mapping, and undo_aliases() on eager/lazy frames.

@codecov
Copy link

codecov bot commented Mar 3, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 100.00%. Comparing base (b3edd6a) to head (52f3b0b).

Additional details and impacted files
@@            Coverage Diff            @@
##              main      #293   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           54        54           
  Lines         3121      3127    +6     
=========================================
+ Hits          3121      3127    +6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature request: Option to rename columns from alias to attribute name after validation

2 participants