Skip to content

RFC: Should main be patched to latest datafusion periodically? #1399

@timsaucer

Description

@timsaucer

A question came up in the discord server that would be useful to get wider audience on. We have two PRs that are nearly ready to merge, but they rely on features that will be in the next release:

The question is: Should we allow a [patch.crates-io] section in Cargo.toml?

Reasons I know of why we should patch on main:

  • It will allow us to reduce the time between when DataFusion upstream release and when we release a Python version by keeping us more in line with changes upstream as they come.
  • It will possibly reduce churn on developers who write a feature for the next release and then have to do rebasing on their PR to get it back to working when the upstream finally releases.
  • It's nice for the developer to get all their work in faster while it's fresh in their mind.

Reasons I know of why we should not patch on main:

  • It pushes additional work onto whomever wants to update to latest which involves unrelated changes as main updates upstream.
  • Sometimes it creates additional churn as upstream changes are not always stable until the the release happens. There is definite potential for rework.
  • We cannot release to crates.io with a patch section. It will be ignored. We would need to add a CI check to our release workflow to ensure there are no patches.
  • We would need to ensure the patches point to a specific commit and that commit should be on main upstream. This isn't a reason against it per se, but it is something we would want to remain cognizant of.

Personally, I don't have a very strong opinion on which is the better route, which is why I'm requesting comments.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions