Skip to content

Make it easier for agents to generate datafusion-python code #1394

@timsaucer

Description

@timsaucer

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

More and more frequently users are reaching for LLMs to generate code and solve problems. We should add to our repository instructions to aid the LLMs in building datafusion-python code.

Describe the solution you'd like

According to my very quick research into the topic, a llms.txt file seems to be one emerging standard. I know some repositories have opted for a CLAUDE.md file as well. I think part of this issue will be to investigate what the emerging standards are and what we need to do to ensure the major agents out there are able to use these instructions.

Additionally, since we will have users coming from different communities it is probably helpful to have LLM oriented instructions for how to rewrite queries from other dataframe APIs into DataFusion.

We should cover:

  • Spark
  • Pandas
  • Polars

Additionally there are probably recommendations for how we update our docstrings to make sure they are easily usable by the LLMs.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions