Skip to content

GH-48846: [C++] Read message metadata and body in one go in IPC file reader#48975

Merged
pitrou merged 5 commits intoapache:mainfrom
abhishek593:feature/ipc-read-message-optimization
Feb 17, 2026
Merged

GH-48846: [C++] Read message metadata and body in one go in IPC file reader#48975
pitrou merged 5 commits intoapache:mainfrom
abhishek593:feature/ipc-read-message-optimization

Conversation

@abhishek593
Copy link
Contributor

@abhishek593 abhishek593 commented Jan 24, 2026

Rationale for this change

ReadMessageAsync takes a body_length parameter and reads Message metadata + body in one go, but the blocking version ReadMessage reads the body length from the Message and issues a second read for the body. This PR adds a ReadMessage overload that takes the body length as parameter and does a single read like the async version does. This reduces the number of IOs issued by the IPC file reader.

What changes are included in this PR?

  1. Add ReadMessage overload accepting body_length
  2. Update IPC file reader to use the new ReadMessage overload when reading full record batches

Are these changes tested?

Yes, added TestReadMessage.ReadBodyWithLength and updated other tests to use the new overload.

Are there any user-facing changes?

No.

@github-actions
Copy link

⚠️ GitHub issue #48846 has been automatically assigned in GitHub to PR creator.

Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the submission @abhishek593. This looks good generally but I've made a number of suggestions below, feel free to update your PR when you have some time!

const int64_t offset, const int32_t metadata_length, io::RandomAccessFile* file,
const FieldsLoaderFunction& fields_loader = {});

/// \brief Read encapsulated RPC message from position in file
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I think "RPC" is a typo, it should be "IPC", could you perhaps fix all occurrences in this file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines -3006 to -3010
// there are 3 read IOs before reading body:
// 1) read magic and footer length IO
// 2) read footer IO
// 3) read record batch metadata IO
EXPECT_EQ(read_ranges.size(), 3 + expected_body_read_lengths.size());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, can we keep a less strict assertion to check that we can at least access the two first elements without segfaulting?
Something like:

  // there are at least 2 read IOs before reading body:
  // 1) read magic and footer length IO
  // 2) read footer IO
  EXPECT_GE(read_ranges.size(), 2);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines 3045 to 3048
// there are 3 read IOs before reading body:
// 1) read magic and footer length IO
// 2) read footer IO
// 3) read record batch metadata IO
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment only matches one of the two code paths. Perhaps fix it or use a separate comment per code path?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

int64_t total_body = 0;
for (auto len : expected_body_read_lengths) total_body += len;

EXPECT_GT(read_ranges[2].length, total_body);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a small comment explaining why? Also can take inspiration from the other code path and add:

  EXPECT_LE(read_ranges[2].length, total_body + footer_length);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@github-actions github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Feb 16, 2026
@abhishek593 abhishek593 force-pushed the feature/ipc-read-message-optimization branch from 10990b2 to cc7dd86 Compare February 16, 2026 17:00
@abhishek593 abhishek593 requested a review from pitrou February 16, 2026 20:27
Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking excellent now, thank you @abhishek593 . I'm just going to push a minor cosmetic change and we'll be able to merge if CI passes.

@pitrou pitrou force-pushed the feature/ipc-read-message-optimization branch from cc7dd86 to ff6a9e5 Compare February 17, 2026 11:23
@pitrou pitrou changed the title GH-48846: [C++] Optimize ReadMessage to read metadata and body in one go GH-48846: [C++] Read message metadata and body in one go in IPC file reader Feb 17, 2026
@pitrou pitrou merged commit 7c45228 into apache:main Feb 17, 2026
55 of 57 checks passed
@pitrou pitrou removed the awaiting committer review Awaiting committer review label Feb 17, 2026
@pitrou
Copy link
Member

pitrou commented Feb 17, 2026

This is now merged, thanks again @abhishek593 :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants