Skip to content

[Core] Fix TASK_PROFILE_EVENT aggregation for multiple phases#61559

Open
Anarion-zuo wants to merge 2 commits intoray-project:masterfrom
Anarion-zuo:aaronzuo/fix_multi_phase
Open

[Core] Fix TASK_PROFILE_EVENT aggregation for multiple phases#61559
Anarion-zuo wants to merge 2 commits intoray-project:masterfrom
Anarion-zuo:aaronzuo/fix_multi_phase

Conversation

@Anarion-zuo
Copy link
Contributor

@Anarion-zuo Anarion-zuo commented Mar 7, 2026

Fix TASK_PROFILE_EVENT aggregation to preserve all phases

Description

Fixes an issue where the Ray Event Export API only exported a single profile event per task attempt instead of all the execution phases (like task:deserialize_arguments , task:execute , task:store_outputs , etc.). The problem was in TaskProfileEvent::ToRpcRayEvents , which unconditionally overwrote the task_profile_event in the RayEventsTuple instead of reusing the existing one and appending new events.

To test this fix:

bazel test //src/ray/core_worker/tests:task_event_buffer_test --test_timeout=300

Related issues

#61520

Additional information

  • Fix applied : Modified TaskProfileEvent::ToRpcRayEvents in src/ray/core_worker/task_event_buffer.cc to check if a profile event already exists and append new phases instead of overwriting
  • Test added : Added TestTaskProfileEventToRpcRayEventsMultipleEvents in src/ray/core_worker/tests/task_event_buffer_test.cc to verify all phases are exported
  • All relevant tests pass
    Now the Ray Event Export API will export the full set of task profile phases (just like the Dashboard timeline), so you can reconstruct the complete task execution timeline from the exported events!

Signed-off-by: aaronzuo <anarionzuo@outlook.com>
@Anarion-zuo Anarion-zuo requested a review from a team as a code owner March 7, 2026 15:51
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly fixes an issue where only a single profile event was being exported per task attempt, instead of aggregating all execution phases. The modification in TaskProfileEvent::ToRpcRayEvents to append new profile events is sound, and the new test TestTaskProfileEventToRpcRayEventsMultipleEvents provides good coverage for this fix. I have one suggestion to simplify the implementation in task_event_buffer.cc for improved readability and maintainability.

Comment on lines +433 to +449
rpc::events::RayEvent *ray_event_ptr;
if (!ray_events_tuple.task_profile_event) {
auto &ray_event = ray_events_tuple.task_profile_event.emplace();
PopulateRpcRayEventBaseFields(ray_event, timestamp);
ray_event_ptr = &ray_event;

// Populate the task profile event base fields
auto *task_profile_events = ray_event_ptr->mutable_task_profile_events();
task_profile_events->set_task_id(task_id_.Binary());
task_profile_events->set_job_id(job_id_.Binary());
task_profile_events->set_attempt_number(attempt_number_);
} else {
ray_event_ptr = &ray_events_tuple.task_profile_event.value();
}

// Add this profile event to the events list
auto *task_profile_events = ray_event_ptr->mutable_task_profile_events();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The logic to get or create the RayEvent and then retrieve the TaskProfileEvents can be simplified. Using an intermediate raw pointer ray_event_ptr makes the code a bit indirect. You can work directly with a pointer to TaskProfileEvents to make the logic more straightforward and reduce redundancy.

  rpc::events::TaskProfileEvents *task_profile_events;
  if (!ray_events_tuple.task_profile_event) {
    auto &ray_event = ray_events_tuple.task_profile_event.emplace();
    PopulateRpcRayEventBaseFields(ray_event, timestamp);
    task_profile_events = ray_event.mutable_task_profile_events();
    task_profile_events->set_task_id(task_id_.Binary());
    task_profile_events->set_job_id(job_id_.Binary());
    task_profile_events->set_attempt_number(attempt_number_);
  } else {
    task_profile_events = ray_events_tuple.task_profile_event->mutable_task_profile_events();
  }

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Signed-off-by: aaronzuo <anarionzuo@outlook.com>
@Anarion-zuo Anarion-zuo force-pushed the aaronzuo/fix_multi_phase branch from 911464b to db4bf3f Compare March 7, 2026 17:14
@ray-gardener ray-gardener bot added core Issues that should be addressed in Ray Core observability Issues related to the Ray Dashboard, Logging, Metrics, Tracing, and/or Profiling community-contribution Contributed by the community labels Mar 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-contribution Contributed by the community core Issues that should be addressed in Ray Core observability Issues related to the Ray Dashboard, Logging, Metrics, Tracing, and/or Profiling

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant