diff options
| author | Stefan Boberg <[email protected]> | 2026-03-17 11:31:59 +0100 |
|---|---|---|
| committer | Stefan Boberg <[email protected]> | 2026-03-17 11:31:59 +0100 |
| commit | 642c09cdae77c1d4c4f46b1145a7156c878ef2d6 (patch) | |
| tree | 8b1278ab72f0377fe921628ae209c6b8786910ad | |
| parent | Clarify StripAnsiSgrSequences comment to note CSI non-SGR sequences are not s... (diff) | |
| download | zen-sb/compute-batch.tar.xz zen-sb/compute-batch.zip | |
Fix queue ActiveCount race in HandleActionUpdates terminal pathsb/compute-batch
NotifyQueueActionComplete (which decrements ActiveCount) was called after
releasing m_ResultsLock. GetActionResult acquires m_ResultsLock to consume
the result, so a caller could observe ActiveCount still at 1 immediately
after GetActionResult returned OK if the scheduler thread was preempted
between releasing m_ResultsLock and reaching NotifyQueueActionComplete.
Fix by calling NotifyQueueActionComplete before the m_ResultsLock block
that publishes the result into m_ResultsMap. This guarantees that by the
time GetActionResult can return OK, the queue counters are already updated.
| -rw-r--r-- | src/zencompute/computeservice.cpp | 8 |
1 files changed, 7 insertions, 1 deletions
diff --git a/src/zencompute/computeservice.cpp b/src/zencompute/computeservice.cpp index 2e49c1114..43031955a 100644 --- a/src/zencompute/computeservice.cpp +++ b/src/zencompute/computeservice.cpp @@ -1896,6 +1896,13 @@ ComputeServiceSession::Impl::HandleActionUpdates() RemoveActionFromActiveMaps(ActionLsn); + // Update queue counters BEFORE publishing the result into + // m_ResultsMap. GetActionResult erases from m_ResultsMap + // under m_ResultsLock, so if we updated counters after + // releasing that lock, a caller could observe ActiveCount + // still at 1 immediately after GetActionResult returned OK. + NotifyQueueActionComplete(Action->QueueId, ActionLsn, TerminalState); + m_ResultsLock.WithExclusiveLock([&] { m_ResultsMap[ActionLsn] = Action; @@ -1927,7 +1934,6 @@ ComputeServiceSession::Impl::HandleActionUpdates() Action->ActionId, ActionLsn, TerminalState == RunnerAction::State::Completed ? "SUCCESS" : "FAILURE"); - NotifyQueueActionComplete(Action->QueueId, ActionLsn, TerminalState); break; } } |