Skip to content

[STF] stackable stf resources#2674

Draft
caugonnet wants to merge 569 commits intoNVIDIA:mainfrom
caugonnet:stackable_ctx_data
Draft

[STF] stackable stf resources#2674
caugonnet wants to merge 569 commits intoNVIDIA:mainfrom
caugonnet:stackable_ctx_data

Conversation

@caugonnet
Copy link
Contributor

@caugonnet caugonnet commented Oct 31, 2024

Description

This introduces helper methods to improve how we nest contexts to better leverage CUDA Graphs

Checklist

  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@copy-pr-bot
Copy link
Contributor

copy-pr-bot bot commented Oct 31, 2024

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@caugonnet
Copy link
Contributor Author

/ok to test

1 similar comment
@caugonnet
Copy link
Contributor Author

/ok to test

@github-actions
Copy link
Contributor

🟩 CI finished in 23m 04s: Pass: 100%/54 | Total: 4h 36m | Avg: 5m 06s | Max: 17m 44s | Hits: 89%/224
  • 🟩 cudax: Pass: 100%/54 | Total: 4h 36m | Avg: 5m 06s | Max: 17m 44s | Hits: 89%/224

    🟩 cpu
      🟩 amd64              Pass: 100%/50  | Total:  4h 20m | Avg:  5m 13s | Max: 17m 44s | Hits:  89%/224   
      🟩 arm64              Pass: 100%/4   | Total: 15m 16s | Avg:  3m 49s | Max:  4m 54s
    🟩 ctk
      🟩 12.0               Pass: 100%/19  | Total:  1h 40m | Avg:  5m 16s | Max: 17m 32s | Hits:  89%/112   
      🟩 12.5               Pass: 100%/2   | Total:  9m 25s | Avg:  4m 42s | Max:  4m 47s
      🟩 12.6               Pass: 100%/33  | Total:  2h 46m | Avg:  5m 03s | Max: 17m 44s | Hits:  89%/112   
    🟩 cudacxx
      🟩 nvcc12.0           Pass: 100%/19  | Total:  1h 40m | Avg:  5m 16s | Max: 17m 32s | Hits:  89%/112   
      🟩 nvcc12.5           Pass: 100%/2   | Total:  9m 25s | Avg:  4m 42s | Max:  4m 47s
      🟩 nvcc12.6           Pass: 100%/33  | Total:  2h 46m | Avg:  5m 03s | Max: 17m 44s | Hits:  89%/112   
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/54  | Total:  4h 36m | Avg:  5m 06s | Max: 17m 44s | Hits:  89%/224   
    🟩 cxx
      🟩 Clang9             Pass: 100%/2   | Total:  7m 21s | Avg:  3m 40s | Max:  4m 02s
      🟩 Clang10            Pass: 100%/2   | Total:  6m 59s | Avg:  3m 29s | Max:  3m 41s
      🟩 Clang11            Pass: 100%/4   | Total: 13m 28s | Avg:  3m 22s | Max:  3m 34s
      🟩 Clang12            Pass: 100%/4   | Total: 13m 16s | Avg:  3m 19s | Max:  3m 27s
      🟩 Clang13            Pass: 100%/4   | Total: 13m 11s | Avg:  3m 17s | Max:  3m 23s
      🟩 Clang14            Pass: 100%/4   | Total: 27m 40s | Avg:  6m 55s | Max: 17m 27s
      🟩 Clang15            Pass: 100%/2   | Total:  7m 01s | Avg:  3m 30s | Max:  3m 37s
      🟩 Clang16            Pass: 100%/4   | Total: 13m 48s | Avg:  3m 27s | Max:  3m 48s
      🟩 Clang17            Pass: 100%/2   | Total:  7m 19s | Avg:  3m 39s | Max:  3m 42s
      🟩 Clang18            Pass: 100%/2   | Total: 19m 21s | Avg:  9m 40s | Max: 15m 57s
      🟩 GCC9               Pass: 100%/2   | Total:  7m 29s | Avg:  3m 44s | Max:  3m 56s
      🟩 GCC10              Pass: 100%/4   | Total: 15m 40s | Avg:  3m 55s | Max:  4m 13s
      🟩 GCC11              Pass: 100%/4   | Total: 14m 55s | Avg:  3m 43s | Max:  3m 53s
      🟩 GCC12              Pass: 100%/7   | Total:  1h 07m | Avg:  9m 34s | Max: 17m 44s
      🟩 GCC13              Pass: 100%/3   | Total: 12m 09s | Avg:  4m 03s | Max:  4m 54s
      🟩 MSVC14.36          Pass: 100%/1   | Total: 10m 04s | Avg: 10m 04s | Max: 10m 04s | Hits:  89%/112   
      🟩 MSVC14.39          Pass: 100%/1   | Total: 10m 06s | Avg: 10m 06s | Max: 10m 06s | Hits:  89%/112   
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  9m 25s | Avg:  4m 42s | Max:  4m 47s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/30  | Total:  2h 09m | Avg:  4m 18s | Max: 17m 27s
      🟩 GCC                Pass: 100%/20  | Total:  1h 57m | Avg:  5m 51s | Max: 17m 44s
      🟩 MSVC               Pass: 100%/2   | Total: 20m 10s | Avg: 10m 05s | Max: 10m 06s | Hits:  89%/224   
      🟩 NVHPC              Pass: 100%/2   | Total:  9m 25s | Avg:  4m 42s | Max:  4m 47s
    🟩 gpu
      🟩 v100               Pass: 100%/54  | Total:  4h 36m | Avg:  5m 06s | Max: 17m 44s | Hits:  89%/224   
    🟩 jobs
      🟩 Build              Pass: 100%/49  | Total:  3h 10m | Avg:  3m 53s | Max: 10m 06s | Hits:  89%/224   
      🟩 Test               Pass: 100%/5   | Total:  1h 25m | Avg: 17m 04s | Max: 17m 44s
    🟩 sm
      🟩 90                 Pass: 100%/1   | Total:  3m 15s | Avg:  3m 15s | Max:  3m 15s
      🟩 90a                Pass: 100%/1   | Total:  3m 22s | Avg:  3m 22s | Max:  3m 22s
    🟩 std
      🟩 17                 Pass: 100%/29  | Total:  2h 12m | Avg:  4m 33s | Max: 17m 32s
      🟩 20                 Pass: 100%/25  | Total:  2h 24m | Avg:  5m 45s | Max: 17m 44s | Hits:  89%/224   
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
+/- CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
+/- CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

🏃‍ Runner counts (total jobs: 54)

# Runner
43 linux-amd64-cpu16
5 linux-amd64-gpu-v100-latest-1
4 linux-arm64-cpu16
2 windows-amd64-cpu16

@caugonnet
Copy link
Contributor Author

/ok to test

@github-actions
Copy link
Contributor

github-actions bot commented Nov 4, 2024

🟩 CI finished in 1h 12m: Pass: 100%/54 | Total: 4h 28m | Avg: 4m 58s | Max: 23m 23s | Hits: 89%/224
  • 🟩 cudax: Pass: 100%/54 | Total: 4h 28m | Avg: 4m 58s | Max: 23m 23s | Hits: 89%/224

    🟩 cpu
      🟩 amd64              Pass: 100%/50  | Total:  4h 15m | Avg:  5m 06s | Max: 23m 23s | Hits:  89%/224   
      🟩 arm64              Pass: 100%/4   | Total: 13m 38s | Avg:  3m 24s | Max:  4m 29s
    🟩 ctk
      🟩 12.0               Pass: 100%/19  | Total:  1h 35m | Avg:  5m 02s | Max: 22m 16s | Hits:  89%/112   
      🟩 12.5               Pass: 100%/2   | Total:  9m 39s | Avg:  4m 49s | Max:  4m 55s
      🟩 12.6               Pass: 100%/33  | Total:  2h 43m | Avg:  4m 56s | Max: 23m 23s | Hits:  89%/112   
    🟩 cudacxx
      🟩 nvcc12.0           Pass: 100%/19  | Total:  1h 35m | Avg:  5m 02s | Max: 22m 16s | Hits:  89%/112   
      🟩 nvcc12.5           Pass: 100%/2   | Total:  9m 39s | Avg:  4m 49s | Max:  4m 55s
      🟩 nvcc12.6           Pass: 100%/33  | Total:  2h 43m | Avg:  4m 56s | Max: 23m 23s | Hits:  89%/112   
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/54  | Total:  4h 28m | Avg:  4m 58s | Max: 23m 23s | Hits:  89%/224   
    🟩 cxx
      🟩 Clang9             Pass: 100%/2   | Total:  6m 47s | Avg:  3m 23s | Max:  3m 29s
      🟩 Clang10            Pass: 100%/2   | Total:  7m 03s | Avg:  3m 31s | Max:  3m 42s
      🟩 Clang11            Pass: 100%/4   | Total: 12m 35s | Avg:  3m 08s | Max:  3m 26s
      🟩 Clang12            Pass: 100%/4   | Total: 12m 22s | Avg:  3m 05s | Max:  3m 12s
      🟩 Clang13            Pass: 100%/4   | Total: 12m 54s | Avg:  3m 13s | Max:  3m 25s
      🟩 Clang14            Pass: 100%/4   | Total: 27m 30s | Avg:  6m 52s | Max: 17m 45s
      🟩 Clang15            Pass: 100%/2   | Total:  6m 52s | Avg:  3m 26s | Max:  3m 37s
      🟩 Clang16            Pass: 100%/4   | Total: 13m 29s | Avg:  3m 22s | Max:  3m 40s
      🟩 Clang17            Pass: 100%/2   | Total:  7m 10s | Avg:  3m 35s | Max:  3m 38s
      🟩 Clang18            Pass: 100%/2   | Total: 20m 58s | Avg: 10m 29s | Max: 17m 48s
      🟩 GCC9               Pass: 100%/2   | Total:  6m 10s | Avg:  3m 05s | Max:  3m 17s
      🟩 GCC10              Pass: 100%/4   | Total: 12m 43s | Avg:  3m 10s | Max:  3m 21s
      🟩 GCC11              Pass: 100%/4   | Total: 12m 17s | Avg:  3m 04s | Max:  3m 12s
      🟩 GCC12              Pass: 100%/7   | Total:  1h 16m | Avg: 10m 58s | Max: 23m 23s
      🟩 GCC13              Pass: 100%/3   | Total: 10m 06s | Avg:  3m 22s | Max:  4m 29s
      🟩 MSVC14.36          Pass: 100%/1   | Total:  6m 56s | Avg:  6m 56s | Max:  6m 56s | Hits:  89%/112   
      🟩 MSVC14.39          Pass: 100%/1   | Total:  6m 30s | Avg:  6m 30s | Max:  6m 30s | Hits:  89%/112   
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  9m 39s | Avg:  4m 49s | Max:  4m 55s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/30  | Total:  2h 07m | Avg:  4m 15s | Max: 17m 48s
      🟩 GCC                Pass: 100%/20  | Total:  1h 58m | Avg:  5m 54s | Max: 23m 23s
      🟩 MSVC               Pass: 100%/2   | Total: 13m 26s | Avg:  6m 43s | Max:  6m 56s | Hits:  89%/224   
      🟩 NVHPC              Pass: 100%/2   | Total:  9m 39s | Avg:  4m 49s | Max:  4m 55s
    🟩 gpu
      🟩 v100               Pass: 100%/54  | Total:  4h 28m | Avg:  4m 58s | Max: 23m 23s | Hits:  89%/224   
    🟩 jobs
      🟩 Build              Pass: 100%/49  | Total:  2h 49m | Avg:  3m 27s | Max:  6m 56s | Hits:  89%/224   
      🟩 Test               Pass: 100%/5   | Total:  1h 39m | Avg: 19m 56s | Max: 23m 23s
    🟩 sm
      🟩 90                 Pass: 100%/1   | Total:  2m 46s | Avg:  2m 46s | Max:  2m 46s
      🟩 90a                Pass: 100%/1   | Total:  2m 45s | Avg:  2m 45s | Max:  2m 45s
    🟩 std
      🟩 17                 Pass: 100%/29  | Total:  2h 14m | Avg:  4m 39s | Max: 23m 23s
      🟩 20                 Pass: 100%/25  | Total:  2h 13m | Avg:  5m 21s | Max: 18m 29s | Hits:  89%/224   
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
+/- CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
+/- CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

🏃‍ Runner counts (total jobs: 54)

# Runner
43 linux-amd64-cpu16
5 linux-amd64-gpu-v100-latest-1
4 linux-arm64-cpu16
2 windows-amd64-cpu16

@caugonnet caugonnet added the stf Sequential Task Flow programming model label Nov 7, 2024
@caugonnet caugonnet changed the title stackable stf resources [STF] stackable stf resources Jan 14, 2025
@caugonnet
Copy link
Contributor Author

/ok to test

@github-actions
Copy link
Contributor

🟩 CI finished in 40m 53s: Pass: 100%/20 | Total: 3h 17m | Avg: 9m 53s | Max: 24m 35s | Hits: 582%/312
  • 🟩 cudax: Pass: 100%/20 | Total: 3h 17m | Avg: 9m 53s | Max: 24m 35s | Hits: 582%/312

    🟩 cpu
      🟩 amd64              Pass: 100%/16  | Total:  2h 45m | Avg: 10m 19s | Max: 24m 35s | Hits: 582%/312   
      🟩 arm64              Pass: 100%/4   | Total: 32m 37s | Avg:  8m 09s | Max:  8m 52s
    🟩 ctk
      🟩 12.0               Pass: 100%/1   | Total: 11m 33s | Avg: 11m 33s | Max: 11m 33s | Hits: 582%/156   
      🟩 12.5               Pass: 100%/2   | Total: 11m 00s | Avg:  5m 30s | Max:  5m 42s
      🟩 12.6               Pass: 100%/17  | Total:  2h 55m | Avg: 10m 18s | Max: 24m 35s | Hits: 582%/156   
    🟩 cudacxx
      🟩 nvcc12.0           Pass: 100%/1   | Total: 11m 33s | Avg: 11m 33s | Max: 11m 33s | Hits: 582%/156   
      🟩 nvcc12.5           Pass: 100%/2   | Total: 11m 00s | Avg:  5m 30s | Max:  5m 42s
      🟩 nvcc12.6           Pass: 100%/17  | Total:  2h 55m | Avg: 10m 18s | Max: 24m 35s | Hits: 582%/156   
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/20  | Total:  3h 17m | Avg:  9m 53s | Max: 24m 35s | Hits: 582%/312   
    🟩 cxx
      🟩 Clang14            Pass: 100%/1   | Total:  9m 26s | Avg:  9m 26s | Max:  9m 26s
      🟩 Clang15            Pass: 100%/1   | Total:  9m 50s | Avg:  9m 50s | Max:  9m 50s
      🟩 Clang16            Pass: 100%/1   | Total:  9m 15s | Avg:  9m 15s | Max:  9m 15s
      🟩 Clang17            Pass: 100%/1   | Total:  9m 56s | Avg:  9m 56s | Max:  9m 56s
      🟩 Clang18            Pass: 100%/4   | Total: 41m 39s | Avg: 10m 24s | Max: 16m 15s
      🟩 GCC10              Pass: 100%/1   | Total:  9m 42s | Avg:  9m 42s | Max:  9m 42s
      🟩 GCC11              Pass: 100%/1   | Total:  9m 13s | Avg:  9m 13s | Max:  9m 13s
      🟩 GCC12              Pass: 100%/2   | Total: 34m 07s | Avg: 17m 03s | Max: 24m 35s
      🟩 GCC13              Pass: 100%/4   | Total: 30m 57s | Avg:  7m 44s | Max:  8m 52s
      🟩 MSVC14.36          Pass: 100%/1   | Total: 11m 33s | Avg: 11m 33s | Max: 11m 33s | Hits: 582%/156   
      🟩 MSVC14.39          Pass: 100%/1   | Total: 11m 06s | Avg: 11m 06s | Max: 11m 06s | Hits: 582%/156   
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 11m 00s | Avg:  5m 30s | Max:  5m 42s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/8   | Total:  1h 20m | Avg: 10m 00s | Max: 16m 15s
      🟩 GCC                Pass: 100%/8   | Total:  1h 23m | Avg: 10m 29s | Max: 24m 35s
      🟩 MSVC               Pass: 100%/2   | Total: 22m 39s | Avg: 11m 19s | Max: 11m 33s | Hits: 582%/312   
      🟩 NVHPC              Pass: 100%/2   | Total: 11m 00s | Avg:  5m 30s | Max:  5m 42s
    🟩 gpu
      🟩 v100               Pass: 100%/20  | Total:  3h 17m | Avg:  9m 53s | Max: 24m 35s | Hits: 582%/312   
    🟩 jobs
      🟩 Build              Pass: 100%/18  | Total:  2h 36m | Avg:  8m 43s | Max: 11m 33s | Hits: 582%/312   
      🟩 Test               Pass: 100%/2   | Total: 40m 50s | Avg: 20m 25s | Max: 24m 35s
    🟩 sm
      🟩 90                 Pass: 100%/1   | Total:  6m 57s | Avg:  6m 57s | Max:  6m 57s
      🟩 90a                Pass: 100%/1   | Total:  7m 24s | Avg:  7m 24s | Max:  7m 24s
    🟩 std
      🟩 17                 Pass: 100%/4   | Total: 28m 13s | Avg:  7m 03s | Max:  7m 50s
      🟩 20                 Pass: 100%/16  | Total:  2h 49m | Avg: 10m 35s | Max: 24m 35s | Hits: 582%/312   
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
+/- CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
+/- CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

🏃‍ Runner counts (total jobs: 20)

# Runner
12 linux-amd64-cpu16
4 linux-arm64-cpu16
2 windows-amd64-cpu16
2 linux-amd64-gpu-v100-latest-1

@caugonnet
Copy link
Contributor Author

/ok to test

@github-actions
Copy link
Contributor

🟩 CI finished in 42m 04s: Pass: 100%/20 | Total: 4h 07m | Avg: 12m 22s | Max: 22m 12s | Hits: 582%/312
  • 🟩 cudax: Pass: 100%/20 | Total: 4h 07m | Avg: 12m 22s | Max: 22m 12s | Hits: 582%/312

    🟩 cpu
      🟩 amd64              Pass: 100%/16  | Total:  3h 22m | Avg: 12m 39s | Max: 22m 12s | Hits: 582%/312   
      🟩 arm64              Pass: 100%/4   | Total: 44m 46s | Avg: 11m 11s | Max: 11m 49s
    🟩 ctk
      🟩 12.0               Pass: 100%/1   | Total: 10m 50s | Avg: 10m 50s | Max: 10m 50s | Hits: 582%/156   
      🟩 12.5               Pass: 100%/2   | Total: 11m 41s | Avg:  5m 50s | Max:  5m 51s
      🟩 12.6               Pass: 100%/17  | Total:  3h 44m | Avg: 13m 13s | Max: 22m 12s | Hits: 582%/156   
    🟩 cudacxx
      🟩 nvcc12.0           Pass: 100%/1   | Total: 10m 50s | Avg: 10m 50s | Max: 10m 50s | Hits: 582%/156   
      🟩 nvcc12.5           Pass: 100%/2   | Total: 11m 41s | Avg:  5m 50s | Max:  5m 51s
      🟩 nvcc12.6           Pass: 100%/17  | Total:  3h 44m | Avg: 13m 13s | Max: 22m 12s | Hits: 582%/156   
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/20  | Total:  4h 07m | Avg: 12m 22s | Max: 22m 12s | Hits: 582%/312   
    🟩 cxx
      🟩 Clang14            Pass: 100%/1   | Total: 12m 12s | Avg: 12m 12s | Max: 12m 12s
      🟩 Clang15            Pass: 100%/1   | Total: 13m 19s | Avg: 13m 19s | Max: 13m 19s
      🟩 Clang16            Pass: 100%/1   | Total: 13m 09s | Avg: 13m 09s | Max: 13m 09s
      🟩 Clang17            Pass: 100%/1   | Total: 13m 12s | Avg: 13m 12s | Max: 13m 12s
      🟩 Clang18            Pass: 100%/4   | Total: 54m 39s | Avg: 13m 39s | Max: 18m 11s
      🟩 GCC10              Pass: 100%/1   | Total: 13m 50s | Avg: 13m 50s | Max: 13m 50s
      🟩 GCC11              Pass: 100%/1   | Total: 14m 10s | Avg: 14m 10s | Max: 14m 10s
      🟩 GCC12              Pass: 100%/2   | Total: 36m 46s | Avg: 18m 23s | Max: 22m 12s
      🟩 GCC13              Pass: 100%/4   | Total: 42m 15s | Avg: 10m 33s | Max: 11m 49s
      🟩 MSVC14.36          Pass: 100%/1   | Total: 10m 50s | Avg: 10m 50s | Max: 10m 50s | Hits: 582%/156   
      🟩 MSVC14.39          Pass: 100%/1   | Total: 11m 18s | Avg: 11m 18s | Max: 11m 18s | Hits: 582%/156   
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 11m 41s | Avg:  5m 50s | Max:  5m 51s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/8   | Total:  1h 46m | Avg: 13m 18s | Max: 18m 11s
      🟩 GCC                Pass: 100%/8   | Total:  1h 47m | Avg: 13m 22s | Max: 22m 12s
      🟩 MSVC               Pass: 100%/2   | Total: 22m 08s | Avg: 11m 04s | Max: 11m 18s | Hits: 582%/312   
      🟩 NVHPC              Pass: 100%/2   | Total: 11m 41s | Avg:  5m 50s | Max:  5m 51s
    🟩 gpu
      🟩 v100               Pass: 100%/20  | Total:  4h 07m | Avg: 12m 22s | Max: 22m 12s | Hits: 582%/312   
    🟩 jobs
      🟩 Build              Pass: 100%/18  | Total:  3h 26m | Avg: 11m 29s | Max: 14m 34s | Hits: 582%/312   
      🟩 Test               Pass: 100%/2   | Total: 40m 23s | Avg: 20m 11s | Max: 22m 12s
    🟩 sm
      🟩 90                 Pass: 100%/1   | Total:  8m 53s | Avg:  8m 53s | Max:  8m 53s
      🟩 90a                Pass: 100%/1   | Total: 10m 48s | Avg: 10m 48s | Max: 10m 48s
    🟩 std
      🟩 17                 Pass: 100%/4   | Total: 36m 10s | Avg:  9m 02s | Max: 10m 45s
      🟩 20                 Pass: 100%/16  | Total:  3h 31m | Avg: 13m 11s | Max: 22m 12s | Hits: 582%/312   
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
+/- CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
+/- CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

🏃‍ Runner counts (total jobs: 20)

# Runner
12 linux-amd64-cpu16
4 linux-arm64-cpu16
2 windows-amd64-cpu16
2 linux-amd64-gpu-v100-latest-1

{
if (this != &other)
{
assert(l.shape() == other.l.shape());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prolly assert the contexts are equal too

};

template <typename T>
class stackable_logical_data
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need a == operator too

* @brief This class defines a context that behaves as a context which can have nested subcontexts (implemented as local
* CUDA graphs)
*/
class stackable_ctx
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need a == operator too

return stackable_task_dep(*this, get_ld().rw(::std::forward<Pack>(pack)...));
}

auto shape() const
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const auto& ?

s.back().set_symbol(symbol + "." + ::std::to_string(depth()));
}

auto& get_sctx()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is suspicious, either const version, or use friend classes/methods


ciphertext encrypt() const;

stackable_logical_data<slice<char>> l;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename to stackable_ld

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means we cannot write a generic code here, maybe there should be a context::logical_data_t defined in the different backends, including the stackable "backend" ?


ctx.pop();
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO check results.

@caugonnet
Copy link
Contributor Author

/ok to test

@github-actions
Copy link
Contributor

🟨 CI finished in 39m 08s: Pass: 85%/20 | Total: 4h 10m | Avg: 12m 30s | Max: 17m 58s | Hits: 388%/522
  • 🟨 cudax: Pass: 85%/20 | Total: 4h 10m | Avg: 12m 30s | Max: 17m 58s | Hits: 388%/522

    🔍 cpu: amd64 🔍
      🔍 amd64              Pass:  81%/16  | Total:  3h 21m | Avg: 12m 34s | Max: 17m 58s | Hits: 388%/522   
      🟩 arm64              Pass: 100%/4   | Total: 49m 07s | Avg: 12m 16s | Max: 13m 08s
    🔍 ctk: 12.6 🔍
      🟩 12.0               Pass: 100%/1   | Total:  9m 06s | Avg:  9m 06s | Max:  9m 06s | Hits: 388%/261   
      🟩 12.5               Pass: 100%/2   | Total: 14m 30s | Avg:  7m 15s | Max:  7m 17s
      🔍 12.6               Pass:  82%/17  | Total:  3h 46m | Avg: 13m 20s | Max: 17m 58s | Hits: 388%/261   
    🔍 cudacxx: nvcc12.6 🔍
      🟩 nvcc12.0           Pass: 100%/1   | Total:  9m 06s | Avg:  9m 06s | Max:  9m 06s | Hits: 388%/261   
      🟩 nvcc12.5           Pass: 100%/2   | Total: 14m 30s | Avg:  7m 15s | Max:  7m 17s
      🔍 nvcc12.6           Pass:  82%/17  | Total:  3h 46m | Avg: 13m 20s | Max: 17m 58s | Hits: 388%/261   
    🔍 std: 20 🔍
      🟩 17                 Pass: 100%/4   | Total: 41m 03s | Avg: 10m 15s | Max: 11m 56s
      🔍 20                 Pass:  81%/16  | Total:  3h 29m | Avg: 13m 04s | Max: 17m 58s | Hits: 388%/522   
    🟨 cxx
      🟩 Clang14            Pass: 100%/1   | Total: 12m 44s | Avg: 12m 44s | Max: 12m 44s
      🟩 Clang15            Pass: 100%/1   | Total: 13m 43s | Avg: 13m 43s | Max: 13m 43s
      🟩 Clang16            Pass: 100%/1   | Total: 15m 20s | Avg: 15m 20s | Max: 15m 20s
      🟩 Clang17            Pass: 100%/1   | Total: 15m 37s | Avg: 15m 37s | Max: 15m 37s
      🟨 Clang18            Pass:  75%/4   | Total: 57m 22s | Avg: 14m 20s | Max: 17m 25s
      🟥 GCC10              Pass:   0%/1   | Total:  4m 18s | Avg:  4m 18s | Max:  4m 18s
      🟩 GCC11              Pass: 100%/1   | Total: 14m 26s | Avg: 14m 26s | Max: 14m 26s
      🟨 GCC12              Pass:  50%/2   | Total: 34m 39s | Avg: 17m 19s | Max: 17m 58s
      🟩 GCC13              Pass: 100%/4   | Total: 46m 27s | Avg: 11m 36s | Max: 13m 08s
      🟩 MSVC14.36          Pass: 100%/1   | Total:  9m 06s | Avg:  9m 06s | Max:  9m 06s | Hits: 388%/261   
      🟩 MSVC14.39          Pass: 100%/1   | Total: 12m 07s | Avg: 12m 07s | Max: 12m 07s | Hits: 388%/261   
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 14m 30s | Avg:  7m 15s | Max:  7m 17s
    🟨 cxx_family
      🟨 Clang              Pass:  87%/8   | Total:  1h 54m | Avg: 14m 20s | Max: 17m 25s
      🟨 GCC                Pass:  75%/8   | Total:  1h 39m | Avg: 12m 28s | Max: 17m 58s
      🟩 MSVC               Pass: 100%/2   | Total: 21m 13s | Avg: 10m 36s | Max: 12m 07s | Hits: 388%/522   
      🟩 NVHPC              Pass: 100%/2   | Total: 14m 30s | Avg:  7m 15s | Max:  7m 17s
    🟨 cudacxx_family
      🟨 nvcc               Pass:  85%/20  | Total:  4h 10m | Avg: 12m 30s | Max: 17m 58s | Hits: 388%/522   
    🟨 gpu
      🟨 v100               Pass:  85%/20  | Total:  4h 10m | Avg: 12m 30s | Max: 17m 58s | Hits: 388%/522   
    🟨 jobs
      🟨 Build              Pass:  94%/18  | Total:  3h 34m | Avg: 11m 56s | Max: 16m 41s | Hits: 388%/522   
      🟥 Test               Pass:   0%/2   | Total: 35m 23s | Avg: 17m 41s | Max: 17m 58s
    🟩 sm
      🟩 90                 Pass: 100%/1   | Total: 10m 09s | Avg: 10m 09s | Max: 10m 09s
      🟩 90a                Pass: 100%/1   | Total: 11m 14s | Avg: 11m 14s | Max: 11m 14s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
+/- CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
+/- CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

🏃‍ Runner counts (total jobs: 20)

# Runner
12 linux-amd64-cpu16
4 linux-arm64-cpu16
2 windows-amd64-cpu16
2 linux-amd64-gpu-v100-latest-1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to #7591 (with extra tests)

@github-actions

This comment has been minimized.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to #7592

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to #7593

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to #7593


// Allocation using uncached allocator directly
auto& allocator = ctx.get_uncached_allocator();
::std::ptrdiff_t buffer_size = blocks * sizeof(redux_vars<deps_tup_t, ops_and_inits>);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This probably belongs to #7592

{
::std::lock_guard<::std::mutex> lock(graph_mutex);

event_list prereqs = acquire(ctx);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved to #7592

@caugonnet
Copy link
Contributor Author

/ok to test 293f992

@github-actions

This comment has been minimized.

@caugonnet
Copy link
Contributor Author

/ok to test c6ef2e2

@github-actions

This comment has been minimized.

@caugonnet
Copy link
Contributor Author

/ok to test b4d57ee

@github-actions

This comment has been minimized.

@caugonnet
Copy link
Contributor Author

/ok to test 43c13c6

@github-actions
Copy link
Contributor

😬 CI Workflow Results

🟥 Finished in 31m 11s: Pass: 20%/48 | Total: 12h 41m | Max: 30m 48s | Hits: 57%/1304

See results here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

stf Sequential Task Flow programming model

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

5 participants