Conversation
1a454b0 to
dcb75d2
Compare
|
/ok to test |
1 similar comment
|
/ok to test |
🟩 CI finished in 23m 04s: Pass: 100%/54 | Total: 4h 36m | Avg: 5m 06s | Max: 17m 44s | Hits: 89%/224
|
| Project | |
|---|---|
| CCCL Infrastructure | |
| libcu++ | |
| CUB | |
| Thrust | |
| +/- | CUDA Experimental |
| python | |
| CCCL C Parallel Library | |
| Catch2Helper |
Modifications in project or dependencies?
| Project | |
|---|---|
| CCCL Infrastructure | |
| libcu++ | |
| CUB | |
| Thrust | |
| +/- | CUDA Experimental |
| python | |
| CCCL C Parallel Library | |
| Catch2Helper |
🏃 Runner counts (total jobs: 54)
| # | Runner |
|---|---|
| 43 | linux-amd64-cpu16 |
| 5 | linux-amd64-gpu-v100-latest-1 |
| 4 | linux-arm64-cpu16 |
| 2 | windows-amd64-cpu16 |
|
/ok to test |
🟩 CI finished in 1h 12m: Pass: 100%/54 | Total: 4h 28m | Avg: 4m 58s | Max: 23m 23s | Hits: 89%/224
|
| Project | |
|---|---|
| CCCL Infrastructure | |
| libcu++ | |
| CUB | |
| Thrust | |
| +/- | CUDA Experimental |
| python | |
| CCCL C Parallel Library | |
| Catch2Helper |
Modifications in project or dependencies?
| Project | |
|---|---|
| CCCL Infrastructure | |
| libcu++ | |
| CUB | |
| Thrust | |
| +/- | CUDA Experimental |
| python | |
| CCCL C Parallel Library | |
| Catch2Helper |
🏃 Runner counts (total jobs: 54)
| # | Runner |
|---|---|
| 43 | linux-amd64-cpu16 |
| 5 | linux-amd64-gpu-v100-latest-1 |
| 4 | linux-arm64-cpu16 |
| 2 | windows-amd64-cpu16 |
cudax/include/cuda/experimental/__stf/utility/stackable_ctx.cuh
Outdated
Show resolved
Hide resolved
cudax/include/cuda/experimental/__stf/utility/stackable_ctx.cuh
Outdated
Show resolved
Hide resolved
cudax/include/cuda/experimental/__stf/utility/stackable_ctx.cuh
Outdated
Show resolved
Hide resolved
ff0ba38 to
fb3de98
Compare
|
/ok to test |
🟩 CI finished in 40m 53s: Pass: 100%/20 | Total: 3h 17m | Avg: 9m 53s | Max: 24m 35s | Hits: 582%/312
|
| Project | |
|---|---|
| CCCL Infrastructure | |
| libcu++ | |
| CUB | |
| Thrust | |
| +/- | CUDA Experimental |
| python | |
| CCCL C Parallel Library | |
| Catch2Helper |
Modifications in project or dependencies?
| Project | |
|---|---|
| CCCL Infrastructure | |
| libcu++ | |
| CUB | |
| Thrust | |
| +/- | CUDA Experimental |
| python | |
| CCCL C Parallel Library | |
| Catch2Helper |
🏃 Runner counts (total jobs: 20)
| # | Runner |
|---|---|
| 12 | linux-amd64-cpu16 |
| 4 | linux-arm64-cpu16 |
| 2 | windows-amd64-cpu16 |
| 2 | linux-amd64-gpu-v100-latest-1 |
|
/ok to test |
🟩 CI finished in 42m 04s: Pass: 100%/20 | Total: 4h 07m | Avg: 12m 22s | Max: 22m 12s | Hits: 582%/312
|
| Project | |
|---|---|
| CCCL Infrastructure | |
| libcu++ | |
| CUB | |
| Thrust | |
| +/- | CUDA Experimental |
| python | |
| CCCL C Parallel Library | |
| Catch2Helper |
Modifications in project or dependencies?
| Project | |
|---|---|
| CCCL Infrastructure | |
| libcu++ | |
| CUB | |
| Thrust | |
| +/- | CUDA Experimental |
| python | |
| CCCL C Parallel Library | |
| Catch2Helper |
🏃 Runner counts (total jobs: 20)
| # | Runner |
|---|---|
| 12 | linux-amd64-cpu16 |
| 4 | linux-arm64-cpu16 |
| 2 | windows-amd64-cpu16 |
| 2 | linux-amd64-gpu-v100-latest-1 |
| { | ||
| if (this != &other) | ||
| { | ||
| assert(l.shape() == other.l.shape()); |
There was a problem hiding this comment.
prolly assert the contexts are equal too
cudax/include/cuda/experimental/__stf/utility/stackable_ctx.cuh
Outdated
Show resolved
Hide resolved
| }; | ||
|
|
||
| template <typename T> | ||
| class stackable_logical_data |
There was a problem hiding this comment.
We need a == operator too
| * @brief This class defines a context that behaves as a context which can have nested subcontexts (implemented as local | ||
| * CUDA graphs) | ||
| */ | ||
| class stackable_ctx |
There was a problem hiding this comment.
We need a == operator too
| return stackable_task_dep(*this, get_ld().rw(::std::forward<Pack>(pack)...)); | ||
| } | ||
|
|
||
| auto shape() const |
| s.back().set_symbol(symbol + "." + ::std::to_string(depth())); | ||
| } | ||
|
|
||
| auto& get_sctx() |
There was a problem hiding this comment.
this is suspicious, either const version, or use friend classes/methods
|
|
||
| ciphertext encrypt() const; | ||
|
|
||
| stackable_logical_data<slice<char>> l; |
There was a problem hiding this comment.
rename to stackable_ld
There was a problem hiding this comment.
This means we cannot write a generic code here, maybe there should be a context::logical_data_t defined in the different backends, including the stackable "backend" ?
cudax/include/cuda/experimental/__stf/internal/logical_data.cuh
Outdated
Show resolved
Hide resolved
|
|
||
| ctx.pop(); | ||
| } | ||
|
|
There was a problem hiding this comment.
TODO check results.
|
/ok to test |
🟨 CI finished in 39m 08s: Pass: 85%/20 | Total: 4h 10m | Avg: 12m 30s | Max: 17m 58s | Hits: 388%/522
|
| Project | |
|---|---|
| CCCL Infrastructure | |
| libcu++ | |
| CUB | |
| Thrust | |
| +/- | CUDA Experimental |
| python | |
| CCCL C Parallel Library | |
| Catch2Helper |
Modifications in project or dependencies?
| Project | |
|---|---|
| CCCL Infrastructure | |
| libcu++ | |
| CUB | |
| Thrust | |
| +/- | CUDA Experimental |
| python | |
| CCCL C Parallel Library | |
| Catch2Helper |
🏃 Runner counts (total jobs: 20)
| # | Runner |
|---|---|
| 12 | linux-amd64-cpu16 |
| 4 | linux-arm64-cpu16 |
| 2 | windows-amd64-cpu16 |
| 2 | linux-amd64-gpu-v100-latest-1 |
This comment has been minimized.
This comment has been minimized.
|
|
||
| // Allocation using uncached allocator directly | ||
| auto& allocator = ctx.get_uncached_allocator(); | ||
| ::std::ptrdiff_t buffer_size = blocks * sizeof(redux_vars<deps_tup_t, ops_and_inits>); |
| { | ||
| ::std::lock_guard<::std::mutex> lock(graph_mutex); | ||
|
|
||
| event_list prereqs = acquire(ctx); |
|
/ok to test 293f992 |
This comment has been minimized.
This comment has been minimized.
…ctively dead code adding unnecessary complexity
|
/ok to test c6ef2e2 |
This comment has been minimized.
This comment has been minimized.
|
/ok to test b4d57ee |
This comment has been minimized.
This comment has been minimized.
|
/ok to test 43c13c6 |
😬 CI Workflow Results🟥 Finished in 31m 11s: Pass: 20%/48 | Total: 12h 41m | Max: 30m 48s | Hits: 57%/1304See results here. |
Description
This introduces helper methods to improve how we nest contexts to better leverage CUDA Graphs
Checklist