Programming Languages
See recent articles
Showing new listings for Friday, 27 March 2026
- [1] arXiv:2603.24595 [pdf, other]
-
Title: Model2Kernel: Model-Aware Symbolic Execution For Safe CUDA KernelsSubjects: Programming Languages (cs.PL); Artificial Intelligence (cs.AI)
The widespread adoption of large language models (LLMs) has made GPU-accelerated inference a critical part of modern computing infrastructure. Production inference systems rely on CUDA kernels to implement core transformer operations, yet these kernels are highly susceptible to memory-safety bugs due to model-dependent tensor layouts, intricate memory indexing, and massive thread-level parallelism. Such bugs can corrupt model weights, crash inference services, or even enable adversarial attacks. Existing techniques either depend on unavailable hardware, incur high overhead, or fail to handle kernel inputs with variable lengths, and none can effectively detect CUDA memory bugs in LLM inference systems. This paper presents Model2Kernel, the first practical system for automatically verifying the memory safety of CUDA kernels used in LLM inference. Model2Kernel performs model-aware dynamic analysis to determine how each model invokes kernels and to classify kernel arguments as either fixed by the model architecture or controlled by model users. Using this information, Model2Kernel then applies CUDA-specialized symbolic execution, supported by new abstractions for dynamic tensor memory and thread identifiers, to accurately pinpoint memory bugs in kernels. In the evaluation on CUDA kernels and models from vLLM, Hugging Face, and recent LLM research papers, Model2Kernel discovers 353 previously unknown bugs while producing only nine false positives, demonstrating its effectiveness.
- [2] arXiv:2603.24624 [pdf, html, other]
-
Title: ReSyn: A Generalized Recursive Regular Expression Synthesis FrameworkComments: Submitted to IJCAI 2026Subjects: Programming Languages (cs.PL); Software Engineering (cs.SE)
Existing Programming-By-Example (PBE) systems often rely on simplified benchmarks that fail to capture the high structural complexity-such as deeper nesting and frequent Unions-of real-world regexes. To overcome the resulting performance drop, we propose ReSyn, a synthesizer-agnostic divide-and-conquer framework that decomposes complex synthesis problems into manageable sub-problems. We also introduce Set2Regex, a parameter-efficient synthesizer capturing the permutation invariance of examples. Experimental results demonstrate that ReSyn significantly boosts accuracy across various synthesizers, and its combination with Set2Regex establishes a new state-of-the-art on challenging real-world benchmark.
- [3] arXiv:2603.24812 [pdf, html, other]
-
Title: Numerical Superoptimization for Library LearningSubjects: Programming Languages (cs.PL)
Numerical software depends on fast, accurate implementations of mathematical primitives like sin, exp, and log. Modern superoptimizers can optimize floating-point kernels against a given set of such primitives, but a more fundamental question remains open: which new primitives are worth implementing in the first place?
We formulate this as numerical library learning: given a workload of floating-point kernels, identify the mathematical primitives whose expert implementations would most improve speed and accuracy. Our key insight is that numerical superoptimizers already have the machinery well-suited to this problem. Their search procedures happen to enumerate candidate primitives, their equivalence procedures can generalize and deduplicate candidates, and their cost models can estimate counterfactual utility: how much the workload would improve if a given primitive were available.
We present GrowLibm, which repurposes the Herbie superoptimizer as a numerical library learner. GrowLibm mines candidate primitives from the superoptimizer's intermediate search results, ranks them by counterfactual utility, and prunes redundant candidates. Across three scientific applications (PROJ, CoolProp, and Basilisk), GrowLibm identifies compact, reusable primitives that can be implemented effectively using standard numerical techniques. When Herbie is extended with these expert implementations, kernel speed improves by up to 2.2x at fixed accuracy, and maximum achievable accuracy also improves, in one case from 56.0% to 93.5%. We also prototype an LLVM matcher that recognizes learned primitives in optimized IR, recovering 26 replacement sites across five PROJ projections and improving end-to-end application performance by up to 5%. - [4] arXiv:2603.24940 [pdf, other]
-
Title: Evaluating adaptive and generative AI-based feedback and recommendations in a knowledge-graph-integrated programming learning systemJournal-ref: Computers and Education: Artificial Intelligence, Volume 10, June 2026, 100526Subjects: Programming Languages (cs.PL); Artificial Intelligence (cs.AI)
This paper introduces the design and development of a framework that integrates a large language model (LLM) with a retrieval-augmented generation (RAG) approach leveraging both a knowledge graph and user interaction history. The framework is incorporated into a previously developed adaptive learning support system to assess learners' code, generate formative feedback, and recommend exercises. Moerover, this study examines learner preferences across three instructional modes; adaptive, Generative AI (GenAI), and hybrid GenAI-adaptive. An experimental study was conducted to compare the learning performance and perception of the learners, and the effectiveness of these three modes using four key log features derived from 4956 code submissions across all experimental groups. The analysis results show that learners receiving feedback from GenAI modes had significantly more correct code and fewer code submissions missing essential programming logic than those receiving feedback from adaptive mode. In particular, the hybrid GenAI-adaptive mode achieved the highest number of correct submissions and the fewest incorrect or incomplete attempts, outperforming both the adaptive-only and GenAI-only modes. Questionnaire responses further indicated that GenAI-generated feedback was widely perceived as helpful, while all modes were rated positively for ease of use and usefulness. These results suggest that the hybrid GenAI-adaptive mode outperforms the other two modes across all measured log features.
- [5] arXiv:2603.25337 [pdf, html, other]
-
Title: On Representability of Multiple-Valued Functions by Linear Lambda Terms Typed with Second-order Polymorphic Type SystemSubjects: Programming Languages (cs.PL); Logic in Computer Science (cs.LO)
We show that any multiple-valued function can be represented by a linear lambda term typed in a second-order polymorphic type system, using two distinct styles. The first is a circuit style, which mimics combinational circuits in switching theory. The second is an inductive style, which follows a more traditional mathematical approach. We also discuss several optimizations for these representations. Furthermore, we present a case study that demonstrates the potential applications of our approach across various domains.
- [6] arXiv:2603.25414 [pdf, html, other]
-
Title: Decidable By Construction: Design-Time Verification for Trustworthy AIComments: 18 pages, 1 figureSubjects: Programming Languages (cs.PL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Logic in Computer Science (cs.LO)
A prevailing assumption in machine learning is that model correctness must be enforced after the fact. We observe that the properties determining whether an AI model is numerically stable, computationally correct, or consistent with a physical domain do not necessarily demand post hoc enforcement. They can be verified at design time, before training begins, at marginal computational cost, with particular relevance to models deployed in high-leverage decision support and scientifically constrained settings. These properties share a specific algebraic structure: they are expressible as constraints over finitely generated abelian groups $\mathbb{Z}^n$, where inference is decidable in polynomial time and the principal type is unique. A framework built on this observation composes three prior results (arXiv:2603.16437, arXiv:2603.17627, arXiv:2603.18104): a dimensional type system carrying arbitrary annotations as persistent codata through model elaboration; a program hypergraph that infers Clifford algebra grade and derives geometric product sparsity from type signatures alone; and an adaptive domain model architecture preserving both invariants through training via forward-mode coeffect analysis and exact posit accumulation. We believe this composition yields a novel information-theoretic result: Hindley-Milner unification over abelian groups computes the maximum a posteriori hypothesis under a computable restriction of Solomonoff's universal prior, placing the framework's type inference on the same formal ground as universal induction. We compare four contemporary approaches to AI reliability and show that each imposes overhead that can compound across deployments, layers, and inference requests. This framework eliminates that overhead by construction.
New submissions (showing 6 of 6 entries)
- [7] arXiv:2603.25111 (cross-list from cs.LG) [pdf, html, other]
-
Title: SEVerA: Verified Synthesis of Self-Evolving AgentsComments: Formally Verified Self-Evolving LLM AgentsSubjects: Machine Learning (cs.LG); Programming Languages (cs.PL); Software Engineering (cs.SE)
Recent advances have shown the effectiveness of self-evolving LLM agents on tasks such as program repair and scientific discovery. In this paradigm, a planner LLM synthesizes an agent program that invokes parametric models, including LLMs, which are then tuned per task to improve performance. However, existing self-evolving agent frameworks provide no formal guarantees of safety or correctness. Because such programs are often executed autonomously on unseen inputs, this lack of guarantees raises reliability and security concerns. We formulate agentic code generation as a constrained learning problem, combining hard formal specifications with soft objectives capturing task utility. We introduce Formally Guarded Generative Models (FGGM), which allow the planner LLM to specify a formal output contract for each generative model call using first-order logic. Each FGGM call wraps the underlying model in a rejection sampler with a verified fallback, ensuring every returned output satisfies the contract for any input and parameter setting. Building on FGGM, we present SEVerA (Self-Evolving Verified Agents), a three-stage framework: Search synthesizes candidate parametric programs containing FGGM calls; Verification proves correctness with respect to hard constraints for all parameter values, reducing the problem to unconstrained learning; and Learning applies scalable gradient-based optimization, including GRPO-style fine-tuning, to improve the soft objective while preserving correctness. We evaluate SEVerA on Dafny program verification, symbolic math synthesis, and policy-compliant agentic tool use ($\tau^2$-bench). Across tasks, SEVerA achieves zero constraint violations while improving performance over unconstrained and SOTA baselines, showing that formal behavioral constraints not only guarantee correctness but also steer synthesis toward higher-quality agents.
- [8] arXiv:2603.25710 (cross-list from cs.LO) [pdf, html, other]
-
Title: Stone Duality for MonadsComments: 29 pagesSubjects: Logic in Computer Science (cs.LO); Programming Languages (cs.PL); Category Theory (math.CT)
We introduce a contravariant idempotent adjunction between (i) the category of ranked monads on $\mathsf{Set}$; and (ii) the category of internal categories and internal retrofunctors in the category of locales. The left adjoint takes a monad $T$-viewed as a notion of computation, following Moggi-to its localic behaviour category $\mathsf{LB}T$. This behaviour category is understood as "the universal transition system" for interacting with $T$: its "objects" are states and the "morphisms" are transitions. On the other hand, the right adjoint takes a localic category $\mathsf{LC}$-similarly understood as a transition system-to the monad $\Gamma\mathsf{LC}$ where $(\Gamma\mathsf{LC})A$ is the set of $A$-indexed families of local sections to the source map which jointly partition the locale of objects. The fixed points of this adjunction consist of (i) hyperaffine-unary monads, i.e., those monads where term $t$ admits a read-only operation $\bar{t}$ predicting the output of $t$; and (ii) ample localic categories, i.e., whose source maps are local homeomorphisms and whose locale of objects are strongly zero-dimensional. The hyperaffine-unary monads arise in earlier works by Johnstone and Garner as a syntactic characterization of those monads with Cartesian closed Eilenberg-Moore categories. This equivalence is the Stone duality for monads; so-called because it further restricts to the classical Stone duality by viewing a Boolean algebra $B$ as a monad of $B$-partitions and the corresponding Stone space as a localic category with only identity morphisms.
Cross submissions (showing 2 of 2 entries)
- [9] arXiv:2505.17703 (replaced) [pdf, html, other]
-
Title: Gradient-Based Program Repair: Fixing Bugs in Continuous Program SpacesSubjects: Programming Languages (cs.PL); Machine Learning (cs.LG); Software Engineering (cs.SE)
Automatic program repair seeks to generate correct code from buggy programs, with most approaches searching the correct program in a discrete, symbolic space of source code tokens. This symbolic search is fundamentally limited by its inability to directly reason about program behavior. We introduce Gradient-Based Program Repair (GBPR), a new approach that recasts program repair as continuous optimization in a differentiable numerical program space. Our core insight is to compile symbolic programs into differentiable numerical representations, enabling search in the numerical program space directly guided by program behavior. To evaluate GBPR, we present RaspBugs, a new benchmark of 1,466 buggy symbolic RASP programs and their respective numerical representations. Our experiments demonstrate that GBPR can effectively repair buggy symbolic programs by gradient-based optimization in the numerical program space, with convincing repair trajectories. To our knowledge, we are the first to state program repair as continuous optimization in a numerical program space. Our work demonstrates the feasibility of this direction for program repair research, bridging continuous optimization and program behavior.
- [10] arXiv:2511.20782 (replaced) [pdf, html, other]
-
Title: Optimism in Equality SaturationSubjects: Programming Languages (cs.PL)
Equality saturation is a technique for program optimization based on non-destructive rewriting and a form of abstract interpretation called e-class analysis. Existing e-class analyses are pessimistic and therefore ineffective at analyzing cyclic programs, such as those in SSA form. We show that a straightforward optimistic variant of e-class analysis can result in unsoundness, due to a subtlety in how e-graphs represent programs. We propose an abstract interpretation algorithm that circumvents this issue and can optimistically analyze e-graphs during equality saturation. This results in a unified algorithm for optimistic analysis and non-destructive rewriting. To demonstrate the practicality of our approach, we implement a prototype abstract interpreter and equality saturation tool for SSA programs using a new semantics of SSA. Our tool exhibits precision improvements over pure abstract interpretation (without rewriting) and pessimistic e-class analysis on example programs. Additionally, its performance is comparable to existing abstract interpretation and e-class analysis techniques.
- [11] arXiv:2406.07737 (replaced) [pdf, html, other]
-
Title: The Future of AI-Driven Software EngineeringComments: **Note** Published in ACM Transactions on Software Engineering and Methodology (TOSEM)Journal-ref: ACM Transactions on Software Engineering and Methodology, Volume 34, Issue 5, Article 120 (May 2025)Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Programming Languages (cs.PL)
A paradigm shift is underway in Software Engineering, with AI systems such as LLMs playing an increasingly important role in boosting software development productivity. This trend is anticipated to persist. In the next years, we expect a growing symbiotic partnership between human software developers and AI. The Software Engineering research community cannot afford to overlook this trend; we must address the key research challenges posed by the integration of AI into the software development process. In this paper, we present our vision of the future of software development in an AI-driven world and explore the key challenges that our research community should address to realize this vision.
- [12] arXiv:2512.01678 (replaced) [pdf, other]
-
Title: Morphling: Fast, Fused, and Flexible GNN Training at ScaleComments: The algorithm present in the paper is incorrect and the results are also not proper. So I want to take this down until we figure something outSubjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Programming Languages (cs.PL)
Graph Neural Networks (GNNs) present a fundamental hardware challenge by fusing irregular, memory-bound graph traversals with regular, compute-intensive dense matrix operations. While frameworks such as PyTorch Geometric (PyG) and Deep Graph Library (DGL) prioritize high-level usability, they fail to address these divergent execution characteristics. As a result, they rely on generic kernels that suffer from poor cache locality, excessive memory movement, and substantial intermediate allocations. To address these limitations, we present Morphling, a domain-specific code synthesizer designed to bridge this gap. Morphling compiles high-level GNN specifications into portable, backend-specialized implementations targeting OpenMP, CUDA, and MPI. It achieves this by instantiating a library of optimized, architecture-aware primitives tailored to each execution environment. Morphling also incorporates a runtime sparsity-aware execution engine that dynamically selects dense or sparse execution paths using input feature statistics, reducing unnecessary computation on zero-valued entries. We evaluate Morphling on eleven real-world datasets spanning diverse graph structures, feature dimensionalities, and sparsity regimes. Morphling improves per-epoch training throughput by an average of 20X on CPUs, 19X on GPUs, and 6X in distributed settings over PyG and DGL, with peak speedups reaching 66X. Morphling's memory-efficient layouts further reduce peak memory consumption by up to 15X, enabling large-scale GNN training on commodity hardware. These findings demonstrate that specialized, architecture-aware code synthesis provides an effective and scalable path toward high-performance GNN execution across diverse parallel and distributed platforms.