Accepted Papers

Broken Access Control Risks in Open Source JavaScript Projects: A Security Analysis

Rima Ayusinta, Rodi Jolak and Raja Khurram Shahzad

TContext: Broken Access Control (BAC) is ranked by OWASP as the most critical web security risk. Open-source JavaScript projects, with their openness and diverse contributors, are particularly exposed. Objective: This study investigates the prevalence and patterns of BAC vulnerabilities in open-source JavaScript projects, addressing the lack of empirical evidence beyond enterprise systems. Method: A hybrid approach combined Semgrep static analysis with manual penetration testing. A curated set of 166 GitHub repositories was scanned using custom rules for Insecure Direct Object Reference (IDOR), unprotected routes, forced browsing, and token/session flaws; flagged cases were validated in Docker with Postman. Results: Of 166 repositories, 33 were flagged and 5 confirmed exploitable. Vulnerabilities included unauthenticated endpoints, parameter-based privilege escalation, and insecure token or Cross-Origin Resource Sharing (CORS) handling. Static analysis was useful but showed high false positives and limited contextual accuracy. Conclusion: BAC vulnerabilities recur in open-source JavaScript projects. Static tools aid detection but require manual validation for reliable assessment.

Can Developers rely on LLMs for Secure IaC Development?

Ehsan Firouzi, Shardul Bhatt, Mohammad Ghafari

We investigated the capabilities of GPT-4o and Gemini 2.0 Flash for secure Infrastructure as Code (IaC) development. On the Stack Overflow dataset, which primarily contains small, simplified code snippets, the models detected at least 71\% of security smells when prompted to analyze code from a security perspective (general prompt). With a guided prompt (adding clear, step-by-step instructions), this increased to 78\%. In GitHub repositories, which contain complete, real-world project scripts, a general prompt was less effective, leaving more than half of the smells undetected. However, with the guided prompt, the models uncovered at least 67\% of security smells and achieved promising F1 scores of 74\% or higher. To assess how cautious LLMs are about security risks, we then asked the models to generate code for synthetic scenarios based on 89 insecure patterns identified from GitHub IaC scripts. For both models, only 7\% of the generated scripts were secure. Adding an explicit instruction to generate secure code increased GPT-4o’s secure output rate to 17\%, while Gemini 2.0 Flash changed little (8\%); however, the number of warnings increased. This study highlights the need for further research to improve LLMs’ capabilities in assisting developers with secure IaC development.

Policy-as-Code in the Wild: A Taxonomy and Dataset for Open Policy Agent from GitHub

Can Önal, Indika Kumara, Filippo Scaramuzza and Marco Tonnarelli

Policy-as-Code (PaC) enables the definition, management, and enforcement of policies and regulations using source code, thereby supporting the application of software engineering practices and tools to policy management. While the gray literature and industrial tools indicate PaC is increasingly used in organizations, there is little academic literature on the topic. Hence, to understand how PaC is used in practice and to create a representative dataset of PaC scripts, we conducted a repository mining study. We selected Open Policy Agent (OPA) as the PaC framework because it is widely used in academia and industry and serves as a de facto standard for PaC. By mining 73 public GitHub repositories using OPA, we created a multi-level policy taxonomy comprising 5 top-level categories and 28 unique subtype categories at the lowest level. We also created a curated dataset of 101 OPA policy files. We believe the taxonomy and dataset can help both researchers and practitioners understand the use cases of PaC and develop methods and tools to support automated policy management using PaC.

Evaluating Large Language Models for Security Bug Report Prediction

Farnaz Soltaniani, Shoaib Razzaq and Mohammad Ghafari

Early detection of security bug reports (SBRs) is critical for timely vulnerability mitigation. We present an evaluation of prompt engineering and fine-tuning approaches for predicting SBRs using Large Language Models (LLMs). Our results revealed a distinct trade-off between the two approaches. Using the prompted proprietary models, we observed the highest sensitivity to SBRs with an average G-measure of 0.77 among all the tested datasets, albeit at the cost of a higher false-positive rate, resulting in a precision of only 0.22. In contrast, fine-tuned models offered more viable solutions, achieving an average G-measure of 0.51 with a 10 to 50 times reduction in inference latency compared to proprietary LLMs. While proprietary models offer accessibility, specialized fine-tuned models remain the optimal choice for privacy-preserving and cost-effective SBR prediction.

Towards Project-Aware Actionability Detection for Coding Rule Violations

Széles Csoma Lázár and Gergő Balogh

In our experience as developers and software engineering researchers, static analysis tools often produce overwhelming numbers of warnings, many of which prove irrelevant within a specific project context. Despite their intent to improve code quality, these tools frequently generate diagnostic noise that erodes developer trust and leads to alert fatigue. Through this work, we explore how project-aware actionability detection can help bridge the gap between automated analysis and developers’ practical needs. We present a preliminary study using an active learning–based framework to distinguish actionable from non-actionable coding rule violations. Experiments across 21 diverse projects from the NASCAR dataset show that a Random Forest–based active learner can achieve stable F1-scores with only 20–40% of labeled data, significantly reducing manual annotation effort. While computational efficiency remains a limitation, our findings suggest that such models could be integrated into IDEs and project management environments, where lightweight developer feedback would enable continuous refinement of project-specific actionability models.

TriGuard: LLMs, Security Analyzers, and Human-Feedback for Secure Code Generation and Vulnerability Detection

Ehsan Firouzi and Mohammad Ghafari

Many developers increasingly rely on LLMs to accelerate software development, yet the code they generate can introduce security risks. We survey publications in top-tier software engineering venues in 2025 and find that they heavily rely on security analyzer tools to evaluate the security of LLM-generated code. To empirically assess the reliability of these tools, we manually review 1,080 LLM-generated code samples to construct a ground-truth dataset and evaluate CodeQL and Semgrep on this corpus. Both tools exhibit low recall across several CWE categories and high false-positive rates, raising concerns about prior studies that rely solely on security analysis tools and highlighting the continued necessity of expert manual review as a key gap in the literature. Experiments with GPT further show that security analysis tools and LLM-based vulnerability detection are complementary but still leave critical blind spots, underscoring the ongoing need for human oversight. To address this gap, we propose a framework that integrates security analyzers, LLMs, and expert feedback through a dual-source retrieval-augmented generation (RAG) pipeline for secure code generation.

Don’t Mind the Mesh: An Empirical Study of Istio Service Mesh Security in GitHub

Kohsuke Sonoda, Jose Luis Martin-Navarro and Tuomas Aura

Service meshes like Istio provide built-in security mechanisms for microservices-based systems, but how they are configured and used in practice remains unclear. This study presents a large-scale empirical analysis of Istio security configurations across 7,429 GitHub repositories containing Istio-related YAML files. We propose a method to systematically extract configurations across multiple API versions and analyze the adoption and combinations of key features, including mTLS, authentication, and authorization. We further map missing or incomplete configurations to potential security risks and quantify affected repositories. Our analysis shows that while Istio provides basic security by default, comprehensive configurations are only partially adopted. Although the combination of Strict mTLS, Peer Authentication, and HTTP Authorization is most common, many repositories disable mTLS, rely on deprecated Alpha APIs, or lack critical authorization mechanisms. As a result, over 6,000 repositories remain potentially vulnerable to attacks including highly starred projects. Overall, this study demonstrates that merely deploying a service mesh does not guarantee sufficient security, effective protection depends on comprehensive and correct configuration of Istio’s security features.