publications | Jiangnan HUANG

2025

ICPC2025

Revisiting Security Practices for GitHub Actions Workflows

Jiangnan Huang, and Bin Lin

In Proceedings of the 33rd International Conference on Program Comprehension 2025

Abs PDF Code

GitHub Actions, a built-in CI/CD service of GitHub released in 2019, has become one of the most widely adopted tools among developers for automating software development workflows. This popularity, however, brings security challenges, as vulnerable workflows can expose repositories and the software supply chains to significant risks. Existing studies have highlighted several types of potential security issues. Over the past few years, GitHub has been constantly promoting better security practices and developers have gained experience in using GitHub Actions. Investigating how developers’ practices for handling GitHub Actions security have changed over time could offer valuable insights for further strengthening the security of these workflows. In this study, we analyzed non-optimal security practices in 18,938 workflows from 5,246 active GitHub repositories. By comparing the prevalence of issues spotted in two different years (2022 and 2024), we find that some undesired practices still widely exist in repositories. However, some progress has been observed, such as the significant reduce of permission misconfigurations.

2023

SCAM2023

CIGAR: Contrastive Learning for GitHub Action Recommendation

Jiangnan Huang, and Bin Lin

Source Code Analysis and Manipulation (Best paper award) 2023

Abs PDF Code

GitHub Actions was introduced in 2019 as an integrated solution for CI/CD to automate software development workflow. Since then, it has gained tremendous popularity among developers. In a GitHub Actions workflow, actions refer to custom applications for performing complex but frequently repeated tasks. Actions can be typically found in GitHub Marketplace or public GitHub repositories. Prior studies have already disclosed that developers often reuse actions to reduce double work and improve productivity. However, it is not trivial for developers, especially novices, to figure out which action to reuse due to the large number of actions available and the limited search functionality GitHub Marketplace provides. To address this issue, we propose CIGAR (ContrastIve learning for GitHub Action Recommendation). Given the textual description of a task developers want to execute, CIGAR will recommend the most relevant actions. CIGAR exploits a pre-trained RoBERTa model to convert sequences of words into high-dimensional vector representations, and is fine tuned through a contrastive learning objective. The performance of CIGAR was evaluated on a novel dataset curated based on prior research, and the results demonstrate that CIGAR can reliably recommend actions needed by developers and significantly outperforms the GitHub Marketplace search engine. Our study indicates the promise of employing contrastive learning for GitHub action recommendation. The promising performance achieved can potentially drive a wider adoption of GitHub Actions and facilitate the automation of software development workflows.

2022

Algorithms
LTU Attacker for Membership Inference

Joseph Pedersen, Rafael Muñoz-Gómez, Jiangnan Huang, and 3 more authors

Algorithms 2022

Abs Bib PDF

We address the problem of defending predictive models, such as machine learning classifiers (Defender models), against membership inference attacks, in both the black-box and white-box setting, when the trainer and the trained model are publicly released. The Defender aims at optimizing a dual objective: utility and privacy. Privacy is evaluated with the membership prediction error of a so-called “Leave-Two-Unlabeled” LTU Attacker, having access to all of the Defender and Reserved data, except for the membership label of one sample from each, giving the strongest possible attack scenario. We prove that, under certain conditions, even a “naïve” LTU Attacker can achieve lower bounds on privacy loss with simple attack strategies, leading to concrete necessary conditions to protect privacy, including: preventing over-fitting and adding some amount of randomness. This attack is straightforward to implement against any model trainer, and we demonstrate its performance against MemGaurd. However, we also show that such a naïve LTU Attacker can fail to attack the privacy of models known to be vulnerable in the literature, demonstrating that knowledge must be complemented with strong attack strategies to turn the LTU Attacker into a powerful means of evaluating privacy. The LTU Attacker can incorporate any existing attack strategy to compute individual privacy scores for each training sample. Our experiments on the QMNIST, CIFAR-10, and Location-30 datasets validate our theoretical results and confirm the roles of over-fitting prevention and randomness in the algorithms to protect against privacy attacks.
@article{a15070254, author = {Pedersen, Joseph and Muñoz-Gómez, Rafael and Huang, Jiangnan and Sun, Haozhe and Tu, Wei-Wei and Guyon, Isabelle}, title = {LTU Attacker for Membership Inference}, journal = {Algorithms}, volume = {15}, year = {2022}, number = {7}, article-number = {254}, url = {https://www.mdpi.com/1999-4893/15/7/254}, issn = {1999-4893}, doi = {10.3390/a15070254}, }

2021

OLA2021
Comparing Local Search Initialization for K-Means and K-Medoids Clustering in a Planar Pareto Front, a Computational Study

Jiangnan Huang, Zixi Chen, and Nicolas Dupin

In Optimization and Learning 2021

Abs Bib

Having N points in a planar Pareto Front (2D PF), k-means and k-medoids are solvable in O(N^3) time by dynamic programming algorithms. Standard local search approaches, PAM and Lloyd’s heuristics, are investigated in the 2D PF case to solve faster large instances. Specific initialization strategies related to 2D PF cases are implemented with the generic ones (Forgy’s, Hartigans, k-means++). Applying PAM and Lloyd’s local search iterations, the quality of local minimums are compared with optimal values. Numerical results are computed using generated instances, which were made public. This study highlights that local minimums of a poor quality exist for 2D PF cases. A parallel or multi-start heuristic using four initialization strategies improves the accuracy to avoid poor local optimums. Perspectives are still open to improve local search heuristics for the specific 2D PF cases.
@inproceedings{10.1007/978-3-030-85672-4_2, author = {Huang, Jiangnan and Chen, Zixi and Dupin, Nicolas}, title = {Comparing Local Search Initialization for K-Means and K-Medoids Clustering in a Planar Pareto Front, a Computational Study}, booktitle = {Optimization and Learning}, year = {2021}, publisher = {Springer International Publishing}, address = {Cham}, pages = {14--28}, isbn = {978-3-030-85672-4} }