IEEE ICAD 2026

2026 IEEE International Conference on AI and Data Analytics
(ICAD 2026)

June 11 – 12, 2026

Boston, Massachusetts

Virtual Session Abstracts and Final Papers

55 Quality-Driven Agentic Reasoning for LLM-Assisted Software Design: Questions-of-Thoughts (QoT) as a Time-Series Self-QA Chain presented by Yen-Ku Liu, Yun-Cheng Tsai (National Taiwan Normal University)

Recent advances in large language models (LLMs) have accelerated AI-assisted software development, yet practical deployment remains constrained by incomplete implementations, weak modularization, and inconsistent security practices. We introduce Questions-of-Thoughts (QoT), a quality-driven inference-time scaffold that turns a user goal into (i) an ordered sequence of engineering steps and (ii) stepwise self-questioning to verify constraints and reduce omission errors, while maintaining a lightweight reasoning record that stabilizes subsequent design decisions.
 
We evaluate QoT across three representative backend engineering domains: API Design, Data Communication, and File Systems. Each task requires multi-module decomposition and exposes standard failure modes in LLM-generated systems. To enable data-driven comparison, we score generated artifacts using an ISO/IEC-inspired quality rubric that measures Scalability, Completeness, Modularity, and Security. We report domain-wise gains as the change in total quality score, defined as the QoT score minus the NoQoT score. Results show capacity-dependent improvements: QoT yields substantial gains for larger models and more complex domains, while smaller models may exhibit trade-offs under tight context and planning budgets.
 

We release an open artifact with prompts, scoring guidelines, raw generations, and scripts that reproduce the reported tables and figures to support applied AI and data analytics research.

184 Evaluating Player Roles in the NBA: Benchmarking Machine Learning Models Against a Rule-Based Rank Score System presented by J-neil Bagamasbad, John Paul Vergara (Ateneo de Manila University)

This study evaluates whether a transparent, rule-based classification framework can serve as a viable alternative to machine learning (ML) models for identifying NBA player roles. Accurate role classification is essential for roster construction, scouting, and strategic decision-making, yet commonly used advanced metrics such as Player Efficiency Rating (PER) and DARKO Daily Plus-Minus (DPM) are designed to estimate player value rather than explicitly distinguish functional roles. Using regular-season NBA data from 2020–2025, this work compares a rule-based Rank Score system against supervised ML classifiers—k-Nearest Neighbors, Decision Trees, and Naive Bayes—as well as PER- and DARKO-based role tiering. Player roles were defined as Starters, Role Players, or Benchwarmers using a deployment-based ground truth derived from games started and minutes played. Rank Score integrates multi-season box-score statistics with position- and minutes-aware thresholds and was evaluated in baseline, accuracy-adjusted, and F-score–adjusted configurations. Results show that while k-Nearest Neighbors achieved the highest overall agreement with ground truth labels (69.02 percent), adjusted Rank Score variants substantially improved classification balance, particularly for Role Players, narrowing the performance gap while preserving interpretability. In contrast, PER and DARKO exhibited limited effectiveness as standalone classifiers. These findings demonstrate that interpretable, rule-based systems can meaningfully support role classification when appropriately calibrated and highlight the value of combining transparent frameworks with higher-performing ML models in applied basketball analytics.

121 Recognizing Multiple Emotions in Bangla Paragraphs Using Multi-Label Classification presented by Sadia Islam, Md Minhajul Karim, Noortaz Rezoana (Premier University)

Multi sentiment analysis targets multiple emotions on one text and provides richer signals than single label approaches. Research in Bangla remains sparse at the paragraph level, where states overlap and depend on cultural and contextual cues. This work addressed multi-label emotion recognition in Bangla paragraphs using a dataset of over one thousand texts annotated with five emotions. The task was approached with a range of models, including a CNN-LSTM architecture and several transformer-based systems such as BanglaBERT, mBERT, SBERT-MLP, and XLM-RoBERTa. Performance was measured using precision, recall, F1 score, confusion matrices. SBERT MLP reached an average F1 of 0.75 and mBERT achieved the highest validation accuracy at 75.49 percent. XLM-RoBERTa with LoRA adaptation delivered the strongest results, reaching a macro F1 of 0.7968, a micro F1 of 0.7967, and a micro accuracy of 79.15 percent.

183 A Robust Deep Learning Framework for Bangla License Plate Recognition Using YOLO and Vision–Language OCR presented by Nayeb Hasin, Md. Arafath Rahman Nishat, Mainul Islam, Khandakar Shakib Al Hasan, Asif Newaz (Islamic University of Technology, BUET)

An Automatic License Plate Recognition (ALPR) system constitutes a crucial element in an intelligent traffic management system. However, the detection of Bangla license plates remains challenging because of the complicated character scheme and uneven layouts. This paper presents a robust Bangla License Plate Recognition system that integrates a deep learning–based object detection model for license plate localization with Optical Character Recognition (OCR) for text extraction. Multiple object detection architectures, including U-Net and several YOLO (You Only Look Once) variants, are compared for license plate localization. A key contribution includes a novel two stage adaptive training strategy for YOLOv8, incorporating phase-aware augmentation and progressive layer unfreezing to improve robustness under real-world variations.  Extensive experiments show that the proposed approach achieves 97.83% accuracy and 91.3% IoU, outperforming multiple YOLO variants. Additionally, a VisionEncoderDecoder-based OCR framework with BanglaBERT achieves superior character-level performance (CER 0.1323). The proposed system also shows a consistent performance when tested on an external dataset that has been curated for this study purpose, highlighting its practical applicability. The dataset offers completely different environment and lighting conditions compared to the training sample. Overall, our proposed system provides a robust and reliable solution for Bangla license plate recognition and performs effectively across diverse real-world scenarios, including variations in lighting, noise, and plate styles. These strengths make it well suited for deployment in intelligent transportation applications such as automated law enforcement and access control.

156 Revify: An Agentic Opinion Mining System for Product Reviews presented by Soorya Sivaramakrishnan, Harshal Vhatkar, Siddhesh Shrawne, Shaily Goyal, Siddhartha Chandra (Sardar Patel Institute of Technology)

In the rapidly evolving e-commerce landscape, the sheer volume and variability of high-velocity customer feedback presents a significant challenge for enterprise-level Voice of the Customer (VoC) analytics. This paper introduces Revify, an agentic opinion mining system designed to autonomously collect, structure, and analyze online product reviews to generate actionable Customer Experience (CX) intelligence. Leveraging Large Language Models (LLMs) within a Thought–Action–Observation (TAO) framework, the system coordinates specialized agents to perform granular, feature-wise sentiment analysis while maintaining strict evidence-based grounding. The architecture incorporates chunked processing to ensure scalability and artifact persistence to mitigate the strategic risks of model hallucination during business decision-making. We present the methodology and implementation of Revify, demonstrating its effectiveness in transforming unstructured textual data into prioritized design insights that empower product development teams to optimize
the customer journey and streamline product iteration.

162 Validating Concept Neighborhood Proxies for Scalable Concept Mapping in Explainable AI presented by Sultaana Lawal, Bilkisu Muhammad-Bello (Nile University of Nigeria)

Concept-based explanations are important in building trust in machine learning, yet there is still an unaddressed problem. Most of the methods used today use labels that are crowd-sourced, which are not only expensive but also subjective and introduce human bias. In this paper, a formalized pipeline for “concept-borrowing” was introduced. This pipeline eliminates the need for humans to manually annotate concepts or being in the loop of the process at all, which therefore creates a way that is scalable in generating human-driven concepts automatically. Two versions of this were developed and tested; the standard baseline pipeline and the optimized pipeline. They were evaluated using metrics such as accuracy, speed, and memory usage. The optimized version demonstrated that the extraction of concept automatically is possible even on mid-scale datasets, besides that the result was validated against a human labelled annotated subset. It was found that by selecting the right data representation and similarity metrics, there is a considerable improvement in alignment with human logic and has “neighborhood coherence” even though perfectly matching human logic is difficult. This research represents the first systematic validation of concept-borrowing pipelines. The results show that the systems can act as a reliable proxy for cognition without having to label them manually. Finally, it is demonstrated that design choices matter in bringing balance to interpretability and technical scalability which paves a way for the use of explainable AI in more real-world systems.

180: Optimizing Lexicon Design for Micro-Resource Code-Mixed Disfluent Speech Recognition presented by Anuran Mitra (Jadavpur University); Tapabrata Mondal (Jadavpur University); Sivaji Bandyopadhyay (Jadavpur University)*

Automatic Speech Recognition (ASR) for low-resource, disfluent code-mixed languages presents significant challenges in micro resource regimes (nearly one hour) where deep learning models often fail to generalize. This paper addresses the unexplored domain of disfluency-aware Bengali-English code-mixed ASR by establishing a reproducible Gaussian Mixture Model–Hidden Markov Model (GMM-HMM)  baseline using the Kaldi toolkit. Unlike conventional systems that filter hesitations, our framework explicitly models disfluencies (e.g., filled pauses) to maintain semantic fidelity for downstream tasks like sentiment analysis. To address the complete absence of acoustic data,
we curated a 1.3-hour purely synthetic corpus using Large Language Model (LLM) and Indic Parler Textto- Speech (TTS) model. Experiments revealed that naive augmentation initially degraded Word Error Rate (WER) from 39.90% to 42.74% due to transliteration
inconsistencies. However, introducing a principled phonetic normalization pipeline with optimized widebeam decoding recovered performance, achieving a best-case WER of 37.74%. Crucially, a simple Triphone model trained on consistent data (WER 39.23%)
significantly outperformed advanced Linear Discriminant Analysis – Maximum Likelihood Linear Transform (LDA-MLLT) models trained on noisy data (WER 42.74%). These results empirically confirm that in very low-resource scenarios, lexicon consistency and data quality are more critical determinants of ASR performance than model complexity.

159 A Novel Multi-Stage Deep Architecture with SIFT-VGG Fusion Net for Muzzle based animal Identification presented by Pranshu Tiwari (AI Thinking Labs)*; Vanshika Mehlawat (BMU); Suyash Patel (North Carolina State University) Pratima K (BMU); Swapnadip Nandi (AI Thinking Labs)

Reliable cow identification through muzzle biomet rics has emerged as a practical, non intrusive alternative to RFID tags and manual records keeping. Earlier studies have focused primarily on the US market [1] , leaving a gap in understanding performance under diverse field conditions. Realworld farms, however, present substantial illumination variability, which degrades the accuracy of both deep learning models and hand crafted feature extractors when used in isolation and localized context. Our study verified that generalized model is preferred over traditional models to improve validation accuracy in unseen data.This work also introduces a fusion based cow muzzle recognition  framework that leverages the complementary strengths of VGG16 deep convolutional features and SIFT and Bag of Visual word descriptors to achieve illumination&robust identification.  By integrating these two modalities through feature& level fusion, the proposed system captures both global texture patterns and fine grained local keypoints of muzzle prints. Our evaluation demonstrates that the fused architecture achieves 99.5% accuracy which is 2% than conventional models on illumination and diverse datasets. The results highlight the effectiveness of hybrid feature fusion for practical, field deployable livestock biometric systems.

8 Computational modeling of Bispecific T-Cell Engager Targeting B7H3 (CD276) for Immunotherapy in Medulloblastoma presented by Monica Gude, Gaurav Sharma (Fort Mill High School, Eigen Sciences LLC)

Medulloblastoma is the most prevalent malignant brain tumor in children, with current treatments such as surgery, chemotherapy, and radiation often leading to severe neurodevelopmental side effects. To explore safer therapeutic alternatives, this study investigates the feasibility of Bispecific T-cell Engager (BiTE) immunotherapy targeting the B7H3 (CD276) receptor in medulloblastoma. Using an integrated computational pipeline involving UniProt, AlphaFold, P2Rank, HDOCK, PLIP, PRODIGY, and GROMACS, protein structures were modeled, docked, and dynamically simulated to assess molecular interactions between the T-cell receptor (TCR/CD3) and B7H3. Among the tested single-chain variable fragments (scFvs), 9B6T and 9EHL exhibited the strongest binding affinities and interaction stability, with binding energies reaching up to −15.0 kcal/mol. These results highlight the potential of BiTE constructs as targeted immunotherapies capable of selectively engaging T-cells to medulloblastoma cells while minimizing off-target toxicity. Finally, the AlphaFold-modeled BiTE molecule exhibited strong thermal and structural stability, making it a promising alternative to conventional antibodies. Future in vitro and in vivo validation is required to confirm the predicted efficacy and optimize BiTE stability for clinical translation.

37 Autonomous Bias Mitigation in Talent Acquisition: An Intelligent Redaction Framework for Fair Hiring presented by Sneh Lata (Rchilli Inc.)

Algorithmic bias in recruitment systems poses significant risks to workplace diversity, regulatory compliance, and organizational reputation. Research indicates that traditional resume screening processes exhibit measurable disparities across protected demographic categories, with callback rate differentials reaching 50% for equivalent qualifications. This paper introduces the Fair AI Intelligent Redaction (FAIR) framework, a multi-agent system designed for autonomous bias mitigation in enterprise talent acquisition platforms. FAIR implements intelligent redaction of 57 bias-inducing fields across seven protected categories while preserving job-relevant qualifications essential for merit-based evaluation. The framework incorporates four specialized agents: Bias Detection Agent for identifying protected attributes, Classification Agent for categorizing field types, Intelligent Redaction Agent for context-aware removal, and Fairness Validation Agent for compliance verification. Experimental evaluation across 180,000+ resumes demonstrates significant improvements: 94% demographic parity score (from 62% baseline), 96.8% average field redaction accuracy, and 73% reduction in disparate impact indicators. The framework supports 40+ languages and maintains compliance with EEOC guidelines, GDPR, NYC Local Law 144, and emerging EU AI Act requirements. These results establish FAIR as a viable approach for achieving algorithmic fairness in high-stakes employment decisions while maintaining hiring efficiency.

103 Access Control Mechanisms for Agentic AI in Multi-tenant Cloud Environments presented by Arun Ganapathi, Dishant Banga (Oracle)

Agentic AI systems are being integrated rapidly into multi-tenant cloud platforms, yet existing Identity and Access Management (IAM) frameworks were designed for human users with static identities and cannot adequately handle autonomous agents that shift personas, delegate tasks recursively, and operate across tenant boundaries. This paper proposes a persona-aware, attribute-based access control (ABAC) architecture that extends OAuth 2.1 token semantics with verifiable persona, tenant, and delegation-chain claims to enable fine-grained, context-sensitive authorization for AI agents. We compare this approach against a realistic IAM baseline RBAC augmented with conditional policies, session attributes, and short-lived credentials and report results from a controlled prototype evaluated in a simulated multi-tenant enterprise environment. The prototype eliminates cross-tenant data exposure and reduces policy enforcement errors, with modest token-validation overhead of 2ms. We identify remaining research gaps in behavioral intent modeling and legacy-system integration, and outline a roadmap for production validation.

124 Explainable Customer Churn Prediction with Gradient Boosting, SHAP Insights, and Business-Aware Thresholding presented by Trishita Dhara, Siddhesh Sheth, Aishwarya Budhkar (Upper Hand, Ace Rent a Car, Indiana University)

Customer churn prediction is a critical task for subscription-based businesses, where predictive models must support actionable and economically meaningful retention decisions. While ensemble models such as gradient boosting achieve strong performance on tabular customer data, their limited transparency and reliance on fixed decision thresholds hinder real-world deployment. 
 
This paper presents an applied churn prediction framework that integrates gradient boosting with SHAP-based explanations and decision-aware threshold selection. Using the IBM Telco Customer Churn dataset, we demonstrate how calibrated probability estimates and interpretable feature attributions can be translated into operational retention decisions under asymmetric business costs. Experimental results show that the proposed approach maintains strong predictive performance while improving expected business outcomes compared to naïve thresholding strategies. 
 

The framework emphasizes deployability and transparency, illustrating best practices for explainable, decision-oriented AI in enterprise customer analytics. 

146 AI-Powered Merchant Named Entity Recognition in Financial Transactions: A Production-Scale Transformer System presented by Rama Mohan Reddy Pilli, Joe Koch, Chris Wright (Kard Financial)

Extracting structured merchant information from abbreviated financial transaction descriptions is a critical challenge for AI-powered financial services. This paper demonstrates a production-scale transformer-based NER system processing over 20 million daily transactions at 3,821 transactions per second, deployed at Kard Financial, Inc. for 8 months across over 60 financial institutions with consistent cross-institution
performance. We extract merchant names, locations, payment facilitators, and identifiers from inconsistent transaction strings to enable fraud detection, transaction categorization, and personalized recommendations. The system achieves 21.8% F1 improvement over handengineered rule-based baselines while reducing engineering effort by 93% (8 hours annotation vs 120 hours rule engineering), and has maintained stable performance over 8 months serving real-world financial applications. Our deployment demonstrates transformers generalize across diverse institutions without perinstitution fine-tuning, processing volumes that grew 25–30% with <1% F1 drift through automated monitoring and retraining. Key results: (1) domain pretraining on 5 million transactions yields 3.6% F1 gain; (2) confidence calibration achieves 0.957 precision at 0.85 threshold; (3) lightweight DistilBERT matches larger models with 2.6× faster inference; (4) MLOps automation enables stable long-term operation. Index Terms—AI Applications, Financial Technology, Production AI Systems, Transformer Models, Named Entity Recognition, Domain Adaptation, MLOps, High-Throughput AI, Scalable AI Systems

196 A Risk-Based Governance Framework for Higher Education Analytics presented by Poorva Patil (Old Dominion University)*; Neha Niphadkar (Old Dominion University); Bhargav Narayanavaram (Old Dominion University)

Artificial intelligence (AI) is increasingly embedded in higher education analytics to support functions such as enrollment forecasting, student success monitoring, and institutional planning. Although often framed as decision-support tools, these systems can meaningfully influence high-stakes decisions affecting students and institutions, raising concerns related to fairness, transparency, accountability, and trust. While Responsible AI principles are widely articulated, higher education institutions frequently struggle to operationalize these principles through enforceable governance mechanisms. Existing approaches are often fragmented, overly generic, and disconnected from the AI lifecycle, resulting in inconsistent oversight and unclear accountability.
 
This paper proposes a risk-based governance framework for responsible AI deployment in higher education analytics environments. The framework integrates three core dimensions: (1) risk classification of AI use cases based on decision influence and potential harm; (2) lifecycle-embedded governance controls spanning design, development, deployment, and monitoring; and (3) clearly defined accountability roles and decision rights.
 

By aligning governance rigor proportionally with assessed risk, the framework avoids both under-governance of high-impact systems and excessive oversight of low-risk analytics. An illustrative student success analytics use case demonstrates how the framework operationalizes Responsible AI principles through concrete controls, review checkpoints, and governance artifacts. The proposed approach reframes AI governance as an integrated, enforceable system embedded within institutional decision-making rather than a static policy layer.

203 Policy-Aligned Autonomous Data Pipelines in Higher Education: A Governance-First Approach to Self-Healing Systems presented by Neha Suhas Niphadkar, Poorva Patil, Bhargav Narayanavaram (Old Dominion University)

The higher education data landscape is undergoing rapid transformation, driven by increasing demand for timely, institution-wide analytics and the growing diversity of data platforms. While institutions historically relied on centralized enterprise systems, modern environments now span multiple operational, instructional, and analytical tools, intensifying challenges in data integration, consistency, and governance.
 
In this context, institutions increasingly depend on complex data pipelines to support analytics and governance-driven decision making. These pipelines are often disrupted by schema drift, such as the introduction of new academic entities that violate legacy constraints, resulting in manual remediation processes that are time-consuming, error-prone, and difficult to audit. While recent approaches emphasize machine learning for pipeline automation, such methods introduce challenges in regulated environments where explainability, determinism, and policy compliance are critical.
 
This paper presents a cloud-native, metadata-driven pipeline that enables autonomous drift detection and remediation using deterministic symbolic logic. Through automated metadata inspection and reference table validation orchestrated via stored procedures, the system safely evolves schema definitions while maintaining a fully auditable change record. In a controlled deployment, the approach reduced remediation time from days to seconds while preserving governance compliance.
 

We demonstrate that this design generalizes across multiple classes of schema drift and enables “silent reliability,” where data availability and institutional trust are preserved without operational disruption. These findings suggest that autonomy in data pipelines can be achieved through system design rather than probabilistic models, offering a practical and explainable path toward self-healing analytics infrastructure in regulated institutional settings.

149 AI-Based Discovery of Silent Failures in Financial Software: Representation Learning for Semantic Data Integrity in CI Pipelines presented by Tetiana Afanasieva (Koyfin)*

Financial software frequently exhibits silent failures: defects that preserve schema validity and do not crash services, yet silently distort the economic meaning of outputs such as historical price/NAV charts, ratios, or performance series. These failures evade traditional test oracles because computed responses remain numerically plausible and API contracts remain intact. We present a practical system for AI-based discovery of silent failures in financial data processing and visualization pipelines. The approach formulates detection as a representation
learning problem over time-series segments and financial invariants (e.g., split/dividend-adjusted continuity, return bounds, volatility regimes, and cross-field consistency). A self-supervised encoder learns a latent behavioral embedding of “financially plausible” segments; silent failures are flagged via a composite score combining embedding distance, reconstruction residual, and invariant-violation attribution. We implement the system as a CI-integrated quality gate that augments existing end-toend tests without requiring ground-truth labels. In controlled experiments using a reproducible generator and an injection suite of 12 silent-failure modes, the proposed method improves F1
by 19–31 points over rule-based baselines at comparable falsepositive budgets and reduces mean time-to-triage by 3.4× via localized explanations. We discuss deployment considerations for regulated environments, including determinism, auditability, and explainability. 

239 Understanding Autonomous Public Safety Drone Operations through Transparent and Interpretable Visual Analytics presented by Swarnamouli Majumda , Anjali Awasthi (Concordia University, Zenext AI)

Autonomous drones are rapidly becoming integral to public safety operations, yet their deployment practices and patterns remain largely opaque to the public. This study presents a visual analytics framework for enhancing transparency and interpretability in police drone activity using open operational data. Drawing on publicly released San Francisco Police Department flight logs, the research demonstrates how spatial–temporal visualization and interpretable modeling can transform raw flight data into accessible, evidence-based insights about drone usage in urban environments. By integrating visual analytics with explainable modeling, the framework enables transparent assessment of deployment behavior and supports informed oversight of technology-assisted policing. The study contributes a reproducible, data-driven approach for examining autonomous systems in public safety contexts, emphasizing openness, interpretability, and accountability as foundations for trustworthy drone governance.

247 Machine Learning-based Prediction of Ciprofloxacin Resistance in Escherichia coli Using Antimicrobial Susceptibility Testing Metadata presented by Huanran Yu (Jordan HS)

Antimicrobial resistance (AMR) in Escherichia coli threatens the effectiveness of first-line therapies, yet many machine learning approaches for resistance prediction rely on whole-genome sequencing, which limits clinical adoption. This work presents a metadata-driven pipeline for predicting ciprofloxacin resistance in Escherichia coli using a large BV-BRC-derived AST dataset and explicitly quantifies how far routinely
collected AST metadata alone can match the performance typically attributed to genomics-based models. Each record represents an isolate–antibiotic pair and includes the antibiotic name, categorical resistant phenotype, MIC measurements, and detailed laboratory metadata such as typing method, platform, and testing standard. After restricting labels to “Resistant” and “Susceptible” and selecting ciprofloxacin, 7,629 isolates remain (25% resistant), forming a realistic, moderately imbalanced binary classification task. Features are constructed from MIC values and laboratory descriptors via median imputation for numeric fields and one-hot encoding for categorical fields, yielding a sparse design matrix suitable for linear and tree-based models. A class-weighted logistic regression and a class-weighted
random forest are trained and evaluated with stratified 5-fold cross-validation and a held-out test set, using accuracy, precision, recall, F1 score, and ROC–AUC as metrics. The best model, the random forest, achieves approximately 90% accuracy, 0.75 F1 for the resistant class, and 0.84 ROC–AUC on the held-out test cohort, substantially outperforming a majority-class and a shallow decision-tree baseline. These results demonstrate that high-quality ciprofloxacin resistance prediction is achievable from routinely collected AST metadata alone, providing a simpler and more deployable alternative to genomics-heavy AMR prediction pipelines.

229 AI-Based Detection of Financially Incorrect States in Visually Stable Web Dashboards presented by Tetiana Afanasieva (Koyfin)*

Web-based financial dashboards increasingly serve as decision-critical interfaces for investors, analysts, and portfolio managers [4]. Despite visually correct rendering and successful execution of automated end-to-end UI tests, dashboards may silently present financially incorrect states due to data staleness, multi-provider inconsistencies, aggregation window misalignment, or asynchronous update anomalies [1]. Existing UI testing frameworks such as Cypress validate structural and visual correctness but lack semantic awareness of financial invariants and temporal consistency constraints [8]. This paper presents reproducible AI-based framework for detecting financially incorrect
states in visually stable dashboards. The approach leverages execution traces collected during Cypress test runs, including network
telemetry, DOM mutation dynamics, and domain-specific financial features, and applies unsupervised anomaly detection to identify deviations from learned normal behavior. Extensive experiments using public financial data sources demonstrate that the proposed method significantly outperforms assertion-only and rule-based baselines, achieving up to 0.82 F1 score while detecting failure modes invisible to conventional UI testing. The framework is designed for integration into continuous integration pipelines and is fully reproducible.

186 Agent-to-Agent -MCP Architecture for Intelligent Enterprise Payroll Management presented by John Selvaraj Arulappan (ADP Celergo)*; Velu Natarajan (GoodRx); Santosh Vasudevan (Caterpillar)

Traditional enterprise payroll systems suffer from monolithic architectures that tightly couple business functions, creating significant challenges in scalability, maintainability, and extensibility. Users must navigate complex technical interfaces, and adding new capabilities requires invasive code modifications. This paper presents an A2A-MCP multi-agent architecture for enterprise payroll management systems utilizing Google’s Agent to-Agent (A2A) protocol combined with Large Language Model (LLM) driven intelligent routing and Model Context Protocol (MCP) integration. The proposed system addresses these critical limitations by implementing a modular, scalable agent-based
architecture where specialized agents handle distinct payroll functions including payment processing, employee management, payslip generation, and reporting. Beyond query-response interactions, our system supports full CRUD (Create, Read, Update, Delete) operations and autonomous task triggering, enabling agents to execute business actions rather than merely providing information. Our approach introduces three key innovations: (1) an LLM-powered routing mechanism that analyzes natural language commands and directs them to
appropriate specialized agents with confidence scoring, (2) dual LLM integration where both the orchestrator and individual agents leverage language models for intent understanding and response generation respectively, and (3) MCP server integration enabling standardized tool access for database operations including transactional writes. The architecture demonstrates significant advantages in modularity, extensibility, and user experience through natural language interaction, while enabling plug-and-play agent deployment without system modifications.

174 PromptOps: An End-to-End Architecture for Prompt Engineering Lifecycle Management in Large Language Model Applications presented by Kumar Kasimala (Salesforce Inc)*; Ashok Kumar (Independent Researcher)

Prompt engineering has become essential as a control measure of the behavior, reliability and safety of large language model (LLM) applications. In practice, however, in most real-life deployments, prompts are created and adapted informally and do not have a systematic lifecycle management approach, traceability and governance. The paper introduces an end-to-end Prompt Engineering Lifecycle Management (PELM) architecture that has formalized the prompt design, assessment, optimization, deployment, monitoring and
governance into a streamlined framework in which feedback is formalized. Prompts are defined as lifecycle-managed engineering resources, which can be versioned, evaluated automatically, track drift and optimized automatically without model parameters. Experimental analysis on reasoning, summarization and domain-related generation. Conventional prompting versus lifecycle-managed prompts. Every evaluation of reasoning, summarization and domain-specific generation tasks shows that lifecycle-managed prompts have a more stable
performance, are less prone to prompt drift, have increased resistance to semantic perturbation and lesser governance violations. The findings validate that organized lifecycle management allows persistent, trusted and audit-recorded timely conduct in a manner that makes prompt engineering a structured engineering procedure that can be utilized in scalable, compliance-sensitive LLCM undertakings.

152: Agentic CDNs: A Multi-Agent Architecture for Edge-Native AI Inference and Control presented by Venkata Gopi Kolla (Salesforce Inc)*; Chintan Tank (Salesforce Inc); Luc Giavelli (Salesforce Inc)

To enhance and convert Content Delivery Networks (CDNs) into distributed artificial intelligence inference and real-time control ones, a multi-agent architecture is presented in this paper and turns them into autonomous edge computing systems. The proposed hierarchical framework involves the use of large language model-based decision-making, with specialized agents deployed to individual edge nodes. The coordination process between agents is also made secure and verifiable in a permission blockchain, which guarantees a trustworthy execution. Possessing privacy-preserving schemes, such as secure multi-party computation and differential privacy, facilitates keeping sensitive information at the edge and, at the same time, promotes collaborative aggregation. The outcome of the experimental performance is that there is a decrease in inference latency by 41.2 %, an increment in the cache hit rate by 34.8 %, a service availability of 99.3 % and coordination efficiency of 96% and a decrease in the usage of the backbone bandwidth by 62 %. These findings show that Agentic CDNs offer a scalable, secure and intelligent framework of edging services with good data sovereignty assurances.

101: Pattern recognition and prediction with a hidden Markov-linear regression model presented by Ekaterina Vedennikova (University of Latvia)*; Dmitry Gromov (University of Latvia)

While time series modeling and forecasting are well-developed with a wide range of statistical and machine learning-based tools, they often struggle with challenges such as unknown underlying time series structures, non-uniform time evolution of events, the need for long-term forecasting where pattern recognition is prioritized over point-in-time precision, as well as the need to train models using multiple time series simultaneously. The standard methods are typically applied to single time series, making them computationally expensive for large datasets and unreliable for short time series.
 
To address these challenges, a novel approach combining Hidden Markov Models (HMM) and linear regression is proposed. This architecture utilizes Dynamic Time Warping (DTW) for clustering multiple time series, allowing a single model to be fitted for a cluster instead of an individual time series. The underlying pattern is modeled using an HMM, which in turn determines the coefficients of a hidden state-specific linear regression model used for value prediction. 
 
We compare the forecasting accuracy of HMM-AR(4) against the established ARIMA model. Numerical results demonstrate that the HM-AR(4) model significantly outperforms ARIMA models in predictive capability. The improvement over ARIMA becomes even more pronounced for longer time series. 
 
243: An Agentic AI Framework for Severity-Driven Fire Risk Mitigation presented by Swarnamouli Majumdar (Concordia University)*
Sustainable public safety systems must increasingly balance risk mitigation effectiveness with constrained resources, operational capacity, and long-term societal impact. In fire risk management, however, success is still commonly evaluated using incident frequency, despite strong empirical evidence that reductions in fire occurrence do not consistently translate into proportional declines in fatalities, injuries, or economic loss. This disconnect undermines the sustainability of prevention strategies by obscuring where limited resources yield the greatest reduction in harm. This paper introduces a severity-aware, agentic AI framework for sustainable fire risk mitigation that explicitly prioritizes outcome-driven impact over incident suppression. Using longitudinal U.S. fire loss data from 2002 to 2011, we empirically demonstrate a weak coupling between fire frequency and fire severity, with a small number of high-impact events accounting for a disproportionate share of societal harm. To operationalize severity as a decision variable, we formalize a Fire Severity Index (FSI) that integrates fatalities, injuries, and inflation-adjusted economic losses into a unified, policy-controllable metric. Building on this formulation, we propose a Strategic Fire Risk Agent (SFRA) that combines agentic reasoning, reinforcement learning, and constrained optimization to select mitigation strategies under budgetary and operational limits. The framework is further extended through UAV-assisted sensing, which provides adaptive situational awareness to support pre-incident assessment and closed-loop severity reduction. By aligning AI-driven decision-making with consequence-focused risk governance, the proposed framework advances sustainable and resilient public safety systems.

230 Machine Learning for Pakistan Stock Exchange (PSX) KSE-100 Index Prediction presented by Vijay Kumar (Brown University)*; Mohsin Raza (Sukkur IBA University); Pooja Hargun (Virtual University of Pakistan)


Accurate forecasting of stock index movements in emerging markets such as the Pakistan Stock Exchange (PSX) is crucial for risk management, asset allocation, and algorithmic trading, yet remains challenging due to high volatility and nonlinear dynamics in price series. This study proposes a deep learning framework based on a bidirectional Long Short-Term Memory (BiLSTM) and Gated Recurrent Unit (GRU) architecture for predicting the daily closing price of the Pakistan Stock Exchange (PSX) KSE-100 index. To assess the effectiveness of the proposed deep learning approach, its performance is compared with a classical machine learning baseline using Support Vector Regression (SVR). Using a 20-year KSE-100 historical dataset consisting of open, high, low, close, and volume (OHLCV) data, the framework constructs a multivariate feature space including technical indicators such as moving averages, rolling volatility, momentum, Relative Strength Index (RSI), Moving Average Convergence Divergence (MACD), Bollinger Bands, and lagged closing-price features. The deep learning branch employs a BiLSTM–GRU architecture trained on normalized multivariate sequences with a 60-day lookback window to capture temporal dependencies and nonlinear patterns in the KSE-100 time series.  Experimental results on an 80/20 time-ordered train–test split show that the BiLSTM–GRU model achieves a mean absolute error (MAE) of 0.0233, Root Mean Squared Error (RMSE) of 0.035, and a determination coefficient R2 of 0.97 on the test set, while the SVR model attains MAE of 0.060, RMSE of 0.069, and R2 of 0.89. Note that these metrics are reported on the normalized price scale and reflect the model’s ability to track index-level movements—a task distinct from return prediction. The high R2 is consistent with the autocorrelated nature of price series and should not be interpreted as evidence of excess return predictability. The findings highlight that deep recurrent architectures, when augmented with engineered technical indicators can enhance prediction accuracy for PSX KSE-100 index levels. The proposed framework demonstrates practical applicability for algorithmic trading, portfolio management, and risk assessment in the Pakistani equity market, and contributes to a reproducible end-to-end prediction pipeline tailored to this market.

 

178 Real-Time Retrieval-Augmented Meeting Intelligence: Knowledge-Enabled Assistance for Sales and Customer Success presented by Krishna Kishore Pilla (Adobe)*

Sales representatives and customer success agents frequently lack real-time access to organizational knowledge during customer conversations. When a prospect asks about product roadmap features or an enterprise customer reports a critical bug, agents must either defer to follow-up communications or provide incomplete answers, resulting in lost deals and missed service level agreements. We present Clueless, a real-time retrieval-augmented conversational intelligence system that surfaces relevant knowledge during live conversations. Our
system introduces three key contributions: (1) a streaming RAG pipeline that extracts queries from live transcripts and retrieves contextual answers from organizational knowledge bases within sub-second latency; (2) real-time engagement tracking that monitors customer sentiment and triggers pivot recommendations when disinterest is detected; and (3) template-driven talking points with live coverage tracking to guide conversations toward successful outcomes. We implement Clueless as a macOS desktop application using parallel dual-agent architecture for simultaneous transcription and analysis. Evaluation demonstrates knowledge retrieval accuracy of 87% for sales scenarios and 82% for customer support, with engagement prediction achieving 86% accuracy. The system enables sales representatives to answer product questions in real-time and customer success agents to surface troubleshooting steps during calls, reducing average call
resolution time and improving customer satisfaction metrics.

38: TraceX: Central Finite-Difference Explainability for AI-Based Financial Credit Evaluation presented by Memoona Aziz (Western University Ontario); Muhammad Umair Danish (Western University Ontario)*; Katarina Grolinger (Western University Ontario); Umair Rehman (Western University Ontario)

Financial credit evaluation requires accuracy and interpretability due to its high-stakes impact on individuals and institutions. While deep learning (DL) models excel in predictive performance, they lack transparency. Existing explainability methods, such as SHAP and LIME, provide feature attributions but are computationally expensive, unstable, and limited to input-output mappings. This paper introduces TraceX, a deterministic and gradient-free method for generating output-level and layer-wise feature attributions using finite-difference perturbations. TraceX eliminates the need for background data or surrogate models, making it highly scalable and stable for financial datasets. We evaluate TraceX across MLP, CNN, Transformer, and Autoencoder models on a loan approval dataset, comparing it with SHAP and LIME using fidelity, sensitivity, and sparsity metrics. Experimental results show that TraceX achieves the lowest infidelity scores and fastest computation times while presenting sub-layer interpretability. These contributions make TraceX suitable for regulated financial environments. Future directions include adapting TraceX to temporal data and broader domains.

Created and maintained by Ballos Associates

Join our mailing list and stayed informed of SiPS 2024 Updates!