Research - I-DSLA

Security and Privacy for Biomedical Data Repositories. We have identified a dangerous inference attack against naive suppression based approaches that are used to protect sensitive information and demonstrated the attack against real data provided by the Healthcare Cost and Utilization Project, though the attack applies in general to any medical database providing a query capability, and does not require the matching of another dataset in order to produce breach of privacy. This work received the 2013 Clinical Research Informatics (CRI) Distinguished Paper Award. We have also worked on detecting inappropriate access to electronic healthcare records, and on performing query auditing. More recently, We have developed techniques to enable privacy-preserving linkage of patient records in different fields as well a privacy-preserving synthetic data generation methodology for exploratory analysis.

Asif, Hafiz, Periklis A. Papakonstantinou, Stephanie Shiau, Vivek Singh, and Jaideep Vaidya. “Intelligent Pandemic Surveillance via Privacy-Preserving Crowdsensing.” IEEE Intelligent Systems 37, no. 4 (2022): 88-96.
Chen, F., Jiang, X., Wang, S., Schilling, L. M., Meeker, D., Ong, T., Matheny, M., Doctor, J., Ohno-Machado, L., & Vaidya, J. (2018). Perfectly Secure and Efficient Two-Party Electronic-Health-Record Linkage. IEEE internet computing, 22(2), 32-41.
Lazrig, I., Ong, T. C., Ray, I., Ray, I., Jiang, X., & Vaidya, J. (2018, August). Privacy Preserving Probabilistic Record Linkage Without Trusted Third Party. In 2018 16th Annual Conference on Privacy, Security and Trust (PST) (pp. 1-10). IEEE.
Vaidya, J., Shafiq, B., Asani, M., Adam, N., Jiang, X., & Ohno-Machado, L. (2017). A Scalable Privacy-preserving Data Generation Methodology for Exploratory Analysis. In AMIA Annual Symposium Proceedings (Vol. 2017, p. 1695). American Medical Informatics Association.
Vaidya, J., Shafiq, B., Jiang X., Ohno-Machado, L. (2013). Identifying Inference Attacks against Healthcare Data Repositories, AMIA Summit on Clinical Research Informatics (CRI). March 20-22, 2013.

Privacy-preserving data analysis. It is the research area where algorithms are proposed that do distributed knowledge discovery, while providing guarantees on the non-disclosure of data. We have proposed novel methods to efficiently analyze vertically partitioned data while preserving data privacy. We have published numerous papers proposing privacy-preserving solutions for all of the widely accepted data analysis tasks such as classification, clustering, association rule mining, top-k analysis, and outlier detection. This work has been published in the premier data mining and database conferences as well as top-tier journals, leading to a flurry of research, making this one of the most active areas of research in data mining. A novelty of our research is the toolkit approach, identifying sub-components that can be composed together to create a privacy-preserving method for any data mining problem, thus reducing the entire mass of problems to a few, key building blocks.

Asif, Hafiz, Jaideep Vaidya, and Periklis Papakonstantinou. “Identifying Anomalies while Preserving Privacy.” IEEE Transactions on Knowledge and Data Engineering (2021).
Asif H, Papakonstantinou PA, Vaidya J. A Guide for Private Outlier Analysis. IEEE Lett Comput Soc. 2020 Jan-Jun;3(1):29-33.
Asif H, Papakonstantinou PA, Vaidya J. How to Accurately and Privately Identify Anomalies. Conf Comput Commun Secur. 2019 Nov;2019:719-736.
Asif, H., Vaidya, J., Shafiq, B., & Adam, N. (2017, May). Secure and Efficient k-NN Queries. In IFIP International Conference on ICT Systems Security and Privacy Protection(pp. 155-170). Springer, Cham.
Vaidya, J., Shafiq, B., Fan, W., Mehmood, D., & Lorenzi, D. (2014). A Random Decision Tree Framework for Privacy-preserving Data Mining. Dependable and Secure Computing, IEEE Transactions on, 11(5), 399-411.
Saptarshi De Chaudhury, Likhith Reddy, Matta Varun, Tirthankar Sengupta, Sandip Chakraborty, Shamik Sural, Jaideep Vaidya, Vijayalakshmi Atluri: Incentivized Federated Learning with Local Differential Privacy Using Permissioned Blockchains. DBSec 2024: 301-319

Adversarial Machine Learning and Protection. As part of a broader approach towards robust learning, we have investigated attacks and defense in reinforcement learning (RL). In attacks, we have improved past state of the art attacks by exploring the weakness of the defender policy by using exploration techniques in RL. In defense, we have explored partial observability based approaches for robust defense. We also study more robust privacy guarantees.

R. Belaire, A. Sinha, P. Varakantham. On Minimizing Adversarial Counterfactual Error in Adversarial Reinforcement Learning, to appear in Proceedings of Thirteenth International Conference on Learning Representations (ICLR), April 2025.
C. Gong, Z. Yang, Y. Bai, J. Shi, J. He, K. Li, B. Xu, A. Sinha, X. Hou, D. Lo, T. Wang. BAFFLE: Hiding Backdoors in Offline Reinforcement Learning Datasets, in Proceedings of 2024 IEEE Symposium on Security and Privacy (SP), May 2024.
C. Gong, Z. Yang, Y. Bai, J. Shi, A. Sinha, B. Xu, D. Lo, X. Hou, G. Fan. Curiosity-Driven and Victim-Aware Adversarial Policies, in the Proceedings of the Annual Computer Security Applications Conference (ACSAC), Dec 2022.
Aurelien Bellet, Edwige Cyffers, and Jalaj Upadhyay. Differentially Private Decentralized Learning with Random Walks. In ICML, 2024.
Arun Ganesh, Abhradeep Thakurta, and Jalaj Upadhyay. Langevin Diffusion: An Almost Universal Algorithm for Private Euclidean (Convex) Optimization. In COLT, 2023.

Access control configuration and analysis. Role Based Access Control (RBAC) and Attribute Based Access Control (ABAC) are the de facto models used for advanced access control. RBAC in particular has been widely deployed in healthcare organizations of all sizes. In recent years, we have developed bottom up and hybrid approaches that can enable automatic configuration of the RBAC/ABAC policy, which is the most important step in deploying and implementing advanced access control. We have also applied software engineering techniques such as model checking and program verification tools to perform security analysis, which is extremely important to ensure that inappropriate access does not occur.

Akhtar A, Shafiq B, Vaidya J, Afzal A, Shamail S, Rana O. Blockchain Based Auditable Access Control for Distributed Business Processes. Proc Int Conf Distrib Comput Syst. 2020 Nov-Dec;2020:12-22. doi: 10.1109/ICDCS47774.2020.00015.
Das, S., Shamik, S., Vaidya, J., & Atluri, V. (2018). HyPE: A Hybrid Approach toward Policy Engineering in Attribute-Based Access Control. IEEE Letters of the Computer Society.
Jha, S., Sural, S., Atluri, V., & Vaidya, J. (2018). Security analysis of ABAC under an administrative model. IET Information Security.
Uzun, E., Parlato, G., Atluri, V., Ferrara, A. L., Vaidya, J., Sural, S., & Lorenzi, D. (2017, July). Preventing unauthorized data flows. In IFIP Annual Conference on Data and Applications Security and Privacy(pp. 41-62). Springer, Cham.
Mian Yang, Vijayalakshmi Atluri, Shamik Sural, Jaideep Vaidya: A Graph-Based Framework for ABAC Policy Enforcement and Analysis. DBSec 2024: 3-23
H. O. Sai Varshith, Shamik Sural, Jaideep Vaidya, Vijayalakshmi Atluri: Enabling Attribute-Based Access Control in Linux Kernel. AsiaCCS 2022: 1237-1239
Samir Talegaon, Gunjan Batra, Vijayalakshmi Atluri, Shamik Sural, Jaideep Vaidya: Contemporaneous Update and Enforcement of ABAC Policies. SACMAT 2022: 31-42
Gaurav Madkaikar, Shamik Sural, Jaideep Vaidya, Vijayalakshmi Atluri: Queuing Theoretic Analysis of Dynamic Attribute-Based Access Control Systems. SEC 2024: 323-337
Amshumaan Pericherla, Proteet Paul, Shamik Sural, Jaideep Vaidya, Vijay Atluri: Towards Supporting Attribute-Based Access Control in Hyperledger Fabric Blockchain. SEC 2022: 360-376
Gunjan Batra, Vijayalakshmi Atluri, Jaideep Vaidya, Shamik Sural: Incremental Maintenance of ABAC Policies. CODASPY 2021: 185-196
Eeshan Gupta, Shamik Sural, Jaideep Vaidya, Vijayalakshmi Atluri: Attribute-Based Access Control for NoSQL Databases. CODASPY 2021: 317-319
Gunjan Batra, Vijayalakshmi Atluri, Jaideep Vaidya, Shamik Sural: Deploying ABAC policies using RBAC systems. J. Comput. Secur. 27(4): 483-506 (2019)

Data Analytics and its Applications. We are primarily interested in the problem of effective summarization and visualization of data as well as the applications of data analytics and visualization to digital government and bioinformatics. We have developed several different data summarization techniques based on boolean matrix decomposition, including weighted rank one factorization and extended boolean matrix decomposition, and query clustering. These methods have the ability to extract interpretable data patterns. We are also collaborating with colleagues in the Rutgers New Jersey Medical School Cancer Center, Rutgers Biomedical and Health Sciences to use data analytics techniques for improving precision medicine in the areas of neurology and oncology. We have investigated the issue of enabling information sharing and interoperability among the different incident management applications and systems used by the various state, county, and local agencies within New Jersey and the neighboring states and contributed to the Newark City Government Twitter project enabling the Newark City Government to be aware of citizens interests, issues, and desires.

Yaqub, U., Chun, S. A., Atluri, V., & Vaidya, J. (2017). Analysis of political discourse on twitter in the context of the 2016 US presidential elections. Government Information Quarterly, 34(4), 613-626.
Hong, Y., Vaidya, J., Lu, H., & Liu, W. M. (2016, January). Accurate and efficient query clustering via top ranked search results. In Web Intelligence (Vol. 14, No. 2, pp. 119-138). IOS Press.
Lorenzi, D., Chun, S. A., Vaidya, J., Shafiq, B., Atluri, V., & Adam, N. R. (2015). Peer: a framework for public engagement in emergency response. International Journal of E-Planning Research (IJEPR), 4(3), 29-46.
Vaidya, J., Yakut, I., & Basu, A. (2014). Efficient Integrity Verification for Outsourced Collaborative Filtering. In Proceedings of the IEEE International Conference on Data Mining (ICDM), December 14 – 17, 2014.

Differential privacy under continual observation. Differential privacy is a rigorous notion of statistical data privacy which gives a guarantee on the algorithms. At a high level, it says that if a single data point changes, then it cannot be discern from the output of a differentially private algorithm. In the continual release, we want to generate a statistics or a task over a streaming data and want to output the statistics after receiving a new streamed input. This simple primitive is used in many deployments, including the recent deployment of Google’s Gboard where the next word prediction is done under the guarantee of differential privacy. In this setting, we have several papers:

Jingcheng Liu, Jalaj Upadhyay, and Zongrui Zou. Optimality of Matrix Mechanism. In ICLR, 2025.
Monika Henzinger and Jalaj Upadhyay. Improved Differentially Private Continual Observation Using Group Algebra. In SODA, 2025.
Joel Andersson, Monika Henzinger, Rasmus Pagh, Teresa Steiner, and Jalaj Upadhyay) Continual Counting with Gradual Privacy Expiration. In NeurIPS, 2024.
Monika Henzinger, Jalaj Upadhyay, and Sarvagya Upadhyay A Unifying Framework for Differentially Private Sums Under Continual Observation. In SODA 2024
Hendrik Fichtenberger, Monika Henzinger and Jalaj Upadhyay. Constant matters: Fine-grained Complexity of Differentially Private Continual Observation. In ICML, 2023.
Monika Henzinger, Jalaj Upadhyay, and Sarvagya Upadhyay Almost Exact Error Bound on Differentially Private Continual Counting. In SODA, 2023

Generate synthetic data that is private. Sometimes it is more important to output a synthetic data that can be used for answering a statistics in the future. However, it is important to ensure that the synthetic data generation does not violate privacy and the cost of answering statistics on these synthetic data does not require significantly more resource than estimating the same estimates on the raw data. One focus of this line of work is to generate synthetic graph that preserves the sparsity of the input graph such that the synthetic graph can be used to answer various cut related queries. This line of research has been published in various machine learning conferences and conferences focusing on algorithms. Some of these works published in the last three years are as follows:

Chengyuan Deng, Jie Gao, Jalaj Upadhyay, Chen Wang, and Samson Zou. On the Price of Differential Privacy for Hierarchical Clustering. In ICLR, 2025.
Jingcheng Liu Almost linear time differentially private release of synthetic graphs (with Jingcheng Liu and Zongrui Zou). In AISTATS, 2025 (oral presentation).
Greg Bodwin, Chengyuan Deng, Gary Hoppenworth, Jie Gao, Jalaj Upadhyay, and Chen Wang. Discrepancy of Shortest Path. In ICALP, 2024.
Jingcheng Liu, Jalaj Upadhyay, and Zongui Zou. Optimal Bounds on Private Graph Approximation. In SODA 2024
Chengyuan Deng, Jie Gao, Jalaj Upadhyay, and Chen Wang. Differentially Private Range Query on Shortest Paths. In WADS, 2023.

Collaborative Business Process Composition and Optimization. We have been working on devising efficient ways and means for organizations to optimize allocation of global resources while preserving the privacy of local information. As part of this, We developed efficient approaches for distributed linear programming where constraints are arbitrarily partitioned and every agent privately holds a set of variables. We further identified inference problems in existing work on horizontally partitioned linear programs and proposed an inference proof approach. Recently, We have developed solutions for collaboratively developing business processes, and also designed, implemented, and evaluated a semantics based service mapping approach to resolve heterogeneity in business process composition.

Afzal, A., Shafiq, B., Shamail, S., Elahraf, A., Vaidya, J., & Adam, N. R. (2018). Assemble: Attribute, structure and semantics based service mapping approach for collaborative business process development. IEEE Transactions on Services Computing.
Irshad, H., Shafiq, B., Vaidya, J., Bashir, M. A., Shamail, S., & Adam, N. (2015). Preserving Privacy in Collaborative Business Process Composition. In Proceedings of the 12th International Conference on Security and Cryptography (SECRYPT), July 20-22, 2015. Best Paper Award.
Hong, Y., & Vaidya, J. (2014). An inference–proof approach to privacy-preserving horizontally partitioned linear programs. Optimization Letters, 8(1), 267-277.
Hong, Y., Vaidya, J., Lu, H., (2012). Secure and Efficient Distributed Linear Programming, Journal of Computer Security, 20(5), pp. 583-634.

Research Projects and Support

Authenticated Machine Learning (CISCO Research)

The goal of this project is to build tools and techniques for authenticated machine learning. Specifically, it aims to build mechanisms and tools that can provide provenance in all aspects of the workflow, starting from the different data exploration and cleaning steps, the model building and parameter tuning process, followed by the validation of results, and make each and every step auditable.

Robust Decision-Making in Changing Games (ARO)

The goal of this project on games between defender and adversary is to be robust when the parameters of the game change or are uncertain, such as uncertain game payoffs and newly acquired attacker actions.

Generation of Machine-Enforceable Security Policies from Natural Language Text (Cisco Research)

The goal of this project is to extract attributed based access control policies from natural language security policies. This leverages the power of large language models and generates code in a readly implemetable format.

Workshop: Establishing the Vision and Creating a Roadmap for Security, Privacy and Ethics Research in Healthcare (NSF)

The goal of this project is to organize a workshop that establishes the vision and creates a roadmap for security, privacy, and ethics research in Healthcare.

Developing Novel Technologies That Ensure Privacy And Security In Biomedical Data Science Research (NIGMS)

This is an R35 MIRA Outstanding Investigator Award from NIGMS. The overall objective of this program of research is to develop complementary solutions for risk inference, distributed learning, and access control that can enable different modalities of data sharing. The problems studied are general in nature and will evolve depending on research successes and new impediments that arise. The project will result in open-source, freely available software tools that will be integrated into widely used data collection, cohort identification, and distributed analytics platforms.

RAPID: Privacy-Preserving Crowdsensing of COVID-19 and its Sociological and Epidemiological Implications (NSF)

The goal of this study is to develop an infrastructure and platform to collect symptomatic data from the population and distill it into aggregate information to provide insight to both users and policymakers while protecting privacy. The project also aims to gain a broader understanding of privacy and decision making in extreme situations and learn how humans value their privacy and the choices they make in such situations.

(Recent) Completed Research Support

TWC SBE: Medium: Collaborative: Building a Privacy-Preserving Social Networking Platform from a Technological and Sociological Perspective (NSF)

The goal of this study is to develop a privacy-preserving social network (Trusted-Space) where user data are protected from the social network itself, other social network users, and advertisers. The project synthesizes solutions from a technological and sociological perspective to ensure that all of the required functionalities for both users and advertisers to participate effectively in the social network are available.

Secure And Private Collaborative Environments (Spaces) For Biomedical Analytics (NIGMS)

The goal of this study is to facilitate biomedical research in collaborative environments by developing technologies for secure and privacy-preserving exploratory analysis. The project develops technologies that enable exploratory analysis of data to determine its usability and relevance to specific biomedical research tasks, as well as technologies to enable the measurement and mitigation of additional privacy/security risk due to accessing this data. The developed tools are being integrated into REDCap, a data collection and management system used widely for providing translational research informatics support.