Skip to main content

Security and Privacy for Biomedical Data Repositories. We have identified a dangerous inference attack against naive suppression based approaches that are used to protect sensitive information and demonstrated the attack against real data provided by the Healthcare Cost and Utilization Project, though the attack applies in general to any medical database providing a query capability, and does not require the matching of another dataset in order to produce breach of privacy. This work received the 2013 Clinical Research Informatics (CRI) Distinguished Paper Award. We have also worked on detecting inappropriate access to electronic healthcare records, and on performing query auditing. More recently, We have developed techniques to enable privacy-preserving linkage of patient records in different fields as well a privacy-preserving synthetic data generation methodology for exploratory analysis.

  1. Asif, Hafiz, Periklis A. Papakonstantinou, Stephanie Shiau, Vivek Singh, and Jaideep Vaidya. “Intelligent Pandemic Surveillance via Privacy-Preserving Crowdsensing.” IEEE Intelligent Systems 37, no. 4 (2022): 88-96.
  2. Chen, F., Jiang, X., Wang, S., Schilling, L. M., Meeker, D., Ong, T., Matheny, M., Doctor, J., Ohno-Machado, L., & Vaidya, J. (2018). Perfectly Secure and Efficient Two-Party Electronic-Health-Record Linkage. IEEE internet computing, 22(2), 32-41.
  3. Lazrig, I., Ong, T. C., Ray, I., Ray, I., Jiang, X., & Vaidya, J. (2018, August). Privacy Preserving Probabilistic Record Linkage Without Trusted Third Party. In 2018 16th Annual Conference on Privacy, Security and Trust (PST) (pp. 1-10). IEEE.
  4. Vaidya, J., Shafiq, B., Asani, M., Adam, N., Jiang, X., & Ohno-Machado, L. (2017). A Scalable Privacy-preserving Data Generation Methodology for Exploratory Analysis. In AMIA Annual Symposium Proceedings (Vol. 2017, p. 1695). American Medical Informatics Association.
  5. Vaidya, J., Shafiq, B., Jiang X., Ohno-Machado, L. (2013). Identifying Inference Attacks against Healthcare Data Repositories, AMIA Summit on Clinical Research Informatics (CRI). March 20-22, 2013.

 

Privacy-preserving data analysis. It is the research area where algorithms are proposed that do distributed knowledge discovery, while providing guarantees on the non-disclosure of data. We have proposed novel methods to efficiently analyze vertically partitioned data while preserving data privacy. We have published numerous papers proposing privacy-preserving solutions for all of the widely accepted data analysis tasks such as classification, clustering, association rule mining, top-k analysis, and outlier detection. This work has been published in the premier data mining and database conferences as well as top-tier journals, leading to a flurry of research, making this one of the most active areas of research in data mining. A novelty of our research is the toolkit approach, identifying sub-components that can be composed together to create a privacy-preserving method for any data mining problem, thus reducing the entire mass of problems to a few, key building blocks.

  1. Asif, Hafiz, Jaideep Vaidya, and Periklis Papakonstantinou. “Identifying Anomalies while Preserving Privacy.” IEEE Transactions on Knowledge and Data Engineering (2021).
  2. Asif H, Papakonstantinou PA, Vaidya J. A Guide for Private Outlier Analysis. IEEE Lett Comput Soc. 2020 Jan-Jun;3(1):29-33.
  3. Asif H, Papakonstantinou PA, Vaidya J. How to Accurately and Privately Identify Anomalies. Conf Comput Commun Secur. 2019 Nov;2019:719-736.
  4. Asif, H., Vaidya, J., Shafiq, B., & Adam, N. (2017, May). Secure and Efficient k-NN Queries. In IFIP International Conference on ICT Systems Security and Privacy Protection(pp. 155-170). Springer, Cham.
  5. Vaidya, J., Shafiq, B., Fan, W., Mehmood, D., & Lorenzi, D. (2014). A Random Decision Tree Framework for Privacy-preserving Data Mining. Dependable and Secure Computing, IEEE Transactions on, 11(5), 399-411.

 

Data Analytics and its Applications. We are primarily interested in the problem of effective summarization and visualization of data as well as the applications of data analytics and visualization to digital government and bioinformatics. We have developed several different data summarization techniques based on boolean matrix decomposition, including weighted rank one factorization and extended boolean matrix decomposition, and query clustering. These methods have the ability to extract interpretable data patterns. We are also collaborating with colleagues in the Rutgers New Jersey Medical School Cancer Center, Rutgers Biomedical and Health Sciences to use data analytics techniques for improving precision medicine in the areas of neurology and oncology. We have investigated the issue of enabling information sharing and interoperability among the different incident management applications and systems used by the various state, county, and local agencies within New Jersey and the neighboring states and contributed to the Newark City Government Twitter project enabling the Newark City Government to be aware of citizens interests, issues, and desires.

  1. Yaqub, U., Chun, S. A., Atluri, V., & Vaidya, J. (2017). Analysis of political discourse on twitter in the context of the 2016 US presidential elections. Government Information Quarterly34(4), 613-626.
  2. Hong, Y., Vaidya, J., Lu, H., & Liu, W. M. (2016, January). Accurate and efficient query clustering via top ranked search results. In Web Intelligence (Vol. 14, No. 2, pp. 119-138). IOS Press.
  3. Lorenzi, D., Chun, S. A., Vaidya, J., Shafiq, B., Atluri, V., & Adam, N. R. (2015). Peer: a framework for public engagement in emergency response. International Journal of E-Planning Research (IJEPR)4(3), 29-46.
  4. Vaidya, J., Yakut, I., & Basu, A. (2014). Efficient Integrity Verification for Outsourced Collaborative Filtering. In Proceedings of the IEEE International Conference on Data Mining (ICDM), December 14 – 17, 2014.

 

Access control configuration and analysis. Role Based Access Control (RBAC) and Attribute Based Access Control (ABAC) are the de facto models used for advanced access control. RBAC in particular has been widely deployed in healthcare organizations of all sizes. In recent years, we have developed bottom up and hybrid approaches that can enable automatic configuration of the RBAC/ABAC policy, which is the most important step in deploying and implementing advanced access control. We have also applied software engineering techniques such as model checking and program verification tools to perform security analysis, which is extremely important to ensure that inappropriate access does not occur.

  1. Akhtar A, Shafiq B, Vaidya J, Afzal A, Shamail S, Rana O. Blockchain Based Auditable Access Control for Distributed Business Processes. Proc Int Conf Distrib Comput Syst. 2020 Nov-Dec;2020:12-22. doi: 10.1109/ICDCS47774.2020.00015.
  2. Das, S., Shamik, S., Vaidya, J., & Atluri, V. (2018). HyPE: A Hybrid Approach toward Policy Engineering in Attribute-Based Access Control. IEEE Letters of the Computer Society.
  3. Jha, S., Sural, S., Atluri, V., & Vaidya, J. (2018). Security analysis of ABAC under an administrative model. IET Information Security.
  4. Uzun, E., Parlato, G., Atluri, V., Ferrara, A. L., Vaidya, J., Sural, S., & Lorenzi, D. (2017, July). Preventing unauthorized data flows. In IFIP Annual Conference on Data and Applications Security and Privacy(pp. 41-62). Springer, Cham.

 

Collaborative Business Process Composition and Optimization. We have been working on devising efficient ways and means for organizations to optimize allocation of global resources while preserving the privacy of local information. As part of this, We developed efficient approaches for distributed linear programming where constraints are arbitrarily partitioned and every agent privately holds a set of variables. We further identified inference problems in existing work on horizontally partitioned linear programs and proposed an inference proof approach. Recently, We have developed solutions for collaboratively developing business processes, and also designed, implemented, and evaluated a semantics based service mapping approach to resolve heterogeneity in business process composition.

  1. Afzal, A., Shafiq, B., Shamail, S., Elahraf, A., Vaidya, J., & Adam, N. R. (2018). Assemble: Attribute, structure and semantics based service mapping approach for collaborative business process development. IEEE Transactions on Services Computing.
  2. Irshad, H., Shafiq, B., Vaidya, J., Bashir, M. A., Shamail, S., & Adam, N. (2015). Preserving Privacy in Collaborative Business Process Composition. In Proceedings of the 12th International Conference on Security and Cryptography (SECRYPT), July 20-22, 2015. Best Paper Award.
  3. Hong, Y., & Vaidya, J. (2014). An inference–proof approach to privacy-preserving horizontally partitioned linear programs. Optimization Letters, 8(1), 267-277.
  4. Hong, Y., Vaidya, J., Lu, H., (2012). Secure and Efficient Distributed Linear Programming, Journal of Computer Security, 20(5), pp. 583-634.

 

Research Projects and Support

 

Authenticated Machine Learning (CISCO Research)

The goal of this project is to build tools and techniques for authenticated machine learning. Specifically, it aims to build mechanisms and tools that can provide provenance in all aspects of the workflow, starting from the different data exploration and cleaning steps, the model building and parameter tuning process, followed by the validation of results, and make each and every step auditable.

 

Workshop: Establishing the Vision and Creating a Roadmap for Security, Privacy and Ethics Research in Healthcare (NSF)

The goal of this project is to organize a workshop that establishes the vision and creates a roadmap for security, privacy, and ethics research in Healthcare.

 

Developing Novel Technologies That Ensure Privacy And Security In Biomedical Data Science Research (NIGMS)

This is an R35 MIRA Outstanding Investigator Award from NIGMS. The overall objective of this program of research is to develop complementary solutions for risk inference, distributed learning, and access control that can enable different modalities of data sharing. The problems studied are general in nature and will evolve depending on research successes and new impediments that arise. The project will result in open-source, freely available software tools that will be integrated into widely used data collection, cohort identification, and distributed analytics platforms.

 

RAPID: Privacy-Preserving Crowdsensing of COVID-19 and its Sociological and Epidemiological Implications (NSF)

The goal of this study is to develop an infrastructure and platform to collect symptomatic data from the population and distill it into aggregate information to provide insight to both users and policymakers while protecting privacy. The project also aims to gain a broader understanding of privacy and decision making in extreme situations and learn how humans value their privacy and the choices they make in such situations.

 

(Recent) Completed Research Support

TWC SBE: Medium: Collaborative: Building a Privacy-Preserving Social Networking Platform from a Technological and Sociological Perspective (NSF)

The goal of this study is to develop a privacy-preserving social network (Trusted-Space) where user data are protected from the social network itself, other social network users, and advertisers. The project synthesizes solutions from a technological and sociological perspective to ensure that all of the required functionalities for both users and advertisers to participate effectively in the social network are available.

 

Secure And Private Collaborative Environments (Spaces) For Biomedical Analytics (NIGMS)

The goal of this study is to facilitate biomedical research in collaborative environments by developing technologies for secure and privacy-preserving exploratory analysis. The project develops technologies that enable exploratory analysis of data to determine its usability and relevance to specific biomedical research tasks, as well as technologies to enable the measurement and mitigation of additional privacy/security risk due to accessing this data. The developed tools are being integrated into REDCap, a data collection and management system used widely for providing translational research informatics support.