Projects – Jie Gao

Current Projects

Promoting Social Learning Amid Interference in the Age of Social Media

Information acquisition is embedded in a social setting. This distorts – or at least changes – the incentives individuals face when they are uncertain about the truth and communicate with others. Social learning, an increasingly impactful topic in the computer science/economics literature, formally studies when and how dispersed and self-interested agents aggregate information. A potential, but unrealized, goal of the social-learning literature is to enable the building of socio-computational systems that promote social learning. A growing volume of literature in social media and computational social science is deeply concerned that, at present, incentives are not aligned with truth-seeking/truth-telling and that discussion is becoming increasingly polarized. This leads to an acrimonious public discourse rife with conflicting information and theories, where the truth is hard to locate. Building on and using theoretical computer science techniques, this project adds to the fundamental understanding of how societies learn.

The social learning system itself, with given parameters, can be seen as a computational process. This project considers two interesting perspectives in this family of problems that involve computational complexity and algorithm design: 1) the computational complexity required for agents to best respond or to determine the properties of different systems; 2) considering social learning as a complex system where the models of social interactions, input signals, and self-regulating/evolving nature can be viewed as constraints, and the design parameters can be optimized to encourage social learning towards truth discovery. This work includes the analysis of models with relevant first-order features to learn which conditions are sufficient and necessary for crowds to quickly and reliably converge on the truth in both the sequential social learning and social learning with repeated updating settings. In addition, the project includes design of algorithms and insights to optimize certain parameters, corresponding to platform design choices, to promote fast and robust social learning in each of these settings. A key feature is augmenting the social-learning literature to explicitly consider agents’ social embeddedness including their mixed incentives and the reality of polarized environments. Additionally, with carefully crafted empirical research, the project develops models for learning more complex truths amid social pressure.

This is a joint project with Univ. of Michigan and Stony Brook University and is currently supported by NSF CCF-2208663, 2022-2025.

Modeling Human Brain Development as a Dynamic Multi-Scale Network Optimization Process

Over a period of almost two decades (from birth to young adulthood), the human brain undergoes profound changes, driven by genetic, environmental and experiential factors. These changes are part of a maturation process that leads to optimally organized neural circuits that support complex behaviors and cognitive processes, and facilitate learning across the lifespan. Fundamental questions remain about how developing brain circuits become optimally organized. Specifically, the underlying biophysical mechanisms — the interval drivers of this process are incompletely understood at the macroscale of the human brain. This is in part due to the complexity of some developmental periods, such as adolescence, during which a constellation of endogenous and exogenous factors contribute to an avalanche of partially unique physiological changes that are difficult to track. Using neuroimaging data collected over years of development from almost 12,000 adolescents, advanced computational tools and engineering principles, the overarching goal of this project is to understand how internal mechanisms in the brain control its functional circuits to optimally support cognitive function. Research activities aim to quantify these mechanisms and their inherent changes, as the brain becomes increasingly optimally connected with age, and to map these changes onto fundamental aspects of cognitive processing.

This research aims to transform mechanistic understanding of the optimization of human brain circuits during the uniquely complex developmental period of adolescence. For this purpose, it will integrate a historically large, longitudinal neuroimaging dataset with novel tools from network science and computer science, and principles of control theory. The primary hypothesis is that the brain’s topological optimization is partly driven by an internal control process, which has a quantifiable, age-varying impact on network topology and dynamics. Thus, neural maturation leads to parsimonious network topologies that maximize efficiency of information processing but also optimal network controllability, both of which are reflected on the efficiency and flexibility of cognitive processing. Findings from this project may have a transformative impact on the understanding of mechanistic principles underlying the emergence of the adult brain circuitry, and the impact of adolescence on its development. They may also provide critical insights towards the development of targeted therapies for improving cognitive outcomes in the diseased or atypically developing brain. Given cross-disciplinary and highly computational activities, this project also involves significant tool development for use by the neuroscience research community.

This is a joint project with Harvard Medical School, University of Minnesota and is currently supported by NSF IIS-2207440, 2022-2026.

Principles and Infrastructure of Extreme Scale Edge Learning for Computational Screening and Surveillance for Health Care

This project investigates a completely new cross-disciplinary concept of “Computational Screening and Surveillance (CSS)” that utilizes edge learning to detect early indicators of diseases, and monitor health changes in both individuals and populations. CSS analyzes and interprets continuous and heterogeneous physical and physiologic sensing-data streams of human subjects to produce real-time information, knowledge, and insights about their health status. The project’s novelty is a data-driven paradigm that revolutionizes the understanding, prediction, intervention, treatment, and management of acute/infectious, chronic physical and psychological diseases. The project’s impacts are enormous social and economic benefits to individuals, organizations, and the healthcare system: early detection, preemptive intervention and management can lead to greatly improved quality of care, and huge savings for multiple diseases each costing hundreds of billions of dollars every year.

The investigators design, develop and evaluate principles and solutions for CSS enabled by extreme-scale edge learning spanning four dimensions: data modalities, health conditions and data patterns, Artificial Intelligence/Machine Learning (AI/ML) algorithms and models, and individuals/populations. The design is guided by four principles: exploit scale and heterogeneity, design for uncertainty, privacy as a first-class citizen, and faults and attacks as a norm. The investigators will 1) design AI/ML algorithms for learning data patterns and correlations for diverse health conditions in both individuals and populations at extreme scales; 2) quantify theoretical bounds on the tradeoffs between security, privacy protection, and learning accuracy in order to protect against various attacks on data and models at both the edge and cloud; 3) develop programming abstractions for automated exploration of competing AI/ML methods under uncertainty, and system mechanisms to protect stream processing integrity against sensitive data disclosure and faulty/malicious analytics; and 4) devise neural architectures and accelerators for computation efficiency at the constrained edge, data efficiency using limited training sets, and human efficiency utilizing AutoML.

This is a joint project with Stony Brook University, MIT, Pennsylvania State University and is currently supported by NSF CCF-2118953, 2021-2026.

Past Projects

From Brains to Society: Neural Underpinnings of Collective Behaviors Via Massive Data and Experiments

Despite thousands of investigations on the neural basis of individual behaviors and even more studies on collective behaviors, a clear bridge between the organization of individual brains and their combinational impact on group behaviors, such as cooperation and conflict and ultimately collective action, is lacking. To address the grand challenge of inferring group cooperation from the functional neuroarchitecture of individual brains, this project will harness advances in data, experiment and computation. Specifically, it will integrate, for the first time, existing large-scale human functional neuroimaging data, prospectively collected individual and group behavioral data from a large cohort, with cutting-edge machine learning tools, hierarchical models and large-scale simulations. This is a collaborative effort between a team of neuroscientists, social scientists and data scientists, that aims to elucidate the neural basis of cooperation, a fundamental process in a functioning society and at the core of social environments.

This was a joint project with Harvard Medical School, MIT, University of Massachusetts at Amherst, University of Minnesota and University of Chicago and is currently supported by NSF OAC-1939459, 2019-2021.

Theory and Algorithms for Discrete Curvatures on Network Data

The new developments in technologies of embedded systems, sensors and wireless communications provide great potentials to improve the safety and security of the physical and social environment we live in, when we face unfortunate accidents and emergency events as well as malicious attacks as in crimes and terrorism. Our focus in this project is on developing mathematical tools and algorithms based on discrete curvatures for understanding and detecting community structures and anomalies in networks that can be of crucial value in many applications. We are interested in understanding high level patterns, community structures, and anomalies using geometric tools. The mathematical tools to be developed will be useful for many other networks like social or biological networks (e.g, protein-protein interaction networks).

This was a joint project with Prof David Gu (Stony Brook), and Prof Feng Luo from Rutgers University, 2017-2020, supported by NSF DMS-1737812.

Social Contagions and Influence

Information, beliefs, diseases, technologies, and behaviors propagate through social interactions as a contagion. Understanding of how these contagions spread is crucial in encouraging beneficial and healthy behaviors and discouraging the ones that are destructive and damaging. Rigorous, mathematical understanding of complex social contagions is not just an abstraction, but will guide applications from healthcare to word-of-mouth advertising. The technical content of this project is inherently interdisciplinary, and its lessons will apply to related fields such as probability, economics, sociology, and statistical physics. The research efforts are integrated with the educational and outreach activities of the PIs, who have strong records of broadly disseminating cutting-edge research to high school, undergraduate, and graduate students through teaching, outreach programs, and personal mentoring.

This project will transform our understanding of social contagions by: 1) Developing a suite of technical tools to enable improved understanding of specific complex processes; 2) Determining how various parameters of cascade and social structure together impact the chances of a cascade’s success or failure; and 3) Obtaining empirical evidence to both corroborate the theoretical findings, and uncover the space of realistic setting for certain parameters.

Many existing models of contagion assume that increasing the number of infected (or affected) neighbors marginally decreases the chance of infection. Many contagions, such as adoption of expensive new technology, fail to have this property, but instead have more complex rules for infection. This leads to different spreading behaviors even on the same networks. Motivated by sociology research findings, this project will greatly enhance our understanding of social contagions in three aspects. First this project will provide rigorous study of the spreading behavior of a simplified theoretical model called k-complex contagions and its interactions with structures in the underlying graph such as tie strength, unusually influential nodes, and community structures. Second, this project presents a general model for studying cascades that is both theoretically tractable and practically motivated. The general model generalizes most previous theoretical models of complex and simple contagions and includes homophily and environmental factors on cascades. Finally, this project will use post-hoc analysis as well as real world social experiments to verify the veracity of the model and fit the parameters in different settings.

This (2015-2018) was supported by NSF. Check out news coverage here.

Geometric and Topological Analysis for Trajectory Privacy

The maturing of mobile devices and systems provide an unprecedented opportunity to collect a large amount of data about real world human motion at all scales. The rich knowledge contained in these data sets can have a huge impact in many fields ranging from transportation to health care, from civil engineering to energy management, from e-commerce to social networking. While the applications are paradigm-transforming, recent studies show that the trajectory data can raise serious privacy concerns in revealing personally sensitive information such as frequently visited locations or social ties. These concerns become the major hurdle in utilizing these data sets. This project systematically studies the issue of anonymizing trajectory data, from the bottom layer of trajectory sensing and data collection, to the middle layer of trajectory representation and anonymity, to the application layer of how the anonymized trajectory data can be used.

By the nature of trajectories as being time stamped sequence of points, in this project novel geometric and topological algorithms that directly work on distributed sensors collecting the trajectories are developed for achieving the objective. Queries to such decentralized sensors are made to ensure no sensitive information is released. The intellectual contribution lies in the following aspects. 1) The topological representation of trajectories, i.e., how trajectories pass around obstacles and landmarks in the domain is adopted. The topological representation is compact and descriptive, introducing novel discrete and combinatorial problems to study. 2) A novel framework is developed for distributed sensors to directly learn, classify and compare the topological types of the target trajectories, using harmonic one-forms and Hodge decomposition from algebraic topology. The new framework can substantially reduce the communication cost within the network, while maintaining the requirement of user privacy from the very beginning of sensing and data collection. 3) A family of anonymization algorithms using different ideas are developed, by altering the way to connect the time-stamped points into trajectories, by adjusting the topological resolution to reach a balance between data anonymity and utility, and by sensing and recording randomized hash data to answer popular trajectory queries. 4) The trajectory data sets are often huge, so algorithms for handling large scale trajectory data sets are developed in both centralized and decentralized settings.

This project (2015-2019) was supported by NSF.

Geometric Analysis of Computer and Social Networks

This project is to develop theoretic foundations and practical algorithms for geometric analysis of massive, weighted graphs arising from computer networking applications. The main challenge of understanding large scale computer networking is the decentralized management and operations on the network. How the global behaviors (for example, congestion) emerge from decentralized, local operations (routing) is still a mystery. Differential geometry studies the connection of local structures (such as curvatures) and global properties (such as topology and geodesics). Geometric analysis theorems often naturally lead to distributed algorithms that achieve global objectives, which is ideal in networking applications.

This project generalizes classical geometric analysis methods to massive graphs, the theoretic exploration focuses on the curvatures on graphs and the heat kernel estimates, relation between optimal transportation on graphs and curvatures, Ricci flow on graphs, and homology/cohomology/homotopy groups on directed graphs. The theoretic results will be applied for studying fundamental problems in computer networks, including: 1) Geodesics and network congestion: which aims to understand the connection of network congestion (e.g., on the Internet) with network curvature, and try to apply Ricci flow to alleviate network congestion by modifying local curvature; 2) Graph embedding and efficient routing: which investigates how to find an embedding of the network in geometric space in order to support greedy routing; and 3) Resource allocation in wireless networks: which applies optimal transport theory to the problem of capacitated base station allocation. The research results will be useful for applications in a broad range of fields, from pure mathematics research to theoretic physics, from telecommunication in engineering to brain imaging in medicine.

This was supported by NSF and AFSOR.

Non-Isotropic Networked Sensor Deployment for Smart Buildings

The objective of this proposal is to thoroughly investigate the deployment of non-isotropic sensors for smart building applications. Various sensing and control applications in smart building require accurate and efficient localization and tracking of human and activities. To achieve this goal, the non-isotropic networked sensor deployment is essential.

This proposal addresses a new set of problems arising from the deployment of four types of sensors: cameras, tripwire, motion sensors and microphones. The set of deployment problems would strive for full sensor coverage and wireless connectivity with a complex floor plan, and involve one or more of the following constraints: (i) Non-isotropic model using visibility (cameras and tripwires), (ii) Non-overlapping sensing range, (iii) Robust 2-coverage for accurate localization. The sensor deployment problem will heavily involve the geometric properties of the environment (building layout), as well as the geometric properties of the sensors (non-isotropic sensing ranges). Ideas from computational geometry will be applied to tackle these problems and approximation algorithms with worst-case guarantee for theoretical rigor as well as practical algorithms suitable for implementations will be developed.

A clear understanding on how to deploy sensors for smart building systems will allows for efficient planning and effective practice of networked sensor deployment to achieve the high quality sensing desired by various indoor applications. For developers, this work will significantly reduce the design and development costs for designing and building smart building systems. For end users, the resultant testbed will lead to exciting applications that will significantly improve the quality of life. It also enriches engineering curriculum to integrate the theory of computational geometry with the system building practice. The PIs are committed in improving female presence in computer science and exposure of research for high school students.

This project was carried out from 2012-2016, supported by NSF.

Algorithmic Aspects of Geometry for Using LIDAR and Wireless Sensor Networks for Combating Chemical Terror Attacks

The project considers using a variety of sensors to monitor an unknown signal field over a spatial domain with unknown geometry, which may bear elevational variations. The main objective of this project is to develop mathematical algorithms that can integrate sensor readings from a large number of distributed devices and reconstruct both the signal field and the terrain. Specifically, the sensors involved in the project include the Light Detection And Ranging technology (LIDAR) and networked wireless sensors, for their complementary capabilities. LIDAR uses a constant number of powerful sensors, each takes an image from a distance. On the other hand, distributed wireless sensors use a large number of inexpensive sensors (possibly piggybacking on cellular phones), each takes a density reading at its position. The approach in this project is to use conformal and hyperbolic geometry, discrete curvature flows, topology and numerical analysis to develop novel algorithms to generate accurate density map. From the networking perspective, this project addresses the problems of LIDAR image fusion, network localization, sensor deployment, and integration of LIDAR and sensor data.

The project is motivated by the potentially devastating chemical terror attacks. Biological and chemical agents, used by adversaries, could potentially spread to a large region in a short time. This project focuses on development of algorithmic tools and techniques for gathering timely, accurate and useful information about the chemical or biological spread in case of such an attack. By exploiting the geometric properties of LIDAR data and sensor network data, the project provides critical observations on fundamental questions of how to organize such a large scale network; how to manage sensor data; and how to use the network for acting on the environment and the users.

This project was carried out from 2012-2015, supported by NSF.

Algorithmic Foundations for Joint Information Processing and Optimization in a Hybrid Mobile Sensor Network

The objective of this project is to develop efficient distributed algorithms for joint information processing, optimization and coordination of static, pervasive sensor nodes and mobile, autonomous agents that altogether monitor and act on the environment. The application scenarios include tracking, searching and planning mobile agents with the underlying sensor network as a supporting communication infrastructure, and online resource management and allocation guided by real-time sensor detections.

This project focuses on three research problems: distributed min-cost bipartite matching, kinetic minimum Steiner tree, and facility location for mobile nodes, by using two non-trivial technical approaches, namely, embedding of the network metric into tree metrics, and distributed primal dual framework. There are two central intellectual questions investigated in this project. 1) How to make use of the sensor data to best serve user requests? This involves developing communication efficient schemes for sensors detecting interesting local events and the mobile users seeking such information to find each other. 2) How to best make use of the continuity and coherence in mobility, either in the presence of mobility enabled sensor data (e.g., detections of continuously moving targets), as well as the locations of mobile users? The project provides solutions that adapt to the system configuration with low update cost, avoiding drastic sudden changes or any level of reconstruction.

This project helps to extend the current Internet to the physical world, encouraging seamless integrations of sensing and control of the physical environment. The PI integrates the research agenda with new and existing curriculum development for both undergraduate and graduate education, and continues her efforts in improving female presence in computer science and exposure of research for high school students.

This project was carried out from 2011-2014, supported by NSF. See this page for publications from this project.

Analyzing Dynamics in Online Social Networks

Social groups often exhibit a high degree of dynamism. Some groups thrive, while many others die over time. Modeling group dynamics and understanding whether/when a group will remain stable or shrink over time can be important in a number of social domains. Specifically, we look at the following problems in this area. In the first problem, we build models to predict if and when an individual is going to quit his/her group, and whether this quitting event will inflict substantial damage to the group. We take the World of Warcraft game as an exemplar platform for studying this problem. Our proposed solution starts from in-game census data and extracts features from multiple perspectives such as individual-level, guild-level, game activity, and social interaction features. These features are then used to build predictors. Our study shows that destructive group dynamics can often be predicted with modest to high accuracy, and feature diversity is critical to prediction performance. A related problem that we tackle looks at churn/attrition within an organization and tries to model this churn. Our approach provides us with valuable insights into understanding attrition within an organization. Lastly, we extend our analysis to be able to predict group stability. We build models to predict if a group is going to remain stable or is likely to shrink over a period of time. Results indicate that both the level of member diversity and social activities are critical in maintaining the stability of groups. We also find that certain ’prolific’ members play a more important role in maintaining group stability.

Many eCommerce websites and consumer review websites provide both reviews of products and a social network structure among the reviewers. Users can review products they purchased as well as the reviews others wrote. Users can also rate each other as trusted or untrusted relationships. By studying a data set from Epinions, we examine and quantify how the trust/distrust relationships among the users influence their ratings of the reviews. We discover that the opinions of one’s friends has an influence on his/her ratings, while the opinions of one’s foes seem to have no influence. We also find that a user’s friends play a significant role in guiding his/her future relationships.

This is part of the DARPA project titled GLAD-PC: Graph Learning for Anomaly Detection using Psychological Context, ADAMS program under contract W911NF-11-C-0216, 06/2011-06/2013.

Large Scale Sensor Network Routing using Conformal Geometry

This project develops efficient routing schemes for large scale multi-hop wireless sensor networks with a complex shape. As sensor networks grow large in size, terrain features or non-uniform energy usage may leave the network with holes. Efficient routing becomes difficult as some knowledge of how to “get around” the holes is needed.

The novelty of this project is to use conformal geometry to compute a proper embedding of the network such that simple, distributed greedy routing schemes achieve desirable properties. Conformal geometry shows that any surface can be deformed to three canonical shapes: the sphere, the plane and the disk. Thus, one can “regulate” any sensor field shape to be of a canonical, simple form. The complexity of the routing problem and the domain specifics are encapsulated in the embedding such that routing decision becomes trivial. The conformal mapping of a sensor network is computed using Ricci curvature flow, an intrinsically distributed computation routine that can easily incorporate the dynamic changes of wireless links. This project focuses on conformal mapping and the companion greedy routing solutions that guarantee delivery, achieve good load balancing, facilitate in-network storage and data-centric routing, are resilient to network failures, and generalize to both 2D and 3D networks.

This project explores the unique cross-disciplinary area of wireless networking and differential geometry. Although the main thrust of this proposal is theoretical, it is expected that the project can also provide practical routing solutions for large scale sensor networks.

This project was carried out from 2010-2013, supported by NSF. See this page for publications from this project.

CAREER: Geometric Algorithms for Wireless Sensor Networks

Embedded networked sensing devices are becoming ubiquitous across many activities that are important to our economy and life. The physical locations of the sensor nodes greatly impact the system design in all aspects from low-level networking and organization to high-level information processing and applications.

This project takes a geometric approach to study algorithms in sensor networks and investigates a number of fundamental ideas about 1) how the geometric embedding of sensor networks influences how the network can operate? 2) how to exploit the geometric characteristics for efficient and scalable network design?

The project addresses a number of important architectural components where geometry has a fundamental influence, including localization, topology discovery, naming and routing, information discovery and brokerage, and mobility, in a harsh environment in which sensor networks operate:inaccuracy or unavailability in location information, limited resources such as bandwidth and energy, noisy or incomplete data, preferable distributed computation and high link or node failure rate. Expected results include the exploration of new models and abstractions, and novel geometric algorithms with provable performance guarantee in a probabilistic, dynamics and information incomplete world.

We expect that the exploration of sensor networks’ rich geometric properties will reveal key insights that enable the creation of large-scale sensor networks. In addition, this interdisciplinary project provides a bridge between communities of computational geometry and wireless networking – identifying important geometric problems for the computational geometry community as well as supplying efficient algorithmic solutions for the networking community.

This project was carried out from 2007-2010, supported by NSF.