Machine Learning: Algorithms and Guarantees:

vaneet_photo

Reinforcement Learning with Non-Linear Function of Multiple Objectives:

Reinforcement Learning algorithms such as DQN owe their success to Markov Decision Processes, and the fact that maximizing the sum of rewards allows using backward induction and reduce to the Bellman optimality equation. However, many real-world problems require optimization of an objective that is non-linear in cumulative rewards for which dynamic programming cannot be applied directly. For example, in a resource allocation problem, one of the objectives is to maximize long-term fairness among the users. We notice that when the function of the sum of rewards is considered, the problem loses its Markov nature. In this theme, we propose novel model-based and model-free algorithms with provable guarantees. As shown alongside, for a network traffic optimization problem, the proposed Actor-Critic (IFRL AC) and Policy Gradient (IFRL PG) approaches significantly outperform the standard approaches that do not explicitly account for non-linearity.


Constrained Reinforcement Learning:

Many engineering problems have constraints, e.g., power, bandwidth, amount of movement, etc. In the presence of constraints, the decision making is challenging since the optimal policy may not longer be deterministic. On this direction, we aim to come up with efficient algorithms with provable guarantees that can handle constraints in decision making.


Combinatorial Bandits:

Exploiting structure in the data across multiple dimensions can lead to efficient techniques for efficient sampling and fingerprinting. The figure alongside shows different network service attributes dependence on time, spatial locations and data service quality measurements thus motivating tensor structure. Further, the map shows the received signal strength in indoor has spatial dependence and the signals from different APs are correlated and thus adaptive sampling can reduce fingerprints needed for localization.


vaneet_photo

Federated and Fog Learning:

Federated learning has generated significant interest, with nearly all works focused on a "star" topology where nodes/devices are each connected to a central server. We migrate away from this architecture and extend it through the network dimension to the case where there are multiple layers of nodes between the end devices and the server. Specifically, we develop multi-stage hybrid federated learning (MH-FL), a hybrid of intra- and inter-layer model learning that considers the network as a multi-layer cluster-based structure, each layer of which consists of multiple device clusters. MH-FL considers the topology structures among the nodes in the clusters, including local networks formed via device-to-device (D2D) communications. It orchestrates the devices at different network layers in a collaborative/cooperative manner (i.e., using D2D interactions) to form local consensus on the model parameters, and combines it with multi-stage parameter relaying between layers of the tree-shaped hierarchy.


Parallel Reinforcement Learning:

We consider M parallel reinforcement learning agents, and wish to achieve the same regret as if they are fully collaborative, while communicating for only logarithmic number of rounds. Though seemingly impossible, we show that even with such limited communication, optimal regret bounds can be achieved. For the special case of bandits, we show that not only is the amount of communication rounds logarithmic in the time, the agents only need to share the best arm index thus achieving better privacy among agents.


Gaussian Process Bandits:

Bayesian optimization is a framework for global search via maximum a posteriori updates rather than simulated annealing, and has gained prominence for decision-making under uncertainty. In this work, we cast Bayesian optimization as a multi-armed bandit problem, where the payoff function is sampled from a Gaussian process (GP). Further, we focus on action selections via upper confidence bound (UCB) or expected improvement (EI) due to their prevalent use in practice. We derive sublinear regret bounds of GP bandit algorithms up to factors depending on the compression parameter for both discrete and continuous action sets. Experimentally, we observe state of the art accuracy and complexity tradeoffs for GP bandit algorithms applied to global optimization, suggesting the merits of compressed GPs in bandit settings.


Mean-Field Games:

We consider a multi-agent Markov strategic interaction over an infinite horizon where agents can be of multiple types. We model the strategic interaction as a mean-field game in the asymptotic limit when the number of agents of each type becomes infinite. Each agent has a private state; the state evolves depending on the distribution of the state of the agents of different types and the action of the agent. Each agent wants to maximize the discounted sum of rewards over the infinite horizon which depends on the state of the agent and the distribution of the state of the leaders and followers. We seek to characterize and compute a stationary multi-type Mean field equilibrium (MMFE) in the above game. We characterize the conditions under which a stationary MMFE exists. We also extend the problem to leader-follower setup with multiple leaders and multiple followers.


vaneet_photo

Tensor Networks:

Tensors are generalizations of vectors and matrices; a vector is a first-order tensor and a matrix is a second-order tensor. Most of the data around us are better represented with multiple orders to capture the correlations across different attributes. For example, a color image can be considered as a third-order tensor, two of the dimensions (rows and columns) being spatial, and the third being spectral (color), while a color video sequence can be considered as an order four tensor, time being the fourth dimension besides spatial and spectral. Similarly, a colored 3-D MRI image across time can be considered as an order five tensor. Exploiting additional structure leads to better embedding algorithms for subspace analysis and the elements needed for data completion (as shown alongside).

Home