Multi-Armed Bandits: Combinatorial/Submodular/Gaussian:

Discrete Submodular Bandits:

We investigate the problem of stochastic, combinatorial multi-armed bandits where the learner only has access to bandit feedback and the reward function can be non-linear. We provide a general framework for adapting discrete offline approximation algorithms into sub-linear regret methods that only require bandit feedback. The framework only requires the offline algorithms to be robust to small errors in function evaluation. The adaptation procedure does not even require explicit knowledge of the offline approximation algorithm -- the offline algorithm can be used as black box subroutine. To demonstrate the utility of the proposed framework, the proposed framework is applied to multiple problems in submodular maximization, adapting approximation algorithms for cardinality and for knapsack constraints, and non-submodular maximization. Further, the impact of delayed composite anonymous feedback on submodular bandits with bandit feedback is studied and it is demonstrated that there is an additive impact of the delay in the achievable regret bound.

Mohammad Pedramfar and Vaneet Aggarwal, "Stochastic Submodular Bandits with Delayed Composite Anonymous Bandit Feedback," Submitted
Fares Fourati, Christopher John Quinn, Mohamed-Slim Alouini, and Vaneet Aggarwal, "Combinatorial Stochastic-Greedy Bandit," Sixteenth European Workshop on Reinforcement Learning, Sept 2023.
Guanyu Nie, Yididiya Y Nadew, Yanhui Zhu, Vaneet Aggarwal, and Christopher John Quinn, "A Framework for Adapting Offline Algorithms to Solve Combinatorial Multi-Armed Bandit Problems with Bandit Feedback," in Proc. ICML, Jul 2023.
Fares Fourati, Vaneet Aggarwal, Christopher Quinn, and Mohamed-Slim Alouini, "Randomized Greedy Learning for Non-monotone Stochastic Submodular Maximization Under Full-bandit Feedback," in Proc. AISTATS, Apr 2023.
Guanyu Nie, Mridul Agarwal, Abhishek Kumar Umrawal, Vaneet Aggarwal, Christopher John Quinn, "An Explore-then-Commit Algorithm for Submodular Maximization Under Full-bandit Feedback," in Proc. UAI, Aug 2022.

Continuous Submodular Bandits:

The discrete sub modular bandits assume that the choice of each arm is binary. Continuous sub-modular bandits consider N actions, each in a continuous range, constrained by a convex set. We present a unified approach for maximizing continuous DR-submodular functions that encompasses a range of settings and oracle access types. Our approach includes a Frank-Wolfe type offline algorithm for both monotone and non-monotone functions, with different restrictions on the general convex set. We consider settings where the oracle provides access to either the gradient of the function or only the function value, and where the oracle access is either deterministic or stochastic. We are working on both stochastic and adversarial setups.

Mohammad Pedramfar, Christopher John Quinn, and Vaneet Aggarwal, "A Unified Approach for Maximizing Continuous DR-submodular Functions," in Proc. Neurips, Dec 2023.

Combinatorial Bandits:

We consider the bandit problem of selecting K out of N arms at each time step. The reward can be a nonlinear function of the rewards of the selected individual arms. The direct use of a multi-armed bandit algorithm requires choosing among N-choose-K options, making the action space large. To simplify the problem, existing works on combinatorial bandits typically assume feedback as a linear function of individual rewards. In this paper, we prove the lower bound for top-K subset selection with bandit feedback with possibly correlated rewards. We present novel algorithms for the combinatorial setting without using individual arm feedback or requiring linearity of the reward function. Additionally, our algorithm works on correlated rewards of individual arms. These works assume that good arms generate good actions, which are valid in multiple setups. Even though the guarantees need this assumption, we also validate the results on social influence maximization where the assumption may not hold.

Mridul Agarwal, Vaneet Aggarwal, Christopher J. Quinn, and Abhishek Umrawal, "Stochastic Top K-Subset Bandits with Linear Space and Non-Linear Feedback with Applications to Social Influence Maximization," ACM/IMS Transactions on Data Science, vol. 2, issue 4, Article 38, Nov 2021.
Mridul Agarwal, Vaneet Aggarwal, Christopher J. Quinn, and Abhishek Umrawal, "DART: aDaptive Accept RejecT for non-linear top-K subset identification," in Proc. AAAI, Feb 2021 (21% acceptance rate, 1692/7911).
Mridul Agarwal, Vaneet Aggarwal, Christopher J. Quinn, and Abhishek Umrawal, "Stochastic Combinatorial Bandits with Linear Space and Non-Linear Feedback," in Proc. ALT, Mar 2021 (PMLR 132:306-339, 2021.) (29.3% acceptance rate, 46/157).

Gaussian Process Bandits:

Bayesian optimization is a framework for global search via maximum a posteriori updates rather than simulated annealing, and has gained prominence for decision-making under uncertainty. In this work, we cast Bayesian optimization as a multi-armed bandit problem, where the payoff function is sampled from a Gaussian process (GP). Further, we focus on action selections via upper confidence bound (UCB) or expected improvement (EI) due to their prevalent use in practice. We derive sublinear regret bounds of GP bandit algorithms up to factors depending on the compression parameter for both discrete and continuous action sets. Experimentally, we observe state of the art accuracy and complexity tradeoffs for GP bandit algorithms applied to global optimization, suggesting the merits of compressed GPs in bandit settings.

Amrit Singh Bedi, Dheeraj Peddireddy, Vaneet Aggarwal, and Alec Koppel, "Sublinear Regret and Belief Complexity in Gaussian Process Bandits via Information Thresholding," Accepted to IEEE Transactions on Artificial Intelligence, Sept 2023.
Amrit Singh Bedi, Dheeraj Peddireddy, Vaneet Aggarwal, and Alec Koppel, "Efficient Large-Scale Gaussian Process Bandits by Believing only Informative Actions," in Proc. L4DC, Jun 2020.

Home