Multi-Armed Bandits: Combinatorial/Submodular/Gaussian:


Discrete Submodular Bandits:

We investigate the problem of stochastic, combinatorial multi-armed bandits where the learner only has access to bandit feedback and the reward function can be non-linear. We provide a general framework for adapting discrete offline approximation algorithms into sub-linear regret methods that only require bandit feedback. The framework only requires the offline algorithms to be robust to small errors in function evaluation. The adaptation procedure does not even require explicit knowledge of the offline approximation algorithm -- the offline algorithm can be used as black box subroutine. To demonstrate the utility of the proposed framework, the proposed framework is applied to multiple problems in submodular maximization, adapting approximation algorithms for cardinality and for knapsack constraints, and non-submodular maximization. Further, the impact of delayed composite anonymous feedback on submodular bandits with bandit feedback is studied and it is demonstrated that there is an additive impact of the delay in the achievable regret bound.


Continuous Submodular Bandits:

The discrete sub modular bandits assume that the choice of each arm is binary. Continuous sub-modular bandits consider N actions, each in a continuous range, constrained by a convex set. We present a unified approach for maximizing continuous DR-submodular functions that encompasses a range of settings and oracle access types. Our approach includes a Frank-Wolfe type offline algorithm for both monotone and non-monotone functions, with different restrictions on the general convex set. We consider settings where the oracle provides access to either the gradient of the function or only the function value, and where the oracle access is either deterministic or stochastic. We are working on both stochastic and adversarial setups.


Combinatorial Bandits:

We consider the bandit problem of selecting K out of N arms at each time step. The reward can be a nonlinear function of the rewards of the selected individual arms. The direct use of a multi-armed bandit algorithm requires choosing among N-choose-K options, making the action space large. To simplify the problem, existing works on combinatorial bandits typically assume feedback as a linear function of individual rewards. In this paper, we prove the lower bound for top-K subset selection with bandit feedback with possibly correlated rewards. We present novel algorithms for the combinatorial setting without using individual arm feedback or requiring linearity of the reward function. Additionally, our algorithm works on correlated rewards of individual arms. These works assume that good arms generate good actions, which are valid in multiple setups. Even though the guarantees need this assumption, we also validate the results on social influence maximization where the assumption may not hold.


Gaussian Process Bandits:

Bayesian optimization is a framework for global search via maximum a posteriori updates rather than simulated annealing, and has gained prominence for decision-making under uncertainty. In this work, we cast Bayesian optimization as a multi-armed bandit problem, where the payoff function is sampled from a Gaussian process (GP). Further, we focus on action selections via upper confidence bound (UCB) or expected improvement (EI) due to their prevalent use in practice. We derive sublinear regret bounds of GP bandit algorithms up to factors depending on the compression parameter for both discrete and continuous action sets. Experimentally, we observe state of the art accuracy and complexity tradeoffs for GP bandit algorithms applied to global optimization, suggesting the merits of compressed GPs in bandit settings.

Home