Title: Log-optimal Investment in Markovian Environments
Address: Budapest, Morgan Stanley Quantitative and Financial Mathematics Conference, 20-21 October, 2005
Joint work with: Antos András László Gerencsér, Remi Munos, Miklós Rásonyi Zsuzsanna Vágó
Downloads: handouts, slides
Title: Results on Fitted Value Iteration
Address: U. Alberta, RLAI Papers and Presentations, October 11, 2005
Joint work with: Antos András Remi Munos
Abstract: For MDPs where the state-space is large or infinite, value iteration that works by keeping a value for each state is not tractable. A straightforward idea then is to project the iterates into a selected function space. This basic idea is the cornerstore of fitted value iteration algorithms. In this talk we will consider several variants of such iterative procedures and will look into the conditions when they can be proven to be useful in approximating optimal policies. In addition to looking at previous results where special function approximators, averagers are used, we will discuss some recent results where fitting is done as in regression problems (minimizing mean-square error) and the function space is less restricted. We will explore the connections of this procedure to Monte-Carlo integration, function approximation and regression. We shall also discuss at some length the issue of how to best use a given fixed number of samples, where some open questions will be raised.
The associated paper is here.
Downloads: handouts, slides
Title: Universal parameter optimisation in games -- SPSA and related algorithms
Address: U. Alberta, Games Group Presentation, U. Alberta, October 12, 2005
Joint work with: Levente Kocsis Abstract: We will consider issues related to optimisation of parameters of game-programs using gradient-methods. We start by mapping this problem into the domain of simulation optimisation. We briefly review the framework and the relevant classical gradient estimation procedures (the likelihood-ratio method and infinitesimal perturbation analysis), as well as the more recent gradient-free, simultaneous perturbation stochastic approximation (SPSA) method. We compare the relative merits of these algorithms, focussing on their relevance to game parameter optimisation. Rather intriguingly we find that there exist cases when the gradient-free method (SPSA) is more efficient than the likelihood-ratio method that needs the knowledge of gradient. We continue by describing methods that can help improving the performance of SPSA. The proposed methods were tested in tuning the parameters of our Omaha Hi-Lo poker program, McRaise and the LOA-program MIA. We present the results of these experiments and discuss several possibilities for future research.
The associated paper can be found here. Downloads: handouts, slides