[Talk #1] Introduction to discrete active inference

Martin Biehl (Araya Co.)

In this talk I will give an overview of the formal ingredients and assumptions of Friston’s active inference. Starting from a Markov chain I will clarify the relation of the “standard” or “internal” formulation of active inference to the “external” free energy principle associated to Markov blankets. The internal version can be seen as an approach to reinforcement learning for partially observable Markov decision problems (POMDPs) i.e. to POMDPs where the transition function is not known to the agent. The external version claims that a particular dependency structure (called a Markov blanket) between subsets of variables in a dynamical system always leads to some subset “appearing” to perform Bayesian inference on another.

I will then present in detail the internal active inference procedure. This procedure relies on a powerful but also computationally intractable graphical generative model. The model includes parameters for the transition function of the environment states and for the dependency of the sensor values on these environment states. The generative model then specifies a joint probability distribution over these parameters as well as all past and future actions, sensor values, and environment states. Conditioning on past taken actions, observed sensor values, and future actions results in an up-to-date posterior (we call it the active posterior) which represents what the model predicts about the consequences of future actions. A policy can then be derived by evaluating its consequences according to some given criteria. The standard criterion in active inference is the expected free energy but other choices are equally compatible with the overall approach. For example, in the reinforcement learning setting it is the expected sum over a particular sensor value called the reward. Obtaining the active posterior and choosing the optimal future action exactly are generally intractable problems. Active inference therefore proposes to turn both of these problems into a single optimization problem. This is achieved by by introducing two new objects. The first is a variational active posterior to approximate the true active posterior. There is then an optimal policy associated to this variational active posterior, the variational optimal policy. The second object is a policy to approximate this variational optimal policy. The single optimization procedure then minimizes the sum of the divergence between the variational active posterior and the true active posterior and the divergence between the approximate policy and the optimal variational policy.

[Talk #3] 知覚から思考までの基本機構を明らかにする自由エネルギー原理

乾 敏郎(追手門学院大学)

自由エネルギー原理は脳のさまざまな機能を説明できる重要な理論である.特に事後信念と呼ばれる主観を扱う理論として注目される.自由エネルギー原理という名前はFriston et al. (2006)で初めて登場する.2009年になると,自由エネルギー原理は大きく発展する.それは能動的推論(active inference)という考え方を導入したことによる(Friston, 2009).これにより,知覚以外のさまざまな機能を説明してみせる理論となった.この理論では基本的に脳があらゆる感覚の予測器であると考える.そこには運動指令というものもなく,運動もまた感覚の予測(期待)と考えるのである.基本的に「ベイズ脳(Bayesian brain)」のアプローチである.この予測は入力された感覚信号に基づき,外環境や内環境(内臓などの内受容系)の状態を推論する.これは基本的にはベイズ推定であり,学習を通じてベイズ最適な(Bayes-optimal)推論を行っていると考える.本講演では,知覚,運動,感情,意思決定,思考などの機能がいかにして,一つの式から導出されるさまざまな関係式で説明されるのかを紹介する.

[Talk #4] Biological plausibility of variational free energy as a cost function for neural networks

Takuya Isomura (RIKEN Center for Brain Science)

This presentation comprises two parts. In the first part, I will review the free-energy principle proposed by Karl Friston. This theory aims at explaining various functions and behaviors of neural networks and biological organisms in terms of minimization of variational free energy, as a proxy for surprise. Variational free energy minimization provides a unified mathematical formulation of inference and learning processes in terms of self-organizing neural networks that function as Bayes optimal encoders. Moreover, biological organisms can use the same cost function to control their surrounding environment by sampling predicted (i.e., preferred) inputs, known as active inference. The free-energy principle suggests that active inference and learning are mediated by changes in neural activity, synaptic strengths, and the behavior of an organism to minimize variational free energy. I will mathematically describe how neural and synaptic update rules are derived from variational free energy minimization.

In the second part, I will introduce our recent work. We consider a class of biologically plausible cost functions for neural networks, where the same cost function is minimized by both neural activity and plasticity. We analytical show that such cost functions can be cast as variational free energy under an implicit generative model. Our results suggest that any neural network minimizing its cost function implicitly minimizes variational free energy, indicating that variational free energy minimization is an apt explanation for a canonical neural network.

[Talk #5] in-vivo imaging of the telencephalic neural activities in the closed-loop virtual reality environment revealed active inference in decision making

Hitoshi Okamoto (Lab. for Neural Circuit Dynamic of Decision Making, RIKEN Center for Brain Science)

Selecting a logical behavioral choice from the available options, i.e. decision making is essential for animals. We aimed at directly addressing this process by establishing the closed-loop virtual reality system for the head-tethered adult zebrafish with the 2-photon calcium imaging system. The adult zebrafish harboring G-CaMP7 in the excitatory neurons were trained to perform visual-based active avoidance tasks and simultaneously the neural activities were imaged at the cellular level. Furthermore, after learning was once established in the closed-loop condition, we suddenly removed the visual feed-back to make the system open-loop. The Non-negative Matrix Factorization analysis revealed the one ensemble of neurons whose activities were suppressed by the recognized backward movement of the landscape, and the other ensemble suppressed by reaching the goal compartment. These ensembles recovered throughout the trials under the open-loop condition. These results suggest that these two ensembles encode the prediction errors between the status represented by the real sensory inputs and the favorable predicted status to successfully escape from the danger, and the behaviors are taken so that these errors become minimum. Our result supports that the adult zebrafish behaves in decision making based on the active inference in the free energy principle, where agents take actions to suppress the prediction errors by trying to make the internal representation of the bottom-up sensory states match those of the top-down predictions, and demonstrate the strong conservation of the basic principle of decision making throughout the evolution.

[Talk #6] 逆強化学習による動物行動戦略の解読

本田直樹 (京都大学 生命科学研究科 生命動態研究センター)


一定の温度で培養した線虫は、その培養温度を記憶し、温度勾配下では培養温度を目指して移動し、逆に一定の温度で餌のない飢餓状態を経験した線虫は、温度勾配下で飢餓温度を避けることが知られている。しかし、線虫がどのような戦略にしたがって行動して いるのかは全くの不明であった。線虫を温度勾配において追尾することで、行動時系列データを取得し、 そして逆強化学習法により、線虫にとっての報酬を推定した。 その結果、餌が十分ある状態で育った線虫は、「絶対温度」および 「温度の時間微分」に応じて報酬を感じていることを明らかにした。この報酬に基づく戦略は2つの異なるモードから構成されており、一つ は効率的に成育温度に向かう移動、もう一つは同じ温度の等温線に沿った移動を説明するものであった。また飢餓を経験した線虫は 「絶対温度」のみに依存した報酬により、飢餓温度を避ける戦略を持っていることを明らかにした。さらには、推定された報酬を用いて線虫行動をシミュレーションすると温度走性行動が再現され、逆強化学習法の妥当性が示された。


[Talk #7] 再認の段階的創発を時間構造の変形から考える




[Talk #8] 学習と認識の熱力学:ニューラルエンジンとはなにか?