Guide to Approximate Inference¶
Variational Inference¶
The API defines the set of algorithms and methods used to perform inference in a probabilistic model \(p(x,z,\theta)\) (where \(x\) are the observations, \(z\) the local hidden variables, and \(\theta\) the global parameters of the model). More precisely, the inference problem reduces to computing the posterior probability over the latent variables given a data sample, i.e., \(p(z,\theta | x_{train})\), because from these posteriors we can uncover the hidden structure in the data. Let us consider the following model:
@inf.probmodel
def pca(k,d):
w = inf.Normal(loc=tf.zeros([k,d]), scale=1, name="w") # shape = [k,d]
with inf.datamodel():
z = inf.Normal(tf.ones([k]),1, name="z") # shape = [N,k]
x = inf.Normal(z @ w , 1, name="x") # shape = [N,d]
In this model, the posterior over the local hidden variables, \(p(w_n|x_{train})\), encodes the latent vector representation of the sample \(x_n\), while the posterior over the global variables \(p(\mu|x_{train})\) reveals which is the affine transformation between the latent and the observable spaces.
InferPy inherits Edward’s approach and considers approximate inference solutions,
in which the task is to approximate the posterior \(p(z,\theta | x_{train})\) using a family of distributions, \(q(z,\theta; \lambda)\), indexed by a parameter vector \(\lambda\).
For doing inference, we must define a model ‘Q’ for approximating the
posterior distribution. This is also done by defining a function decorated
with @inf.probmodel
:
@inf.probmodel
def qmodel(k,d):
qw_loc = inf.Parameter(tf.ones([k,d]), name="qw_loc")
qw_scale = tf.math.softplus(inf.Parameter(tf.ones([k, d]), name="qw_scale"))
qw = inf.Normal(qw_loc, qw_scale, name="w")
with inf.datamodel():
qz_loc = inf.Parameter(tf.ones([k]), name="qz_loc")
qz_scale = tf.math.softplus(inf.Parameter(tf.ones([k]), name="qz_scale"))
qz = inf.Normal(qz_loc, qz_scale, name="z")
In the ‘Q’ model we should include a q distribution for each non-observed variable in
the ‘P’ model. These variables are also objects of class inferpy.RandomVariable
.
However, their parameters might be of type inf.Parameter
, which are objects
encapsulating TensorFlow trainable variables.
Then, we set the parameters of the inference algorithm. In case of variational inference
(VI) we must specify an instance of the ‘Q’ model and the number of epochs
(i.e.,
iterations). For example:
# set the inference algorithm
VI = inf.inference.VI(qmodel(k=1,d=2), epochs=1000)
VI can be further configured by setting the parameter optimizer
which
indicates the TensorFlow optimizer to be used (AdamOptimizer by default).
Stochastic Variational Inference (SVI) is similarly specified but has an additional input parameter for setting the batch size:
SVI = inf.inference.SVI(qmodel(k=1,d=2), epochs=1000, batch_size=200)
Then we must instantiate ‘P’ model and fit the data with the inference algorithm previously defined.
# create an instance of the model
m = pca(k=1,d=2)
# run the inference
m.fit({"x": x_train}, VI)
The output generated will be similar to:
0 epochs 44601.14453125....................
200 epochs 44196.98046875....................
400 epochs 50616.359375....................
600 epochs 41085.6484375....................
800 epochs 30349.79296875....................
Finally, we can access the parameters of the posterior distributions:
>>> m.posterior("w").parameters()
{'name': 'w',
'allow_nan_stats': True,
'validate_args': False,
'scale': array([[0.9834974 , 0.99731755]], dtype=float32),
'loc': array([[1.7543027, 1.7246702]], dtype=float32)}
Custom Loss function¶
Following InferPy guiding principles, users can further configure the inference algorithm. For example, we might be interested in defining our own function to minimize when using VI. As an example, we define the following function taking as input parameters the random variables of the P and Q models (we assume that their sample sizes are consistent with the plates in the model). Note that the output of this function must be a tensor.
def custom_elbo(pvars, qvars, **kwargs):
# compute energy
energy = tf.reduce_sum([tf.reduce_sum(p.log_prob(p.value)) for p in pvars.values()])
# compute entropy
entropy = - tf.reduce_sum([tf.reduce_sum(q.log_prob(q.value)) for q in qvars.values()])
# compute ELBO
ELBO = energy + entropy
# This function will be minimized. Return minus ELBO
return -ELBO
In order to use our defined loss function, we simply have to pass it to the
input parameter loss
in the inference method constructor. For example:
# set the inference algorithm
VI = inf.inference.VI(qmodel(k=1,d=2), loss=custom_elbo, epochs=1000)
# run the inference
m.fit({"x": x_train}, VI)
After this, the rest of the code remains unchanged.
Markov Chain Monte Carlo¶
Relying on Edward functionality, Markov Chain Monte Carlo (MCMC) is also available for doing inference on InferPy
models. To this end, an object of class inf.inference.MCMC
is created and passed to the model when fitting the data.
Unlike variational inference, a Q-model is not created for doing inference.
# set the inference algorithm
MC = inf.inference.MCMC()
# run the inference
m.fit({"x": x_train}, MC)
Now the posterior is represented as a set of samples. So we might need to aggregate them, e.g., using the mean:
# extract the posterior of z
hidden_encoding = m.posterior("z").parameters()["samples"].mean(axis=0)
Queries¶
The syntax of queries allows using the probabilistic models specifying a type of knowledge: prior, posterior or posterior
predictive. That means that, for example, we can generate new instances from the prior knowledge (using the initial
model definition), or the posterior/posterior predictive knowledge (once the model has been trained using input data).
There are two well-differentiated parts: the query definition and the action function. The action functions can be applied
on Query
objects to:
sample
: samples new data.log_prob
: computes the log prob given some evidence (observed variables).sum_log_prob
: the same as log_prob, but computes the sum of the log prob for all the variables in the probabilistic model.parameters
: returns the parameters of the Random Variables (i.e.: loc and scale for Normal distributions).
Building Query objects¶
Given a probabilistic model object, i.e.: model, we can build Query
objects by calling the prior()
,
posterior()
or posterior_predictive()
methods of the probmodel
class. All these accept the same two
arguments:
target_names
: A string or list of strings that correspond to random variable names. These random variables are the targets of the queries (in other words, the random variables that we want to use when calling an action).data
: A dict that contains as keys the names of the random variables, and the values the observed data for those random variables. By default, it is an empty dict.
Each funtion is defined as follows:
prior()
: This function returnsQuery
objects that use the random variables initially defined in the model when applying the actions. It just uses prior knowledge and can be invoked once the model object is created.posterior()
: This function returnsQuery
objects that use the expanded random variables defined and fitted after the training process. It utilizes the posterior knowledge and can be used only after calling thefit
function. The target variables allowed are those not observed during the training process.posterior_predictive()
: This function is similar to theposterior
, but he target variables permitted in this function are those observed during the training process.
Action functions¶
Action functions allow getting the desired information from the Query
objects. As described before, actually
there are four functions:
sample(size)
: Generates _size_ instances (by defaultsize=1
). It returns a dict, where the keys are the random variable names and the values are the sample data. If there is only one target name, only the sample data is returned.log_prob()
: computes the log prob given the evidence specified in theQuery
object. It returns a dict, where the keys are the random variable names and the values are the log probs. If there is only one target name, only the log prob is returned.sum_log_prob()
: the same aslog_prob
, but computes the sum of the log prob for all the variables in the probabilistic model.parameters(names)
: returns the parameters of the Random Variables. Ifnames
isNone
(by default) it returns all the parameters of all the random variables. Ifnames
is a string or a list of strings, that corresponds to parameter names, then it returns the parameters of the random variables that match with any name provided in the _names_ argument. It returns a dict, where the keys are the random variable names and the values are the dict of parameters (name of parameter: parameter value). If there is only one target name, only the dict of parameters for such a random variable is returned.
Example¶
The following example illustrates the usage of queries.
import inferpy as inf
import tensorflow as tf
@inf.probmodel
def linear_reg(d):
w0 = inf.Normal(0, 1, name="w0")
w = inf.Normal(tf.zeros([d, 1]), 1, name="w")
with inf.datamodel():
x = inf.Normal(tf.ones(d), 2, name="x")
y = inf.Normal(w0 + x @ w, 1.0, name="y")
m = linear_reg(2)
# Generate 100 samples for x and y random variables, with random variables w and w0 observed
data = m.prior(["x", "y"], data={"w0": 0, "w": [[2], [1]]}).sample(100)
# Define the qmodel and train
@inf.probmodel
def qmodel(d):
qw0_loc = inf.Parameter(0., name="qw0_loc")
qw0_scale = tf.math.softplus(inf.Parameter(1., name="qw0_scale"))
qw0 = inf.Normal(qw0_loc, qw0_scale, name="w0")
qw_loc = inf.Parameter(tf.zeros([d, 1]), name="qw_loc")
qw_scale = tf.math.softplus(inf.Parameter(tf.ones([d, 1]), name="qw_scale"))
qw = inf.Normal(qw_loc, qw_scale, name="w")
x_train = data["x"]
y_train = data["y"]
# set and run the inference
VI = inf.inference.VI(qmodel(2), epochs=10000)
m.fit({"x": x_train, "y": y_train}, VI)
# Now we can obtain the parameters of the hidden variables (after training)
m.posterior(["w", "w0"]).parameters()
# We can also generate new samples for the posterior distribution of the random variable x
post_data = m.posterior_predictive(["x", "y"]).sample()
# and we can check the log prob of the hidden variables, given the posterior sampled data
m.posterior(data=post_data).log_prob()