Guide to Approximate Inference¶

Variational Inference¶

The API defines the set of algorithms and methods used to perform inference in a probabilistic model \(p(x,z,\theta)\) (where \(x\) are the observations, \(z\) the local hidden variables, and \(\theta\) the global parameters of the model). More precisely, the inference problem reduces to computing the posterior probability over the latent variables given a data sample, i.e., \(p(z,\theta | x_{train})\), because from these posteriors we can uncover the hidden structure in the data. Let us consider the following model:

@inf.probmodel
def pca(k,d):
    w = inf.Normal(loc=tf.zeros([k,d]), scale=1, name="w")      # shape = [k,d]
    with inf.datamodel():
        z = inf.Normal(tf.ones([k]),1, name="z")                # shape = [N,k]
        x = inf.Normal(z @ w , 1, name="x")                     # shape = [N,d]

In this model, the posterior over the local hidden variables, \(p(w_n|x_{train})\), encodes the latent vector representation of the sample \(x_n\), while the posterior over the global variables \(p(\mu|x_{train})\) reveals which is the affine transformation between the latent and the observable spaces.

InferPy inherits Edward’s approach and considers approximate inference solutions,

\[q(z,\theta) \approx p(z,\theta | x_{train})\]

in which the task is to approximate the posterior \(p(z,\theta | x_{train})\) using a family of distributions, \(q(z,\theta; \lambda)\), indexed by a parameter vector \(\lambda\).

For doing inference, we must define a model ‘Q’ for approximating the posterior distribution. This is also done by defining a function decorated with @inf.probmodel:

@inf.probmodel
def qmodel(k,d):
    qw_loc = inf.Parameter(tf.ones([k,d]), name="qw_loc")
    qw_scale = tf.math.softplus(inf.Parameter(tf.ones([k, d]), name="qw_scale"))
    qw = inf.Normal(qw_loc, qw_scale, name="w")

    with inf.datamodel():
        qz_loc = inf.Parameter(tf.ones([k]), name="qz_loc")
        qz_scale = tf.math.softplus(inf.Parameter(tf.ones([k]), name="qz_scale"))
        qz = inf.Normal(qz_loc, qz_scale, name="z")

In the ‘Q’ model we should include a q distribution for each non-observed variable in the ‘P’ model. These variables are also objects of class inferpy.RandomVariable. However, their parameters might be of type inf.Parameter, which are objects encapsulating TensorFlow trainable variables.

Then, we set the parameters of the inference algorithm. In case of variational inference (VI) we must specify an instance of the ‘Q’ model and the number of epochs (i.e., iterations). For example:

# set the inference algorithm
VI = inf.inference.VI(qmodel(k=1,d=2), epochs=1000)

VI can be further configured by setting the parameter optimizer which indicates the TensorFlow optimizer to be used (AdamOptimizer by default).

Stochastic Variational Inference (SVI) is similarly specified but has an additional input parameter for setting the batch size:

SVI = inf.inference.SVI(qmodel(k=1,d=2), epochs=1000, batch_size=200)

Then we must instantiate ‘P’ model and fit the data with the inference algorithm previously defined.

# create an instance of the model
m = pca(k=1,d=2)
# run the inference
m.fit({"x": x_train}, VI)

The output generated will be similar to:

epochs	 44601.14453125....................
epochs	 44196.98046875....................
epochs	 50616.359375....................
epochs	 41085.6484375....................
epochs	 30349.79296875....................

Finally, we can access the parameters of the posterior distributions:

>>> m.posterior("w").parameters()
{'name': 'w',
 'allow_nan_stats': True,
 'validate_args': False,
 'scale': array([[0.9834974 , 0.99731755]], dtype=float32),
 'loc': array([[1.7543027, 1.7246702]], dtype=float32)}

Custom Loss function¶

Following InferPy guiding principles, users can further configure the inference algorithm. For example, we might be interested in defining our own function to minimize when using VI. As an example, we define the following function taking as input parameters the random variables of the P and Q models (we assume that their sample sizes are consistent with the plates in the model). Note that the output of this function must be a tensor.

def custom_elbo(pvars, qvars, **kwargs):

    # compute energy
    energy = tf.reduce_sum([tf.reduce_sum(p.log_prob(p.value)) for p in pvars.values()])

    # compute entropy
    entropy = - tf.reduce_sum([tf.reduce_sum(q.log_prob(q.value)) for q in qvars.values()])

    # compute ELBO
    ELBO = energy + entropy

    # This function will be minimized. Return minus ELBO
    return -ELBO

In order to use our defined loss function, we simply have to pass it to the input parameter loss in the inference method constructor. For example:

# set the inference algorithm
VI = inf.inference.VI(qmodel(k=1,d=2), loss=custom_elbo, epochs=1000)

# run the inference
m.fit({"x": x_train}, VI)

After this, the rest of the code remains unchanged.

Markov Chain Monte Carlo¶

Relying on Edward functionality, Markov Chain Monte Carlo (MCMC) is also available for doing inference on InferPy models. To this end, an object of class inf.inference.MCMC is created and passed to the model when fitting the data. Unlike variational inference, a Q-model is not created for doing inference.

# set the inference algorithm
MC = inf.inference.MCMC()
# run the inference
m.fit({"x": x_train}, MC)

Now the posterior is represented as a set of samples. So we might need to aggregate them, e.g., using the mean:

# extract the posterior of z
hidden_encoding = m.posterior("z").parameters()["samples"].mean(axis=0)

Queries¶

The syntax of queries allows using the probabilistic models specifying a type of knowledge: prior, posterior or posterior predictive. That means that, for example, we can generate new instances from the prior knowledge (using the initial model definition), or the posterior/posterior predictive knowledge (once the model has been trained using input data). There are two well-differentiated parts: the query definition and the action function. The action functions can be applied on Query objects to:

sample: samples new data.
log_prob: computes the log prob given some evidence (observed variables).
sum_log_prob: the same as log_prob, but computes the sum of the log prob for all the variables in the probabilistic model.
parameters: returns the parameters of the Random Variables (i.e.: loc and scale for Normal distributions).

Building Query objects¶

Given a probabilistic model object, i.e.: model, we can build Query objects by calling the prior(), posterior() or posterior_predictive() methods of the probmodel class. All these accept the same two arguments:

target_names: A string or list of strings that correspond to random variable names. These random variables are the targets of the queries (in other words, the random variables that we want to use when calling an action).
data: A dict that contains as keys the names of the random variables, and the values the observed data for those random variables. By default, it is an empty dict.

Each funtion is defined as follows:

prior(): This function returns Query objects that use the random variables initially defined in the model when applying the actions. It just uses prior knowledge and can be invoked once the model object is created.
posterior(): This function returns Query objects that use the expanded random variables defined and fitted after the training process. It utilizes the posterior knowledge and can be used only after calling the fit function. The target variables allowed are those not observed during the training process.
posterior_predictive(): This function is similar to the posterior, but he target variables permitted in this function are those observed during the training process.

Action functions¶

Action functions allow getting the desired information from the Query objects. As described before, actually there are four functions:

sample(size): Generates _size_ instances (by default size=1). It returns a dict, where the keys are the random variable names and the values are the sample data. If there is only one target name, only the sample data is returned.
log_prob(): computes the log prob given the evidence specified in the Query object. It returns a dict, where the keys are the random variable names and the values are the log probs. If there is only one target name, only the log prob is returned.
sum_log_prob(): the same as log_prob, but computes the sum of the log prob for all the variables in the probabilistic model.
parameters(names): returns the parameters of the Random Variables. If names is None (by default) it returns all the parameters of all the random variables. If names is a string or a list of strings, that corresponds to parameter names, then it returns the parameters of the random variables that match with any name provided in the _names_ argument. It returns a dict, where the keys are the random variable names and the values are the dict of parameters (name of parameter: parameter value). If there is only one target name, only the dict of parameters for such a random variable is returned.

Example¶

The following example illustrates the usage of queries.

import inferpy as inf
import tensorflow as tf

@inf.probmodel
def linear_reg(d):
    w0 = inf.Normal(0, 1, name="w0")
    w = inf.Normal(tf.zeros([d, 1]), 1, name="w")
    with inf.datamodel():
        x = inf.Normal(tf.ones(d), 2, name="x")
        y = inf.Normal(w0 + x @ w, 1.0, name="y")

m = linear_reg(2)

# Generate 100 samples for x and y random variables, with random variables w and w0 observed
data = m.prior(["x", "y"], data={"w0": 0, "w": [[2], [1]]}).sample(100)

# Define the qmodel and train
@inf.probmodel
def qmodel(d):
    qw0_loc = inf.Parameter(0., name="qw0_loc")
    qw0_scale = tf.math.softplus(inf.Parameter(1., name="qw0_scale"))
    qw0 = inf.Normal(qw0_loc, qw0_scale, name="w0")
    qw_loc = inf.Parameter(tf.zeros([d, 1]), name="qw_loc")
    qw_scale = tf.math.softplus(inf.Parameter(tf.ones([d, 1]), name="qw_scale"))
    qw = inf.Normal(qw_loc, qw_scale, name="w")

x_train = data["x"]
y_train = data["y"]

# set and run the inference
VI = inf.inference.VI(qmodel(2), epochs=10000)
m.fit({"x": x_train, "y": y_train}, VI)

# Now we can obtain the parameters of the hidden variables (after training)
m.posterior(["w", "w0"]).parameters()

# We can also generate new samples for the posterior distribution of the random variable x
post_data = m.posterior_predictive(["x", "y"]).sample()

# and we can check the log prob of the hidden variables, given the posterior sampled data
m.posterior(data=post_data).log_prob()