Guide to Bayesian Deep Learning

Models Containing Neural Networks

InferPy inherits Edward’s approach for representing probabilistic models as (stochastic) computational graphs. As described above, a random variable \(x\) is associated to a tensor \(x^*\) in the computational graph handled by TensorFlow, where the computations take place. This tensor \(x^*\) contains the samples of the random variable \(x\), i.e. \(x^* \sim p(x|\theta)\). In this way, random variables can be involved in complex deterministic operations containing deep neural networks, math operations and other libraries compatible with TensorFlow (such as Keras).

Bayesian deep learning or deep probabilistic programming embraces the idea of employing deep neural networks within a probabilistic model in order to capture complex non-linear dependencies between variables. This can be done by combining InferPy with tf.layers, tf.keras or tfp.layers.

InferPy’s API gives support to this powerful and flexible modeling framework. Let us start by showing how a non-linear PCA.

import inferpy as inf
import tensorflow as tf


# number of components
k = 1
# size of the hidden layer in the NN
d0 = 100
# dimensionality of the data
dx = 2
# number of observations (dataset size)
N = 1000


@inf.probmodel
def nlpca(k, d0, dx, decoder):

    with inf.datamodel():
        z = inf.Normal(tf.ones([k])*0.5, 1., name="z")    # shape = [N,k]
        output = decoder(z,d0,dx)
        x_loc = output[:,:dx]
        x_scale = tf.nn.softmax(output[:,dx:])
        x = inf.Normal(x_loc, x_scale, name="x")   # shape = [N,d]


def decoder(z,d0,dx):
    h0 = tf.layers.dense(z, d0, tf.nn.relu)
    return tf.layers.dense(h0, 2 * dx)


# Q-model  approximating P

@inf.probmodel
def qmodel(k):
    with inf.datamodel():
        qz_loc = inf.Parameter(tf.ones([k])*0.5, name="qz_loc")
        qz_scale = tf.math.softplus(inf.Parameter(tf.ones([k]),name="qz_scale"))

        qz = inf.Normal(qz_loc, qz_scale, name="z")


# create an instance of the model
m = nlpca(k,d0,dx, decoder)

# set the inference algorithm
VI = inf.inference.VI(qmodel(k), epochs=5000)

# learn the parameters
m.fit({"x": x_train}, VI)

#extract the hidden representation
hidden_encoding = m.posterior("z")
print(hidden_encoding.sample())

In this case, the parameters of the decoder neural network (i.e., weights) are automatically managed by TensorFlow. These parameters are treated as model parameters and not exposed to the user. In consequence, we can not be Bayesian about them by defining specific prior distributions.

Alternatively, we could use Keras layers by simply defining an alternative decoder function as follows.

def decoder_keras(z,d0,dx):
    h0 = tf.keras.layers.Dense(d0, activation=tf.nn.relu)
    h1 = tf.keras.layers.Dense(2*dx)
    return h1(h0(z))

# create an instance of the model
m = nlpca(k,d0,dx, decoder_keras)
m.fit({"x": x_train}, VI)

InferPy is also compatible with Keras models such as tf.keras.Sequential`:

def decoder_seq(z,d0,dx):
    return tf.keras.Sequential([
        tf.keras.layers.Dense(d0, activation=tf.nn.relu),
        tf.keras.layers.Dense(2 * dx)
    ])(z)

# create an instance of the model and fit the data
m = nlpca(k,d0,dx, decoder_seq)
m.fit({"x": x_train}, VI)

Bayesian Neural Networks

InferPy allows the definition of Bayesian NN using the same dense variational layers that are available in tfp.layers, i.e.:

  • DenseFlipout: Densely-connected layer class with Flipout estimator.

  • DenseLocalReparameterization: Densely-connected layer class with local reparameterization estimator.

  • DenseReparameterization: Densely-connected layer class with reparameterization estimator.

The weights of these layers are drawn from distributions whose posteriors are calculated using variational inference. For more details, check the official tfp documentation. For its usage, we simply need to include them in an InferPy Sequential model inf.layers.Sequential as follows.

import tensorflow_probability as tfp

def decoder_bayesian(z,d0,dx):
    return inf.layers.Sequential([
        tfp.layers.DenseFlipout(d0, activation=tf.nn.relu),
        tfp.layers.DenseLocalReparameterization(d0, activation=tf.nn.relu),
        tfp.layers.DenseReparameterization(d0, activation=tf.nn.relu),
        tf.keras.layers.Dense(2 * dx)
    ])(z)


# create an instance of the model
m = nlpca(k,d0,dx, decoder_bayesian)
m.fit({"x": x_train}, VI)

Note that this model differs from the one provided by Keras. A more detailed example with Bayesian layers is given here.