Guide to Probabilistic Models¶

Getting Started with Probabilistic Models¶

InferPy focuses on hierarchical probabilistic models structured in two different layers:

A prior model defining a joint distribution \(p(\mathbf{w})\) over the global parameters of the model. \(\mathbf{w}\) can be a single random variable or a bunch of random variables with any given dependency structure.
A data or observation model defining a joint conditional distribution \(p(\mathbf{x},\mathbf{z}|\mathbf{w})\) over the observed quantities \(\mathbf{x}\) and the the local hidden variables \(\mathbf{z}\) governing the observation \(\mathbf{x}\). This data model is specified in a single-sample basis. There are many models of interest without local hidden variables, in that case, we simply specify the conditional \(p(\mathbf{x}|\mathbf{w})\). Similarly, either \(\mathbf{x}\) or \(\mathbf{z}\) can be a single random variable or a bunch of random variables with any given dependency structure.

For example, a Bayesian PCA model has the following graphical structure,

The prior model is composed by the variables \(\bf{w}_k\). The data model is the part of the model surrounded by the box indexed by N.

And this is how this Bayesian PCA model is defined in InferPy:

# definition of a generic model
@inf.probmodel
def pca(k,d):
    w = inf.Normal(loc=np.zeros([k,d]), scale=1, name="w")      # shape = [k,d]
    with inf.datamodel():
        z = inf.Normal(np.ones(k),1, name="z")                # shape = [N,k]
        x = inf.Normal(z @ w , 1, name="x")                     # shape = [N,d]


# create an instance of the model
m = pca(k=1,d=2)

The with inf.datamodel() syntaxis is used to replicate the random variables contained within this construct. It follows from the so-called plateau notation to define the data generation part of a probabilistic model. Every replicated variable is conditionally independent given the previous random variables (if any) defined outside the with statement. The plateau size will be later automatically calculated, so there is no need to specify it. Yet, this construct has an optional input parameter for specifying its size, e.g., with inf.datamodel(size=N). This should be consistent with the size of the data.

Random Variables¶

Any random variable in InferPy encapsulates an equivalent one in Edward 2, and hence it also has associated a distribution object from tensorflow-probability. These can be accessed using the properties var and distribution respectively:

>>> x = inf.Normal(loc = 0, scale = 1)

>>> x.var
<ed.RandomVariable 'randvar_0/' shape=() dtype=float32>

>>> x.distribution
<tfp.distributions.Normal 'randvar_0/' batch_shape=() event_shape=() dtype=float32>

InferPy random variables inherit all the properties and methods from Edward2 variables or TensorFlow Probability distributions (in this order or priority). For example:

>>> x.value
<tf.Tensor 'randvar_0/sample/Reshape:0' shape=() dtype=float32>

>>> x.sample()
-0.05060442

>>> x.loc
<tf.Tensor 'randvar_0/Identity:0' shape=() dtype=float32>

In the code, value is inherited form the encapsulated Edward2 object while sample() and the parameter loc are obtained from the distribution object. Note that the method sample() returns evaluated tensors. It can be avoided using the input parameter tf_run as follows.

>>> x.sample(tf_run=False)
<tf.Tensor 'randvar_0/sample/Reshape:0' shape=() dtype=float32>

Following Edward’s approach, we (conceptually) partition a random variable’s shape into three groups:

Batch shape describes independent, not identically distributed draws. Namely, we may have a set of (different) parameterizations to the same distribution.
Sample shape describes independent, identically distributed draws from the distribution.
Event shape describes the shape of a single draw (event space) from the distribution; it may be dependent across dimensions.

The previous attributes can be accessed by x.batch_shape, x.sample_shape and x.event_shape, respectively. When declaring random variables, the batch_shape is obtained from the distribution parameters. For as long as possible, the parameters will be broadcasted. With this in mind, all the definitions in the following code are equivalent.

x = inf.Normal(loc = [[0.,0.],[0.,0.],[0.,0.]], scale=1)  # x.shape = [3,2]

x = inf.Normal(loc = np.zeros([3,2]), scale=1)            # x.shape = [3,2]

x = inf.Normal(loc = 0, scale=tf.ones([3,2]))             # x.shape = [3,2]

The sample_shape can be explicitly stated using the input parameter sample_shape, but this only can be done outside a model definition. Inside of inf.probmodels, the sample_shape is fixed by with inf.datamodel(size = N) (using the size argument when provided, or in runtime depending on the observed data).

x = inf.Normal(tf.ones([3,2]), 0, sample_shape=100)     # x.sample = [100,3,2]

with inf.datamodel(100):
    x = inf.Normal(tf.ones([3, 2]), 0)                  # x.sample = [100,3,2]

Finally, the event shape will only be considered in some distributions. This is the case of the multivariate Gaussian:

x = inf.MultivariateNormalDiag(loc=[1., -1], scale_diag=[1, 2.])

>>> x.event_shape
TensorShape([Dimension(2)])

>>> x.batch_shape
TensorShape([])

>>> x.sample_shape
TensorShape([])

Note that indexing over all the defined dimensions is supported:

with inf.datamodel(size=10):
    x = inf.models.Normal(loc=tf.zeros(5), scale=1.)       # x.shape = [10,5]

y = x[7,4]                                              # y.shape = []

y2 = x[7]                                               # y2.shape = [5]

y3 = x[7,:]                                             # y2.shape = [5]

y4 = x[:,4]                                             # y4.shape = [10]

Moreover, we may use indexation for defining new variables whose indexes may be other (discrete) variables.

i = inf.Categorical(logits= tf.zeros(3))        # shape = []
mu = inf.Normal([5,1,-2], 0.)                   # shape = [3]
x = inf.Normal(mu[i], scale=1.)                 # shape = []

Probabilistic Models¶

A probabilistic model defines a joint distribution over observable and hidden variables, i.e., \(p(\mathbf{w}, \mathbf{z}, \mathbf{x})\). Note that a variable might be observable or hidden depending on the fitted data. Thus this is not specified when defining the model.

A probabilistic model is defined by decorating any function with @inf.probmodel. The model is made of any variable defined inside this function. A simple example is shown below.

@inf.probmodel
def simple(mu=0):
    # global variables
    theta = inf.Normal(mu, 0.1, name="theta")

    # local variables
    with inf.datamodel():
        x = inf.Normal(theta, 1, name="x")

Note that any variable in a model can be initialized with a name. Otherwise, names generated automatically will be used. However, it is highly convenient to explicitly specify the name of a random variable because in this way it will be able to be referenced in some inference stages.

The model must be instantiated before it can be used. This is done by simply invoking the function (which will return a probmodel object).

>>> m = simple()
>>> type(m)
<class 'inferpy.models.prob_model.ProbModel'>

Now we are ready to use the model with the prior probabilities. For example, we might get a sample or access the distribution parameters:

>>> m.prior().sample()
{'theta': -0.074800275, 'x': array([0.07758344], dtype=float32)}

>>> m.prior().parameters()
{'theta': {'name': 'theta',
  'allow_nan_stats': True,
  'validate_args': False,
  'scale': 0.1,
  'loc': 0},
 'x': {'name': 'x',
  'allow_nan_stats': True,
  'validate_args': False,
  'scale': 1,
  'loc': 0.116854645}}

or to extract the variables:

>>> m.vars["theta"]
<inf.RandomVariable (Normal distribution) named theta/, shape=(), dtype=float32>

We can create new and different instances of our model:

>>> m2 = simple(mu=5)
>>> m==m2
False

Supported Probability Distributions¶

Supported probability distributions are located in the package inferpy.models. All of them have inferpy.models.RandomVariable as the superclass. A list with all the supported distributions can be obtained as follows.

>>> inf.models.random_variable.distributions_all
['Autoregressive', 'BatchReshape', 'Bernoulli', 'Beta', 'BetaWithSoftplusConcentration',
 'Binomial', 'Categorical', 'Cauchy', 'Chi2', 'Chi2WithAbsDf', 'ConditionalTransformedDistribution',
  'Deterministic', 'Dirichlet', 'DirichletMultinomial', 'ExpRelaxedOneHotCategorical', '
  Exponential', 'ExponentialWithSoftplusRate', 'Gamma', 'GammaGamma', 
  'GammaWithSoftplusConcentrationRate', 'Geometric', 'GaussianProcess', 
  'GaussianProcessRegressionModel', 'Gumbel', 'HalfCauchy', 'HalfNormal', 
  'HiddenMarkovModel', 'Horseshoe', 'Independent', 'InverseGamma',
   'InverseGammaWithSoftplusConcentrationRate', 'InverseGaussian', 'Kumaraswamy',
   'LinearGaussianStateSpaceModel', 'Laplace', 'LaplaceWithSoftplusScale', 'LKJ',
  'Logistic', 'LogNormal', 'Mixture', 'MixtureSameFamily', 'Multinomial',
   'MultivariateNormalDiag', 'MultivariateNormalFullCovariance', 'MultivariateNormalLinearOperator',
   'MultivariateNormalTriL', 'MultivariateNormalDiagPlusLowRank', 'MultivariateNormalDiagWithSoftplusScale',
   'MultivariateStudentTLinearOperator', 'NegativeBinomial', 'Normal', 'NormalWithSoftplusScale', 
   'OneHotCategorical', 'Pareto', 'Poisson', 'PoissonLogNormalQuadratureCompound', 'QuantizedDistribution',
   'RelaxedBernoulli', 'RelaxedOneHotCategorical', 'SinhArcsinh', 'StudentT', 'StudentTWithAbsDfSoftplusScale', 
   'StudentTProcess', 'TransformedDistribution', 'Triangular', 'TruncatedNormal', 'Uniform', 'VectorDeterministic',
   'VectorDiffeomixture', 'VectorExponentialDiag', 'VectorLaplaceDiag', 'VectorSinhArcsinhDiag', 'VonMises', 
   'VonMisesFisher', 'Wishart', 'Zipf']

Note that these are all the distributions in Edward 2 and hence in tensorflow-probability. Their input parameters will be the same.