Guide to Probabilistic Models¶
Getting Started with Probabilistic Models¶
InferPy focuses on hierarchical probabilistic models structured in two different layers:
A prior model defining a joint distribution \(p(\mathbf{w})\) over the global parameters of the model. \(\mathbf{w}\) can be a single random variable or a bunch of random variables with any given dependency structure.
A data or observation model defining a joint conditional distribution \(p(\mathbf{x},\mathbf{z}|\mathbf{w})\) over the observed quantities \(\mathbf{x}\) and the the local hidden variables \(\mathbf{z}\) governing the observation \(\mathbf{x}\). This data model is specified in a single-sample basis. There are many models of interest without local hidden variables, in that case, we simply specify the conditional \(p(\mathbf{x}|\mathbf{w})\). Similarly, either \(\mathbf{x}\) or \(\mathbf{z}\) can be a single random variable or a bunch of random variables with any given dependency structure.
For example, a Bayesian PCA model has the following graphical structure,
The prior model is composed by the variables \(\bf{w}_k\). The data model is the part of the model surrounded by the box indexed by N.
And this is how this Bayesian PCA model is defined in InferPy:
# definition of a generic model
@inf.probmodel
def pca(k,d):
w = inf.Normal(loc=np.zeros([k,d]), scale=1, name="w") # shape = [k,d]
with inf.datamodel():
z = inf.Normal(np.ones(k),1, name="z") # shape = [N,k]
x = inf.Normal(z @ w , 1, name="x") # shape = [N,d]
# create an instance of the model
m = pca(k=1,d=2)
The with inf.datamodel()
syntaxis is used to replicate the
random variables contained within this construct. It follows from the
so-called plateau notation to define the data generation part of a
probabilistic model. Every replicated variable is conditionally
independent given the previous random variables (if any) defined
outside the with statement. The plateau size will be later automatically calculated,
so there is no need to specify it. Yet, this construct has an optional input parameter for specifying
its size, e.g., with inf.datamodel(size=N)
. This should be consistent with the size of
the data.
Random Variables¶
Any random variable in InferPy encapsulates an equivalent one in Edward 2, and hence it also has associated
a distribution object from tensorflow-probability. These can be accessed using the properties var
and
distribution
respectively:
>>> x = inf.Normal(loc = 0, scale = 1)
>>> x.var
<ed.RandomVariable 'randvar_0/' shape=() dtype=float32>
>>> x.distribution
<tfp.distributions.Normal 'randvar_0/' batch_shape=() event_shape=() dtype=float32>
InferPy random variables inherit all the properties and methods from Edward2 variables or TensorFlow Probability distributions (in this order or priority). For example:
>>> x.value
<tf.Tensor 'randvar_0/sample/Reshape:0' shape=() dtype=float32>
>>> x.sample()
-0.05060442
>>> x.loc
<tf.Tensor 'randvar_0/Identity:0' shape=() dtype=float32>
In the code, value
is inherited form the encapsulated Edward2 object while sample()
and the
parameter loc
are obtained from the distribution object. Note that the method sample()
returns
evaluated tensors. It can be avoided using the input parameter tf_run
as follows.
>>> x.sample(tf_run=False)
<tf.Tensor 'randvar_0/sample/Reshape:0' shape=() dtype=float32>
Following Edward’s approach, we (conceptually) partition a random variable’s shape into three groups:
Batch shape describes independent, not identically distributed draws. Namely, we may have a set of (different) parameterizations to the same distribution.
Sample shape describes independent, identically distributed draws from the distribution.
Event shape describes the shape of a single draw (event space) from the distribution; it may be dependent across dimensions.
The previous attributes can be accessed by x.batch_shape
, x.sample_shape
and x.event_shape
,
respectively. When declaring random variables, the batch_shape is obtained from the distribution
parameters. For as long as possible, the parameters will be broadcasted. With this in mind, all the definitions in the
following code are equivalent.
x = inf.Normal(loc = [[0.,0.],[0.,0.],[0.,0.]], scale=1) # x.shape = [3,2]
x = inf.Normal(loc = np.zeros([3,2]), scale=1) # x.shape = [3,2]
x = inf.Normal(loc = 0, scale=tf.ones([3,2])) # x.shape = [3,2]
The sample_shape
can be explicitly stated using the input parameter
sample_shape, but this only can be done outside a model definition.
Inside of inf.probmodels
, the sample_shape is fixed by with inf.datamodel(size = N)
(using the size argument
when provided, or in runtime depending on the observed data).
x = inf.Normal(tf.ones([3,2]), 0, sample_shape=100) # x.sample = [100,3,2]
with inf.datamodel(100):
x = inf.Normal(tf.ones([3, 2]), 0) # x.sample = [100,3,2]
Finally, the event shape will only be considered in some distributions. This is the case of the multivariate Gaussian:
x = inf.MultivariateNormalDiag(loc=[1., -1], scale_diag=[1, 2.])
>>> x.event_shape
TensorShape([Dimension(2)])
>>> x.batch_shape
TensorShape([])
>>> x.sample_shape
TensorShape([])
Note that indexing over all the defined dimensions is supported:
with inf.datamodel(size=10):
x = inf.models.Normal(loc=tf.zeros(5), scale=1.) # x.shape = [10,5]
y = x[7,4] # y.shape = []
y2 = x[7] # y2.shape = [5]
y3 = x[7,:] # y2.shape = [5]
y4 = x[:,4] # y4.shape = [10]
Moreover, we may use indexation for defining new variables whose indexes may be other (discrete) variables.
i = inf.Categorical(logits= tf.zeros(3)) # shape = []
mu = inf.Normal([5,1,-2], 0.) # shape = [3]
x = inf.Normal(mu[i], scale=1.) # shape = []
Probabilistic Models¶
A probabilistic model defines a joint distribution over observable and hidden variables, i.e., \(p(\mathbf{w}, \mathbf{z}, \mathbf{x})\). Note that a variable might be observable or hidden depending on the fitted data. Thus this is not specified when defining the model.
A probabilistic model is defined by decorating any function with @inf.probmodel
.
The model is made of any variable defined inside this function. A simple example is shown
below.
@inf.probmodel
def simple(mu=0):
# global variables
theta = inf.Normal(mu, 0.1, name="theta")
# local variables
with inf.datamodel():
x = inf.Normal(theta, 1, name="x")
Note that any variable in a model can be initialized with a name. Otherwise, names generated automatically will be used. However, it is highly convenient to explicitly specify the name of a random variable because in this way it will be able to be referenced in some inference stages.
The model must be instantiated before it can be used. This is done by simply invoking the function (which will return a probmodel object).
>>> m = simple()
>>> type(m)
<class 'inferpy.models.prob_model.ProbModel'>
Now we are ready to use the model with the prior probabilities. For example, we might get a sample or access the distribution parameters:
>>> m.prior().sample()
{'theta': -0.074800275, 'x': array([0.07758344], dtype=float32)}
>>> m.prior().parameters()
{'theta': {'name': 'theta',
'allow_nan_stats': True,
'validate_args': False,
'scale': 0.1,
'loc': 0},
'x': {'name': 'x',
'allow_nan_stats': True,
'validate_args': False,
'scale': 1,
'loc': 0.116854645}}
or to extract the variables:
>>> m.vars["theta"]
<inf.RandomVariable (Normal distribution) named theta/, shape=(), dtype=float32>
We can create new and different instances of our model:
>>> m2 = simple(mu=5)
>>> m==m2
False
Supported Probability Distributions¶
Supported probability distributions are located in the package inferpy.models
. All of them
have inferpy.models.RandomVariable
as the superclass. A list with all the supported distributions can be obtained as follows.
>>> inf.models.random_variable.distributions_all
['Autoregressive', 'BatchReshape', 'Bernoulli', 'Beta', 'BetaWithSoftplusConcentration',
'Binomial', 'Categorical', 'Cauchy', 'Chi2', 'Chi2WithAbsDf', 'ConditionalTransformedDistribution',
'Deterministic', 'Dirichlet', 'DirichletMultinomial', 'ExpRelaxedOneHotCategorical', '
Exponential', 'ExponentialWithSoftplusRate', 'Gamma', 'GammaGamma',
'GammaWithSoftplusConcentrationRate', 'Geometric', 'GaussianProcess',
'GaussianProcessRegressionModel', 'Gumbel', 'HalfCauchy', 'HalfNormal',
'HiddenMarkovModel', 'Horseshoe', 'Independent', 'InverseGamma',
'InverseGammaWithSoftplusConcentrationRate', 'InverseGaussian', 'Kumaraswamy',
'LinearGaussianStateSpaceModel', 'Laplace', 'LaplaceWithSoftplusScale', 'LKJ',
'Logistic', 'LogNormal', 'Mixture', 'MixtureSameFamily', 'Multinomial',
'MultivariateNormalDiag', 'MultivariateNormalFullCovariance', 'MultivariateNormalLinearOperator',
'MultivariateNormalTriL', 'MultivariateNormalDiagPlusLowRank', 'MultivariateNormalDiagWithSoftplusScale',
'MultivariateStudentTLinearOperator', 'NegativeBinomial', 'Normal', 'NormalWithSoftplusScale',
'OneHotCategorical', 'Pareto', 'Poisson', 'PoissonLogNormalQuadratureCompound', 'QuantizedDistribution',
'RelaxedBernoulli', 'RelaxedOneHotCategorical', 'SinhArcsinh', 'StudentT', 'StudentTWithAbsDfSoftplusScale',
'StudentTProcess', 'TransformedDistribution', 'Triangular', 'TruncatedNormal', 'Uniform', 'VectorDeterministic',
'VectorDiffeomixture', 'VectorExponentialDiag', 'VectorLaplaceDiag', 'VectorSinhArcsinhDiag', 'VonMises',
'VonMisesFisher', 'Wishart', 'Zipf']
Note that these are all the distributions in Edward 2 and hence in tensorflow-probability. Their input parameters will be the same.