Representing evidence for Bayesian updating: compositional evidence, privacy and calibration

Speaker: Paul-Gauthier Noé.

Abstract: Attribute privacy in multimedia technology aims to hide only one or a few personal characteristics, or attributes, of an individual rather than the full identity. To give a few examples, these attributes can be the sex, nationality, or health state of the individual. When the attribute to hide is discrete with a finite number of possible values, the attacker’s belief about the attribute is represented by a discrete probability distribution over the set of possible values. The Bayes’ rule is known as an information acquisition paradigm and tells how the likelihood function is changing the prior belief into a posterior belief. In the binary case—i. e. when there are only two possible values for the attribute—the likelihood function can be written in the form of a Log-Likelihood-Ratio (LLR). This has been known as the weight-of-evidence and is considered a good candidate to inform which hypothesis the data is supporting and how strong. The Bayes’ rule can be written as a sum between the LLR and the log-ratio of prior probabilities decoupling therefore the initial personal belief and the evidence provided by the data.
This thesis proposes to represent the sensitive information disclosed by the data by a likelihood function. In the binary case, the LLR is a good candidate for expressing the likelihood function. However, this appealing form of the Bayes’ Rule can not be generalized straightforwardly to cases where more than two hypotheses are possible. In order to get around this issue, this thesis proposes to treat discrete probability distributions and likelihood functions as compositional data. The sample space of compositional data is a simplex on which a Euclidean vector space structure—known as the Aitchison geometry— can be defined. With the coordinate representation given by the Isometric-Log-Ratio (ILR) approach, the Bayes’ rule is the translation of the prior distribution by the likelihood function.
Within this space, the likelihood function—in the form of a ILR transformation of the likelihood vector (ILRL)—is considered in this thesis as the multiple hypotheses and multidimensional extension of the LLR. The norm of the Isometric-Log-Ratio-Likelihood (ILRL) is the strength-of-evidence and measures the distance between the prior distribution and the posterior distribution. This can be seen as a measure of the information disclosed by the data. This measure of information is refered as evidence information. Perfect privacy—coming from Claude Shannon’s perfect secrecy—is reached when the attacker’s belief does not change when observing the data: its posterior probabilities remain equal to its prior ones. In other words, we want the data to provide no evidence about the value the attribute takes. This idea—also known as zero-evidence—is theoretically reached when the LLR is zero in a binary setting, and by extension when the ILRL is the zero vector in a non-binary case corresponding to no strength-of-evidence. The information—contained in an observation—about an attribute, is represented by a ILRL. However, in order to properly represent the information, the ILRLs have to be calibrated. The concept of calibration has been mostly discussed for probabilities but can be applied to likelihood functions. The idempotence of calibrated LLRs and its constraint on the distributions of normally distributed LLRs are well-known properties. In this thesis, these properties are generalized to the ILRL for multiple hypotheses applications. Based on these properties and on the compositional nature of the likelihood function, a new discriminant analysis approach is proposed. First, for binary applications, the proposed discriminant analysis maps the input feature vectors into a space where the discriminant component forms a calibrated LLR. The mapping is learned through Normalizing Flow (NF) a cascade of invertible neural networks. This discriminant analysis can be used for standard pattern recognition but also for privacy purposes. Since the mapping is invertible, the LLR can be set to zero—which is consistent with the zero-evidence formulation of privacy—and the data can then be mapped back to the feature space. This protection strategy is tested on the concealment of the speaker’s sex in neural network-based speaker embeddings. The resulting protected embeddings are tested for Automatic Speaker Verification (ASV) and for voice conversion applications. Since the properties of the LLR naturally extend to the ILRL thanks to the Aitchison geometry of the simplex, the proposed discriminant analysis is easily generalized to cases where more than two classes, or hypotheses, are involved. We call this new approach Compositional Discriminant Analysis (CDA). It maps the data into a space where the discriminant components forms calibrated likelihood functions expressed by the ILRLs. The family of invertible transformations given by the NF can be used to learn a calibration mapping for LLR. This is briefly discussed at the end of this thesis. Although this work is presented first in the context of privacy preservation, we believe this opens several research directions in pattern recognition, calibration of probabilities and likelihoods for multiclass applications, and the learning of interpretable representation of information.