Not all data is equal. But how much information is a piece of data likely to contain? This question is at the heart of medical testing, the design of scientific experiments, and even everyday human learning and thinking. MIT researchers have developed a new way to solve this problem, opening up new applications in medicine, scientific discovery, cognitive science and artificial intelligence.

In theory, the 1948 paper, “A Mathematical Theory of Communication”,* *The late MIT Professor Emeritus Claude Shannon answered this question definitively. One of Shannon’s breakthrough results is the idea of entropy, which allows us to quantify the amount of information inherent in any random object, including the random variables that model observed data. Shannon’s results laid the foundation for modern information and telecommunications theory. The concept of entropy has also proven to be central to computer science and machine learning.

**The challenge of estimating entropy**

Unfortunately, using Shannon’s formula can quickly become computationally intractable. This requires accurately calculating the probability of the data, which in turn requires calculating all the possible ways the data could have appeared in a probabilistic model. If the process of generating data is very simple – for example, a single coin toss or a charged dice roll – calculating entropies is simple. But consider the problem of medical testing, where a positive test result is the result of hundreds of interacting variables, all unknown. With only 10 unknowns, there are already 1,000 possible explanations for the data. With a few hundred, there are more possible explanations than atoms in the known universe, making the exact calculation of entropy an unmanageable problem.

MIT researchers have developed a new method for estimating good approximations of large amounts of information such as Shannon’s entropy using probabilistic inference. The work appears in a paper presented at AISTATS 2022 by* *authors Feras Saad ’16, MEng ’16, PhD student in electrical and computer engineering; Marco-Cusumano Towner PhD ’21; and Vikash Mansinghka ’05, MEng ’09, PhD ’09, Principal Investigator in the Department of Brain and Cognitive Sciences. The key idea is, rather than listing all the explanations, to instead use probabilistic inference algorithms to first infer which explanations are likely, and then use those likely explanations to construct estimates of high quality entropy. The article shows that this inference-based approach can be much faster and more accurate than previous approaches.

Estimating entropy and information in a probabilistic model is fundamentally difficult because it often requires solving a high-dimensional integration problem. Many previous works have developed estimators of these quantities for some special cases, but the new entropy estimators by inference (EEVI) offer the first approach that can provide clear upper and lower bounds on a large set of theoretical quantities of l ‘information. An upper and lower bound means that although we don’t know the true entropy, we can get a number lower than this and a number higher than this.

“The upper and lower entropy bounds provided by our method are particularly useful for three reasons,” Saad explains. “First, the difference between the upper and lower bounds gives a quantitative idea of how confident we should be in the estimates. Second, by using more computational effort, we can bring the difference between the two bounds to zero, which “compresses” the true value with a high degree of precision.Third, we can compose these bounds to form estimates of many other quantities that tell us how informative different variables in a model are about each other. others.

**Solve fundamental problems with data-driven expert systems**

Saad says he’s very excited about the ability this method offers to interrogate probabilistic models in areas such as machine-assisted medical diagnostics. He says one of the goals of the EEVI method is to be able to solve new queries using rich generative models for things like liver disease and diabetes that have already been developed by experts in the medical field. For example, suppose we have a patient with a set of observed attributes (height, weight, age, etc.) and observed symptoms (nausea, blood pressure, etc.). Given these attributes and symptoms, the EEVI can be used to help determine which medical tests for symptoms the physician should perform to maximize information about the absence or presence of a given liver disease (such as cirrhosis or primary biliary cholangitis).

For insulin diagnosis, the authors showed how to use the method of calculating optimal times to take blood glucose measurements that maximize information about a patient’s insulin sensitivity, given a model probabilistic developed by experts in insulin metabolism and the patient’s personalized meal and medication schedule. As routine medical monitoring, such as blood glucose monitoring, moves away from doctor’s offices and towards wearable devices, there are even more opportunities to improve data acquisition, if the value of data can be accurately estimated in advance.

Vikash Mansinghka, lead author of the paper, adds: “We have shown that probabilistic inference algorithms can be used to estimate hard bounds on measures of information that artificial intelligence engineers often consider difficult to compute. This opens up many new applications. It also shows that inference may be more computationally fundamental than we thought. It also helps explain how human minds might be able to estimate the value of information so pervasively, as central to everyday cognition, and helps us design expert AI systems with of these abilities.

The paper, “Estimators of Entropy and Information via Inference in Probabilistic Models”, was presented at AISTATS 2022.