Use of Bayesian networks as a massive data mining technique—application to care utilisation data
Type de matériel :
9
Context. There are two distinct approaches to the use of Bayesian networks, both of which rely on the same principles of Bayesian analysis, namely as an a priori modelling tool involving the researcher’s hypotheses, and as a data mining tool, without the researcher’s a priori hypothesis. The first approach has spread to the biomedical community. The second comes primarily from artificial intelligence and to our knowledge is not being used in epidemiology. Nevertheless, this application is promising—especially in the case of massive data—and could lead to the discovery of unsuspected causal relationships. This remains to be demonstrated.Method. We used the 2010 data from the SIRS cohort, based on a representative sample of the adult population of Greater Paris. Several publications in social epidemiology are based on this cohort, including one that studies care utilisation and its corresponding social characteristics. We re-analysed the data from this study with different data mining algorithms that i) automatically identify the structure of the Bayesian network representing the data (the graph), and ii) estimate the network parameters from data. We compared the results obtained by data mining with classical multivariate analyses and data from the literature.Results. Multivariate analysis identifies relationships between variables known from the literature. Bayesian network analyses identify more complex relationships, oriented among variables with simple connotations. The majority of analyses show a separation between social variables and care utilisation variables.Discussion. Mass searching by Bayesian network represents a set of theoretically well-established techniques successfully applied in different domains. Our example of results obtained on known data in the field of social epidemiology suggests that the interest of this type of approach needs to be clarified. In particular, in view of our results, its blinded application appears to have little relevance.
Réseaux sociaux