Secondary data analysis using Evidence-Based Bayesian Networks with an application to investigate the determinants of childhood stunting

2024-12-05
Yet, Barbaros
Öykü Başerdem, Elif
Rosenstock, Todd
Secondary data – data previously collected by other researchers for a different purpose – offers a cost-effective and readily available resource for research and policy or program design but presents challenges due to the lack of control of sampling design or data. Bayesian Networks (BN) are well-suited for guiding secondary data analysis as their graphical structure can encode domain knowledge about the causal relationships among factors, and secondary data can be used to learn the nature and strength of these relationships. In order to build BNs from a combination of knowledge and secondary data, the causal structure is firstly built based on expert knowledge and published evidence, and then the parameters are learned from the data. However, the variables in secondary data often imperfectly match the variables in the causal BN structure. When ad-hoc structural modifications are made to match the structure and data, the link between the parameterized model and the supporting knowledge and evidence is lost. This paper presents a systematic method of building BNs based on secondary data. We build the BN structure based on published evidence and expert interviews, carefully documenting the origin of evidence for each relation in the BN. We use formal BN abstraction operations to match the expert structure with the secondary data. The causal and associational implications of applying abstraction operations are traced, making it possible to link the original BN with the parameterized model and trace it back to more complicated models when additional data become available. The method is demonstrated by building a BN model for the drivers of childhood stunting. The BN model puts together the rich published evidence in this domain in a BN structure and evidence-base while learning the parameters of this model from the Demographic and Health Survey (DHS) datasets for India and Senegal. We compared the BNs built by our approach to BNs learned purely from secondary data using structure learning algorithms. We found that none of the learning algorithms can lead to structures close to the evidence-based model. Yet, the link between our models and the evidence is clearly established due to abstraction approaches. The stunting case study demonstrates the advantages of having a clear evidence-base and building a formal link between the evidence and secondary data using abstraction. The resulting models and supporting evidence can be browsed in an online tool.
Expert Systems with Applications
Citation Formats
B. Yet, E. Öykü Başerdem, and T. Rosenstock, “Secondary data analysis using Evidence-Based Bayesian Networks with an application to investigate the determinants of childhood stunting,” Expert Systems with Applications, vol. 256, pp. 0–0, 2024, Accessed: 00, 2024. [Online]. Available: https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85200644235&origin=inward.