Understanding visually grounded spoken language via multi-tasking
An alternative approach for intelligent systems to understand human speech
Modern scientific challenges are often tackled with (Deep) Neural Networks (DNN). Despite their high predictive accuracy, DNNs lack inherent explainability. Many scientists do not harvest DNNs power because of lack of trust and understanding of their working. Meanwhile, the eXplainable AI (XAI) research offers some post-hoc (after training) interpretability methods that provide insight into the DNN reasoning by quantifying the relevance of individual features (image pixels, words in text, etc.) with respect to the prediction. These relevance heatmaps indicate how the network has reached its decision directly in the input modality (images, text, speech etc.) of the scientific data. Representing visually the captured knowledge by the AI system can become a source of scientific insights. There are many Open Source Software (OSS) implementations of these methods, alas, supporting a single DNN format, while standards like Open Neural Network eXchange (ONNX) exist. The libraries are known mostly by the AI experts. For the adoption by the wide scientific community understanding of the XAI methods and well-documented and standardized OSS are needed. The DIANNA project aims at determining the best XAI methods in the context of scientific usage providing their OSS implementation based on the ONNX standard and demonstrations on benchmark datasets.