Technical Program, Day 3 - IberSPEECH 2022 Conference, Granada

Wednesday, November 16

Oral 5: Natural Language Processing
Wednesday, 16 November 2022 (9:00-10:40)
Chair: Zoraida Callejas Carrión

O5.1 09:00 – 09:20	An Attentional Extractive Summarization Framework (abs abstractive approaches, extractive methods can be specially adequate for some applications, and they can help with other tasks such as Question Answering or Information Extraction. In this paper, we propose a general framework for extractive summarization, the Attentional Extractive Summarization framework. The proposed approach is based on the interpretation of the attention mechanisms of hierarchical neural networks, that compute document-level representations of documents and summaries from sentence-level representations, which, in turn, are computed from word-level representations. The models proposed under this framework are able to automatically learn relationships among document and summary sentences, without requiring oracle systems to compute reference labels for each sentence before the training phase. We evaluate two different systems, formalized under the proposed framework, on the CNN/DailyMail and the NewsRoom corpora, which are some of the reference corpora in the most relevant works in text summarization. The results obtained during the evaluation support the adequacy of our proposal and they suggest that there is still room for the improvement of our attentional framework. )
	José Ángel González, Encarna Segarra, Fernando García-Granada, Emilio Sanchis and Lluis-F Hurtado
O5.2 09:20 – 09:40	SUMBot: Summarizing Context in Open-Domain Dialogue Systems (abs In this paper, we investigate the problem of including relevant information as context in open-domain dialogue systems. Most models struggle to identify and incorporate important knowledge from dialogues and simply use the entire turns as context, which increases the size of the input fed to the model with unnecessary information. Additionally, due to the input size limitation of a few hundred tokens of large pre-trained models, regions of the history are not included and informative parts from the dialogue may be omitted. In order to surpass this problem, we introduce a simple method that substitutes part of the context with a summary instead of the whole history, which increases the ability of models to keep track of all the previous relevant information. We show that the inclusion of a summary may improve the answer generation task and discuss some examples to further understand the system’s weaknesses. )
	Rui Ribeiro and Luísa Coheur
O5.3 09:40 – 10:00	Automatic Detection of Inconsistencies in Open-Domain Chatbots (abs Current pre-trained Large Language Models applied to chatbots are capable of producing good quality sentences, handling different conversation topics, and larger interaction times. Unfortunately, the generated responses highly depend on the data on which the chatbot has been trained on, the specific dialogue history and current turn used for guiding the response, the internal decoding mechanisms, ranking strategies, among others. Therefore, it may happen that for the same question asked by the user, the chatbot may provide a different answer, which in a long-term interaction may produce confusion. In this paper, we propose a new methodology based on three phases: a) automatic detection of dialogue topics using zeroshot learning approaches, b) automatic clustering of distinctive questions, and c) detecting inconsistent answers using K-Means clustering and the Silhouette coefficient. To test our proposal, we used the DailyDialog dataset to detect up to 13 different topics. To detect inconsistencies, we manually generated multiple paraphrased questions. Then, we used multiple pre-trained chatbots to answer those questions. Our results in topic detection show a weighted F-1 value of 0.658, and a 3.4 MSE to predict the number of different responses. )
	Jorge Mira Prats, Marcos Estecha-Garitagoitia, Mario Rodríguez-Cantelar and Luis Fernando D’Haro
O5.4 10:00 – 10:20	Ethics Guidelines for the Development of Virtual Assistants for e-Health (abs The use of intelligent virtual assistants for human-machine communication is spreading across multiple applications. The latest breakthroughs in fields such as Natural Language Processing (NLP) or Natural Language Generation (NLG) make it possible to communicate with machines in a more natural and fluent way and in broader contexts, normalizing voice-based interactions with machines. These advances also lead to the appearance of new issues never seen before, especially when this technology extends to public services such as administration, education or health. The transfer of personal data, the opacity of decisions, the presence of bias or the exclusion of groups are critical aspects that cannot be controlled exclusively by economic interests. The design of conversational assistant solutions must be within an ethical, legal, socio-economic and cultural (ELSEC) framework, and it must be ensured that it preserves the dignity, freedom and autonomy of the users. In this paper, we analyse the Artificial Intelligence (AI) European regulatory framework, the issues that appear when designing and developing AI-based conversational solutions for e-health, and we present recommendations based on our experience and on the reflection from an ethical point of view. )
	Andrés Piñeiro Martín, Carmen García Mateo, Laura Docío Fernández and María del Carmen López Pérez
O5.5 10:20 – 10:40	esCorpius: A Massive Spanish Crawling Corpus (abs In the recent years, transformer-based models have lead to significant advances in language modelling for natural language processing. However, they require a vast amount of data to be (pre-)trained and there is a lack of corpora in languages other than English. Recently, several initiatives have presented multilingual datasets obtained from automatic web crawling. However, the results in Spanish present important shortcomings, as they are either too small in comparison with other languages, or present a low quality derived from sub-optimal cleaning and deduplication. In this paper, we introduce ESCORPIUS, a Spanish crawling corpus obtained from near 1 PB of Common Crawl data. It is the most extensive corpus in Spanish with this level of quality in the extraction, purification and deduplication of web textual content. Our data curation process involves a novel highly parallel cleaning pipeline and encompasses a series of deduplication mechanisms that together ensure the integrity of both document and paragraph boundaries. Additionally, we maintain both the source web page URL and the WARC shard origin URL in order to complain with EU regulations. ESCORPIUS has been released under CC BY-NC-ND 4.0 license and it is available on HuggingFace. )
	Asier Gutiérrez-Fandiño, David Pérez-Fernández, Jordi Armengol-Estapé, David Griol and Zoraida Callejas

Keynote 3
Wednesday, 16 November 2022 (11:00-12:00)

KN3
11:00 – 12:00

Utilizing context information in speech recognition for voice assistants (abs

Utilizing contextual information plays a key role in achieving accurate speech recognition in the voice assistant domain. Context can be available in a number of ways, such as the acoustic environment, conversational context, or personalized information about the user. Other sources of context information are trending content at the time the user is speaking and the speaker’s location. While there are established methods for utilizing some of this information in traditional statistical speech recognition systems, contextualizing all-neural speech recognition systems is an active area of research. In my talk, I will present ongoing research at Amazon Alexa on this problem and discuss some of the challenges.

)

Simon Wiesler

Entrepreneurship
Wednesday, 16 November 2022 (12:00-13:00)
Chair: Dayana Rivas

EN 12:00 – 13:00	Entrepreneurship Round Table (abs )