Signal Processing in Medicine and Biology: Innovations in Big Data Processing 3031212355, 9783031212352

Signal Processing in Medicine and Biology: Innovations in Big Data Processing provides an interdisciplinary look at sta

302 91 9MB

English Pages 151 [152] Year 2023

Polecaj historie

Signal Processing and Data Analysis 9783110465082, 9783110461589

This book presents digital signal processing theories and methods and their applications in data analysis, error analysi

393 85 3MB Read more

Signal Processing and Data Analysis 9783110465082, 9783110461589

This book presents digital signal processing theories and methods and their applications in data analysis, error analysi

280 66 39MB Read more

Library in Signal Processing: Signal Processing Theory and Machine Learning [1 ed.] 0123965020, 9780123965028

This first volume, edited and authored by world leading experts, gives a review of the principles, methods and technique

317 143 34MB Read more

Advances in signal processing 9783030403119, 9783030403126

764 180 3MB Read more

Topics in signal processing 9789811395314, 9789811395321

653 63 2MB Read more

Analytical Data Processing in SQL

A guide to understanding the core concepts of distributed data processing, analytical functions, and query optimizations

544 161 21MB Read more

Discrete-Time Signal Processing

504 116 8MB Read more

Digital Signal Processing

As per Anna University Syllabus

408 138 3MB Read more

Digital Signal Processing

UNIT IV FINITE WORD LENGTH EFFECTS Fixed point and floating point number representation — ADC — quantization — truncatio

491 87 4MB Read more

Genomic Signal Processing 9781400865260

Genomic signal processing (GSP) can be defined as the analysis, processing, and use of genomic signals to gain biologica

191 102 2MB Read more

Signal Processing in Medicine and Biology: Innovations in Big Data Processing
3031212355, 9783031212352

Author / Uploaded
Iyad Obeid
Joseph Picone
Ivan Selesnick

Table of contents :
Preface
Contents
Hyper-Enhanced Feature Learning System for Emotion Recognition
1 Introduction
1.1 Emotion Work and Its Relation to Affective States
1.2 Qualitative Approach to Emotion Recognition
2 Related Background
2.1 Literature Review and Applications of Machine/Deep Learning to Emotion Recognition
3 Databases and Emotion State Modeling
3.1 Valence-Arousal Emotional State Modeling
4 Hyper-Enhanced Learning System Methodology
5 Experimentation
5.1 Data Preprocessing and Feature Learning
6 Results and Discussion
6.1 Multimodal Classification
6.2 Results of the Hybrid Neuro-single and Neuro-multimodal Network Classification
7 Conclusions
References
Monitoring of Auditory Discrimination Therapy for Tinnitus Treatment Based on Event-Related (De-) Synchronization Maps
1 Introduction
1.1 What Is Tinnitus?
1.2 Sort of Tinnitus
1.3 Tinnitus Affectation
1.4 How Can Be Over-Synchronization of Neurons Due to Tinnitus Detected?
1.5 Event-Related (De-) Synchronization (ERD/ERS)
1.6 How Can Be Tinnitus Treated?
1.7 Auditory Discrimination Therapy (ADT)
1.8 How Can Be Auditory Discrimination Therapy for Tinnitus Treatment Monitored?
1.9 Methods to Evaluate Auditory Discrimination Therapy
2 Methodology
2.1 EEG Database
2.2 EEG Signal Pre-processing
2.3 ERD/ERS Maps
2.4 Statistical Evaluation
3 Results
3.1 ERD/ERS Maps Grouped by the THI Outcome
3.2 Individual Analysis of the ERD/ERS Maps in Tinnitus Subjects
3.3 Quantification of ERD/ERS Responses
3.4 Cross-Sectional Analysis (Tinnitus Versus Control Group)
4 Discussion
4.1 ERD/ERS Maps Grouped by the THI Outcome
4.2 Individual Analysis of the ERD/ERS Maps
4.3 Quantification of ERD/ERS Responses
4.4 Cross-Sectional Analysis (Tinnitus Versus Control Group)
4.5 Comparison Analysis
5 Conclusions
References
Investigation of the Performance of fNIRS-based BCIs for Assistive Systems in the Presence of Acute Pain
1 Introduction
1.1 fNIRS
1.2 BCI
1.3 Input Data for BCI in Assistive Systems
1.4 Pain and BCI
1.5 Objective
2 Experiment
3 Data Preprocessing
4 Classification
4.1 SVM
4.2 Convolutional Neural Network
5 Results and Discussions
6 Conclusions
References
Spatial Distribution of Seismocardiographic Signal Clustering
1 Introduction
2 Methods
2.1 Experimental Data
2.2 Preprocessing
2.2.1 Filtering
2.2.2 Lung Volume Signal
2.2.3 Segmentation
2.3 SCG Clustering
2.3.1 Distance Measure
Dynamic Time Warping (DTW)
Euclidian and Cross-correlation-based Distance (Ecorr)
2.3.2 Initial Conditions
2.3.3 K-medoid Clustering Algorithm
2.4 Decision Boundary Between Clusters in the Standardized Flow Rate-Lung Volume Feature Space
2.4.1 Consistency of Clustering Spatial Distribution
2.5 Heart Rates in the Clusters
2.6 Intra-cluster Variability Reduction After Clustering
3 Results and Discussion
3.1 Clustering Accuracy
3.2 Decision Boundary Angle
3.2.1 Intra-subject and Inter-subject Angle Variability
3.3 Clusters Locations in Relation to the Respiratory Phase
3.4 Heart Rates in the Clusters
3.5 Intra-cluster Variability Reduction After Clustering
3.6 Computational Cost of the Different Distance Measures
4 Conclusions
Appendix A Heart Rate Distribution in the FL-LV Feature Space
References
Non-invasive ICP Monitoring by Auditory System Measurements
1 Introduction
2 Auditory System-Based Measurements
3 Evoked Tympanic Membrane Displacement
4 Spontaneous Tympanic Membrane Pulsation (TMp)
5 Tympanometry
6 Otoacoustic Emissions
7 Discussion
8 Conclusion
References
Index

Citation preview

Iyad Obeid Joseph Picone Ivan Selesnick Editors

Signal Processing in Medicine and Biology Innovations in Big Data Processing

Signal Processing in Medicine and Biology

Iyad Obeid • Joseph Picone • Ivan Selesnick Editors

Signal Processing in Medicine and Biology Innovations in Big Data Processing

Editors Iyad Obeid ECE Temple University Philadelphia, PA, USA

Joseph Picone ECE Temple University Philadelphia, PA, USA

Ivan Selesnick Tandon School of Engineering New York University Brooklyn, NY, USA

ISBN 978-3-031-21235-2 ISBN 978-3-031-21236-9 (eBook) https://doi.org/10.1007/978-3-031-21236-9 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

Preface

This edited volume consists of the expanded versions of the exceptional papers presented at the 2021 EEE Signal Processing in Medicine and Biology Symposium (IEEE SPMB) held at Temple University in Philadelphia, Pennsylvania, USA. This was the second time the symposium was held as a virtual conference, which has been a popular format since it allows greater participation from the international community. We had 180 participants from 36 countries. IEEE SPMB promotes interdisciplinary papers across a wide range of topics including applications from many areas of the health sciences. The symposium was first held in 2011 at New York University Polytechnic (now known as NYU Tandon School of Engineering). Since 2014, it has been hosted by the Neural Engineering Data Consortium at Temple University as part of a broader mission to promote machine learning and big data applications in bioengineering. The symposium typically consists of 18 highly competitive full paper submissions that include oral presentations, and 12–18 single-page abstracts that are presented as posters. Two plenary lectures are included – one focused on research and the other focused on emerging technology. The symposium provides a stimulating environment where multidisciplinary research in the life sciences is presented. More information about the symposium can be found at www.ieeespmb.org. This edited volume contains five papers selected from the symposium by the technical committee. Authors were encouraged to expand their original submissions into book chapters. The papers represented in this volume all focus on signal processing applications in the health sciences. The first paper, titled “Hyper Enhanced Feature Learning System for Emotion Recognition,” focuses on the problem of automatically detecting the emotional state of a person based on five signals: electroencephalogram (EEG), galvanic skin response (GSR), respiration (RES), electromyogram (EMG), and electrocardiograph (ECG). The authors explore techniques to automatically identify key features for characterizing the emotional state of the subject. The second paper, titled “Monitoring of Auditory Discrimination Therapy for Tinnitus Treatment Based on Event-Related (De-) Synchronization Maps,” deals with a well-known auditory condition known as tinnitus, and evaluates the effect of v

vi

Preface

auditory discrimination therapy (ADT) by monitoring the level of neural synchronization before and after the ADT-based treatment. Using event-related desynchronization (ERD) and event-related synchronization (ERS) maps, the authors suggest that ADT reduces attention towards tinnitus if incremental alpha-ERS responses elicited after the ADT-based treatment during an auditory encoding task are found. The third paper, titled “Investigation of the Performance of fNIRS-based BCIs for Assistive Systems in the Presence of Physical Pain,” investigates the impact of the presence of acute pain conditions on the performance of fNIRS-based brain– computer interfaces (BCIs), exploring the use of this technology as an assistive device for patients with motor and communication disabilities. The authors found that the presence of acute pain negatively impacts the performance of the BCI. This study suggests that it is critical to consider the presence of pain when designing BCIs in assistive systems for patients. The fourth paper, titled “Spatial Distribution of Seismocardiographic Signal Clustering,” studies the use of seismocardiographic (SCG) signals to monitor cardiac activity. The authors study distance measures used to cluster these signals and suggest that Euclidean distances with flow-rate condition would be the method of choice if a single distance measure is used for all patients. The final paper, titled “Non-invasive ICP Monitoring by Auditory System Measurements,” explores monitoring of intracranial pressure (ICP) for diagnosing various neurological conditions. Elevated ICP can complicate the pre-existing clinical disorders and can result in headaches, nausea, vomiting, obtundation, seizures, and even death. This chapter focuses on non-invasive approaches to ICP monitoring linked to the auditory system. The limitation of these non-invasive techniques is their inability to provide absolute ICP values. Hence, these methods are unlikely to substitute for gold standard invasive procedures in the near term, but due to the reduced risks and ease of use, the methods might provide clinical utility in a broad range of inpatient, emergency department, and outpatient settings. We are indebted to all of our authors who contributed to making IEEE SPMB 2021 a great success. The authors represented in this volume worked very diligently to provide excellent expanded chapters of their conference papers, making this volume a unique contribution. We are also indebted to the technical committee for volunteering to review submissions. IEEE SPMB is known for its constructive review process. Our technical committee works closely with authors to improve the quality of their submissions. Philadelphia, PA, USA Brooklyn, NY, USA Philadelphia, PA, USA December 2021

Iyad Obeid Ivan Selesnick Joseph Picone

Contents

Hyper-Enhanced Feature Learning System for Emotion Recognition �� 1 Hayford Perry Fordson, Xiaofen Xing, Kailing Guo, Xiangmin Xu, Adam Anderson, and Eve DeRosa Monitoring of Auditory Discrimination Therapy for Tinnitus Treatment Based on Event-Related (De-) Synchronization Maps�� 29 Ingrid G. Rodríguez-León, Luz María Alonso-Valerdi, Ricardo A. Salido-Ruiz, Israel Román-Godínez, David I. Ibarra-Zarate, and Sulema Torres-Ramos Investigation of the Performance of fNIRS-based BCIs for Assistive Systems in the Presence of Acute Pain�� 61 Ashwini Subramanian, Foroogh Shamsi, and Laleh Najafizadeh Spatial Distribution of Seismocardiographic Signal Clustering�� 87 Sherif Ahdy, Md Khurshidul Azad, Richard H. Sandler, Nirav Raval, and Hansen A. Mansy Non-invasive ICP Monitoring by Auditory System Measurements�� 121 R. Dhar, R. H. Sandler, K. Manwaring, J. L. Cosby, and H. A. Mansy Index�� 149

vii

Hyper-Enhanced Feature Learning System for Emotion Recognition Hayford Perry Fordson, Xiaofen Xing, Kailing Guo, Xiangmin Xu, Adam Anderson, and Eve DeRosa

1 Introduction We all experience strong feelings necessary for any living being. These include fear, surprise, joy, happy, anger, and disgust (Ekman, 1992). Emotion recognition plays a vital role in human-computer interactions (HCI) and render capable computers to comprehend the emotional states of living humans beings in attempt to make computers more “compassionate” in the HCI (Corive et al., 2001; Song et al., 2020). Emotion recognition can be basically divided into two classes. The first class is based on physical signals such as facial expressions (Anderson & McOwan, 2006), body movement (Yan et al., 2014), and speech signals (Khalil et al., 2019). The second class is based on physiological signals such as electroencephalography (EEG) (Kong et al., 2021), electrocardiogram (ECG) (Hasnul et al., 2021), and electromyogram (EMG) (Mithbavkar & Shah, 2021). Some studies in emotion analysis use unimodal signals for emotion recognition (Hajarolasvadi et al., 2020). Other studies focus on combining different physiological signals in order to model a multimodal paradigm (Abdullah et al., 2021).

H. Perry Fordson (*) Centre for Human Body Data Science, South China University of Technology, Guangzhou, China Affect and Cognition Lab, Cornell University, Ithaca, NY, USA e-mail: [email protected] X. Xing · K. Guo · X. Xu Centre for Human Body Data Science, South China University of Technology, Guangzhou, China A. Anderson · E. DeRosa Affect and Cognition Lab, Cornell University, Ithaca, NY, USA © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 I. Obeid et al. (eds.), Signal Processing in Medicine and Biology, https://doi.org/10.1007/978-3-031-21236-9_1

1

2

H. Perry Fordson et al.

Structures of deep neural networks and learnings have been practically used in many fields and have achieved discovery successes in many applications such as artificial intelligence-based technologies (Buhrmester et al., 2021). Some of the most popular deep neural networks include deep belief networks (DBN) (Hinton et al., 2006; Hinton & Salakhutdinov, 2006), deep Boltzmann machines (DBM) (Taherkhani et al., 2018), artificial neural networks (ANN) (Yegnanarayana, 1994), and the convolutional neural networks (CNN). These deep structures have been very powerful but most of the time suffer from time-consuming training process due to numerous numbers of hyperparameters which make structures highly complicated. In addition, the complications involving the numerous hyperparameters make it hard to theoretically analyze the deep structures. Therefore, tuning parameters and adding more layers for better accuracy is what most works employ. This however involves more and more powerful computational resources to improve training performance such as gradable representation structures and ensemble learning structures (Tang et al., 2016; Chen et al., 2012, 2015; Gong et al., 2015; Feng & Chen, 2018; Yu et al., 2015, 2016a, b). Feature extraction for emotion recognition is a difficult task (Zhao & Chen, 2021). Reliable features are needed to classify emotions correctly. Because of the universal approximation capabilities of single-layer feedforward neural networks (SLFNN), they have been largely applied to solve classification problems (Leshno et al., 1993). However, they usually suffer from long training time and low convergence rate as they are sensitive to hyperparameter settings like learning rate. Random vector functional-link neural network (RVFLNN) (Pao & Takefuji, 1992; Pao et al., 1994) is proposed to offer different learning approaches to eliminate long training time and provide generalized ability in function approximation. Its limitation is that it does not work well on large data remodeling. Broad learning systems (BLS) algorithm is proposed to handle large data sizes and model them in a dynamically stepwise manner. The BLS can also handle raw data in high dimensions directly to a neural network. The BLS takes raw features as inputs. The proposed hyper-enhanced learning system (HELS) is constructed based on the idea of BLS. Furthermore, the HELS takes extracted physiological features as inputs. These features can effectively and simultaneously generate enhanced feature nodes serving as weight to the originally extracted features, and are more informative for emotion state classification.

1.1 Emotion Work and Its Relation to Affective States The management of one’s personal feelings is defined as emotion work (Zapf et al., 2021). The two types of emotion work comprise evocation and suppression. Emotion evocation requires obtaining and brining up subjective feelings (Chu et al., 2017). Emotion suppression requires withholding or hiding certain feelings (Chiang et al., 2021; Schouten et al., 2020). These feelings may be positive or negative (Fresco et al., 2014). Emotion work is completed by a person, others upon the person, or the

Hyper-Enhanced Feature Learning System for Emotion Recognition

3

person upon others. This is done to achieve a certain level of belief satisfactory to oneself. Emotion work can be categorized into three specific types. They include cognitive, bodily, and expressive types. Cognitive relates to or involves images, bodily relates to, or belongs to physical changes of the body, and expressive relates to gestures. For example, a fearful person uses expressive emotion work to enhance their confidence and strength by lifting their shoulders high and putting on a smile. A stressed person may use bodily emotion work by breathing slower to lower stress levels. Emotion work allows us to regulate our feelings so that the emotions suit our current state of mind and are viewed as appropriate. Since we want to maintain a good relationship with our colleagues, we constantly are working on our feelings to suit the current situations we find ourselves in. This study on emotion recognition is to identify evocative and suppressive emotions by extracting relevant features from physiological signals and empowering the features through the proposed hyper- enhanced learning system. This will be useful in the development and design of systems and biomarkers for early clinical assessments and interventions. Emotions are the grassroots of the daily living of a human being and play a very crucial role in human cognition, namely rational decision-making, perception, human interactions, and human intelligence (Johnson et al., 2020; Luo et al., 2020). In recent decades, research on emotion has increased and contributed immensely to fields such as psychology, medicine, history, sociology of emotions, and computer science. These attempts to explain the origin, purpose, and other areas of emotion have promoted more intense studies on this topic though more is needed to be done to address key issues. Furthermore, emotions have been widely ignored especially in the field of HCI (Ren & Bao, 2020; Yun et al., 2021).

1.2 Qualitative Approach to Emotion Recognition Emotion recognition involves the process of identifying human affect. The recognition task can be summarized as an automatic classification of human emotions from images or video sequences. People largely vary in their accuracy at recognizing other people’s emotions. Current research in emotion recognition involves designing and using technologies to help understand and predict human state of mind. These technologies work best when multiple modalities are investigated. Most works on emotion recognition to date have been conducted on automating facial expression recognition (FER) from videos, vocal expressions from audio, written expressions from texts, and physiology measured from wearable biomarkers. These signals over the decades of scientific research have been tested and research has been conducted to develop and evaluate methods for automatic emotion classification. Also, due to the potential of HCI systems and the attention drawn to their importance, researchers around the world are making efforts aimed at finding better and more appropriate ways to uniformly build relationships between the way computers and humans interact. To build a system for HCI, knowledge of emotional states of subjects must be known. Again, interest in emotion recognition is

4

H. Perry Fordson et al.

traditionally from physical modalities, for example, facial expressions, body posture, speech, and text (Khalil et al., 2019; Liu et al., 2017; Reed et al., 2020; Araño et al., 2021; Batbaatar et al., 2019; Erenel et al., 2020; Alswaidan & Menai, 2020; Salmam et al., 2018; Nithya Roopa, 2019; Imani & Montazer, 2019; Wu et al., 2021; Khenkar & Jarraya, 2022; Rovetta et al., 2021). These traditional ways are still gaining attention today from scholars even though their reliability and effectiveness may be questioned because they can be deliberately altered. Emotions are time-varying affective phenomena that are elicited because of a stimulus. When we are introduced to a particular stimulus, how we respond is necessary to access our emotional intelligence (Hajncl & Vučenović, 2020; Issah, 2018; Drigas & Papoutsi, 2018). Physiological signals can assist in obtaining a better understanding of a person’s response and expression at a time of observation. These involve multiple recordings from both the central and the nervous systems. Emotional stimuli in short music/ video excerpts are introduced to elicit emotions (Song et al., 2019; Masood & Farooq, 2019; Baveye et al., 2018). They are shown to persons in an experimental setting and signals are taken from other parts of their body which enables detecting emotional traces instantaneously. Emotions require physiological responses that are modulated by the brain (Hagemann et al., 2003). The central nervous system comprises the brain and the spinal cord (Richardson & Li, 2021) while the autonomous nervous system (Fesas et al., 2021) is a control system that acts unconsciously and regulates bodily functions like heart rate, pupillary response, vagal nerves, and sexual arousal. Consequently, they can hardly be falsified. Physiological signals that are spontaneous and highly correlated with human emotion include electroencephalogram (EEG), electromyogram (EMG), electrocardiograph (ECG), galvanic skin response (GSR), heart rate (HR), temperature (T), functional magnetic resonance imaging (fMRI), blood volume pulse (BVP), positron emission topography (PET), and respiration (RES). This is evident in the work of (Shu et al., 2018) which conducted an extensive review on physiological parameters and their relation to human emotion. In our previous work (Fordson & Xu, 2018), we used broad learning system (BLS) (Chen & Liu, 2018) without the enhancement nodes as a classifier to train the physiological signals for emotion recognition. In our recent work (Fordson et al., 2021), we introduced the hyper-enhanced learning system in improving classification performance. This chapter aims to go deeper in explaining the approach in more detail. Affective computing is the study and development of systems and devices that can recognize, interpret, process, and simulate human affect (Li et al., 2020). This has emerged to convert technology and emotions in HCI (Zhang et al., 2020a; Wang et al., 2020a). The main design is to simulate emotional interactions between humans and computers by calculating and measuring the emotional states of a subject. A person’s most emotional state may become evident by subjective experiences (how a person is feeling), internal expressions (physiological and biological signals), and external expressions (audio-visual signals) (Mishra & Salankar, 2020). In other words, how a person reacts when confronted with an emotional stimulus.

Hyper-Enhanced Feature Learning System for Emotion Recognition

5

Self-assessment reports provide precious information, but generates issues with their validity, certification, and corroboration. For example, Mr. Karikari because of his status as a strong man in society may not tell his wife about his fears for turbulence during a flight. He may physically look okay but physiologically stunned. When asked if he is okay, he would give an answer to what he expects will make him keep his prestige as a strong man. In recent times, extensive literature on emotion recognition proposes evaluating different kinds of methodology, adapting techniques from multiple domains such as deep learning, machine learning, signal processing, and computer vision. Bayesian networks (BN) (Friedman et al., 1997), Gaussian mixture models (GMM) (Shahin et al., 2019), hidden Markov models (HMM) (Mao et al., 2019), and deep neural networks (DNN) (Yu & Sun, 2020) are few different techniques and methods employed to interpret emotions. The emotion recognition framework requires collecting data, removing noise, extracting relevant features, and classification. There are many applications of emotion recognition. These can be seen in health sectors (Martínez et al., 2020), marketing sectors (Thomas et al., 2016), defense sectors (Singla et al., 2020), social media (Nivetha et al., 2016), etc. Figure 1. shows a primal component of emotion recognition and some of its applications. EEG is based on measuring the electrical activities of the brain. Numerous studies have used EEG signals in their study to develop systems for emotion recognition. The research on EEG-based emotion recognition is booming as numerous commercial devices for easy data collection has become common in the market. ECG on the other hand is a graphical recording of the cardiac cycle derived by an electrocardiograph. Researchers normally stimulate this signal by introducing

Fig. 1 Fundamental modules for emotion recognition: The process includes collection of data, preprocessing, feature extraction and selection, then classification. (Source of photo top left: Center for Human Body Data Science, School of Electronic and Information Engineering, South China University of Technology)

6

H. Perry Fordson et al.

stimuli to subjects and collecting their signals. Where there are anomalies, systems and algorithms can be designed to address subjects’ emotion by even solving issues of stress and depression. EMG signals can also be used to treat depression as it is important in generating HCI systems. They require the usage of instruments to record electrical frequencies associated with skeletal muscle activities. Respiration is defined as the process by which organisms obtain energy from organic molecules. The process is quite complicated but necessary. It involves processes that take place in tissues and cells and releasement of energy and production of carbon dioxide by the blood and transmitted into the lungs. These processes occur in all humans and can affect our emotional states. Frameworks are designed where emotional features are extracted from RES to classify emotions. GSR involves calculating or measuring the electrical activities arising from sweat glands. It can also be well talked about as the change in the electrical properties of the human skin in direct response to stress and/or anxiety. It is measured by either recording the electrical resistance of the skin or by measuring weak currents generated by the body. In this chapter, we would give sufficient technical details on the hyper-enhanced learning system, its construction and usage to detect human affective states into valence-arousal dimension using EEG, GSR, EMG, ECG, and RES signals. Prior works have either focused on single modalities or have not put emphasis on the strength and individual contributions the extracted features make in improving final classification results. Yang et al. employed a deep neural network to extract features and recognize emotions from EEG signals (Yang et al., 2019). They designed a model that comprises many modules in recognizing valence and arousal emotions from EEG inputs. Jerritta et al. researched the applications of higher order statistics (HOS) methods to get features from facial electromyogram (fEMG) data to classify emotions. In addition, they preprocessed their data using moving average filter and also extracted traditional statistical features (SF) in order to analyze the efficacy of HOS features over SF using principal component analysis (Jerritta et al., 2014). Goshvarpour et al. examined the reliability of using matching pursuit (MP) algorithms to recognize emotions (Goshvarpour et al., 2017). Their work further extracted features based on the MP coefficients and applied three dimensionality reduction techniques including principal component analysis (PCA), linear discriminant analysis (LDA), and kernel PCA. The reduced features are then fed into a probabilistic neural network (PNN) in participant-independent and participant-dependent validation mode. While these approaches achieved reliable results in their own respect, they fail to utilize multimodal signals in validating their approach. Further, the features used in these approaches were also not well investigated and enhanced to make sure their individual contributions to emotion recognition are further improved. We reported our results in three ways. Two classes which include positive and negative for valence and high and low for arousal, three classes using the self- reported feedback values which range from 1 to 9 in valence and arousal axis, and finally, we defined another three classes based on coded emotional keywords. We present our results based on single signal contribution and combined all signals for multi-modal-based classification. We employed the Database for Emotion Analysis using Physiological signals (DEAP) (Koelstra et al., 2012) and a Multimodal

Hyper-Enhanced Feature Learning System for Emotion Recognition

7

Database for Affect Recognition and Implicit Tagging (MAHNOB-HCI) (Soleymani et al., 2012). We applied our enhanced feature learning systems after feature extraction and during feature selection stages and use artificial neural networks (ANN) for classification. The rest of this chapter is structured as follows. Section 2 presents related background. Section 3 presents the proposed method and strides needed for accurate classification results. Experimental procedures including preprocessing, feature extraction, feature enhancement techniques, and ANN classifier are introduced in Sect. 4. Section 5 reports and discusses the obtained results, and we conclude the chapter in Sect. 6.

2 Related Background Emotion recognition based on features is critical for achieving robust performance. It is very difficult to develop features based on emotionally induced signals due to the ambiguity of the ground truth. Several feature learning algorithms have been applied in efforts to learning task-specific features using labeled and unlabeled data. These algorithms include K-means clustering (Sinaga & Yang, 2020), the sparse auto-encoder (Zhang et al., 2021), sparse restricted Boltzmann machines (Wang et al., 2020b), and other promising deep learning techniques. Numerous works have been conducted with respect to feature learning for emotion recognition.

2.1 Literature Review and Applications of Machine/Deep Learning to Emotion Recognition Deep learning techniques in effort to obtain deeper features in addition to preprocessing traditional feature extraction of labeled data are required for accurate emotion recognition task. Time domain, frequency domain, and time-frequency domain are features normally extracted from emotion recognition (Topic & Russo, 2021; Shukla et al., 2019; Yu et al., 2019). Also, works on affect and emotion recognition normally combine these features to have reliable and technically sound and robust results. The algorithms used in the classification process include support vector machines (SVM) (Wei & Jia, 2016), random forest (RF) (Wang et al., 2020c), linear discriminant analysis (LDA) (Chen et al., 2019a), k-nearest neighbor (Xie & Xue, 2021), Bayesian classifier (Zhang et al., 2020b), etc. Recently, more and more physiological data are made publicly available for emotion recognition. They include a large number of multimodal signals. In emotion classification, researchers tend to use deep neural networks more and more recently for feature extraction (Dara & Tumma, 2018). This is because deep learning can automatically and effectively extract features. One importance of deep neural features to traditional features is that deep neural features try to learn more high-level features from data in an

8

H. Perry Fordson et al.

incremental manner. This eliminates the need for handcraft and hard-core feature extraction. Regarding physiological feature extraction for emotion recognition, deep neural networks can be used to learn more informative and deep features from physiological signals that can more accurately discriminate between different sets of complex emotions. The deep learning techniques used in emotion recognition include convolutional neural network (CNN) (Kollias & Zafeiriou, 2021), deep belief networks (DBN) (Xia & Liu, 2016), long short-term memory (LSTM) (Ahmad et al., 2019), recurrent neural network (RNN) (Huang et al., 2021), multilayer perceptron neural network (MLPNN) (Özerdem & Polat, 2017), and artificial neural network (ANN) (Subasi et al., 2021). The work of (Topic & Russo, 2021) proposes novel models for emotion recognition that is supported on the creation of feature maps based on topographic (TOPO-FM) and holographic (HOLO-FM) representation of signal properties. They utilized deep learning techniques as automatic feature extractor methods on the feature maps. They also combined all features extracted for classification processes to recognize different types of emotions. Their results prove that deep feature learning is vital for improving emotion recognition results even on multiple and different size datasets. For speech emotion recognition, the accuracy is determined by the feature selection methods. Careful selection of parameters such as learning rate, number of hidden layers, epoch size, and classifier selection is important in reducing error in computation (Jermsittiparsert et al., 2020). The broad learning system (BLS) (Chen & Liu, 2018) introduced a novel alternative to learning deep features. Because of the large number of parameters connecting each node in between layers, deep structures are mostly time-consuming and consume a lot of energy during the training process. It also requires a new training process altogether if the deep structures cannot initially model the system efficiently. BLS reduces this constrain and its algorithms are faster in remodeling without the need for retraining. This is evident in the paper (Chen et al., 2019b) where authors compared BLS with other existing learning algorithms on the performance of regression models. Because of the efficiency in extracting features and its high computational efficiency, the BLS is further extended in (Zhao et al., 2020) to construct an objective function capable of effectively solving ridge regression and obtain promising outputs. In this present study, we further extend the algorithms of BLS to learn deep features from physiological signals. We proposed a hyper-enhanced learning system where physiologically extracted features are taken as inputs. These physiologically feature inputs are then mapped and corresponded to enhancement nodes accordingly in an attempt to generate a more effective and richer output node named enhanced feature nodes. These enhanced feature nodes are then taken as input into an artificial neural network to classify emotions in valence and arousal dimension.

Hyper-Enhanced Feature Learning System for Emotion Recognition

9

3 Databases and Emotion State Modeling Data is vital for scientific research. In addition, freely available data helps researchers to test their ideas, validate the work of others and improve upon them. Authors and respondents who provide data must also be credited and protected. There are several established databases for affective state classification. Nevertheless, only few established databases are tested to be effective. In this chapter, we utilized two publicly available databases, DEAP (Koelstra et al., 2012) and MAHNOB-HCI (Soleymani et al., 2012). The MAHNOB-HCI database is recorded in response to affective stimuli with the common goal of recognizing emotions and implicit tagging. It consists of 30 subjects (13 male, 17 female). The subjects are aged between 19 and 40 years (mean age 26.06). Unfortunately, 3 subjects’ data were lost due to technical errors, thus, 27 subjects (11 male, 16 female) data were considered for processing. The subjects watched 20 emotional movie videos and self-reported their felt emotions using arousal, valence, dominance, and predictability in addition to emotional keywords. The experiment is set up in a multimodal arrangement for synchronized recording of face videos, audio signals, eye gaze, and central/peripheral nervous system physiological signals. The dataset comprises 32-channel EEG signals. The database also includes peripheral physiological signals like galvanic skin response (GSR), electrocardiogram (ECG), respiration amplitude, and skin temperature (ST). In a second experiment, subjects were presented with short videos and images as stimuli. The videos and images come with correct and incorrect tags and subjects are to assess them with agreements or disagreements. Two modalities were employed to predict the trueness of displayed tags. Facial expression is captured by a camera and eye gaze is captured by an eye gaze tracker. Participants’ facial videos and bodily responses were segmented and stored. The DEAP database is recorded for the analysis of human affective states. Thirty- two subjects participated in this experiment (16 male, 16 female). They are aged between 19 and 37 years (mean age 26.9). The subjects watched as stimuli a 40 one-minute-long excerpts of music video while their physiological signals are being collected. After each trial, participants rated each music video in terms of their level of arousal, valence, dominance, liking, and familiarity. The rating values comprise a continuous scale of 1–9 for arousal, valence, dominance, and liking, and a discrete scale of 15 for familiarity. The multimodal dataset comprises of EEG and peripheral physiological signals. The EEG signals were recorded with 32 channels. Also, EEG was recorded at a sampling rate of 512 Hz and down-sampled to 128 Hz using 32 active silver chloride (AgCl) electrodes placed in accordance with the international 10–20 system. In addition, galvanic skin response (GSR), respiration amplitude, skin temperature, electrocardiogram, blood volume by plethysmograph, electromyograms of zygomaticus and trapezius muscles, and electrooculogram (EOG) peripheral signals were also recorded. Out of 32 participants, frontal face videos of 22 subjects were also recorded for researchers interested in facial emotion analysis. Table 1 sums up the important technical specification of the two databases. We, in this chapter, are only focused on the EEG, GSR, RES, and EMG data.

10

H. Perry Fordson et al.

Table 1 Database summary of DEAP and MAHNOB-HCI Content No. of participants Signals used No. of videos Stimuli selection method Sampling rate Self-report Rating scale

DEAP 32, 16 male and 16 female EEG, GSR, RES, and EMG 40 Subset of online annotated music videos 256 Hz to 128 Hz Emotional keywords, arousal, and valence Discrete scale of 1–9

MAHNOB-HCI 27, 11 male and 16 female EEG, GSR, RES, and ECG 20 Subset of online annotated movie videos 1024 Hz to 256 Hz Emotional keywords, arousal, and valence Discrete scale of 1–9

Table 2 Two defined classes in valence-arousal model Assortment Valence Negative Positive

Arousal High Low

Rating (r) 1 ≤ 4.5 4.5 ≤ r

Table 3 Three defined classes in valence-arousal model Assortment Valence Unpleasant Neutral Pleasant

Arousal Calm Average Excited

Rating (r) 1 ≤ r ≤ 3 4 ≤ r ≤ 6 7 ≤ r ≤ 9

3.1 Valence-Arousal Emotional State Modeling Emotions should be well defined so that their interpretation can be effective in emotion recognition systems across the globe. The most common ways to model emotions in emotion recognition research are through discrete and multi-dimensional approaches (Liu et al., 2018; Yao et al., 2020; Mano et al., 2019). Because emotions can be interpreted differently in different cultures, it is important to identify ways by which relationships can be built between basic emotions. The distinct classes using 1–9 discrete scales within the valence-arousal dimension is presented. This is necessary to find correlation among different discrete emotions which correspond to higher levels of a particular emotion. Tables 2 and 3 show modeling of the two and three defined classes. The participants reported their feelings after watching the affective music video clips. Firstly, for two classes, we assigned “High” and “Low” for arousal and “Positive” and “Negative” for valence. Secondly, the three classes modeling was assigned “Calm”, “Average”, and “Activated” for arousal and “Unpleasant”, “Neutral”, and “Pleasant” for valence. Finally, we defined valence-arousal using 6 affective coded keywords.

Hyper-Enhanced Feature Learning System for Emotion Recognition

11

Table 4 Classes in valence-arousal model using emotional keywords Dimension Valence

Arousal

Affective classes Unpleasant Neutral Pleasant Calm Average Activated

Emotion tagging Angry, sad Neutral, surprise Happy, amuse Sad, neutral Happy amuse Surprise, angry

These include (1) Happy, (2) Amuse, (3) Sad, (4) Neutral, (5) Surprise, and (6) Angry. This is shown in Table 4.

4 Hyper-Enhanced Learning System Methodology In this section, we introduce the proposed hyper-enhanced learning system method for emotion recognition. As shown in Fig. 2, the system retains but improves the structure of the BLS by replacing feature nodes of BLS with groups of physiologically extracted data from EEG, EMG, GSR, ECG, and respiration to form a hybrid neuro-multimodal network. Also, it should be known that the BLS takes data directly whereas the hyper-enhanced learning system takes the extracted features as inputs to reduce the structure complexity, and this helps preserve computational memory. Let’s assume our input feature X, projected using ϕi(XWei + βei), is the ith mapped physiological features, Fi, where Wei are randomly generated weights, βei are bias, and ϕ is linear transformation. The first i group of mapped physiological features is concatenated by denoting Fi ≡ [F1, F2, …, Fi]. Similarly, enhancement feature nodes for the jth group, ζj(FiWhj + βhj) is denoted Ej. The first j group of enhanced nodes are concatenated by denoting Ej ≡ [E1, E2, …, EJ]. We then applied linear inverse problems (Goldstein et al., 2014) to fine tune the initial weight, Wei so as to obtain richer features. Therefore, assuming an input signal X, with N samples each with M dimensions, the output is Y ∈ ℝN × C. For n physiological feature, each mapping randomly yields k nodes which can be represented in the form:

Fi = φ ( XWei + Bei ) , i = 1, 2,…, n

(1)

We denote feature nodes as Fn ≡ [F1, …, Fn], and denote the nth group of the enhancement node as:

(

Em ≡ ζ F i Whm + β hm

)

(2)

Hence, the hyper-enhanced structure, Y = [F1, F2, …, Fn|ζ(FiWh1 + βh1), …, ζj(FiWhm + βhm]Wm, i.e., Y = [F1, F2, …, Fn| E1, E2, …, Em]Wm is represented as:

12

H. Perry Fordson et al.

Fig. 2 Construction of hyper-enhanced learning system: The input feature X are physiologically extracted features from EEG, EMG, GSR, RES, and ECG data. F stands for the features nodes. F1, F2, …, Fn are for example features extracted from EEG data. E1, E2, …, En in the same way are enhancement nodes corresponding to the input features This process is synchronously done. The enhancement nodes serving as weight to the feature nodes in turn produce the output enhanced feature Y. The output-enhanced nodes are more informative and are finally used as input features to artificial neural network for final valence-arousal emotion classification

Y =  F n |E m  W m

(3)

where Fi, = 1, …, n,are the physiological mapped features gotten from Eq. (1). Emotion recognition systems have respective strides that need to be carefully considered to obtain accurate classification results as detailed below. The block diagram of our study is presented in Fig. 3.

5 Experimentation 5.1 Data Preprocessing and Feature Learning To obtain robust results in emotion recognition task, data preprocessing, feature extraction and selection, and classification steps are required to be given special attention. Firstly, for DEAP, data were down-sampled at 128 Hz. For EEG data,

Hyper-Enhanced Feature Learning System for Emotion Recognition

13

Fig. 3 Overall framework of the proposed method using the hyper-enhanced learning system approach to multimodal emotion recognition: The framework takes physiological features of each modality of EEG, EMG, GSR, ECG, and RES as mapped inputs. The physiological features are then trained in the hyper-enhanced learning neural network which simultaneously has enhancement nodes. The output of each training in the hyper-enhanced learning system is a richer and more enhanced feature nodes which are combined to form combined feature nodes. The combined feature nodes are then processed through a fully connected layer for valence arousal emotion recognition. (Source of photo top left: Roots Emoji. Source of photo top right: Neurolite Advanced Medical Solutions)

electrooculogram (EOG) noise is removed. Then a bandpass frequency filter from 4.0 to 45.0 Hz was applied after which data is averaged to the common reference. EEG data are then segmented into 60-second trials and a 3-second pre-trial baseline is removed. For the EMG, GSR, and RES data, the data is also down-sampled to 128 Hz. Also, the data are segmented into 60 s trials, and we remove a second pre- trial baseline. Following the settings of (Zhang et al., 2020a; Soleymani et al., 2017), different features for each physiological signal were extracted. For MAHNOB-HCI data, before the introduction of affective stimuli, 30 seconds recording is contained in the beginning of each trial and at the end of each trial. This is eliminated so that we can have all the relevant physiological information during the experiment. Artifacts were then removed by Butterworth filters (Mahata et al., 2021) for EEG, GSR, RES, and ECG signals. The data is finally down-sampled to 256 Hz. The features extracted are shown in Table 5. After feature extraction, we introduce the hyper-enhanced

14

H. Perry Fordson et al.

Table 5 Feature extraction of each modality Modality DEAP EEG GSR RES EMG MAHNOB- HCI EEG GSR RES ECG

Extracted features Power spectral density in different bands Number of peaks, amplitude of peaks, rise time, and statistical moments Main frequency, power spectral density, and statistical moments Power and statistical moments

Power spectral density in different bands Number of peaks, amplitude of peaks, rise time, and statistical moments Main frequency, power spectral density, and statistical moments Inter-beats interval, multiscale entropy, tachogram power, and power spectral density

learning system to take inputs from each feature of each modality to produce a more informative feature for classification. In the final execution, this study employed a three-layer artificial neural network to model each modality for single signal classification, and then combined all signals for multimodal classification. We set the dimension of the hidden layer to 16. We choose RELU activation function and used a dropout rate of 0.5 for all layers to avoid overfitting. We used binary cross-entropy loss as a criterion and Adam optimizer with the 0.001 learning rate. The experiment was conducted in a subject- independent setting. We used one subject’s data for testing and the rest of the remaining subject’s data for training. We repeated results for all subjects and averaged the results.

6 Results and Discussion This section summarizes and assesses the obtained results for emotion classification in valence-arousal dimension. Our classification of emotional states are presented in two and three defined classes using 1–9 discrete self-rating scale and 6 emotional keywords to emphasize the three defined areas in valence arousal dimension. The results are reported in four forms. Firstly, a single classification report on each modality is given, then we combine two modalities and report the results. We also report the results as a combination of three modalities and finally, we report the results of the combinations of all four modalities used in the study. We do this to demonstrate the effectiveness of using multimodal physiological signals for building an accurate emotion recognition system. Most emotion recognition systems are dependent on single physiological signal classification as the availability of multimodal signals is time-consuming and expensive (Song et al., 2020; Zheng, 2017; Zhong et al., 2020; Shangguan et al., 2014; Yang et al., 2014; Hsu et al., 2020; Zhang et al., 2017). They are usually based on EEG, ECG, EMG, RES, and GSR

Hyper-Enhanced Feature Learning System for Emotion Recognition

15

signals among others. These signals are quite significant, popular, and effective in detecting emotions. We must note that majority of the works focused on single modality classification follow the traditional emotion recognition system. Traditional emotion recognition procedure means the need for data collection from scratch, preprocessing of the collected data, feature extraction and classification. None of the works focused on enhancing features with a broad neural system where weights are automatically generated. Our work uses this strategy. After feature extraction, features are fed into an enhanced learning system where the extracted features are further enhanced and have a more predictive power to better detect patterns for emotion recognition. For single modality classification, the best results obtained are from two classes of classification. For evaluation on the DEAP data, the best results in the valence scale are 73.3% and in the arousal scale, the best results are 75.6% obtained from GSR physiological signal. For evaluation on the MAHNOB-HCI data, the best results in the valence scale are 73.1% and in the arousal scale, the best results are 74.7% obtained from ECG physiological signal.

6.1 Multimodal Classification Because different physiological signals are of different origin, it is possible that they will define different functions of the body. Combining these multiple modalities and mining all relevant features will be useful in accurately detecting emotional states of people. The work of (Bota et al., 2020) evaluated valence arousal emotions in a supervised manner using ECG, electrodermal activity (EDA), RES, or blood volume pulse (BVP). The results obtained indicate that using multimodal physiological data can more reliably be used to classify emotions. Emotional features from ECG and GSR signals have also been extracted in (Goshvarpour et al., 2017) to examine the effectiveness of machine pursuit (MP) algorithms in emotion detection. Studies also utilize signals from facial movements in addition to EEG and GSR (Cimtay et al., 2020) to learn the different representation abilities of their feature in correctly analyzing emotion recognition. Because of these, our study reports results from the combination of numerous physiological signals in an attempt to improve classification accuracy. Our results include the combination of EEG, EMG, GSR, and RES signals from the DEAP dataset and also the combination of EEG, GSR, RES, and ECG from the MAHNOB-HCI dataset.

16

H. Perry Fordson et al.

Table 6 DEAP database classification accuracy (two classes) Physiological signal EEG GSR RES EMG EMG + EEG GSR + EEG RES + EEG GSR + EMG RES + EMG RES + GSR GSR + EMG + EEG RES + EMG + EEG RES + GSR + EEG RES + GSR + EMG EEG + GSR + RES + EMG

Accuracy results % Valence 70.1 73.3 72.8 69.8 69.2 71.3 68.7 68.8 65.9 69.4 71.3 70.8 71.0 71.1 78.6

Arousal 72.2 75.6 74.5 72.4 71.1 72.2 70.2 70.3 70.2 70.9 71.4 69.9 71.2 68.9 79.9

Table 7 DEAP database classification accuracy (three classes) Physiological signal EEG GSR RES EMG EMG + EEG GSR + EEG RES + EEG GSR + EMG RES + EMG RES + GSR GSR + EMG + EEG RES + EMG + EEG RES + GSR + EEG RES + GSR + EMG EEG + GSR + RES + EMG

Accuracy results % Valence 67.4 67.3 68.4 66.6 66.2 67.4 65.2 67.5 67.9 68.4 68.4 67.1 68.2 69.1 69.7

Arousal 69.6 70.2 69.7 68.0 68.5 70.1 67.4 66.4 65.3 69.1 70.1 69.8 70.0 69.9 71.9

Hyper-Enhanced Feature Learning System for Emotion Recognition

17

Table 8 DEAP database classification accuracy (emotional keywords) Physiological signal EEG GSR RES EMG EMG + EEG GSR + EEG RES + EEG GSR + EMG RES + EMG RES + GSR GSR + EMG + EEG RES + EMG + EEG RES + GSR + EEG RES + GSR + EMG EEG + GSR + RES + EMG

Accuracy results % Valence 68.1 69.2 69.7 68.5 67.3 68.6 66.2 67.9 68.4 69.2 69.2 68.9 69.0 70.2 72.2

Arousal 70.6 72.9 71.1 70.8 69.1 70.6 68.1 68.4 67.8 66.6 71.3 70.5 71.1 71.6 75.4

6.2 Results of the Hybrid Neuro-single and Neuro-multimodal Network Classification Tables 6, 7, and 8 present the classification results for two defined classes, three defined classes, and the emotional keywords in the valence-arousal dimension DEAP database. In the table, it is observed that single signal classification results produce results consistent with state-of-the-art methods. In fact, GSR results are higher compared to other modalities. When we compare the results of single modalities to the combination of two-modality classification results, we observe a decrease in classification accuracy. We also compare the results of physiological signals in a combination of three to that of single modality and dual modality results. We observe that the results of the three-modality classification are quite higher than that of the two-modality classification. Finally, we combined all four signals together. Results obtained from combining all four physiological signals together are superior that the previous single, dual, and three-modality combination. Previous studies have pointed out the difficulties in training multimodal signals to training single modality signals (Wang et al., 2020d). However, because of our approach to first extracting features and then feeding them into our hyper-enhanced learning system, our system reduces computational capacity and hence solving the issues of model overfitting.

18

H. Perry Fordson et al.

Table 9 MAHNOB-HCI database classification accuracy (two classes) Physiological signal EEG GSR RES ECG ECG + EEG GSR + EEG RES + EEG GSR + ECG RES + ECG RES + GSR GSR + EMG + EEG RES + ECG + EEG RES + GSR + EEG RES + GSR + ECG EEG + GSR + RES + ECG

Accuracy results % Valence 67.4 72.6 70.2 73.1 67.9 70.0 72.4 73.7 72.9 72.4 68.9 70.3 69.4 68.8 76.2

Arousal 70.8 74.2 72.5 74.7 70.1 73.9 74.3 75.1 75.0 74.6 70.3 72.1 70.3 69.9 78.8

Table 10 MAHNOB-HCI database classification accuracy (three classes) Physiological signal EEG GSR RES ECG ECG + EEG GSR + EEG RES + EEG GSR + ECG RES + ECG RES + GSR GSR + ECG + EEG RES + ECG + EEG RES + GSR + EEG RES + GSR + ECG EEG + GSR + RES + ECG

Accuracy results % Valence 67.5 66.1 67.7 68.5 65.3 64.4 65.8 64.9 66.7 67.5 69.3 68.3 67.5 66.9 68.8

Arousal 69.8 67.9 69.9 70.1 67.6 66.8 67.5 65.9 68.2 69.5 70.1 70.6 69.8 68.6 70.7

Hyper-Enhanced Feature Learning System for Emotion Recognition

19

Table 11 MAHNOB-HCI database classification accuracy (emotional keywords) Physiological signal EEG GSR RES ECG ECG + EEG GSR + EEG RES + EEG GSR + ECG RES + ECG RES + GSR GSR + ECG + EEG RES + ECG + EEG RES + GSR + EEG RES + GSR + ECG EEG + GSR + RES + ECG

Accuracy results % Valence 69.1 70.9 69.4 69.1 66.8 67.4 67.7 66.8 65.6 64.9 68.5 69.7 70.0 70.6 71.9

Arousal 69.9 71.1 70.2 71.9 70.3 68.8 69.2 68.9 66.5 67.8 70.0 70.4 71.8 72.5 74.2

Tables 9, 10, and 11 in the same way report results for MAHNOB-HCI in two defined classes, three defined classes, and results for the 6 emotional keywords. The tables (9, 10, 11) reports the best single modality results for ECG signals especially in the arousal scale. This shows that the heart rate variability (Shaffer & Ginsberg, 2017) calculated from ECG is an important feature in emotion recognition task. Also, the accuracy of the dual combination of features produces lesser classification results compared to the single modality results. The three-combination results from the MANHOB-HCI data also produce higher accuracy results compared to the dual modality classification results. Subsequently, when all signals are fused together and fed into our hyper-enhanced learning system, the obtained results are superior compared to the unimodal, the dual, and three-modality results.

7 Conclusions The rationale behind our proposed multi-modal fusion is such that GSR for instance is well known to correlate well with arousal scale but poorly with valence. Also, HRV gotten from ECG is well known for evaluations of the autonomous nervous system (ANS) and is effective in detecting human emotion (Choi et al., 2017). Thus, using different multi-modal signal combinations separately for arousal and valence may improve the classification accuracy. As described in the paper, we initially classified each physiological signal to obtain a single modal classification result and find which signal best classifies human emotion. Then, we fused all signals to obtain a multimodal fusion. After, we compared our results to related works.

20

H. Perry Fordson et al.

Fig. 4 Feature learning influence by enhancement nodes parameter generation, N on classification performance Table 12 Experimental comparison with related works in two and three classes (D- DEAP, M-MAHNOB-HCI) Dimension

Valence Arousal

Ours (D) 78.6 79.9

Two classes

Three classes

Ours (M) 76.2 78.8

1–9 values Ours Ours (D) (M) 69.7 68.8 71.9 70.7

Koelstra et al. (2012) 57.0 52.3

Soleymani et al. (2012) 69.6 70.1

6 coded keywords Ours Ours (D) (M) 72.2 71.9 75.4 74.2

Table 13 Comparison with related work Works Koelstra et al. (2012) Soleymani et al. (2017) Soleymani et al. (2012) (Zhang et al. (2020a) Ours (MAHNOB-HCI Ours (DEAP)

Valence % 62.7 76.1 57.0 69.9 76.2 78.6

Arousal % 57.0 67.7 52.3 70.1 78.8 79.9

The classification performance of both DEAP and MAHNOB-HCI datasets on our hyper-enhanced feature system is influenced by the number of enhancement nodes. For example, for DEAP data, performance is notably steady with N in a broad range as it can be clearly seen in Fig. 4. In the arousal dimension, when N ∈ [1, 4], the performance improves as there is an increase in the value of N, the enhancement feature nodes. The observation is because performance increases as N

Hyper-Enhanced Feature Learning System for Emotion Recognition

21

increases. When N is small, the high relationship between features cannot be mined. Also, the figure shows a decline in performance when N is greater than 5. Then there is a stability in performance. In contrast, the valence dimension increases in classification performance when N ∈ (Ekman, 1992; Hasnul et al., 2021). Like the arousal dimension, performance increases as there is an increase in enhancement nodes. The figure also shows that as N is greater than 8, the classification performance declines in the valence dimension. In Tables 12 and 13, we compared our obtained results with recently published works in both two and three classes for emotion recognition task in the valence- arousal dimension. Our results prove to be promising and more robust. This is indicative of the fact that the proposed enhancement learning system generates enhanced feature nodes that are more significantly informative than those chosen in earlier studies. Also, the results obtained indicate that, it is easier to classify emotions into two classes than into three classes or emotional keywords. Finally, this chapter presents the hyper-enhanced learning system, the proposed feature learning approach to multimodal emotion recognition using physiological signals. Given physiological signal information, we preprocess data by removing artifacts and noise to make it smooth. After, several features are extracted. These features are then mapped as inputs to construct an enhanced hybrid neuro- multimodal learning network that automatically updates weights with enhancement nodes to generate more informative feature nodes. The model then learns complex relationships within signals and explores the importance of different modalities through a fully connected neural network. We used the DEAP and MAHNOB-HCI datasets to evaluate our approach and compared our results with other related works. We show the supremacy of the proposed method by establishing two and three class modeling in the valence-arousal dimension using discrete rating values from 1 to 9. We also use 6 emotionally coded keywords to define the three areas in the valence- arousal dimension. Results were reported for single, dual, and multimodal signals. Fusions of multimodal signals training using hyper-enhanced learning system prove to be more robust than using a single modality for emotion recognition.

References Abdullah, S. M. S. A., Ameen, S. Y. A., Sadeeq, M. A. M., & Zeebaree, S. (2021). Multimodal emotion recognition using deep learning. Journal of Applied Science and Technology Trends, 2(02), 52–58. https://doi.org/10.38094/jastt20291 Ahmad, J., Farman, H., & Jan, Z. (2019). Deep learning methods and applications. In Springer briefs in computer science, pp. 31–42. Springer. Alswaidan, N., & Menai, M. E. B. (2020). A survey of state-of-the-art approaches for emotion recognition in text. Knowledge and Information Systems, 62(8), 2937–2987. https://doi. org/10.1007/s10115-020-01449-0 Anderson, K., & McOwan, P. W. (2006). A real-time automated system for the recognition of human facial expressions. IEEE Transactions on Systems, Man, and Cybernetics, Part B, Cybernetics, 36(1), 96–105. https://doi.org/10.1109/TSMCB.2005.854502

22

H. Perry Fordson et al.

Araño, K. A., Gloor, P., Orsenigo, C., & Vercellis, C. (2021). When old meets new: Emotion recognition from speech signals. Cognitive Computation, 13(3), 771–783. https://doi.org/10.1007/ s12559-021-09865-2 Batbaatar, E., Li, M., & Ryu, K. H. (2019). Semantic-emotion neural network for emotion recognition from text. IEEE Access, 7, 111866–111878. https://doi.org/10.1109/ACCESS.2019.2934529 Baveye, Y., Chamaret, C., Dellandrea, E., & Chen, L. (2018). Affective video content analysis: A multidisciplinary insight. IEEE Transactions on Affective Computing, 9(4), 396–409. https:// doi.org/10.1109/TAFFC.2017.2661284 Bota, P., Wang, C., Fred, A., & Silva, H. (2020). Emotion assessment using feature fusion and decision fusion classification based on physiological data: Are we there yet? Sensors (Switzerland), 20(17), 4723. https://doi.org/10.3390/s20174723 Buhrmester, V., Münch, D., & Arens, M. (2021). Analysis of explainers of black box deep neural networks for computer vision: A survey. Machine Learning and Knowledge Extraction, 3(4), 966–989. https://doi.org/10.3390/make3040048 Chen, C. L. P., & Liu, Z. (2018). Broad learning system: An effective and efficient incremental learning system without the need for deep architecture. IEEE Transactions on Neural Networks and Learning Systems, 29(1), 10–24. https://doi.org/10.1109/TNNLS.2017.2716952 Chen, M., Xu, Z., Weinberger, K. Q., & Sha, F. (2012). Marginalized denoising autoencoders for domain adaptation. In Proceedings of the 29th International Conference on Machine Learning, ICML 2012, vol. 1, pp. 767–774. Chen, C. L. P., Zhang, C. Y., Chen, L., & Gan, M. (2015). Fuzzy restricted Boltzmann machine for the enhancement of deep learning. IEEE Transactions on Fuzzy Systems, 23(6), 2163–2173. https://doi.org/10.1109/TFUZZ.2015.2406889 Chen, D. W., et al. (2019a). A feature extraction method based on differential entropy and linear discriminant analysis for emotion recognition. Sensors (Switzerland), 19(7), 1631. https://doi. org/10.3390/s19071631 Chen, C. L. P., Liu, Z., & Feng, S. (2019b). Universal approximation capability of broad learning system and its structural variations. IEEE Transactions on Neural Networks and Learning Systems, 30(4), 1191–1204. https://doi.org/10.1109/TNNLS.2018.2866622 Chiang, J. T. J., Chen, X. P., Liu, H., Akutsu, S., & Wang, Z. (2021). We have emotions but can’t show them! Authoritarian leadership, emotion suppression climate, and team performance. Human Relations, 74(7), 1082–1111. https://doi.org/10.1177/0018726720908649 Choi, K. H., Kim, J., Kwon, O. S., Kim, M. J., Ryu, Y. H., & Park, J. E. (2017). Is heart rate variability (HRV) an adequate tool for evaluating human emotions? – A focus on the use of the International Affective Picture System (IAPS). Psychiatry Research, 251, 192–196. https://doi. org/10.1016/j.psychres.2017.02.025 Chu, W. L., Huang, M. W., Jian, B. L., & Cheng, K. S. (2017). Analysis of EEG entropy during visual evocation of emotion in schizophrenia. Annals of General Psychiatry, 16(1), 1–9. https:// doi.org/10.1186/s12991-017-0157-z Cimtay, Y., Ekmekcioglu, E., & Caglar-Ozhan, S. (2020). Cross-subject multimodal emotion recognition based on hybrid fusion. IEEE Access, 8, 168865–168878. https://doi.org/10.1109/ ACCESS.2020.3023871 Corive, R., et al. (2001). Emotion recognition in human-computer interaction. IEEE Signal Processing Magazine, 18(1), 32–80. https://doi.org/10.1109/79.911197 Dara, S., & Tumma, P. (2018, September). Feature extraction by using deep learning: A survey. In Proceedings of the 2nd International Conference on Electronics, Communication and Aerospace Technology, ICECA 2018, pp. 1795–1801. https://doi.org/10.1109/ICECA.2018.8474912 Drigas, A. S., & Papoutsi, C. (2018). A new layered model on emotional intelligence. Behavioral Sciences (Basel), 8(5), 1–17. https://doi.org/10.3390/bs8050045 Ekman, P. (1992). An argument for basic emotions. Cognition & Emotion, 6(3–4), 169–200. https://doi.org/10.1080/02699939208411068 Erenel, Z., Adegboye, O. R., & Kusetogullari, H. (2020). A new feature selection scheme for emotion recognition from text. Applied Sciences, 10(15), 1–13. https://doi.org/10.3390/ APP10155351

Hyper-Enhanced Feature Learning System for Emotion Recognition

23

Feng, S., & Chen, C. L. P. (2018). A fuzzy restricted Boltzmann machine: Novel learning algorithms based on the crisp possibilistic mean value of fuzzy numbers. IEEE Transactions on Fuzzy Systems, 26(1), 117–130. https://doi.org/10.1109/TFUZZ.2016.2639064 Fesas, A., et al. (2021). Cardiac autonomic nervous system and ventricular arrhythmias: The role of radionuclide molecular imaging. Diagnostics, 11(7), MDPI, 1273. https://doi.org/10.3390/ diagnostics11071273 Fordson, P., & Xu, X. (2018). Research on emotion recognition and feature learning method based on Multimodal human data. Dissertation, South China University of Technology. https://cdmd. cnki.com.cn/Article/CDMD-10561-10118875306.htm, pp. 1–53. Fordson, H. P., Xing, X., Guo, K., & Xu, X. (2021). A feature learning approach based on multimodal human body data for emotion recognition. In 2021 IEEE Signal Processing in Medicine and Biology Symposium, SPMB 2021 - Proceedings, pp. 1–6. https://doi.org/10.1109/ SPMB52430.2021.9672303 Fresco, D. M., Mennin, D. S., Moore, M. T., Heimberg, R. G., & Hambrick, J. (2014). Changes in explanatory flexibility among individuals with generalized anxiety disorder in an emotion evocation challenge. Cognitive Therapy and Research, 38(4), 416–427. https://doi.org/10.1007/ s10608-014-9601-4 Friedman, N., Geiger, D., & Goldszmidt, M. (1997). Bayesian network classifiers. Machine Learning, 29(2–3), 131–163. https://doi.org/10.1023/a:1007465528199 Goldstein, T., O’Donoghue, B., Setzep, S., & Baraniuk, R. (2014). Fast alternating direction optimization methods. SIAM Journal on Imaging Sciences, 7(3), 1588–1623. https://doi. org/10.1137/120896219 Gong, M., Liu, J., Li, H., Cai, Q., & Su, L. (2015). A multiobjective sparse feature learning model for deep neural networks. IEEE Transactions on Neural Networks and Learning Systems, 26(12), 3263–3277. https://doi.org/10.1109/TNNLS.2015.2469673 Goshvarpour, A., Abbasi, A., & Goshvarpour, A. (2017). An accurate emotion recognition system using ECG and GSR signals and matching pursuit method. Biomedical Journal, 40(6), 355–368. https://doi.org/10.1016/j.bj.2017.11.001 Hagemann, D., Waldstein, S. R., & Thayer, J. F. (2003). Central and autonomic nervous system integration in emotion. Brain and Cognition, 52(1), 79–87. https://doi.org/10.1016/ S0278-2626(03)00011-3 Hajarolasvadi, N., Ramirez, M. A., Beccaro, W., & Demirel, H. (2020). Generative adversarial networks in human emotion synthesis: A review. IEEE Access, 8, 218499–218529. https://doi. org/10.1109/ACCESS.2020.3042328 Hajncl, L., & Vučenović, D. (2020). Effects of measures of emotional intelligence on the relationship between emotional intelligence and transformational leadership. Psihološke teme, 29(1), 119–134. https://doi.org/10.31820/pt.29.1.7 Hasnul, M. A., Aziz, N. A. A., Alelyani, S., Mohana, M., & Aziz, A. A. (2021). Electrocardiogram- based emotion recognition systems and their applications in healthcare—A review. Sensors, 21(15), MDPI AG. https://doi.org/10.3390/s21155015 Hinton, G. E., & Salakhutdinov, R. R. (2006). Reducing the dimensionality of data with neural networks. Science (80-), 313(5786), 504–507. https://doi.org/10.1126/science.1127647 Hinton, G. E., Osindero, S., & Teh, Y. W. (2006). A fast learning algorithm for deep belief nets. Neural Computation, 18(7), 1527–1554. https://doi.org/10.1162/neco.2006.18.7.1527 Hsu, Y. L., Wang, J. S., Chiang, W. C., & Hung, C. H. (2020). Automatic ECG-based emotion recognition in music listening. IEEE Transactions on Affective Computing, 11(1), 85–99. https:// doi.org/10.1109/TAFFC.2017.2781732 Huang, J., Liu, B., & Tao, J. (2021). Learning long-term temporal contexts using skip RNN for continuous emotion recognition. Virtual Reality & Intelligent Hardware, 3(1), 55–64. https:// doi.org/10.1016/j.vrih.2020.11.005 Imani, M., & Montazer, G. A. (2019). A survey of emotion recognition methods with emphasis on E-Learning environments. Journal of Network and Computer Applications, 147. Academic Press, 102423. https://doi.org/10.1016/j.jnca.2019.102423

24

H. Perry Fordson et al.

Issah, M. (2018). Change leadership: The role of emotional intelligence. SAGE Open, 8(3), 1–6. https://doi.org/10.1177/2158244018800910 Jermsittiparsert, K., et al. (2020). Pattern recognition and features selection for speech emotion recognition model using deep learning. International Journal of Speech Technology, 23(4), 799–806. https://doi.org/10.1007/s10772-020-09690-2 Jerritta, S., Murugappan, M., Wan, K., & Yaacob, S. (2014). Emotion recognition from facial EMG signals using higher order statistics and principal component analysis. Journal of the Chinese Institute of Engineers, 37(3), 385–394. https://doi.org/10.1080/02533839.2013.799946 Johnson, E. L., Kam, J. W. Y., Tzovara, A., & Knight, R. T. (2020). Insights into human cognition from intracranial EEG: A review of audition, memory, internal cognition, and causality. Journal of Neural Engineering, 17(5), 051001. https://doi.org/10.1088/1741-2552/abb7a5 Khalil, R. A., Jones, E., Babar, M. I., Jan, T., Zafar, M. H., & Alhussain, T. (2019). Speech emotion recognition using deep learning techniques: A review. IEEE Access, 7, 117327–117345. https:// doi.org/10.1109/ACCESS.2019.2936124 Khenkar, S., & Jarraya, S. K. (2022). Engagement detection based on analyzing micro body gestures using 3D CNN. Computers, Materials & Continua, 70(2), 2655–2677. https://doi. org/10.32604/cmc.2022.019152 Koelstra, S., et al. (2012). DEAP: A database for emotion analysis using physiological signals. IEEE Transactions on Affective Computing, 3(1), 18–31. https://doi.org/10.1109/T-AFFC.2011.15 Kollias, D., & Zafeiriou, S. (2021). Exploiting multi-CNN features in CNN-RNN based dimensional emotion recognition on the OMG in-the-wild dataset. IEEE Transactions on Affective Computing, 12(3), 595–606. https://doi.org/10.1109/TAFFC.2020.3014171 Kong, T., Shao, J., Hu, J., Yang, X., Yang, S., & Malekian, R. (2021). Eeg-based emotion recognition using an improved weighted horizontal visibility graph. Sensors, 21(5), 1–22. https://doi. org/10.3390/s21051870 Leshno, M., Lin, V. Y., Pinkus, A., & Schocken, S. (1993). Multilayer feedforward networks with a nonpolynomial activation function can approximate any function. Neural Networks, 6(6), 861–867. https://doi.org/10.1016/S0893-6080(05)80131-5 Li, Y., Kumar, R., Lasecki, W. S., & Hilliges, O. (2020). “Artificial intelligence for HCI: A modern approach. In Conference on Human Factors in Computing Systems - Proceedings, pp. 1–8. https://doi.org/10.1145/3334480.3375147 Liu, Z., et al. (2017). A facial expression emotion recognition based human-robot interaction system. IEEE/CAA Journal of Automatica Sinica, 4(4), 668–676. https://doi.org/10.1109/ JAS.2017.7510622 Liu, Y. J., Yu, M., Zhao, G., Song, J., Ge, Y., & Shi, Y. (2018). Real-time movie-induced discrete emotion recognition from EEG signals. IEEE Transactions on Affective Computing, 9(4), 550–562. https://doi.org/10.1109/TAFFC.2017.2660485 Luo, Y., et al. (2020). EEG-based emotion classification using spiking neural networks. IEEE Access, 8, 46007–46016. https://doi.org/10.1109/ACCESS.2020.2978163 Mahata, S., Herencsar, N., & Kubanek, D. (2021). Optimal approximation of fractional-order butterworth filter based on weighted sum of classical butterworth filters. IEEE Access, 9, 81097–81114. https://doi.org/10.1109/ACCESS.2021.3085515 Mano, L. Y., et al. (2019). Using emotion recognition to assess simulation-based learning. Nurse Education in Practice, 36, 13–19. https://doi.org/10.1016/j.nepr.2019.02.017 Mao, S., Tao, D., Zhang, G., Ching, P. C., & Lee, T. (2019, May). Revisiting hidden Markov models for speech emotion recognition. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, vol. 2019-May, pp. 6715–6719. https://doi. org/10.1109/ICASSP.2019.8683172 Martínez, A., Pujol, F. A., & Mora, H. (2020). Application of texture descriptors to facial emotion recognition in infants. Applied Sciences, 10(3), 1–15. https://doi.org/10.3390/app10031115 Masood, N., & Farooq, H. (2019). Investigating EEG patterns for dual-stimuli induced human fear emotional state. Sensors (Switzerland), 19(3), 522. https://doi.org/10.3390/s19030522

Hyper-Enhanced Feature Learning System for Emotion Recognition

25

Mishra, P., & Salankar, N. (2020). Automation of emotion quadrant identification by using second order difference plots and support vector machines. In Proceedings of the IEEE Signal Processing in Medicine and Biology Symposium (SPMB), pp. 1–4. https://doi.org/10.1109/ SPMB50085.2020.9353637 Mithbavkar, S. A., & Shah, M. S. (2021). Analysis of EMG based emotion recognition for multiple people and emotions. In 3rd IEEE Eurasia Conference on Biomedical Engineering, Healthcare and Sustainability, ECBIOS 2021, pp. 1–4. https://doi.org/10.1109/ ECBIOS51820.2021.9510858 Nithya Roopa, S. (2019). Emotion recognition from facial expression using deep learning. International Journal of Engineering and Advanced Technology, 8. 6 Special Issue, 91–95. https://doi.org/10.35940/ijeat.F1019.0886S19 Nivetha, K., Ragavi Ram, G., & Ajitha, P. (2016, November). Opinion mining from social media using Fuzzy Inference System (FIS). In International Conference on Communication and Signal Processing, ICCSP 2016, pp. 2171–2175. https://doi.org/10.1109/ICCSP.2016.7754566 Özerdem, M. S., & Polat, H. (2017). Emotion recognition based on EEG features in movie clips with channel selection. Brain Informatics, 4(4), 241–252. https://doi.org/10.1007/ s40708-017-0069-3 Pao, Y. H., & Takefuji, Y. (1992). Functional-link net computing: Theory, system architecture, and functionalities. Computer (Long. Beach. Calif), 25(5), 76–79. https://doi.org/10.1109/2.144401 Pao, Y. H., Park, G. H., & Sobajic, D. J. (1994). Learning and generalization characteristics of the random vector functional-link net. Neurocomputing, 6(2), 163–180. https://doi. org/10.1016/0925-2312(94)90053-1 Reed, C. L., Moody, E. J., Mgrublian, K., Assaad, S., Schey, A., & McIntosh, D. N. (2020). Body matters in emotion: Restricted body movement and posture affect expression and recognition of status-related emotions. Frontiers in Psychology, 11, 1961. https://doi.org/10.3389/ fpsyg.2020.01961 Ren, F., & Bao, Y. (2020). A review on human-computer interaction and intelligent robots. International Journal of Information Technology and Decision Making, 19(1), 5–47. https:// doi.org/10.1142/S0219622019300052 Richardson, B., & Li, H. Y. (2021). Designing wearable electronic textiles to detect early signs of neurological injury and disease: A review. In Textile Bioengineering and Informatics Symposium Proceedings 2021 - 14th Textile Bioengineering and Informatics Symposium, TBIS 2021, pp. 11–18. Rovetta, S., Mnasri, Z., Masulli, F., & Cabri, A. (2021). Emotion recognition from speech: An unsupervised learning approach. International Journal of Computational Intelligence Systems, 14(1), 23–35. https://doi.org/10.2991/ijcis.d.201019.002 Salmam, F. Z., Madani, A., & Kissi, M. (2018). Emotion recognition from facial expression based on fiducial points detection and using neural network. International Journal of Electrical and Computer Engineering, 8(1), 52–59. https://doi.org/10.11591/ijece.v8i1.pp52-59 Schouten, A., Boiger, M., Kirchner-Häusler, A., Uchida, Y., & Mesquita, B. (2020). Cultural differences in emotion suppression in Belgian and Japanese couples: A social functional model. Frontiers in Psychology, 11, 1–12. https://doi.org/10.3389/fpsyg.2020.01048 Shaffer, F., & Ginsberg, J. P. (2017. September 28). An overview of heart rate variability metrics and norms. Frontiers in Public Health, 5. Frontiers Media S.A., 258. https://doi.org/10.3389/ fpubh.2017.00258 Shahin, I., Nassif, A. B., & Hamsa, S. (2019). Emotion recognition using hybrid Gaussian mixture model and deep neural network. IEEE Access, 7, 26777–26787. https://doi.org/10.1109/ ACCESS.2019.2901352 Shangguan, P., Liu, G., & Wen, W. (2014). The emotion recognition based on GSR signal by curve fitting. Journal of Information and Computing Science, 11(8), 2635–2646. https://doi. org/10.12733/jics20103685 Shu, L., et al. (2018). A review of emotion recognition using physiological signals. Sensors (Switzerland), 18(7), 2074. https://doi.org/10.3390/s18072074

26

H. Perry Fordson et al.

Shukla, J., Barreda-Angeles, M., Oliver, J., Nandi, G. C., & Puig, D. (2019). Feature extraction and selection for emotion recognition from electrodermal activity. IEEE Transactions on Affective Computing, 3045, 1–1. https://doi.org/10.1109/TAFFC.2019.2901673 Sinaga, K. P., & Yang, M. S. (2020). Unsupervised K-means clustering algorithm. IEEE Access, 8, 80716–80727. https://doi.org/10.1109/ACCESS.2020.2988796 Singla, C., Singh, S., & Pathak, M. (2020). Automatic audio based emotion recognition system: Scope and challenges. SSRN Electronic Journal, 6. https://doi.org/10.2139/ssrn.3565861 Soleymani, M., Lichtenauer, J., Pun, T., & Pantic, M. (2012). A multimodal database for affect recognition and implicit tagging. IEEE Transactions on Affective Computing, 3(1), 42–55. https:// doi.org/10.1109/T-AFFC.2011.25 Soleymani, M., Villaro-Dixon, F., Pun, T., & Chanel, G. (2017). Toolbox for emotional feature extraction from physiological signals (TEAP). Frontiers in ICT, 4, 1. https://doi.org/10.3389/ fict.2017.00001 Song, T., Zheng, W., Lu, C., Zong, Y., Zhang, X., & Cui, Z. (2019). MPED: A multi-modal physiological emotion database for discrete emotion recognition. IEEE Access, 7, 12177–12191. https://doi.org/10.1109/ACCESS.2019.2891579 Song, T., Zheng, W., Song, P., & Cui, Z. (2020). EEG emotion recognition using dynamical graph convolutional neural networks. IEEE Transactions on Affective Computing, 11(3), 532–541. https://doi.org/10.1109/TAFFC.2018.2817622 Subasi, A., Tuncer, T., Dogan, S., Tanko, D., & Sakoglu, U. (2021). EEG-based emotion recognition using tunable Q wavelet transform and rotation forest ensemble classifier. Biomedical Signal Processing and Control, 68, 102648. https://doi.org/10.1016/j.bspc.2021.102648 Taherkhani, A., Cosma, G., & McGinnity, T. M. (2018). Deep-FS: A feature selection algorithm for Deep Boltzmann Machines. Neurocomputing, 322, 22–37. https://doi.org/10.1016/j. neucom.2018.09.040 Tang, J., Deng, C., & Bin Huang, G. (2016). Extreme learning machine for multilayer perceptron. IEEE Transactions on Neural Networks and Learning Systems, 27(4), 809–821. https://doi. org/10.1109/TNNLS.2015.2424995 Thomas, A. R., Pop, N. A., Iorga, A. M., & Ducu, C. (2016). Ethics and neuromarketing: Implications for market research and business practice. Springer International Publishing. Topic, A., & Russo, M. (2021). Emotion recognition based on EEG feature maps through deep learning network. Engineering Science and Technology, an International Journal, 24(6), 1442–1454. https://doi.org/10.1016/j.jestch.2021.03.012 Wang, S., Li, J., Cao, T., Wang, H., Tu, P., & Li, Y. (2020a). Dance emotion recognition based on Laban motion analysis using convolutional neural network and long short-term memory. IEEE Access, 8, 124928–124938. https://doi.org/10.1109/ACCESS.2020.3007956 Wang, G., Qiao, J., Bi, J., Jia, Q. S., & Zhou, M. C. (2020b). An adaptive deep belief network with sparse restricted Boltzmann machines. IEEE Transactions on Neural Networks and Learning Systems, 31(10), 4217–4228. https://doi.org/10.1109/TNNLS.2019.2952864 Wang, J., Liu, H., Liu, F., & Wang, Q. (2020c). Human-computer interaction speech emotion recognition based on random forest and convolution feature learning. Xitong Fangzhen Xuebao / Journal of System Simulation, 32(12), 2388–2400. https://doi.org/10.16182/j.issn1004731x. joss.20-FZ0494E Wang, W., Tran, D., & Feiszli, M. (2020d). What makes training multi-modal classification networks hard?. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 12692–12702, https://doi.org/10.1109/CVPR42600.2020.01271 Wei, W., & Jia, Q. (2016). Weighted feature Gaussian Kernel SVM for emotion recognition. Computational Intelligence and Neuroscience, 2016, 1–7. https://doi. org/10.1155/2016/7696035 Wu, J., Zhang, Y., Sun, S., Li, Q., & Zhao, X. (2021). Generalized zero-shot emotion recognition from body gestures. Applied Intelligence, 52, 1–12. https://doi.org/10.1007/s10489-021-02927-w Xia, R., & Liu, Y. (2016, September). DBN-ivector framework for acoustic emotion recognition. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2016, vol. 08–12, pp. 480–484. https://doi.org/10.21437/ Interspeech.2016-488

Hyper-Enhanced Feature Learning System for Emotion Recognition

27

Xie, W., & Xue, W. (2021). WB-KNN for emotion recognition from physiological signals. Optoelectronics Letters, 17(7), 444–448. https://doi.org/10.1007/s11801-021-0118-2 Yan, J., Zheng, W., Xin, M., & Yan, J. (2014). Integrating facial expression and body gesture in videos for emotion recognition. IEICE Transactions on Information and Systems, E97-D(3), 610–613. https://doi.org/10.1587/transinf.E97.D.610 Yang, Z., Wang, J., & Chen, Y. (2014). Surface EMG based emotion recognition model for body language of head movements. Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer- Aided Design and Computer, 26(9), 1396–1402. Yang, H., Han, J., & Min, K. (2019). A multi-column CNN model for emotion recognition from EEG signals. Sensors (Switzerland), 19(21), 1–12. https://doi.org/10.3390/s19214736 Yao, Z., Wang, Z., Liu, W., Liu, Y., & Pan, J. (2020). Speech emotion recognition using fusion of three multi-task learning-based classifiers: HSF-DNN, MS-CNN and LLD-RNN. Speech Communication, 120, 11–19. https://doi.org/10.1016/j.specom.2020.03.005 Yegnanarayana, B. (1994). Artificial neural networks for pattern recognition. Sadhana, 19(2), 189–238. https://doi.org/10.1007/BF02811896 Yu, D., & Sun, S. (2020). A systematic exploration of deep neural networks for EDA-based emotion recognition. Information, 11(4), 212–212. https://doi.org/10.3390/INFO11040212 Yu, Z., Li, L., Liu, J., & Han, G. (2015). Hybrid adaptive classifier ensemble. IEEE Transactions on Cybernetics, 45(2), 177–190. https://doi.org/10.1109/TCYB.2014.2322195 Yu, Z., et al. (2016a). Incremental semi-supervised clustering ensemble for high dimensional data clustering. IEEE Transactions on Knowledge and Data Engineering, 28(3), 701–714. https:// doi.org/10.1109/TKDE.2015.2499200 Yu, Z., et al. (2016b). Hybrid k -nearest neighbor classifier. IEEE Transactions on Cybernetics, 46(6), 1263–1275. Yu, M., et al. (2019). A review of EEG features for emotion recognition. Scientia Sinica Informationis, 49(9), 1097–1118. https://doi.org/10.1360/n112018-00337 Yun, Y., Ma, D., & Yang, M. (2021). Human–computer interaction-based decision support system with applications in data mining. Future Generation Computer Systems, 114, 285–289. https:// doi.org/10.1016/j.future.2020.07.048 Zapf, D., Kern, M., Tschan, F., Holman, D., & Semmer, N. K. (2021). Emotion work: A work psychology perspective. Annual Review of Organizational Psychology and Organizational Behavior, 8. Annual Reviews Inc., 139–172. https://doi.org/10.1146/annurev-orgpsych-012420-062451 Zhang, Q., Chen, X., Zhan, Q., Yang, T., & Xia, S. (2017). Respiration-based emotion recognition with deep learning. Computers in Industry, 92–93, 84–90. https://doi.org/10.1016/j. compind.2017.04.005 Zhang, X., et al. (2020a). Emotion recognition from multimodal physiological signals using a regularized deep fusion of kernel machine. IEEE Transactions on Cybernetics, 59(9), 4386–4399. https://doi.org/10.1109/tcyb.2020.2987575 Zhang, J., Yin, Z., Chen, P., & Nichele, S. (2020b). Emotion recognition using multi-modal data and machine learning techniques: A tutorial and review. Information Fusion, 59, 103–126. https://doi.org/10.1016/j.inffus.2020.01.011 Zhang, Y., Zhao, C., Chen, M., & Yuan, M. (2021). Integrating stacked sparse auto-encoder into matrix factorization for rating prediction. IEEE Access, 9, 17641–17648. https://doi. org/10.1109/ACCESS.2021.3053291 Zhao, Y., & Chen, D. (2021). Expression EEG multimodal emotion recognition method based on the bidirectional LSTM and attention mechanism. Computational and Mathematical Methods in Medicine, 2021, 1–12. https://doi.org/10.1155/2021/9967592 Zhao, H., Zheng, J., Deng, W., & Song, Y. (2020). Semi-supervised broad learning system based on manifold regularization and broad network. IEEE Transactions on Circuits and Systems I: Regular Papers, 67(3), 983–994. https://doi.org/10.1109/TCSI.2019.2959886 Zheng, W. (2017). Multichannel EEG-based emotion recognition via group sparse canonical correlation analysis. IEEE Transactions on Cognitive and Developmental Systems, 9(3), 281–290. https://doi.org/10.1109/TCDS.2016.2587290 Zhong, P., Wang, D., & Miao, C. (2020). EEG-based emotion recognition using regularized graph neural networks. IEEE Transactions on Affective Computing, 13, 1–1. https://doi.org/10.1109/ taffc.2020.2994159

Monitoring of Auditory Discrimination Therapy for Tinnitus Treatment Based on Event-Related (De-) Synchronization Maps Ingrid G. Rodríguez-León, Luz María Alonso-Valerdi, Ricardo A. Salido-Ruiz, Israel Román-Godínez, David I. Ibarra-Zarate, and Sulema Torres-Ramos

1 Introduction 1.1 What Is Tinnitus? Tinnitus is the perception of sound in the absence of an external source (Eggermont & Roberts, 2012). It affects between 5 and 15% of the world’s population (Meyer et al., 2014). Tinnitus is caused by exposure to loud noise, fever, ototoxicity, or a transient disturbance in the middle ear (Eggermont & Roberts, 2012). It can be perceived by people of all ages, either those with normal hearing or those with hearing loss (Alonso-Valerdi et al., 2017).

1.2 Sort of Tinnitus Tinnitus has adopted different classifications according to several criteria such as causes, duration, and symptom characteristics (Haider et al., 2018). Objective tinnitus is associated with peripheral vascular abnormalities detectable by stethoscope inspection, whereas subjective tinnitus is determined as an acoustic perception experienced only by the patient (Lenhardt, 2004).

I. G. Rodríguez-León (*) · R. A. Salido-Ruiz · I. Román-Godínez · S. Torres-Ramos División de Tecnologías para la Integración Ciber-Humana, Universidad de Guadalajara (UDG), Jalisco, México e-mail: [email protected] L. M. Alonso-Valerdi · D. I. Ibarra-Zarate Escuela de Ingeniería y Ciencias, Tecnológico de Monterrey, Nuevo León, México © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 I. Obeid et al. (eds.), Signal Processing in Medicine and Biology, https://doi.org/10.1007/978-3-031-21236-9_2

29

30

I. G. Rodríguez-León et al.

Another causal classification strategy is based on the origin of the tinnitus relative to the site of impairment in the auditory pathway, categorizing tinnitus into peripheral and central types (Haider et al., 2018). Tinnitus is called chronic when it has been present for at least 6 months. On the other hand, tinnitus lasting less than 6 months is called acute (Henry, 2016). Another classification of symptoms is based on a description of the tinnitus sound, for example, whether it is continuous or intermittent, pulsatile or non-pulsatile (Henry, 2016).

1.3 Tinnitus Affectation Chronic tinnitus is caused by the over-synchronization of neurons, which affects cognitive, attentional, emotional, and even motor processes (Eggermont & Roberts, 2012). In particular, cognitive impairment is frequently reported in patients with tinnitus over the last few years (Hallam et al., 2004). Working memory and attentional processes such as deficits in (1) executive control of attention (Heeren et al., 2014), (2) attentional changes (Hallam et al., 2004), and (3) selective and divided attention (Rossiter et al., 2006) have been particularly studied over the last years. Tinnitus is related to abnormal changes at one or more levels along the auditory pathway. Imaging studies of the human brain have identified altered tinnitus-related activity in auditory areas, including the inferior colliculus and auditory cortex (Finlayson & Kaltenbach, 2009; Henry & Wilson, 1995; Kaltenbach et al., 2005). The central auditory system appears to increase its activity to compensate for reduced sensorineural input from the cochlea induced by acoustic trauma, ototoxic agents, or other causes. Such hyperactivity can show up as phantom sound, hyperacusis or intolerance to loud sounds. In addition to hyperactivity, tinnitus-related changes in the auditory system also include increased neural synchrony and explosive activity. Animal models have corroborated the previous explanation, pointing to the presence of rearranged tonotopic maps and increased spontaneous neuronal activity or synchrony in the auditory cortex at the origin of tinnitus (Eggermont & Roberts, 2012).

1.4 How Can Be Over-Synchronization of Neurons Due to Tinnitus Detected? The attentional neurophysiological mechanisms occurring at a cortical neural level can be recorded over the human scalp using the electroencephalography (EEG) technique (Henry, 2016). EEG allows monitoring brain rhythmic and ongoing electrical activity, which is made up of several simultaneous oscillations at different frequencies (Basar et al., 1999a, 2000; Krause, 2003). Neural oscillations have traditionally been studied based on event-related experiments, where event-related

Monitoring of Auditory Discrimination Therapy for Tinnitus Treatment Based…

31

potentials and (de-) synchronization levels have been estimated (Krause, 2003). Specifically, event-related neural oscillatory responses at different frequency bands reflect different stages of neural information processing (Basar et al., 1999a, b, 2000). Event-related oscillations are typically studied as: (1) event-related desynchronization (ERD), which refers to the phasic relative power decrease of a certain frequency band, and (2) event-related synchronization (ERS), which implies a relative power increase. As the term indicates, both ERD and ERS are neural patterns occurring in relation to emotional, cognitive, motor, sensory, and/or perceptual events (Aranibar & Pfurtscheller, 1978; Pfurtscheller, 1977, 1992; Pfurtscheller & Aranibar, 1979). In tinnitus patients, power changes in various frequency bands reflect changes in neural synchrony. The levels of synchronization related to auditory stimuli are carried out here to evaluate the effect of auditory discrimination therapy (ADT).

1.5 Event-Related (De-) Synchronization (ERD/ERS) ERD/ERS maps have been extensively used to study auditory information processing at a cortical level. For instance, Klimesch and colleagues have reported theta and alpha oscillatory responses during cognitive processing in different studies (Klimesch, 1999). On the functional level, attention and semantic memory (cognitive processes responsible for accessing and/or bringing back information from long-term memory) are strongly related to brain oscillatory responses within alpha frequencies (Klimesch, 1997) whereas working memory functions (memory needed to retain auditory stimulus for some time) are associated with the brain activity in the theta frequency range (Bastiaansen & Hagoort, 2003; Duncan Milne et al., 2003). Krause and colleagues have extensively studied auditory elicited ERD/ERS responses using a wide variety of cognitive tasks. One consistent finding from these studies is that the encoding of acoustic information typically elicits widespread alpha-ERS responses whereas recognition of the same acoustic material elicits widespread alpha-ERD responses (Krause, 2006). The mapping of synchronization and desynchronization mechanisms related to auditory events allows us to visualize the neural processing of the auditory cortex from a couple of milliseconds to several seconds. Auditory stimuli have been found to produce changes in EEG signal oscillations primarily in the temporal and parietal lobes. In addition, these changes have a latency of a couple of milliseconds up to 2 s after the appearance of the stimulus (Krause et al., 1994). In general, the mapping of neuronal synchronization and desynchronization mechanisms is a widely used tool in the area of Neuroscience, since it reflects the coupling and uncoupling process experienced by the neuronal oscillatory system when decoding sensory, cognitive, and/or cognitive events. There is evidence showing that patients suffering from tinnitus have psychological and electrophysiological abnormalities (Gabr et al., 2011). These abnormalities should be reflected when quantifying the neuronal synchronization and

32

I. G. Rodríguez-León et al.

desynchronization mechanisms due to auditory stimuli. Additionally, if the treatment of tinnitus through acoustic therapies was producing any significant change at the cortical level, it should be equally quantifiable by the same means.

1.6 How Can Be Tinnitus Treated? To date, there is no potentially curative medical, neurological, or neurophysiological therapy that addresses the underlying causes of the disorder. Therefore, there is a wide variety of treatments for tinnitus, including pharmacological, psychological, magnetic and electrical stimulation interventions, and sound-based therapies (McFerran et al., 2019). In particular, acoustic therapies such as ADT aim to reverse the tinnitus-related neuroplasticity phenomenon by appropriately stimulating the auditory pathway, inducing either habituation and/or suppression (Formby et al., 2013). Despite being the seventh of the twenty-five most used treatments for tinnitus (Simoes et al., 2019), the effect produced is still not well understood (Alonso- Valerdi et al., 2017). That is why there are currently several areas of opportunity suggested by the scientific community to evaluate tinnitus (McFerran et al., 2019); such as the search for objective measures to quantify the effect of treatments, since currently the most used evaluations are completely subjective, such as audiometric and self-report questionnaires.

1.7 Auditory Discrimination Therapy (ADT) As we saw in the previous section, one of the consequences of neuroplasticity dependent on stimulation is the possibility of reversing tinnitus through appropriate acoustic therapies. The objective of these therapies is to produce neuroplastic changes in patients that can lead to habituation and residual inhibition. Habituation acts more on the limbic and autonomic systems, in such a way that, although the patient perceives his tinnitus, he is able to cope with it and improve his quality of life. Residual inhibition is the sensation of the tinnitus diminishing when the stimulus is stopped. The inhibition can last from a few seconds to days. To date, the parameters to improve the stimulus, such as frequency, intensity, and duration, are still being studied (Mohebbi et al., 2019). That is why in this chapter we will analyze with more detail and specifically the auditory discrimination therapy. Due to the theories of tinnitus origin described previously, it is important to consider a change in neuronal activity in the involved areas of the brain in the neural circuits responsible for tinnitus. In this way, both auditory and non-auditory areas can be reached. Fundamentally, a difference is made between invasive and non- invasive neuromodulation such as ADT, where its goal is to normalize brain activity related to tinnitus. With the advancement of technology in structural and functional neuroimaging techniques, it has become possible to find those areas of the brain that

Monitoring of Auditory Discrimination Therapy for Tinnitus Treatment Based…

33

are mainly responsible for the perception of tinnitus or everything related to it (Husain et al., 2011). The auditory discrimination therapy provides targeted acoustic neurostimulation of damaged areas of the cochlear nucleus to enhance activity in the auditory cortex corresponding to impaired frequencies and its adjacent zones, effectively reducing tinnitus. This treatment is based on the ability of the central nervous system to reorganize the tonotopic distribution of the cerebral cortex as a result of neuronal plasticity phenomena after a process of peripheral deafferentation. The treatment consists of the use of pure tones discontinuously mixed randomly with small broadband noises. It is about stimulating the frequencies immediately before and after the cochlear injury to redirect the process of cortical reorganization, that is, distribution of the deafferent cortical area between the regions of perilesional frequencies. In other acoustic therapies it is not necessary for the patient to pay attention to the stimulus, it is only required to listen to it, otherwise, ADT requires the attention of patient to the stimulus, and the discrimination of some of its characteristics. The vast majority of published works on ADT use oddball paradigms as stimuli (Alonso- Valerdi et al., 2021). These paradigms consist of sound stimuli composed of standard and deviant pulses, presented randomly. The patient has to identify if it is a standard or deviant pulse. Figure 1 shows an example of these stimuli in which both the duration of the impulse t, the inter-latency between impulses of each pair isi, the inter-latency between pairs of impulses isi12, and the probabilities of appearance of pairs of standard and deviant pulses, p1 and p2 respectively are variable parameters that can be controlled. Figure 2a shows one of the impulses of pair of stimuli of the oddball signal. As can be seen, it is a sinusoidal signal windowed by a Hanning function of the same duration as the pulse (100 ms in this case). Figure 2b shows the spectral module of the oddball signal. Two clear peaks can be seen at the frequencies of the standard (4000 Hz) and deviant (4500 Hz) pulses.

Fig. 1 (a) Parameters of the stimuli of the Oddball Paradigm. Duration, inter-latency of individual impulses, and inter-latency of pairs of impulses. (b) Probabilities of each of the pulse pairs

34

I. G. Rodríguez-León et al.

Fig. 2 (a) Waveform of one of the pulses of the oddball stimuli, (b) Spectral module of the oddball signal

As mentioned above, two types of stimuli are used in this procedure: standard s and deviant d. Note that the sound impulses work with a very narrow range of frequencies around the fundamental frequency, f0, where the tinnitus occurs. Both pulses Xs(t) and Xd(t), standard and deviant sinusoidal signals, respectively, are originated from a periodic wave whose expression is:

x t cos 2 f t

(1)

The area in which the stimulus has to influence must be very close to or equal to the fundamental frequency (f0, tinnitus frequency). Let fs be the standard frequency and fd the deviating frequency of the pulses

f s f0 f f f f0 f s d (2)

Depending on the characteristics of tinnitus and with the aim of improving the hearing disorder, the deviating frequency can be higher or lower than 5–10% of the standard frequency. Thus, from Eqs. (1) and (2) the excitatory signals are obtained Eq. (3), to which a Hanning window is applied before supplying them to the patient:

xs t cos 2 f s t xd t cos 2 f d t (3)

Therefore, the auditory discrimination therapy stimuli are sinusoids of very similar frequency that act on the focus of symptom, using in most cases the tinnitus frequency as seen in Eq. (4):

Monitoring of Auditory Discrimination Therapy for Tinnitus Treatment Based…

35

xs t cos 2 t f 0 xd t cos 2 t f 0 f (4)

ADT is an acoustic therapy based on the Oddball paradigm principle, which is designed to reduce attention toward tinnitus, thereby reducing its perception (Herraiz et al., 2007). The oddball paradigm consists of a pair of stimuli: standard and deviant pulses which are randomly presented. The patient must identify deviant (40%) from standard (60%) pulses. The standard pulse is white noise with a duration of 500 ms while the deviated pulse can range between 4 and 8 kHz with a duration between 50 and 100 ms. Training in tones that differ from the dominant tinnitus tone is beneficial due to the effect of lateral inhibition. Furthermore, stimulating regions of specific frequencies close to but not within the tinnitus frequency region will likely promote or strengthen lateral inhibitory activity and thus disrupt the synchronous pathological activity of the tinnitus-generating region (Wang et al., 2020).

1.8 How Can Be Auditory Discrimination Therapy for Tinnitus Treatment Monitored? ERD/ERS monitoring has become a widely applied electroencephalography monitoring tool. ERD/ERS maps are currently employed as quantitative measures derived from the EEG (EEG biomarkers) to assess cognitive engagement in stroke patients during motor rehabilitation (Osuagwu et al., 2016; Park et al., 2015; Sebastian-Romagosa et al., 2020), to monitor deterioration of the attention network of people with Alzheimer’s and Parkinson’s disease (Missonnier et al., 2007), and to estimate the effect of sporadic seizures on cognitive abilities in very young people with non-symptomatic focal epilepsy receiving antiepileptic medication (Krause et al., 2008). Furthermore, Caimmi and colleagues (Caimmi et al., 2016) evaluated the recovery of patients who had suffered a stroke, and who were undergoing rehabilitation assisted by robotic devices, through neural synchronization maps. The researchers showed that assisted rehabilitation is just as efficient as conventional rehabilitation. Another example of the application of synchronization and desynchronization maps has been to pain perception, specifically the representation, encoding, evaluation, and integration of nociceptive sensory inputs. It has been found that the subjective perception of pain and the intensity of the stimulus are correlated with the latency, frequency, magnitude, spatial distribution, phase, neural generator, and frequency coupling of the oscillatory activities of the neural networks (Peng & Tang, 2016). Although this method has been used to monitor cognitive engagement during certain therapies, with applications in stroke patients or patients with epilepsy and so on, this method has not yet been applied to assess acoustic therapies in tinnitus patients, as far as we know. Based on this evidence, we propose to evaluate the

36

I. G. Rodríguez-León et al.

effect of the auditory discrimination therapy for tinnitus treatment by mapping ERD and ERS responses before and after the therapy, and decide whether this electroencephalography technique could be feasible to monitor sound effects. In particular, the effects of the auditory discrimination therapy on attentional and memory processes are of special interest since ADT looks to reduce attention toward tinnitus in order to increase attention on everyday acoustic environments.

1.9 Methods to Evaluate Auditory Discrimination Therapy Currently, the most widely used way to evaluate auditory discrimination therapy is through subjective methods such as the visual analog scale and ad hoc questionnaires (Alonso-Valerdi et al., 2017). For instance, (Herraiz et al., 2007) evaluated the effectiveness of ADT in 27 tinnitus patients for 1 month, and according to the Tinnitus Handicap Inventory (THI) test and a visual analog scale, the authors found an improvement in tinnitus perception in 40% of patients. Same authors, (Herraiz et al., 2010), evaluated the effect of ADT using two paradigms; stimulating 20 patients at the same tinnitus frequency and stimulating 21 patients at one octave band of the frequency below the tinnitus pitch. According to the responses of the THI test, an analog scale, and a questionnaire, the perception of tinnitus decreased in 42.2% of the patients submitted to the second paradigm. The research presented by (Alonso-Valerdi et al., 2021) compared music-based sound therapies, retraining, neuromodulation (ADT), and binaural beats using neuroaudiology assessments and psychological assessments. The first evaluation revealed that the entire frequency structure of the neural networks showed a higher level of activity in the tinnitus patients than in the control individuals. Based on psychological evaluation, retraining treatment was the most effective sound-based therapy in reducing tinnitus perception and releasing stress and anxiety after 60 days of treatment. However, binaural beats and ADT produced very similar effects. In addition, ADT was shown to have fewer side effects. In light of the above discussion, the present work aims to establish a methodology based on the electroencephalography analysis to evaluate objectively the effectiveness of ADT to redirect the attention of patients with tinnitus. For this purpose, the database “Acoustic therapies for tinnitus treatment: An EEG database” (Ibarra- Zarate et al., 2022) was used. From the database, only control and ADT groups were selected. Afterward, ERD and ERS responses were mapped for two study cases: (1) before and (2) after applying the ADT and two events: (1) encoding and (2) recognition of auditory material. For ERD/ERS maps, continuous wavelet transform (CWT) was used. Thereafter, resulting scalograms images were analyzed to investigate the performance in terms of cognitive changes, specifically those related to attention and memory. The foregoing may provide solid evidence of the feasibility of ADT to treat subjective and chronic tinnitus. The conduction of the investigation is described below.

Monitoring of Auditory Discrimination Therapy for Tinnitus Treatment Based…

37

2 Methodology The methodology for this work was undertaken into four steps: (1) to analyze and select the electroencephalography signals of interest from the aforementioned database, (2) to pre-process the electroencephalography signals to increase the signal- to-noise ratio, (3) to estimate the ERD/ERS maps based on CWT, and (4) to compare statistically sessions before and after the acoustic treatment. This methodology is shown in Fig. 3, and described in detail in the following paragraphs.

2.1 EEG Database The database for this research is available at Mendeley Data under the title “Acoustic therapies for tinnitus treatment: An EEG database” (Ibarra-Zarate et al., 2022). This database was created by following a protocol formerly approved by the Ethical

Fig. 3 Pipeline of the EEG analysis to monitor the effectiveness of ADT to treat subjective, chronic tinnitus

38

I. G. Rodríguez-León et al.

Committee of the National School of Medicine of the Tecnologico de Monterrey, described, published, and registered under the trial number: ISRCTN14553550. From the cohort, two groups were selected: tinnitus patients treated with the auditory discrimination therapy and controls. There were eleven participants per group. Both groups were treated for 8 weeks and were instructed to use the sound- based therapy for 1 h every day at any time of the day. Note that controls were acoustically stimulated with relaxing music. The therapy was monitored before and after the 8-week treatment. At each monitoring session, an adapted version of the Tinnitus Handicap Inventory created by the National Institute of Rehabilitation was applied and an electroencephalography recording was produced. THI was applied to report the perception of tinnitus during the sound-based treatment. The questionnaire responses were categorized in (1) normal, (2) borderline normal, and (3) abnormal condition before and after treatment. For the electroencephalography recording, two different soundscapes were played, while five associated auditory stimuli were randomly played. Whenever participants identified auditory stimuli, they pressed a keyboard button. The soundscapes along with their related auditory stimuli in each monitoring session comprised: (1) restaurant sounds: human sound (tasting food), microwave sound, glass breaking, door closing, and soda can being opened; and (2) sounds of construction in progress: human sound (yelling), police siren, mobile dialing, bang and hit. All the stimuli lasted 1 s and were repeated 50 times at a random rate. Participants kept their eyes closed during the stimulation. Every monitoring session was around 60 min long (Alonso-Valerdi et al., 2017). The experimental timing protocol is included in Fig. 4. To record electroencephalographic data, a g.USBamp amplifier was used, which was configured as stated in Table 1.

2.2 EEG Signal Pre-processing The electroencephalographic signals were pre-processed as follows in order to increase the signal-to-noise ratio. Firstly, the low-frequency components were eliminated by applying a Butterworth-type Band Pass digital filter with order 6 of zero phase, and with cutoff frequencies between 0.1 and 30 Hz. Secondly, channels were removed according to the criteria reported in (Chang et al., 2020): flat for more than 5 s, maximum acceptable high-frequency noise standard deviation of 4, minimum acceptable correlation with nearby channels of 0.8. Thirdly, artifact subspace reconstruction (ASR) bad burst correction was performed in order to remove bad data periods with transient or large amplitude artifacts that exceeded 20 times the standard deviation of the calibrated data (Chang et al., 2020). Fourthly, independent component analysis (ICA) was applied with RunICA function. Finally, the independent components distinguished as non-brain sources were rejected by the ICLabel classifier. The probability range for components flagged for rejection was set between 0.6 and 1. There were five non-brain source categories: (1) muscular, (2) ocular, and (3) electrocardiographic artifacts, (4) line noise, and (5) channel noise.

Monitoring of Auditory Discrimination Therapy for Tinnitus Treatment Based…

39

Fig. 4 Timing protocol for EEG data in use. Each trial was around 60 min long. In each trial, participants listened to a soundscape and identified five randomly played auditory stimuli by pressing a button on the keyboard. There were two types of induced events: (1) auditory material encoding and (2) auditory material retrieval Table 1 EEG recording system configuration Sampling rate Number of channels Channels used by region Reference method Electrode placement system

256 Hz 16 Prefrontal (FP1, FP2), Frontal (F7, F3, Fz, F4, F8), Temporal (T3, T4, T5, T6), Central (C3, C4), Parietal (Pz), Occipital (O1, O2) Monopolar @ Cz International 10–20 system

2.3 ERD/ERS Maps Most of the changes in neural activity due to tinnitus have been observed over the frontal lobe. In (Eggermont & Roberts, 2012), delta and theta band oscillations were enhanced because of tinnitus loudness and tinnitus-related distress. Patients in greater distress from tinnitus showed larger theta oscillations. According to (Adamchic et al., 2014), most tinnitus patients showed a decrease in alpha power since their attention had been redirected to their tinnitus as their minds wandered

40

I. G. Rodríguez-León et al.

off, resulting in low alpha synchronization. Based on this evidence, the electroencephalography signals over the frontal lobe (Fp1, Fp2, F7, F3, Fz, F4, F8) were averaged to monitor the auditory discrimination therapy effect on tinnitus sufferers. The epochs were extracted 500 ms before and 1 s after the stimulus onset in line with the timing protocol presented in Fig. 4. There were two types of events: (1) auditory material encoding and (2) auditory material retrieval. Regarding auditory memory mechanisms, the first event induces long-lasting alpha ERS responses, while the second event is associated with long-lasting alpha ERD responses (Krause, 2006). The CWT was the time-frequency analysis applied to each epoch. Wavelet of the complex Gaussian family (Eq. 5) was selected since they are based on complex- valued sinusoids constituting an analytic signal, possessing the shift invariance property. The sampling frequency was 256 Hz and the frequency range oscillated between 0.1 and 30 Hz. f x C p e ix e x

2

(5)

The integer p is the parameter of this family built from the complex Gaussian function. Cp is such that ∥ fp∥2= 1 where fp is the pth derivative of f. The baseline correction was carried out using the subtraction method based on Eq. (6) BC P t ,f R f

(6)

where P (t, f) is the power value given a time-frequency point subtracted by the average value of the baseline values R f from −400 to −100 ms at each frequency range prior to the appearance of an auditory encoding or recognition event (Zhang, 2019). The coefficient matrices resulting from the CWT per epoch were averaged and the absolute value was carried out to obtain only real estimations. CWT scalograms were plotted as a function of time windows from −500 ms to 1 s and a frequency ranging from 0.1 to 30 Hz, for the purpose of representing the auditory synchronization and desynchronization activity over the frontal lobe before and after the ADT- based procedure. Based on the reference and the two experimental conditions (encoding and recognition of acoustic material), the ERD/ERS values were determined for each of the tinnitus subjects using the following mathematical expression (Eq. 7) in the different frequency bands from 4–8 Hz (θ), 8–13 Hz (α), and 13–30 Hz (β) (Krause et al., 2008). f x 100%

power during reference power during experiment power durring reference

(7)

Monitoring of Auditory Discrimination Therapy for Tinnitus Treatment Based…

41

2.4 Statistical Evaluation The statistical analyses were conducted separately for each frequency band under two conditions: (1) ERD/ERS response before and after the treatment, and (2) considering two auditory processes: encoding and recognition of auditory material. The Lilliefors test was used to assess data distribution. After getting a non-normal distribution, the statistical significance of any differences between ERD/ERS values of the two groups was evaluated with the Kruskal-Wallis test. P-values were stated at 5% for both statistical processes.

3 Results 3.1 ERD/ERS Maps Grouped by the THI Outcome ERD/ERS maps of 11 tinnitus subjects, delimited by brain frequency bands (delta (1–4 Hz), theta (4–8 Hz), alpha (8–13 Hz), and beta (13–30 Hz)), were obtained for encoding and recognition of auditory material events before and after the ADT- based treatment, resulting in a total of 22 ERD/ERS maps linked with the encoding task, and 18 ERD/ERS maps linked with the recognition task. The four missing maps were due to the lack of auditory material recognition responses obtained from four subjects in the initial monitoring session during the acoustic therapy. On the first analysis, shown in Figs. 5, 6, and 7, event-related (de) synchronizations maps extracted during the auditory encoding task before and after the ADT- based treatment were grouped based on the perception of tinnitus reported by the patients at the end of the therapy (THI questionnaire). Figure 5a, b show the median of ERD/ERS responses obtained before and after the ADT-based treatment, of six subjects who reported no therapeutic benefits. During the first session, high- frequency energy is observed in beta-frequency band before the encoding stimulus. Thereafter after the stimulus, energy gets concentrated mainly in the lower beta band and barely in the alpha band. On the other hand, during the final monitoring session, there were no significant changes. Figure 6a, b show the ERD/ERS responses elicited in both monitoring sessions of one subject who perceived a worsening on tinnitus perception. During the first session, according to his ERD/ERS responses, high-frequency energy is observed in beta and alpha bands before the encoding stimulus, followed by a power decrease in the same bands after the stimulus. Finally 750 ms after the stimulus onset, high-frequency energy appears in the alpha and lower beta bands. On the other side, during the final monitoring session most of the energy is focused after the stimulus onset between 25 and 30 Hz. Figure 7a, b show the ERD/ERS responses of one subject who perceived a decrease in the perception of his tinnitus. During the first session, high-frequency energy is observed before and after the stimulus while during the final monitoring session, the energy is spread over alpha and beta bands before the stimulus and finally 250 ms

42

I. G. Rodríguez-León et al.

Fig. 5 (ERD/ERS) responses over the frontal lobe before (a) and after (b) the ADT-based treatment during the auditory material encoding event. Median of 6 patients who exhibited normal condition in the THI

Fig. 6 (ERD/ERS) responses over the frontal lobe before (a) and after (b) the ADT-based treatment during the auditory material encoding event. A patient who exhibited abnormal condition in the THI

after the stimulus onset, high levels of synchronization are concentrated mainly in alpha and barely in theta bands. Regarding the three missing subjects, their THI outcomes were not obtained, so they could not be associated with ERD/ERS maps.

3.2 Individual Analysis of the ERD/ERS Maps in Tinnitus Subjects ERD/ERS maps for encoding and recognition of auditory material events before and after the ADT-based treatment were analyzed for each tinnitus subject.

Monitoring of Auditory Discrimination Therapy for Tinnitus Treatment Based…

43

Fig. 7 (ERD/ERS) responses over the frontal lobe before (a) and after (b) the ADT-based treatment during the auditory material encoding event. A patient who exhibited borderline condition in the THI

Fig. 8 Subject 1. (ERD/ERS) responses over the frontal lobe before (a) and after (b) the ADT- based treatment during the auditory material encoding event

Figures 8 and 9 correspond to the first subject. In Fig. 8, during the first session, high-frequency energy is observed between 13 and 20 Hz after 400 ms of the encoding stimulus onset. Nonetheless, during the final monitoring session, there was a high-frequency power more attenuated after the encoding stimulus. On the other hand, in Fig. 9, during the first session, power energy is observed more attenuated compared with the final monitoring session, where high levels of synchronization were kept before the recognition stimulus. Figures 10 and 11 correspond to the second subject. In Fig. 10, during the first session, low-frequency energy is observed between 8 and 14 Hz before the encoding stimulus onset. Instead, during the final monitoring session, there was a high- frequency power more attenuated after the encoding stimulus. On the other hand, in Fig. 11, during the first session, power energy is observed more attenuated

44

I. G. Rodríguez-León et al.

Fig. 9 Subject 1. (ERD/ERS) responses over the frontal lobe before (a) and after (b) the ADT- based treatment during the auditory material recognition event

Fig. 10 Subject 2. (ERD/ERS) responses over the frontal lobe before (a) and after (b) the ADT- based treatment during the auditory material encoding event

compared to the final monitoring session, where high levels of synchronization were kept at higher frequencies before the recognition stimulus. Figures 12 and 13 correspond to the third subject. In Fig. 12, during the first session, high-frequency energy is observed after the encoding stimulus onset, while during the final monitoring session, there is attenuated low-frequency power before and after the encoding stimulus. Meanwhile, in Fig. 13, during the first session, high levels of synchronization are observed at low and high frequencies after the recognition stimulus. Compared to the final monitoring session, power levels of synchronization are observed at high frequencies between 13 and 20 Hz after the recognition stimulus. Figures 14 and 15 correspond to the fourth subject. In Fig. 14, high levels of synchronization are observed at high frequencies between 12 and 20 Hz after 200 ms of the encoding stimulus onset during the final monitoring session while in

Monitoring of Auditory Discrimination Therapy for Tinnitus Treatment Based…

45

Fig. 11 Subject 2. (ERD/ERS) responses over the frontal lobe before (a) and after (b) the ADT- based treatment during the auditory material recognition event

Fig. 12 Subject 3. (ERD/ERS) responses over the frontal lobe before (a) and after (b) the ADT- based treatment during the auditory material encoding event

the first monitoring session it is observed attenuated power over the whole spectrum. Furthermore, In Fig. 15, high levels of synchronization are observed at low and high frequencies between 8 and 20 Hz after the recognition stimulus onset during the final monitoring session while in the first monitoring session it is observed high beta energy after 700 ms of the recognition stimulus. Figures 16 and 17 correspond to the fifth subject. In Fig. 16, high levels of synchronization are observed at high frequencies between 15 and 30 Hz before the auditory encoding stimulus and medium levels of synchronization after 350 ms of the auditory encoding stimulus onset between 20 and 30 Hz during the first monitoring session while in the final monitoring session there are high levels of synchronization between 20 and 30 Hz after 100 ms of the auditory encoding event onset. In addition, In Fig. 17, high levels of synchronization are observed at high frequencies between 25 and 30 Hz before the auditory recognition stimulus during the final

46

I. G. Rodríguez-León et al.

Fig. 13 Subject 3. (ERD/ERS) responses over the frontal lobe before (a) and after (b) the ADT- based treatment during the auditory material recognition event

Fig. 14 Subject 4. (ERD/ERS) responses over the frontal lobe before (a) and after (b) the ADT- based treatment during the auditory material encoding event

Fig. 15 Subject 4. (ERD/ERS) responses over the frontal lobe before (a) and after (b) the ADT- based treatment during the auditory material recognition event

Monitoring of Auditory Discrimination Therapy for Tinnitus Treatment Based…

47

Fig. 16 Subject 5. (ERD/ERS) responses over the frontal lobe before (a) and after (b) the ADT- based treatment during the auditory material encoding event

Fig. 17 Subject 5. (ERD/ERS) responses over the frontal lobe before (a) and after (b) the ADT- based treatment during the auditory material recognition event

monitoring session while in the first monitoring session it is observed medium beta energy before and after 250 ms of the auditory recognition stimulus. Figures 18 and 19 correspond to the sixth subject. In Fig. 18, high levels of synchronization are observed at high frequencies between 13 and 30 Hz after the auditory encoding stimulus during the final monitoring session while in the first monitoring session there is attenuated power over the whole spectrum. Furthermore, in Figs. 19, 20, and 21 high levels of synchronization are observed at alpha and beta frequencies after 850 ms of the auditory recognition stimulus onset during the first monitoring session while in the final monitoring session it is observed medium beta energy before and after the auditory recognition stimulus.

48

I. G. Rodríguez-León et al.

Fig. 18 Subject 6. (ERD/ERS) responses over the frontal lobe before (a) and after (b) the ADT- based treatment during the auditory material encoding event

Fig. 19 Subject 6. (ERD/ERS) responses over the frontal lobe before (a) and after (b) the ADT- based treatment during the auditory material recognition event

Fig. 20 Subject 7. (ERD/ERS) responses over the frontal lobe before (a) and after (b) the ADT- based treatment during the auditory material encoding event

Monitoring of Auditory Discrimination Therapy for Tinnitus Treatment Based…

49

Fig. 21 Subject 7. (ERD/ERS) responses over the frontal lobe before (a) and after (b) the ADT- based treatment during the auditory material recognition event

3.3 Quantification of ERD/ERS Responses On the other hand, in Table 2, we can see p-values because of the Kruskal-Wallis test to statistically verify the existence of significant differences by frequency bands in 11 patients with tinnitus before and after the ADT-based treatment under two experimental conditions: encoding and recognition of acoustic material. Estimations in bold fonts refer to those p-values under 0.05. These represent significant differences in the responses of the ERD/ERS maps between the sessions undertaken before and after the ADT-based treatment during both tasks by each brain frequency band.

3.4 Cross-Sectional Analysis (Tinnitus Versus Control Group) ERD/ERS maps over the frontal lobe of encoding and recognition auditory material events were averaged by study group: tinnitus and control group with the aim of analyzing the central tendency of the levels of neural synchronization before and after the sound-based treatment through a cross-sectional analysis comparing the patients group with regard to the control subjects. Figures 22 and 23 correspond to the mean of the tinnitus group. In Fig. 22, high levels of synchronization are observed at high frequencies between 12 and 25 Hz after the auditory encoding stimulus during the last monitoring session while in the first monitoring session there are medium levels of synchronization between 12 and 30 Hz after 500 ms of the auditory encoding event onset. In addition, In Fig. 23, high levels of synchronization are observed at low and high frequencies between 8 and 30 Hz before and after the auditory recognition stimulus during the initial

50

I. G. Rodríguez-León et al.

Table 2 P-values resulting from the Kruskal-Wallis test to obtain the statistical significant differences between patients with tinnitus before and after the ADT-based treatment Subjects Subject 1

Subject 2

Subject 3

Subject 4

Subject 5

Subject 6

Subject 7

Subject 8

Subject 9

Subject 10

Subject 11

EEG rhythms Theta rhythm Alpha rhythm Beta rhythm Theta rhythm Alpha rhythm Beta rhythm Theta rhythm Alpha rhythm Beta rhythm Theta rhythm Alpha rhythm Beta rhythm Theta rhythm Alpha rhythm Beta rhythm Theta rhythm Alpha rhythm Beta rhythm Theta rhythm Alpha rhythm Beta rhythm Theta rhythm Alpha rhythm Beta rhythm Theta rhythm Alpha rhythm Beta rhythm Theta rhythm Alpha rhythm Beta rhythm Theta rhythm Alpha rhythm Beta rhythm

Encoding of acoustic material P > 0.05 P > 0.05 P