Communication, Signal Processing & Information Technology 9783110594003, 9783110591200

The volume is dedicated to fields related to design, modeling, fundamentals and application of communication systems. Fo

197 66 4MB

English Pages 166 Year 2020

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Communication, Signal Processing & Information Technology
 9783110594003, 9783110591200

Table of contents :
Contents
The advent of Tight Frame-Based Time-frequency Representations on Speech Reconstruction Stability
Optimal Bounding Ellipsoid Algorithms for adaptive blind equalizations over AWGN and multipath fading Channels
DVB-T Systems Performance Enhancement
Trust History-based Routing Algorithm to Improve the Quality of Service in Wireless Sensor Network
Enhancing the Performance of OFDM Systems-Based PAPR Reduction
Bi-axis control algorithm for the generation of manuscript shapes from mathematical handwriting model
Design and Implementation of an Inspection Class of Underwater Vehicles
MIMO Pre-Equalization With Decision Feedback for High-Speed Chip-to-Chip Communication
A Comparative Analysis between Energy and Maximum Eigenvalue based detection in Cognitive Radio Systems
Developing a New Method for the Detection of the Cancerous Breast Mass
Prosody-based speech synthesis by unit selection for Arabic

Citation preview

Faouzi Derbel, Nabil Derbel, Olfa Kanoun (Eds.) Communication, Signal Processing and Information Technology

Advances in Signals, Systems and Devices

| Edited by Olfa Kanoun, University of Chemnitz, Germany

Volume 12

Communication, Signal Processing and Information Technology | Edited by Faouzi Derbel, Nabil Derbel, Olfa Kanoun

Editors of this Volume Prof. Dr.-Ing. Faouzi Derbel Leipzig University of Applied Sciences Chair of Smart Diagnostic and Online Monitoring Wächterstrasse 13 04107 Leipzig, Germany [email protected]

Prof. Dr.-Ing. Olfa Kanoun Technische Universität Chemnitz Chair of Measurement and Sensor Technology Reichenhainer Strasse 70 09126 Chemnitz [email protected]

Prof. Dr.-Eng. Nabil Derbel University of Sfax Sfax National Engineering School Control & Energy Management Laboratory 1173 BP, 3038 SFAX, Tunisia [email protected]

ISBN 978-3-11-059120-0 e-ISBN (PDF) 978-3-11-059400-3 e-ISBN (EPUB) 978-3-11-059206-1 ISSN 2365-7493 Library of Congress Control Number: 2019957928 Bibliographic information published by the Deutsche Nationalbibliothek The Deutsche Nationalbibliothek lists this publication in the Deutsche Nationalbibliografie; detailed bibliographic data are available on the Internet at http://dnb.dnb.de. © 2020 Walter de Gruyter GmbH, Berlin/Boston Typesetting: Newgen Publishing Europe Printing and binding: CPI books GmbH, Leck www.degruyter.com

Advances in Systems, Signals and Devices Editors in Chief: Systems, Analysis & Automatic Control Prof. Dr.-Eng. Nabil Derbel ENIS, University of Sfax, Tunisia [email protected]

Power Systems & Smart Energies Prof. Dr.-Ing. Faouzi Derbel Leipzig Univ. of Applied Sciences, Germany [email protected]

Communication, Signal Processing & Information Technology Prof. Dr.-Ing. Faouzi Derbel Leipzig Univ. of Applied Sciences, Germany [email protected]

Sensors, Circuits & Instrumentation Systems Prof. Dr.-Ing. Olfa Kanoun Technische Universität Chemnitz, Germany [email protected]

Editorial Board Members: Systems, Analysis & Automatic Control Dumitru Baleanu, Çankaya University, Ankara, Turkey Ridha Ben Abdennour, Engineering School of Gabès, Tunisia Naceur Benhadj, Braïek, ESSTT, Tunis, Tunisia Mohamed Benrejeb, Engineering School of Tunis, Tunisia Riccardo Caponetto, Universita’ degli Studi di Catania, Italy Yang Quan Chen, Utah State University, Logan, USA Mohamed Chtourou, Engineering School of Sfax, Tunisia Boutaïeb Dahhou, Univ. Paul Sabatier Toulouse, France Gérard Favier, Université de Nice, France Florin G. Filip, Romanian Academy Bucharest Romania Dorin Isoc, Tech. Univ. of Cluj Napoca, Romania Pierre Melchior, Université de Bordeaux, France Faïçal Mnif, Sultan qabous Univ. Muscat, Oman Ahmet B. Özgüler, Bilkent University, Bilkent, Turkey Manabu Sano, Hiroshima City Univ. Hiroshima, Japan Abdul-Wahid Saif, King Fahd University, Saudi Arabia José A. Tenreiro Machado, Engineering Institute of Porto, Portugal Alexander Pozniak, Instituto Politecniko, National Mexico Herbert Werner, Univ. of Technology, Hamburg, German Ronald R. Yager, Mach. Intelligence Inst. Iona College USA Blas M. Vinagre, Univ. of Extremadura, Badajos, Spain Lotfi Zadeh, Univ. of California, Berkeley, CA, USA

Power Systems & Smart Energies Sylvain Allano, Ecole Normale Sup. de Cachan, France Ibrahim Badran, Philadelphia Univ., Amman, Jordan Ronnie Belmans, University of Leuven, Belgium Frdéric Bouillault, University of Paris XI, France Pascal Brochet, Ecole Centrale de Lille, France Mohamed Elleuch, Tunis Engineering School, Tunisia Mohamed B. A. Kamoun, Sfax Engineering School, Tunisia Mohamed R. Mékidèche, University of Jijel, Algeria Bernard Multon, Ecole Normale Sup. Cachan, France Francesco Parasiliti, University of L’Aquila, Italy Manuel Pérez,Donsión, University of Vigo, Spain Michel Poloujadoff, University of Paris VI, France Francesco Profumo, Politecnico di Torino, Italy Alfred Rufer, Ecole Polytech. Lausanne, Switzerland Junji Tamura, Kitami Institute of Technology, Japan

Communication, Signal Processing & Information Technology Til Aach, Achen University, Germany Kasim Al-Aubidy, Philadelphia Univ., Amman, Jordan Adel Alimi, Engineering School of Sfax, Tunisia Najoua Benamara, Engineering School of Sousse, Tunisia Ridha Bouallegue, Engineering School of Sousse, Tunisia Dominique Dallet, ENSEIRB, Bordeaux, France Mohamed Deriche, King Fahd University, Saudi Arabia Khalifa Djemal, Université d’Evry, Val d’Essonne, France Daniela Dragomirescu, LAAS, CNRS, Toulouse, France Khalil Drira, LAAS, CNRS, Toulouse, France Noureddine Ellouze, Engineering School of Tunis, Tunisia Faouzi Ghorbel, ENSI, Tunis, Tunisia Karl Holger, University of Paderborn, Germany Berthold Lankl, Univ. Bundeswehr, München, Germany George Moschytz, ETH Zürich, Switzerland Radu Popescu-Zeletin, Fraunhofer Inst. Fokus, Berlin, Germany Basel Solimane, ENST, Bretagne, France Philippe Vanheeghe, Ecole Centrale de Lille France

Sensors, Circuits & Instrumentation Systems Ali Boukabache, Univ. Paul, Sabatier, Toulouse, France Georg Brasseur, Graz University of Technology, Austria Serge Demidenko, Monash University, Selangor, Malaysia Gerhard Fischerauer, Universität Bayreuth, Germany Patrick Garda, Univ. Pierre & Marie Curie, Paris, France P. M. B. Silva Girão, Inst. Superior Técnico, Lisboa, Portugal Voicu Groza, University of Ottawa, Ottawa, Canada Volker Hans, University of Essen, Germany Aimé Lay Ekuakille, Università degli Studi di Lecce, Italy Mourad Loulou, Engineering School of Sfax, Tunisia Mohamed Masmoudi, Engineering School of Sfax, Tunisia Subha Mukhopadhyay, Massey University Turitea,New Zealand Fernando Puente León, Technical Univ. of München, Germany Leonard Reindl, Inst. Mikrosystemtec., FreiburgGermany Pavel Ripka, Tech. Univ. Praha, Czech Republic Abdulmotaleb El Saddik, SITE, Univ. Ottawa, Ontario, Canada Gordon Silverman, Manhattan College Riverdale, NY, USA Rached Tourki, Faculty of Sciences, Monastir, Tunisia Bernhard Zagar, Johannes Kepler Univ. of Linz, Austria

Advances in Systems, Signals and Devices Volume 1 N. Derbel (Ed.) Systems, Automation, and Control, 2016 ISBN 978-3-11-044376-9, e-ISBN 978-3-11-044843-6, e-ISBN (EPUB) 978-3-11-044627-2, Set-ISBN 978-3-11-044844-3 Volume 2 O. Kanoun, F. Derbel, N. Derbel (Eds.) Sensors, Circuits and Instrumentation Systems, 2016 ISBN 978-3-11-046819-9, e-ISBN 978-3-11-047044-4, e-ISBN (EPUB) 978-3-11-046849-6, Set-ISBN 978-3-11-047045-1 Volume 3 F. Derbel, N. Derbel, O. Kanoun (Eds.) Power Systems & Smart Energies, 2016 ISBN 978-3-11-044615-9, e-ISBN 978-3-11-044841-2, e-ISBN (EPUB) 978-3-11-044628-9, Set-ISBN 978-3-11-044842-9 Volume 4 F. Derbel, O. Kanoun, N. Derbel (Eds.) Communication, Signal Processing & Information Technology, 2016 ISBN 978-3-11-044616-6, e-ISBN 978-3-11-044839-9, e-ISBN (EPUB) 978-3-11-043618-1, Set-ISBN 978-3-11-044840-5 Volume 5 N. Derbel, F. Derbel, O. Kanoun (Eds.) Systems, Automation, and Control, 2017 ISBN 978-3-11-046821-2, e-ISBN 978-3-11-047046-8, e-ISBN (EPUB) 978-3-11-046850-2, Set-ISBN 978-3-11-047047-5 Volume 6 O. Kanoun, N. Derbel, F. Derbel (Eds.) Sensors, Circuits and Instrumentation Systems, 2017 ISBN 978-3-11-044619-7, e-ISBN 978-3-11-044837-5, e-ISBN (EPUB) 978-3-11-044624-1, Set-ISBN 978-3-11-044838-2 Volume 7 F. Derbel, N. Derbel, O. Kanoun (Eds.) Power Systems & Smart Energies, 2018 ISBN 978-3-11-046820-5, e-ISBN 978-3-11-047052-9, e-ISBN (EPUB) 978-3-11-044628-9, Set-ISBN 978-3-11-047053-6

Volume 8 F. Derbel, N. Derbel, O. Kanoun (Eds.) Communication, Signal Processing & Information Technology, 2018 ISBN 978-3-11-046822-9, e-ISBN 978-3-11-047038-3, e-ISBN (EPUB) 978-3-11-046841-0, Set-ISBN 978-3-11-047039-0 Volume 9 N. Derbel, F. Derbel, O. Kanoun (Eds.) Systems, Automation, and Control, 2019 ISBN 978-3-11-059024-1, e-ISBN 978-3-11-059172-9, e-ISBN (EPUB) 978-3-11-059031-9 Volume 10 O. Kanoun, N. Derbel, F. Derbel (Eds.) Sensors, Circuits and Instrumentation Systems, 2019 ISBN 978-3-11-059025-8, e-ISBN 978-3-11-059256-6, e-ISBN (EPUB) 978-3-11-059128-6 Volume 11 F. Derbel, N. Derbel, O. Kanoun (Eds.) Power Systems & Smart Energies, 2020 ISBN 978-3-11-059117-0, e-ISBN 978-3-11-059392-1, e-ISBN (EPUB) 978-3-11-059211-5

Contents S. Bousselmi and K. Ouni The advent of Tight Frame-Based Time-frequency Representations on Speech Reconstruction Stability | 1 A. Moussa, M. Pouliquen, M. Frikel, S. Bedoui, K. Abderrahim and M. M’Saad Optimal Bounding Ellipsoid Algorithms for adaptive blind equalizations over AWGN and multipath fading Channels | 17 Omar Daoud,Qadri Hamarsheh, and Ahlam Damati DVB-T Systems Performance Enhancement | 37 A. Jedidi Trust History-based Routing Algorithm to Improve the Quality of Service in Wireless Sensor Network | 47 Omar Daoud,Qadri Hamarsheh, and Ahlam Damati Enhancing the Performance of OFDM Systems-Based PAPR Reduction | 57 I. Mahmoud, I. Chihi, A. Abdelkrim, M. Benrejeb Bi-axis control algorithm for the generation of manuscript shapes from mathematical handwriting model | 69 O. Triki, I. Hbiri and H. Trabelsi Design and Implementation of an Inspection Class of Underwater Vehicles | 89 L. Jacobs, M. Guenach and M. Moeneclaey MIMO Pre-Equalization With Decision Feedback for High-Speed Chip-to-Chip Communication | 103 A. Maali, H. Semlali, N. Boumaaz and A. Soulmani A Comparative Analysis between Energy and Maximum Eigenvalue based detection in Cognitive Radio Systems | 117 N. Smaoui and H. Amari Developing a New Method for the Detection of the Cancerous Breast Mass | 127 R. Abdelmalek and Z. Mnasri Prosody-based speech synthesis by unit selection for Arabic | 139

S. Bousselmi and K. Ouni

The advent of Tight Frame-Based Time-frequency Representations on Speech Reconstruction Stability Abstract: The main advantage of tight frame-based time-frequency representations is to warrant a robust and stable signals reconstruction. In this paper we propose to study the speech reconstruction stability, in distortion sense, of three tight frame-based representations: the tight framelet transform, the higher density tight framelet transform and the tight framelet packet transform. The reconstruction stability is assessed using objective criteria. The results are compared with the critically wavelet packet transform. We perceive that the representations satisfying the frame theory conditions are more stable and resistant to distortions. Keywords: Speech reconstruction, wavelet frame theory, tight framelet packet transform, wavelet packet transform, tight framelet transform, higher density tight framelet transform. Classification: 65C05, 62M20, 93E11, 62F15, 86A22

1 Introduction The critically sampled wavelet packet transform is a most widely used time-frequency representation for analyzing non-stationnary signals with finite energy like speech and audio [1]. Despite the importance of this transformation several researchers in applied mathematics and signal processing promote the use of wavelet type frames instead of wavelet type basis [2, 3]. This amounts to the performance gain for many applications [4–6]. The wavelet frames can be obtained by a multitude of ways [7, 8]. The undecimated discrete wavelet transform generates a wavelet frame by removing the down-samplig operators of the critically sampled filter bank [9, 10]. The dual tree complex wavelet transform is another example of wavelet frame which is implemented with two filter banks operating in parallel [11]. The most effective method consist to use an iterated oversampled filter bank [12, 13]. In fact, with such method we can have more degree of freedom for filter design which offers more flexibility in attaining attractive property: redundancy, symmetry, linear phase,

S. Bousselmi and K. Ouni: Research Laboratory Smart Electricity ICT, SEICT, National Engineering School of Carthage, University of Carthage, Tunisia, emails: [email protected], [email protected]

https://doi.org/10.1515/9783110594003-001

2 | S. Bousselmi and K. Ouni smoothness, better time-frequency localization, robust and stable reconstruction [8, 14]. The redundancy procreates a dense time-frequency plane which results in an approximate shift invariance. Symmetry improves the processing at the edges of blocks. The phase linearity suppresses the frequency distortions. The smoothness entails a more compact time-frequency representation. In this paper, the type of wavelet frame considered is based on an oversampled tight frame filter bank. A tight frame representations generate more data at the output than at the input, where the redundancy rates depends on the number of filter and filtering stage. The tight framelet transform is obtained by a successive iteration on the low pass filter output from a three channel oversampled filter bank [8, 15]. It produces at the output the double of the input coefficients. The higher density tight framelet transform is based also on a three channel oversampled filter bank. However, unlike the tight framelet transform the third channel is undecimated [16, 17]. These transforms permits significant improved performances in many speech processing applications. However, they do not allow a sub-band decomposition according to the function of the human ear. For that reason a generalization which equally decompose the high-pass bands is more adequate, it is called the tight framelet packet transform [18–22]. The purpose of this paper is to compare the speech reconstruction stability induced by three tight frame representations which are the tight framelet transform, the higher density tight framelet transform and the tight framelet packet transform. A comparison with the critically sampled wavelet packet transform is also conducted. The reconstruction stability is studied in term of distortion using different objective measures. In section 2 the tight wavelet frame theory is briefly introduced. In section 3, the tight frame-based representations associated to a three channel oversampled filter bank are presented. An application of the three tight frame transforms in speech reconstruction is depicted in section 4. In section 5 the simulation results are discussed. The section 6 conclude the paper.

2 Proposed tight frame representation 2.1 Overview of wavelet frame The sequence of functions {(ψ imn )m,n∈Z } wavelet frame for any signal f ∈ established [14, 24, 25]:

L2 (R),

N i=1

where ψ imn (t) = 2m/2 ψ i (2m t − n) is called

when for A ≤ B ≺ ∞, the following formula is

󵄨 󵄨 2 A ‖f ‖2 ≤ ∑ 󵄨󵄨󵄨⟨f, ψ mn 󵄨󵄨󵄨⟩ ≤ B ‖f ‖2

(1)

m,n

A and B are called frame bounds. We have tight frames, when A = B [26]. To generate a wavelet frame we propose to use the tight frame filter bank shown in Fig. 1. In this filter

Tight frame-based time-frequency representation on speech reconstruction stability

|

3

Fig. 1: A three-band tight frame filter bank decimated by 2.

bank each band is decimated by 2, and since the frame is tight the synthesis filters are the time-reversed versions of the analysis filters [14].

2.2 Tight framelet transform The tight framelet transform (TFT) is a redundant time-frequency representation generated by iteration of the three-channel tight frame filter bank presented in Fig. 1. Similarly to the conventional wavelet transform, the tight framelet transform is associated with a sub-band filtering, but the only difference is to consider two high-pass bands [8, 14]. This Allows us to have an effective separation of high-pass frequency components from the low-pass frequency components and provides more flexibility in the design of tight frame filters. In Fig. 2 we present a tight framelet decomposition at level 3.

2.3 Higher density tight framelet transform The higher density tight framelet transform (HDTFT) is constructed using a three band tight frame filter bank illustrated in Fig. 3, where H1 , H2 and H3 are lowpass, band-pass, and high-pass filters, respectively [16]. The first two bands are decimated

Fig. 2: Tight framelet decomposition at level 3.

4 | S. Bousselmi and K. Ouni

Fig. 3: Tight frame filter banks for higher-density framelet transform

by two while the third channel is remained undecimated. This transformation is more redundant than the tight framelet transform and expands 3∗ N coefficients for a signal of N coefficients.

2.4 Tight framelet packet transform The Tight framelets packet transform TFPT is a time-frequency decomposition builted from a repeated treatment in the low-pass band and in the two high-pass bands of the tight frame filter bank presented in Fig. 4 [19, 23]. Figure 5 represents the tight framelet packet decomposition at level 3, where the indices 0, 1 and 2 associated with each leaf/node of the tree correspond to the analysis filters {h0 , h1 , h2 } decimated by 2. The auxiliary nodes related to filter h1 are terminal and are not part of the analysis step. The nodes related to the filters h0 and h2 are called dyadic nodes. The principal condition for a framelets packet tree to be admissible, is that each dyadic node has 0 or 3 children, and each auxiliary node has 0 children. For each decomposition level, each dyadic node represents an analysis step that divides each subsignal into two separate frequency bands. The frequency bandwidth is equal to the half of the bandwidth of the preceding level.

Fig. 4: Higher density tight framelet decomposition at level 3.

Tight frame-based time-frequency representation on speech reconstruction stability

|

5

Fig. 5: Tight framelets packet decomposition at level 3

3 Application in speech reconstruction We propose to analyse the effect of tight frame-based time-frequency representations on speech reconstruction stability, in the distortion sense. Three types of representation are considered: the tight framelet transform (TFT), the tight framelet packet transform (TFPT) and the higher density tight framelet transform (HDTFT). A comparative study with the conventional wavelet packet transform is carried out. To compute the wavelet coefficients the Daubechies wavelet at order 4 is adopted. We use the filters designed by Selesnick [8] to esteem the framelet and framelet packet coefficients. The coefficients for the higher density tight framelet transform are calculated using a set of filters with 1 vanishing moment designed by Selesnick [16]. For all transformations the speech signal is decomposed at level 5. The estimation of the wavelet packet and framelet packet coefficients depends on the decomposition tree selection. Subjective assessment is considerable in speech processing, hence we must take account of the psychoacoustic properties in the decomposition choice. In fact, the human auditory system is based on the critical band analysis, so we choose a most close decomposition to the critical bands. Figures 6 and 7 show respectively the decomposition into 17 sub-bands for the TFPT and the WPT covering the frequency range [0-4000]Hz. We present in Fig. 8 an example of speech segment reconstructed using all envisaged transformations for 60 % of the most energetic coefficients. The same is presented in Fig. 9 but considering 80 % of coefficients. We note that all tight frame-based representations have a better reconstruction quality than the wavelet packet transform. Figure 10 present an example of speech sentence from TIMIT database [27] reconstructed for 70% of the most energetic coefficients using respectively the WPT (a), the TFT (b), the HDTFT(c) and the TFPT (d). We discern that the shape of the reconstructed speech sentence is identical to the original for all tight frame transformation. However, the reconstructed sentence using the WPT presents some distortions. Indeed, the reconstruction error curve show a high level of error than the other transformations.

6 | S. Bousselmi and K. Ouni

Fig. 6: Level 5 decomposition tree corresponding to the TFPT with 17 critical bands covering [0-4000]Hz

Fig. 7: Level 5 decomposition tree corresponding to the WPT with 17 critical bands covering [0-4000]Hz

Tight frame-based time-frequency representation on speech reconstruction stability |

Tab. 1: Frequency intervals of 17 sub-bands Sub-band number 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

Frequency interval

Sub-band width Hz

0-125 125-250 250-375 375-500 500-625 625-750 750-875 875-1000 1000-1250 1250-1500 1500-1750 1750-2000 2000-2250 2250-2500 2500-3000 3000-3500 3500-4000

125 125 125 125 125 125 125 125 250 250 250 250 250 250 500 500 500

Fig. 8: Original and reconstructed speech segments for WPT, TFPT, TFT and HDTFT using 60% of the most energetic coefficients

7

8 | S. Bousselmi and K. Ouni

Fig. 9: Original and reconstructed speech segments for WPT, TFPT, TFT and HDTFT using 80% of the most energetic coefficients

Tab. 2: Values of NRMSE for different filters related to the WPT and the HDTFT

50% 60% 70% 80% 90%

1vm 0.42 0.36 0.29 0.19 0.11

HDTFT 3vm 0.38 0.32 0.27 0.19 0.12

4vm 0.37 0.32 0.27 0.19 0.12

db4 0.89 0.84 0.79 0.67 0.51

db16 0.89 0.84 0.80 0.67 0.51

coif4 0.89 0.84 0.79 0.67 0.51

WPT coif16 0.88 0.84 0.80 0.67 0.51

sym4 0.88 0.84 0.79 0.67 0.52

sym16 0.89 0.844 0.80 0.67 0.51

TFPT

TFT

0.47 0.44 0.43 0.37 0.29

0.44 0.38 0.31 0.24 0.19

Moreover, the spectrogram presented in Fig. 11 of this sentence reconstructed by WPT shows a degradation especially at low frequency. The spectrograms of the reconstructed sentence using respectively the TFT, the HDTFT and the TFPT reveal a good reconstruction quality and they are very similar to the spectrogram of the original sentence. The shape of the reconstructed signal does not give a good accuracy on quality. For a more accurate assessment of the quality a simulation based on objective criteria are more appropriate.

Tight frame-based time-frequency representation on speech reconstruction stability |

(b)

(a)

(c)

(d)

Fig. 10: Speech sentence reconstructed respectively with 70% of the most energetic coefficients using the WPT (a), the TFT (b), the HDTFT(c) and the TFPT (d)

Tab. 3: Values of PSNR for different filters related to the WPT and the HDTFT

50% 60% 70% 80% 90%

1vm 68.9 70.3 72.1 74.9 80.1

HDTFT 3vm 70.8 71.9 73.4 75.8 80.4

4vm 70.8 72.0 73.4 75.7 80.2

db4 61.3 61.8 62.5 64.3 67.2

db16 61.4 61.8 62.5 64.3 67.2

coif4 61.3 61.8 62.5 64.2 67.2

WPT coif16 61.4 61.8 62.5 64.3 67.3

sym4 61.3 61.8 62.5 64.2 67.1

sym16 61.4 61.8 62.5 64.3 67.3

TFPT

TFT

67.1 67.8 68.3 69.6 72.1

68.7 69.7 71.1 73.2 75.3

9

10 | S. Bousselmi and K. Ouni

(a)

(b)

(c)

(d)

(e)

Fig. 11: Spectrogram of the original and reconstructed signal for different representations: (a) original signal. (b) Spectrogram of the reconstructed signal using the TFT. (c) Spectrogram of the reconstructed signal using the HDTFT. (d) Spectrogram of the reconstructed signal using the TFPT. (e) Spectrogram of the reconstructed signal using the WPT

Tight frame-based time-frequency representation on speech reconstruction stability

|

11

4 Simulation results The purpose of this paper is to investigate the advantage of the time-frequency representations derived from tight frame theory in the improvement of speech reconstruction stability. Three tight frame-based transformations are considered, which are the tight framelet transform (TFT), the tight framelet packet transform (TFPT) and the higher density tight framelet transform (HDTFT). A comparison with the critically wavelet packet transform (WPT) is conducted. The stability is studied in distortion term using different objective criteria: the NRMSE (normalized root mean square error), the PSNR (peak signal to noise ratio), the SNRseg ( segmental signal to noise ratio), the fwSNRseg (frequency weighted segmental signal to noise ratio [28]), the score PESQ (perceptual evaluation of speech quality [29]) and the score MOS-LQO (MOS listening quality objective) [30]. We carried out this study by considering the TIMIT database [27]. The speech sentences obtained from this database are resampled at a frequency of 8 kHz and divided into blocks of 256 samples. As in the previous section the filters conceived by Selesnick [8] are used to compute the TFT and the TFPT. A set of filters with 1 vanishing moment are used for HDTFT. The Daubechies wavelet of order 4 is used for WPT. In a first time, we studied the influence of the decomposition level on reconstruction stability, where levels between 2 and 5 are considered. Figure 12 shows respectively the values of NRMSE, PSNR and SNRseg at different levels for the proposed transformations, where 90 % of the most energetic coefficients are retained. We note that the performances of the tight frame representations are higher than those of WPT. Besides, the HDTFT is more robust to distortion which leads to a good reconstruction stability. In a second time, we examine the influence of the retained coefficients in the speech reconstruction quality. Different percentages of the most energetic coefficients are used in speech signal synthesis. Figure 13 shows respectively the values of the NRMSE, the PSNR and the SNRseg, the fwSNRseg, the PESQ and the MOS-LQO using several percentages of transform coefficients between 50% and 90% for all proposed representations. The obtained results reveal a better performances for all the percentages of coefficients using the tight frame representations. In fact, for these presentations, all values of the objective criteria are well above those of the WPT. Moreover, for the small percentages of coefficients, these transformations generate a fairly good reconstructed quality with a score of 3.5. However, a very good reconstructed quality is obtained for percentages greater than 90% with a score upper than 4. The scores given by the WPT are less than 3, which leads to unsatisfactory quality. Furthermore, for 90 % of coefficients the reconstruction quality using the TFPT exceed the other transformations, where best MOS-LQO score is obtained. Several filter choices can be considered for the WPT and the HDTFT. We present in Tabs. 1–6, the same objective criteria but for different filters. For the WPT the Daubechies wavelet of order 4 and 16, the Coiflet wavelet of order 4 and 16 and the Symlet wavelet of order

12 | S. Bousselmi and K. Ouni

Fig. 12: NRMSE, PSNR and SNRseg for different decomposition level using the TFT, the HDTFT, the TFPT and WPT.

4 and 16 are used. For the HDTFT a tight frame with respectively 1, 2 and 3 vanishing moments are used. According to these tables, we notice that the results obtained by the WPT are identical for the different wavelet filters, where a poor reconstruction quality is produced relative to the tight frame transformations. Besides, the reconstruction quality using the HDTFT increases by increasing the number of vanishing moments.

5 Conclusion In this paper, we studied a speech reconstruction stability based on wavelet frame theory. Three type of tight wavelet frame representation are proposed: the tight framelet transform, the tight framelet packet transform and the higher density tight

Tight frame-based time-frequency representation on speech reconstruction stability

|

Fig. 13: Values of NRMSE, PSNR, SNRseg, fwSNRseg, PESQ, MOS-Lqo for different percentage of coefficients using the TFT, the HDTFT, the TFPT and WPT.

13

14 | S. Bousselmi and K. Ouni Tab. 4: Values of SNRseg for different filters related to the WPT and the HDTFT

50% 60% 70% 80% 90%

1vm 9.6 11.1 12.7 15.5 20.8

HDTFT 3vm 11.7 12.9 14.4 16.6 21.5

4vm 11.8 13.0 14.4 16.6 21.3

db4 1.39 1.87 2.47 4.14 6.90

db16 1.41 1.84 2.40 4.15 7.11

coif4 1.39 1.86 2.45 4.13 7.06

WPT coif16 1.41 1.85 2.39 4.13 7.06

sym4 1.40 1.88 2.46 4.11 6.88

sym16 1.41 1.86 2.40 4.11 7.04

TFPT

TFT

7.1 7.7 8.0 9.4 12.0

9.4 10.4 11.6 13.6 15.8

TFPT

TFT

15.1 15.9 16.3 18.1 21.7

8.1 14.1 18.8 22.3 28.4

TFPT

TFT

3.21 3.36 3.41 3.70 4.08

3.58 3.45 3.58 3.68 3.89

TFPT

TFT

3.14 3.36 3.44 3.82 4.23

3.66 3.49 3.66 3.79 4.04

Tab. 5: Values of fwSNRseg for different filters related to the WPT and the HDTFT

50% 60% 70% 80% 90%

1vm 10.3 14.4 18.6 22.2 27.5

HDTFT 3vm 11.5 15.6 19.5 23.0 28.1

4vm 11.9 15.7 19.4 22.7 27.9

db4 1.01 3.11 3.47 8.36 14.8

db16 0.14 1.93 2.35 7.61 14.1

coif4 0.87 3.05 3.44 8.42 14.9

WPT coif16 0.05 2.22 2.40 7.44 14.1

sym4 0.61 3.01 3.44 8.37 14.9

sym16 0.15 2.11 2.33 7.57 13.8

Tab. 6: Values of PESQ for different filters related to the WPT and the HDTFT

50% 60% 70% 80% 90%

1vm 3.00 3.26 3.48 3.65 3.96

HDTFT 3vm 3.31 3.53 3.72 3.85 4.07

4vm 3.25 3.42 3.59 3.71 4.03

db4 2.25 2.50 2.75 2.65 3.03

db16 2.46 2.77 2.92 2.90 3.16

coif4 2.29 2.58 2.78 2.68 3.05

WPT coif16 2.39 2.67 2.86 2.91 3.19

sym4 2.33 2.56 2.80 2.65 3.05

sym16 2.46 2.72 2.92 2.94 3.15

Tab. 7: Values of MOS-LQO for different filters related to the WPT and the HDTFT

50% 60% 70% 80% 90%

1vm 2.82 3.22 3.53 3.76 4.11

HDTFT 3vm 3.29 3.60 3.84 4.00 4.22

4vm 3.20 3.45 3.68 3.84 4.18

db4 1.85 2.14 2.47 2.32 2.86

db16 2.09 2.49 2.71 2.68 3.07

coif4 1.90 2.23 2.51 2.37 2.90

WPT coif16 2.01 2.35 2.62 2.70 3.11

sym4 1.94 2.21 2.53 2.33 2.90

sym16 2.09 2.42 2.71 2.73 3.05

framelet transform. First, we investigate the effect of the retained coefficients in reconstruction quality, where different percentages of the most energetic coefficients are used. Secondly, we examine the effect of the decomposition level on reconstruction

Tight frame-based time-frequency representation on speech reconstruction stability

|

15

stability. To evaluate the distortion on speech quality we used objective criteria. A comparison relative to the classical wavelet packet transform was conducted. We show that the time-frequency decomposition satisfying the frame conditions ensure a perfect and great reconstruction stability than the critically sampled wavelet packet transform. Furthermore, these new representations can be very advantageous in speech enhancement and coding.

Bibliography [1] S. Mallat. A Wavelet Tour of Signal Processing: The Sparse Way. Academic Press, SanDiego, 3rd edition, 2008. [2] I. Daubechies. Ten Lectures on Wavelets. vol. 61 of CBMS-NSF Regional Conf. Series in Applied Mathematics, SIAM, Philadelphia, 1992. [3] I. Daubechies. The Wavelet Transform Time-Frequency Localization and Signal Analysis. IEEE Transaction on Information Theory, 36:961–1005, September 1990. [4] J. Kovacevic and A. Chebira. Life Beyond Bases: The Advent of Frames (Part I). IEEE Signal Processing Magazine, 24(4):86–104, July 2007. [5] J. Kovacevic and A. Chebira. Life Beyond Bases: The Advent of Frames (Part II). IEEE Signal Processing Magazine, 24(5):115–125, September, 2007. [6] Bin Han. Properties of Discrete Framelet Transforms. Mathematical Modelling of Natural Phenomena, 8(1):18–47, January 2013. [7] I. W. Selesnick. Smooth wavelet tight frames with zero moments. Applied and Computational Harmonic Analysis (ACHA), 10:163–181, 2001. [8] I. W. Selesnick and A. F. Abdelnour. Symmetric wavelet tight frames with two generators. Applied and Computational Harmonic Analysis (ACHA), 17:211–225, 2004. [9] M.J. Shensa. The Discrete Wavelet Transform: Wedding the A Trous and Mallat Algorithms. IEEE Trans. on Signal Processing, 40(10), October, 1992. [10] G. P. Nason and B. W. Silverman. Stationary wavelet transform and some statistical applications. Wavelets and Statistics, Springer-Verlag Lecture Notes, Springer-Verlag, Berlin, :281–299, 1995. [11] N. G. Kingsbury. The dual-tree complex wavelet transform: A new technique for shift invariance and directional filters. Proc. of the Eighth IEEE DSP Workshop, Utah, August 9–12, 1998. [12] H. Bölcskei, F. Hlawatsch and H. G. Feichtinger. Frame-theoretic analysis and design of oversampled filter banks. IEEE Trans. on Signal Processing, 2:409–412, 1996. [13] Z. Cvetkovic, M. Vetterli. Oversampled filter banks. IEEE Trans. on Signal Processing, 46:1245–1255, May 1998. [14] A. F. Abdelnour and I. W. Selesnick. Symmetric Nearly Shift-Invariant Tight Frame Wavelets. IEEE Trans. on Signal Processing, 53:231–239, 2005. [15] I. W. Selesnick. The double density DWT. A. Petrosian and F. G.Meyer editors, Wavelets in Signal and Image Analysis: From Theory to Practice. Kluwer, 2001. [16] I. W. Selesnick. A Higher Density Discrete Wavelet Transform. IEEE Trans. on Signal Processing, 54(8):3039–3048, Aug. 2006. [17] B. Dumitrescu. Optimization of the Higher Density Discrete Wavelet Transform and of Its Dual Tree. IEEE Trans. on Signal Processing, 58(2):583–590, February, 2010. [18] D. LU, and FAN. Q. A class of tight framelet packets. zechoslovak Mathematical Journal, 61:623–639, 2011.

16 | S. Bousselmi and K. Ouni

[19] P. SUQI. Tight wavelet frame packet. PhD Thesis, Departement of Mathematics, National University of Singapore 2009. [20] F. A. Shah. Construction of shift invariant M-band tight framelet packets. Journal of Applied and Engineering Mathematics, 6(1):102, 2016. Academic One File, Accessed 28 December 2017. [21] F. A. Shah. On Stationary and Non-stationary M-band Framelet Packets. Journal of Mathematical Extension, 9(3):39–56, 2015. [22] L. Debnath, F. A. Shah. Explicit construction of M-band tight framelet packets. Analysis Int. Mathematical Journal of Analysis and its Applications, 32(4):281–294.(2012). [23] S. Parker. Cfs: Time-frequency representations of acoustic signals based on redundant wavelet methodologies. PhD Thesis, University of Winsconsin Madison 2005. [24] A. Petukhov. Symmetric framelets. Constructive Approximation, 19:309–328, January 2003. [25] A. Petukhov. Explicit construction of framelets. Applied and Computational Harmonic Analysis, 11:313–327, September 2001. [26] I. Daubechies, B. Han, A. Ron, and Z. Shen. Framelets: MRA-based constructions of wavelet frames. Applied and Computation Harmonic Analysis, 14:1–46, 2003. [27] W. Fisher, G. Dodington, and K. Goudie-Marshall. The TIMIT-DARPA speech recognition research database: Specification and status. DARPA Workshop on Speech Recognition, 1986. [28] P. Loizou. Speech enhancement: Theory and practice. CRC edition 2, 2013. [29] UIT-T P862. Perceptual evaluation of speech quality (PESQ), an objective method for end-to-end speech quality assessment of narrowband telephone networks and speech codecs. 2000. [30] ITU-T P.862.1. Mapping function for transforming P.862 raw result scores to MOS-LQO. Geneva, 2003.

Biographies Souhir Bousselmi received his PhD degree in Electrical Engineering from National Engineering School of Tunis in 2016. He is also a member of the Research Unit Signals and Mechatronic Systems. His research interests include speech and audio coding, time-frequency representations, sparsity and frame theory.

Kaïs Ouni is full professor in electrical engineering at National Engineering School of Carthage, ENICarthage and the Director of the Signals and Mechatronics Systems Research Unit, SMS, at ENICarthage. He has supervised more than thirty PhD and postgraduate Master degree students and worked on projects in areas related to electrical engineering and digital signal processing and has published more than one hundred of research journals and conferences papers.

A. Moussa, M. Pouliquen, M. Frikel, S. Bedoui, K. Abderrahim and M. M’Saad

Optimal Bounding Ellipsoid Algorithms for adaptive blind equalizations over AWGN and multipath fading Channels Abstract: Two adaptive blind equalization algorithms for M-QAM modulation of a Single Input Single Output (SISO) are presented. The second blind equalization algorithm is an extension of the first algorithm. These methods are tested in a wireless environment where the noise is bounded and the channel is multipath fading. The two proposed equalizers are based an Optimal Bounding Ellipsoid (OBE) algorithms which are among the set Membership identification methods. The objective of these approaches is to restore the transmitted data by estimating the coefficients of the equalizer. Thanks to their simplicity of implementation, these algorithms are easy to apply in real time. Through the simulation results, constellation and SER comparison confirm the good performance of the algorithm with respect to the first algorithm. Keywords: Blind equalization, AWGN, Multipath Fading channel, Symbol Error Rate (SER).

1 Introduction 1.1 The works of literature for blind equalization The transmission of data encounters many problems and constraints in wireless communication, it arises generally from these factors: – The transmission Channels: it introduces the phenomenon of propagation called multipath – The external noise from the environment – The different types of interference such as intersymbols interference (ISI) and frequency selective fading

A. Moussa, M. Pouliquen, M. Frikel, S. Bedoui, K. Abderrahim and M. M’Saad: A. Moussa, University of Gabes and University of Caen Normandy, Tunisia and France, email: ali.moussa@ensicaen, M. Pouliquen, University of Caen Normandy, France, email: [email protected], M. Frikel, University of Caen Normandy, France, email: [email protected], S. Bedoui, University of Gabes, Tunisia, email: [email protected], K. Abderrahim, University of Gabes, Tunisia, email: [email protected], M. M’Saad, University of Caen Normandy, France, email: [email protected]

https://doi.org/10.1515/9783110594003-002

18 | A. Moussa et al. To overcome the problem of the data recover due to the presence of these factors, a signal processing technique is placed before the receiver, called an equalizer. The studies of the equalization have received a great attention. In fact, many equalization methods have been proposed. Initially, conventional and traditional equalization methods were proposed, which required training sequences and periodic properties of the channels to realize the design of the equalizer. But, these types of algorithms are not appropriate in some systems and modern applications since the periodic transmission of the training sequences uses the bandwidth resources. Due to limitations of the classic equalization, researchers have switched to another type of equalization called blind equalization, which does not require a known training sequence. Blind equalization has attracted the attention of many researchers over the last few decades. The first blind equalization algorithm has been proposed in [25] by Sato in 1975. Two algorithms are derived from the Sato algorithm: Godard algorithm [13] and the Constant Modulus Algorithm (CMA) [28]. CMA is the most widespread blind equalization method thanks to its simplicity for the implementation and its low computational cost. But, to guarantee the convergence it requires a big data sequence, this leads to high residual error. So, many extensions of CMA are developed to overcome these limitations. In fact, the first and most important algorithm is Multi-Modulus Algorithm (MMA) ([21]) which performs blind equalization and carrier phase recovery at the same time. Improvements were made on MMA to finally give two algorithms [18]: minl1 -MMA and MGauss-MMA, which are better than MMA. In addition, a Stochastic blind equalization approach that uses the Quadratic Distance (SQD) between the pdf at the equalizer output and the known constellation pdf as a cost function, is proposed in [19]. It’s an adaptive algorithm based on theoretical criteria of information and pdf estimation of the transmitted data, it converges faster than CMA with a low computational cost. Another Multi-Modulus algorithm based on the SQD method (MSQD-l p ), is presented in [11]. In recent years, most proposed blind equalization algorithms are based on CMA, SQD or the extensions of the two algorithms, or are presented in the form of a hybrid algorithm that combines two criteria inspired by two previous methods, as indicated in the following contributions and references ([2–4, 9, 10, 12, 20, 24, 29, 30]). The classification of blind equalization methods is done by considering mainly: – The nature of the a priori knowledge about the sequence of the transmitted data: Second Order Statistics (SOS) ([8, 27, 31]) and Higher Order Statistics (HOS) ([4, 19, 28, 30]). – The principle of algorithm: it can be broadly classified into two categories. The first category is adaptive methods that employ a stochastic gradient algorithm. The second category uses statistical methods, which exploit stationary statistics on a large block of receiving data. – Channel and equalizer structures and modulation used.

Optimal bounding ellipsoid algorithms for adaptive blind equalizations |

19

However, the channels of wireless communication are subject of the phenomenon of propagation Multipath fading. This phenomenon is an interesting challenge for researchers in the field of blind equalization. The multipath fading effect on the transmitted signal amplitude is mathematically introduced by certain distributions such as the Rayleigh, Rician, Nakagami and Weibull distributions. In this framework, many algorithms for multi-path channels are proposed in the literature ([1, 7, 14, 32]). The noise is usually modeled as a Gaussian process in the equalization algorithms that we have indicated before due to of the central limit theorem. In some real applications, the noise is non-gaussian such as low-frequency atmospheric noise and underwater acoustic noise. Furthermore, Gaussian noise can cause errors due to poor choice of variables of the Gaussian distribution. Hence the importance of using bounded noise that is less restrictive among existing noise. It should be noted that a few blind equalization algorithms have received little attention in this environment ([15] and [16]).

1.2 Paper contributions In the present work, two blind equalization algorithms (the SBME and e-SBME algorithms) are tested in two environments. Firstly, the channel is assumed to be Additive White Gaussian Noise (AWGN). Secondly, the channel is assumed to be multipath fading, particularly, we choose the Rayleigh and Rician models. The two algorithms are based on an Optimal Bounding Ellipsoid (OBE) algorithm which considers a recursive algorithm ([5, 6, 17, 23, 26]). Note that the two proposed approaches are easily implemented in real time thanks to their simplicity. Note also that few synthesis parameters have to be selected by the searcher.

1.3 Paper organisation The rest of this paper is organised as follows: section 2 is devoted to the System description: the data model is stated in 2.1, assumptions are presented in subsection 2.2 and the model of channel is discussed in subsection 2.3. In section 3, the SBME and e-SBME methods are introduced. To demonstrate and to compare the performance of the two algorithms under different channels, illustrative simulations are presented in section 4 and section 5 concludes the paper.

2 System description 2.1 Data model The block diagram for blind equalization is given in Fig. 1.

20 | A. Moussa et al. nk Transmitter

M-QAM Channel signaling sk system

+

Blind xk equalizer

sk

Fig. 1: Block diagram for blind equalization.

The transmitted data are modulated as a sequence of symbols s k , belonging to a M-QAM constellation. The complex data s k are independent, identically disturbed (i.d.d.) and are transported through a channel. The channel model is described in section 2.3. The channel output x k is disturbed by the unknown noise denoted by {n k }. The blind equalizer operates on the channel output to reduce the ISI. ̂s k denotes the output of the equalizer.

2.2 Assumptions The aim of the two algorithms presented here is to approximate as much as possible the data at the channel output constellation to the transmitted data constellation, this operation is done by estimating the information symbols without the knowledge of the channel parameter and the transmitted symbols. The following assumptions in the sequel are hold: A.1 {n k } is the bounded noise i.e |n k | ≤ δ n , where δ n is an upper bound. A.2 The two information symbol sequences {n k } and {s k } are independent. A.3 The output of the equalizer can be written as follow if n k = 0 L

s k = ∑ w Ti x k−i = ϕ Tk θ∗

(1)

i=0

where {w k } is the impulse response of the equalizer, θ∗ ∈ ℂn×1 is the parameter vector with n = L + 1 is the number of parameters and ϕ k is the observation vector: w0 xk . . θ = ( . ) and ϕ k = ( ... ) wL x k−L ∗

In this paper, we propose the function Q(.) denoted by Q(y) =

arg min 󵄨󵄨 󵄨 {󵄨z − y󵄨󵄨󵄨} z ∈ CQAM 󵄨

(2)

Q(y) corresponds to the close symbol in CQAM from y and CQAM denotes the QAM constellation.

Optimal bounding ellipsoid algorithms for adaptive blind equalizations |

21

2.3 Channels employed The aim of this article is the performance analysis of two OBE algorithms over an AWGN channel and multipath fading channels which are defined hereinafter.

2.3.1 AWGN channel AWGN supplements a white Gaussian noise to the information flowing through it. The data passed through it do not undergo any amplitude loss and phase distortion. The AWGN channel is simply represented as xk = sk + nk

(3)

Where n k is the Gaussian random noise, s k is the output of modulation QAM and x k is the output of the channel.

2.3.2 Multipath Fading Channels The paths of multipath channels come to electromagnetic phenomenon of wave propagation such as reflection, diffraction and scattering occurring between the transmitter and the receiver. The two most distributions able to modeling the amplitudes of the incoming signals by the different paths are Rayleigh and Rice distributions, that’s why the Rayleigh and rice channels are used in this paper. – Rayleigh distribution: is when there is no line-of-sight path between the transmitter and the receiver. In this condition, the magnitude of the received complex envelope, which is denoted by r, is distributed as follows: p(r) =



r2 −r22 e 2σ ; r ⩾ 0 σ2

(4)

where E{r2 } = 2σ2 . Rician distribution: is when there is LOS path between the transmitter and the receiver. It is characterized by factor K which is defined in the following manner: K=

s2 2σ2

(5)

where s2 = m2I + m2Q is a non-centrality parameter. The magnitude of the received complex envelope r is distributed by a Rician distribution as follows: p(r) =

rs −r2 + s2 r2 exp ) I0 ( 2 ) ; r ⩾ 0 ( σ2 2σ2 σ

(6)

22 | A. Moussa et al. where the I0 (.) part is the modified Bessel function of the 0 order which can be written in the following manner: π

I0 (r) =



1 1 ∫ cosh(r sin ξ)dξ = ∫ exp(r sin ξ)dξ π 2π 0

(7)

0

In the context of multi-path channels, the displacement of the receiver body causes a phenomenon that is known as Doppler Effects. This signal widening is shown by the frequency shift that is called the Doppler Frequency Shift. The Doppler Frequency Shift can be written by the following equation: fd =

v cos α = f m cos α λ

(8)

When v is the speed of the receiver body, λ is the wavelength, α is the angle between the displacement direction of the receiver body and the incident of signal and the f m = λv is the maximum Doppler Frequency Shift

3 Blind equalization methods 3.1 Principle The main role of the two blind equalizers presented in this paper, is to try as much as possible to minimize or eliminate the effect of the channel without access to the input source, this can be done by estimating a parameter vector θ such that ̂s k = ϕ Tk θ is in a narrow area around of a point which belongs to CQAM . In this case, θ is calculated as follows (9) Q (ϕ Tk θ) = ϕ Tk θ + ε k Where |ε k | ≤ δ

(10)

With δ < 1. From assumption A.1 and A.3, v k is bounded that’s to say there exists δ v > 0 such that |v k | ≤ δ v With δ v ≤ δ < 1. The SBME and e-SBME are mainly based on the hypothesis in section 2 and especially hypothesis (A.3), the two equations (9) and (10) and on the following property.

Optimal bounding ellipsoid algorithms for adaptive blind equalizations |

23

Q φTk θ∗ = φTk θ∗ + vk δv

δv

Q φTk θ∗ = sk

1

−1 δv

1 δv

Fig. 2: If the noise can not be neglected: ϕ Tk θ∗ is in a neighborhood of Q(ϕ Tk θ∗ ). The neighborhood is defined by a circle of radius δ v < 1 around s k .

Property 1: if {n k } and {w i } satisfy δ v < 1, then we have Q (ϕ Tk θ∗ ) = s k

(11) ◻

From this property, s k can be estimated using Q(.). We have Q (ϕ Tk θ∗ ) = ϕ Tk θ∗ + v k

(12)

This property is more fully explained and illustrated in Figure 2.

3.2 The SMBE algorithm The first algorithm presented in this paper, is the SBME (Set Membership Blind Equalization) algorithm that has been proposed in [22]. The updating of θ̂ k at each k is described in the following equation: θ̂ k = θ̂ k−1 + Γ k ε k/k−1

(13)

When Γ k is a vectorial gain and P k is matrix that appears in Γ k . They are defined as follows: P k−1 ϕ k σ k { Γk = { { λ + ϕ Tk P k−1 ϕ k σ k { { { { (14) { { { { { { 1 { T { P k = λ (I n − Γ k ϕ k ) P k−1

24 | A. Moussa et al. The a priori filter output and the a priori error are defined in the following manner: ̂s = ϕ Tk θ̂ k−1 { { k/k−1 { { { ε k/k−1 = Q(y k/k−1 ) − y k/k−1 Two weighting terms exist in the equation (14): λ and σ k . λ is the forgetting factor that is bounded i.e. 0 < λ ≤ 1. σ k is a factor that monitors the update of θ̂ k , its value varies according to the value of the error (ε k/k−1 ). σ k is defined by: λ 󵄨ε 󵄨󵄨 󵄨 { (󵄨󵄨󵄨󵄨 k/k−1 { δ 󵄨󵄨 − 1) T { ϕ k P k−1 ϕ k { { { 󵄨󵄨 󵄨ε { { 󵄨 if (󵄨󵄨󵄨󵄨 k/k−1 { δ 󵄨󵄨 > 1) { { { { σk = { and (ϕ Tk P k−1 ϕ k > 0) { { { { { { { { { { { { 0 else {

(15)

We define the a posteriori filter output and the a posteriori error as ̂s = ϕ Tk θ̂ k { { k/k { { { ε k/k = Q(y k/k ) − y k/k By multiplying equation (13) by ϕ Tk , the following equation is obtained: ̂s k/k = ̂s k/k−1 + ϕ Tk Γ k ε k/k−1

(16)

According to (16), Q (̂s k/k−1 ) − ̂s k/k can be calculated as follow Q (̂s k/k−1 ) − ̂s k/k = Q (̂s k/k−1 ) − (̂s k/k−1 + ϕ Tk Γ k ε k/k−1 )

(17)

After a factorization, we obtain Q (̂s k/k−1 ) − ̂s k/k = (1 − ϕ Tk Γ k )ε k/k−1

(18)

By replacing Γ k by its expression, the equation (18) becomes Q (̂s k/k−1 ) − ̂s k/k = (1 − ϕ Tk

P k−1 ϕ k σ k λ + ϕ Tk P k−1 ϕ k σ k

) ε k/k−1

(19)

After a simplification, we find: Q (̂s k/k−1 ) − ̂s k/k =

λ

ε k/k−1 λ + ϕ Tk P k−1 ϕ k σ k

(20)

Optimal bounding ellipsoid algorithms for adaptive blind equalizations |

25

The absolute value of this error varies in two cases: 󵄨󵄨 󵄨ε 󵄨 C.1 The first case where 󵄨󵄨󵄨󵄨 k/k−1 δ 󵄨󵄨 > 1 Using (15), we obtain: Q (̂s k/k−1 ) − ̂s k/k =

λ λ + ϕ Tk P k−1 ϕ k ϕ T P λ ϕ k−1 k k

ε k/k−1 󵄨󵄨 ε k/k−1 󵄨󵄨 󵄨󵄨 − 1) (󵄨󵄨󵄨󵄨 󵄨󵄨 󵄨 δ 󵄨

(21)

(21) can be written as: Q (̂s k/k−1 ) − ̂s k/k =

1

|

ε k/k−1 ε k/k−1 δ |

(22)

It follows: |Q (̂s k/k−1 ) − ̂s k/k | = δ

(23)

In addition δ < 1, thus we have: |Q (̂s k/k−1 ) − ̂s k/k | < 1

(24)

Then Q (̂s k/k ) = Q (̂s k/k−1 ) It follows

󵄨󵄨 󵄨 󵄨󵄨ε k/k 󵄨󵄨󵄨 = δ ≤ 1

(25)

󵄨󵄨 󵄨ε 󵄨 C.2 The second case where 󵄨󵄨󵄨󵄨 k/k−1 δ 󵄨󵄨 ≤ 1 and σ k = 0, this gives Γ k = 0 and Q (̂s k/k−1 ) − ̂s k/k = ε k/k−1

(26)

It follows |Q (̂s k/k ) − ̂s k/k | = |ε k/k−1 |

(27)

Knowing that |ε k/k−1 | ≤ δ then |Q (̂s k/k ) − ̂s k/k | ≤ δ

(28)

3.3 The e-SMBE algorithm The second algorithm is the extension of the SBME algorithm, which we denote e-SBME (extented-SBME). In this algorithm, σ k takes three different expressions instead of two in the first algorithm. σ k is defined by 󵄨󵄨 λ 󵄨󵄨 󵄨󵄨 T 󵄨󵄨 ϕ k P k−1 ϕ k 󵄨 σ k = 󵄨󵄨󵄨 󵄨󵄨 1 󵄨󵄨 󵄨󵄨 󵄨󵄨 0

󵄨󵄨 ε k/k−1 󵄨󵄨 󵄨󵄨 − 1) (󵄨󵄨󵄨󵄨 󵄨 󵄨 δ 󵄨󵄨

󵄨󵄨 ε k/k−1 󵄨󵄨 󵄨󵄨 > 1) if (󵄨󵄨󵄨󵄨 󵄨 󵄨 δ 󵄨󵄨 󵄨󵄨 ε k/k−1 󵄨󵄨󵄨 󵄨󵄨 ≤ 1) if (󵄨󵄨󵄨󵄨 󵄨 δ 󵄨󵄨 elsewhere

and (ϕ Tk P k−1 ϕ k > 0) and (ϕ Tk P k−1 ϕ k > 0)

(29)

26 | A. Moussa et al. It is clear that in the case where σ k = 1, this method is similar to the least square algorithm and the error Q (̂s k/k−1 ) − ̂s k/k can be written as Q (̂s k/k−1 ) − ̂s k/k =

λ λ + ϕ Tk P k−1 ϕ k

ε k/k−1

󵄨󵄨 󵄨󵄨 󵄨󵄨 󵄨󵄨 λ 󵄨󵄨 󵄨󵄨 < 1 󵄨󵄨 󵄨 T 󵄨󵄨 λ + ϕ k P k−1 ϕ k 󵄨󵄨󵄨 The magnitude of this error is given in the following equation: 󵄨 󵄨 󵄨 󵄨󵄨 ̂ 󵄨󵄨Q (s k/k−1 ) − ̂s k/k 󵄨󵄨󵄨 < 󵄨󵄨󵄨ε k/k−1 󵄨󵄨󵄨 < 1 It follows that Q (̂s k/k ) = Q (̂s k/k−1 ) 󵄨󵄨 ̂ 󵄨 󵄨 󵄨 󵄨󵄨Q (s k/k ) − ̂s k/k 󵄨󵄨󵄨 < 󵄨󵄨󵄨ε k/k−1 󵄨󵄨󵄨 And

(30)

󵄨 󵄨󵄨 󵄨󵄨ε k/k−1 󵄨󵄨󵄨 ≤ δ

3.4 Properties of SBME and e-SBME algorithms Here, the two approaches presented are modified least squares algorithms. The difference between the two algorithms resides basically in the expression of σ k , more 󵄨󵄨 ε k/k−1 󵄨󵄨 󵄨󵄨 ≤ 1 and ϕ T P ϕ > 0. In this case, the e-SBME algorithm precisely, in the case 󵄨󵄨󵄨󵄨 󵄨 k k−1 k 󵄨 δ 󵄨󵄨 corresponds to the least square algorithm. From equation (28) relating to the SBME algorithm and the equation (30) relating to e-SBME, it can be concluded that the two algorithms guarantee the following property: Property 2: If ϕ Tk P k−1 ϕ k > 0 and δ < 1, then the a posteriori error ε k/k satisfies 󵄨 󵄨󵄨 󵄨󵄨ε k/k 󵄨󵄨󵄨 ≤ δ

(31)

◻ In these algorithms, a parameter δ must be chosen correctly, so that the two methods remain stable and converge. Indeed, δ must be able to ensure the following condition: δ 75% of (threshold *number hopes) 40% 1. – Ttr = T: The length of the rectangular transmit pulses equals the symbol period T, although the pre-equalizer operates at a multiple 1/Tpr = Npr /T of the symbol

110 | L. Jacobs et al. rate. When Npr > 1, It is readily verified using (13) that (Gtr )n1 ,n2 = 1 − |n1 − n2 | /Npr if |n1 − n2 | ⩽ Npr and 0 elsewhere. The frequency response is zero for f = 1/T. The receiver filters are unit-energy square-root raised-cosine filters with a 3 dB bandwidth of 1/(2T) and a roll-off factor β = 0.3. The latter filters are SRNFs, which implies that when spatially independent additive white Gaussian noise (AWGN) with spectral density N0 /2 is applied at the receiver filters and sampled at rate 1/T, the resulting noise samples n(p) (k) are spatially and temporally independent real-valued zero-mean Gaussian random variables with variance N0 /2: E [n(p1 ) (k1 ) n(p2 ) (k2 )] = N0 /2 δ p1 −p2 δ k1 −k2 , such that the autocorrelation matrix Rn = N0 /2 IL . When the sampling phase ε = 0, it is assumed that the impulse response corresponding to the frequency response (1,1) Htr (f)Hch (f)Hrec (f) is sampled at the instant it reaches its maximum value. Since the filter coefficients can be computed offline, the complexity of the proposed equalization systems is mainly determined by the discrete-time filter operations. Hence, the total number of filter taps can be considered as a valid complexity measure for both MIMO and SISO equalization systems. Assuming Es /N0 = 20 dB and a bit rate of Rb = 30 Gbit/s per lane, we display in Fig. 3 the 1/MSE curves as a function of the sampling phase ε for several equalization schemes, for Npr = 1 and Npr = 2. The rectangular transmit pulses are assumed to have length Ttr = T. We consider both the proposed pre-equalization scheme combined with decision feedback at the receiver side as well as a linear pre-equalization scheme without decision feedback (LFB = 0). In addition, we show the MSE performance resulting from both the DFE post-equalization scheme from Fig. 2 and the linear MIMO post-equalization scheme from [22]; in the latter two schemes, upsampling at the transmitter (Npr = 2) is replaced by oversampling at the receiver (Npo = 2) while the transmit and receiver filters remain unchanged. It is observed from Fig. 3.a that moving the feedforward filters from the transmitter to the receiver side and vice versa does not affect the MSE performance of the linear and DFE MIMO equalization schemes when Npr = 1. However, when Npr = 2, Fig. 3.b shows that the MIMO pre-equalization schemes slightly outperform their MIMO post-equalization couterparts. It also follows from Fig. 3.a that the proposed MIMO DFE scheme with Lpr = 7 (i.e., Lpr,min = Lpr,max = 3) and LFB = 4 achieves a performance improvement of about 1 dB as compared to an equivalent SISO DFE scheme with Lpr = 7 and LFB = 0, at the cost of increased complexity. However, even by increasing the number of filter taps of the SISO DFE scheme (Lpr = 28, LFB = 16) such that both schemes have the same total number of filter taps, the SISO DFE scheme cannot compete with the MIMO DFE scheme. Furthermore, the equalization schemes with DFE are less susceptible to variations of the sampling phase ε than the linear schemes. From Fig. 3.b, it follows that upsampling at the transmitter with a factor Npr = 2 improves the MSE performance

20

20

18

18

16

16 1/MSE (dB)

1/MSE (dB)

Pre-Equalization With Decision Feedback |

14 MIMO DFE L =7, L pr

=4

FB

MIMO DFE at Rx Lpo=7, LFB=4

12

SISO DFE L =7, L pr

=4

14 MIMO DFE L =7, L pr

SISO DFE L =7, L pr

linear MIMO Lpr=7, LFB=0

linear MIMO at Rx L =7

linear MIMO at Rx L =7

po

8

po

8

linear SISO Lpr=7, LFB=0

linear SISO Lpr=7, LFB=0

linear SISO Lpr=28, LFB=0 6 −0.5

0 ε

(a)

=4

FB

SISO DFE Lpr=28, LFB=16

10

linear MIMO Lpr=7, LFB=0

=4

FB

MIMO DFE at Rx Lpo=7, LFB=4

12

FB

SISO DFE Lpr=28, LFB=16

10

111

linear SISO Lpr=28, LFB=0 0.5

6 −0.5

0 ε

0.5

(b)

Fig. 3: MSE for SISO and MIMO equalization schemes with and without DFE (Rb = 30, (a): Npr = 1 and (b): Npr = 2. Gbit/s).

for both the linear and the DFE schemes and further reduces the dependency of the MSE on the sampling phase. For the MIMO DFE scheme, the performance gain due to upsampling amounts to about 1 dB. However, the difference in MSE performance as compared to the other equalization schemes becomes much smaller than when Npr = 1. In Fig. 4, we show the MSE performance of the equalization schemes from Fig. 3 for a bit rate of Rb = 60 Gbit/s. It is readily observed that increasing the bit rate deteriorates the MSE performance for all equalization schemes. However, for the MIMO DFE scheme, the degradation is limited to about 3 dB for both Npr = 1 and Npr = 2, whereas it is much larger for the linear schemes and the SISO DFE schemes. For instance, when Npr = 1, the MIMO DFE schemes outperform the SISO DFE schemes by about 3 dB and the linear MIMO schemes by about 5 dB; when Npr = 2, the MIMO DFE schemes outperform their SISO DFE counterparts and the linear MIMO post-equalization scheme by more than 3 dB, whereas the difference with the linear MIMO pre-equalization scheme amounts to more than 4 dB. Hence, MIMO pre-equalization with DFE at the receiver side is clearly shown to be a promising technique to help facilitate future high-speed communication over low-cost electrical interconnects. In order to examine the impact of the transmit filters on the MSE, we display in Fig. 5 the MSE as a function of Es /N0 for different SISO and MIMO pre-equalization schemes with decision feedback, under the assumption that Npr = 2. The rectangular transmit pulses have length Ttr = T or Ttr = T/2. As it was shown in Figs. 3.b and 4.b that for Npr = 2 the impact of ε on the MSE is small, we set ε = 0. The resulting MSE for Rb = 30 Gbit/s and Rb = 60 Gbit/s is shown in Fig. 5.a and Fig. 5.b, respectively. It is easily observed that the differences between the MSE results for the different schemes are larger for growing Es /N0 . Moreover, the MSE exhibits a floor for large Es /N0 due to the residual ISI. By using more pre-equalizer taps, however, the residual ISI can be

112 | L. Jacobs et al.

20

20 MIMO DFE L =7, L pr

=4

MIMO DFE L =7, L

FB

pr

MIMO DFE at Rx Lpo=7, LFB=4

18

SISO DFE L =28, L pr

SISO DFE Lpr=7, LFB=4

=16

SISO DFE L =28, L

FB

linear MIMO at Rx Lpo=7 14

linear SISO L =7, L pr

=0

FB

linear SISO Lpr=28, LFB=0 12

14

linear SISO L =7, L pr

10

8

6 −0.5

0.5

=0

FB

linear SISO Lpr=28, LFB=0 12

8

0 ε

=16

FB

linear MIMO Lpr=7, LFB=0 linear MIMO at Rx Lpo=7

10

6 −0.5

pr

16

linear MIMO Lpr=7, LFB=0 1/MSE (dB)

1/MSE (dB)

16

=4

FB

MIMO DFE at Rx Lpo=7, LFB=4

18

SISO DFE Lpr=7, LFB=4

0 ε

(a)

0.5

(b)

Fig. 4: MSE for SISO and MIMO equalization schemes with and without DFE (Rb = 60, (a): Npr = 1 and (b): Npr = 2 Gbit/s).

30

30 Ttr=T

Ttr=T

Ttr=T/N

25

MIMO DFE Lpr=11, LFB=4 MIMO DFE L =3, L pr

MIMO DFE L =3, L

FB

pr

=4

SISO DFE Lpr=3, LFB=2 15

5

5

5

10

15 20 E /N (dB) s

(a)

25

30

35

=4

FB

15

10

0

pr

SISO DFE Lpr=3, LFB=2

10

0

=2

FB

SISO DFE L =11, L

20

FB

1/MSE (dB)

1/MSE (dB)

20

MIMO DFE Lpr=11, LFB=4

=2

SISO DFE L =11, L pr

Ttr=T/N

25

0

0

5

10

0

15 20 E /N (dB) s

25

30

35

0

(b)

Fig. 5: MSE for SISO and MIMO DFE schemes. (a): Rb = 30 Gbit/s and (b): Rb = 60 Gbit/s.

reduced, which gives rise to a much larger 1/MSE floor. It follows from the figures that the transmit pulses with Ttr = T outperform the ones with Ttr = T/2 in terms of 1/MSE performance, although the difference is very small when the number of equalization coefficients is relatively large. According to (8), the decision variables can be written as a function of the unknown data symbols and the noise when the equalization filters are known. Hence, in order to analytically obtain the exact BER, the conditional BER needs to be averaged over all transmitted symbols, which is computationally prohibitive. Therefore, we average the conditional BER over the 10 ISI terms with largest magnitude and treat the remaining ISI as additive white Gaussian noise (AWGN). In Fig. 6, we show the BER versus Es /N0 for the SISO and MIMO equalization schemes with decision feedback from Figs. 3 and 4 for a 2-PAM constellation and rectangular transmit pulses with length T tr = T . Figs. 6.a and 6.b correspond to the

Pre-Equalization With Decision Feedback |

0

0

10

10

−2

−2

10

10

−4

−4

10

BER

BER

10

−6

10

−6

10

R =30 Gbit/s

R =30 Gbit/s

b

−8

10

b

−8

10

Rb=60 Gbit/s

Rb=60 Gbit/s

SISO DFE Npr=1

SISO DFE Npr=1

MIMO DFE N =1

−10

MIMO DFE Npr=2 0

5

pr

10

SISO DFE Npr=2

−12

MIMO DFE N =1

−10

pr

10

10

113

SISO DFE Npr=2 MIMO DFE Npr=2

−12

10

15

20

25

10

0

5

10

E /N (dB) s

(a)

15

20

25

E /N (dB)

0

s

0

(a)

Fig. 6: BER for SISO and MIMO equalization schemes with DFE at the receiver. (a): BER for DFE (pre-equalization) and (b): BER for DFE (post-equalization).

pre-equalization and post-equalization schemes from Figs. 1 and 2, respectively. Note that we consider the SISO DFE scheme with Lpr = 28 such that all schemes have the same complexity in terms of total number of filter taps. For each scheme and for each value of Es /N0 , we obtain the optimal sampling phase from the corresponding 1/MSE curves before computing the BER. Fig. 6.a shows that given a target BER of 10−12 , MIMO DFE outperforms SISO DFE by more than 1 dB at a bit rate of Rb = 30 Gbit/s for both Npr = 1 and Npr = 2. In line with the results from Fig. 3, upsampling by a factor 2 clearly results in an improvement of the BER. At a bit rate of Rb = 60 Gbit/s, the SISO DFE schemes do not achieve the target BER due to an error floor, whereas the MIMO pre-equalization schemes with DFE still perform very well. At the target BER, the degradation of the MIMO DFE scheme with Npr = 2 is limited to about 4.5 dB when doubling the bit rate from 30 Gbit/s to 60 Gbit/s per lane. According to Fig. 6.b the post-equalization schemes have a fairly similar BER performance as their pre-equalization counterparts, although in the case of Npr = 2, the pre-equalization schemes perform slightly better.

5 Conclusions In this contribution, we derived neat closed-form expressions for the FIR filters of an MMSE MIMO pre-equalization scheme with decision feedback at the receiver. When high bit rates are targeted, the proposed MIMO equalization scheme is shown to greatly outperform its SISO counterpart, even given a total number of filter taps. We also showed that the proposed MIMO pre-equalization scheme slightly outperforms a comparable MIMO post-equalization scheme, where both the feedforward and the feedback equalization filters are located at the receiver side. Therefore, the proposed

114 | L. Jacobs et al. scheme can be considered a promising technique to help facilitate future high-speed communication over low-cost electrical interconnects.

Acknowledgements Part of this research has been funded by the Interuniversity Attraction Poles Programme initiated by the Belgian Science Policy Office.

Bibliography [1] M. Kossel, T. Toifl, P.A. Francese, M. Brandli, C. Menolfi, P. Buchmann, L. Kull, T.M. Andersen, and T. Morf. A 10 gb/s 8-tap 6b 2-pam/4-pam tomlinson-harashima precoding transmitter for future memory-link applications in 22-nm SOI CMOS. IEEE J. Solid-State Circuits, 48(12):3268–3284, Dec 2013. [2] H. Kimura, P.M. Aziz, Tai Jing, A. Sinha, S.P. Kotagiri, R. Narayan, Hairong Gao, Ping Jing, G. Hom, Anshi Liang, E. Zhang, A. Kadkol, R. Kothari, G. Chan, Yehui Sun, B. Ge, J. Zeng, K. Ling, M.C. Wang, A. Malipatil, Lijun Li, C. Abel, and F. Zhong. A 28 gb/s 560 mw multi-standard serdes with single-stage analog front-end and 14-tap decision feedback equalizer in 28 nm CMOS. IEEE J. Solid-State Circuits, 49(12):3091–3103, Dec 2014. [3] M.H. Nazari and A. Emami-Neyestanak. A 15-gb/s 0.5-mw/gbps two-tap DFE receiver with far-end crosstalk cancellation. IEEE J. Solid-State Circuits, 47(10):2420–2432, Oct 2012. [4] Y. Iijima and Y. Yuminaka. Double-rate equalization using tomlinson-harashima precoding for multi-valued data transmission. In IEEE 46th Int. Symp. on Multiple-Valued Logic (ISMVL), :66–71, Sapporo, Japan, 18-20 May 2016. [5] John F. Bulzacchelli, Christian Menolfi, Troy J. Beukema, Daniel W. Storaska, Jürgen Hertle, David R. Hanson, Ping-Hsuan Hsieh, Sergy V. Rylov, Daniel Furrer, Daniele Gardellini, Andrea Prati, Thomas Morf, Vivek Sharma, Ram Kelkar, Herschel A. Ainspan, William R. Kelly, Leonard R. Chieco, Glenn A. Ritter, John A. Sorice, Jon D. Garlett, Robert Callan, Matthias Brandli, Peter Buchmann, Marcel Kossel, Thomas Toifl, and Daniel J. Friedman. A 28-gb/s 4-tap FFE/15-tap DFE serial link transceiver in 32-nm SOI CMOS technology. IEEE J. Solid-State Circuits, 47(12):3232–3248, Dec 2012. [6] Kyu-Dong Hwang and Lee-Sup Kim. A 6.5-gb/s 1-mw/gb/s/ch simple capacitive crosstalk compensator in a 130-nm process. IEEE Trans. Circuits Syst. II: Express Briefs, 60(6):302–306, June 2013. [7] P. Amleshi and Cong Gao. NEXT and FEXT characteristics and suppressions in dense 25gbps+ backplane vias. In IEEE Int. Symp. on Electromagnetic Compatibility (EMC), :979–985, Raleigh, NC, USA, 4-8 Aug 2014. [8] D.R. Shilpa and B.V. Uma. A wavelet technique to minimize off-chip interconnect crosstalk. In Int. Conf. on Emerging Trends in Communication, Control, Signal Processing Computing Applications (C2SPCA), :1–5, Bangalore, India, 10-11 Oct 2013. [9] Taehyoun Oh and R. Harjani. A 12-gb/s multichannel I/O using MIMO crosstalk cancellation and signal reutilization in 65-nm CMOS. IEEE J. Solid-State Circuits, 48(6):1383–1397, June 2013.

Pre-Equalization With Decision Feedback |

115

[10] N. Al-Dhahir. FIR channel-shortening equalizers for MIMO ISI channels. IEEE Trans. Commun., 49(2):213–218, Feb 2001. [11] M. Joham, W. Utschick, and J. A. Nossek. Linear transmit processing in MIMO communications systems. IEEE Trans. Signal Process., 53(8):2700–2712, Aug 2005. [12] C. Toker, S. Lambotharan, and J.A. Chambers. Joint transceiver design for MIMO channel shortening. IEEE Trans. Signal Process., 55(7):3851–3866, Jul 2007. [13] M. L. Honig, P. Crespo, and K. Steiglitz. Suppression of near- and far-end crosstalk by linear pre- and post-filtering. IEEE J. Select. Areas Commun., 10(3):614–629, Apr 1992. [14] J. Salz. Digital transmission over cross-coupled linear channels. AT T Technical Journal, 64(6):1147–1159, July 1985. [15] A. Hjorungnes, M. L. R. de Campos, and P. S. R. Diniz. Jointly optimized transmitter and receiver FIR MIMO filters in the presence of near-end crosstalk. IEEE Trans. Signal Process., 53(1):346–359, Jan 2005. [16] B. Yuksekkaya and C. Toker. A general joint transceiver design for multiuser MIMO channel equalization. In IEEE Vehicular Technology Conference (VTC 2010-Fall), :1–5, Ottawa, Canada, 6-9 Sep 2010. [17] Tongtong Li and Zhi Ding. Joint transmitter-receiver optimization for partial response channels based on nonmaximally decimated filterbank precoding technique. IEEE Trans. Signal Process., 47(9):2407–2414, Sep 1999. [18] N. Al-Dhahir and A.H. Sayed. The finite-length multi-input multi-output MMSE-DFE. IEEE Trans. Signal Process., 48(10):2921–2936, Oct 2000. [19] M. Tomlinson. New automatic equaliser employing modulo arithmetic. Electronics Letters, 7(5):138–139, March 1971. [20] H. Harashima and H. Miyakawa. Matched-transmission technique for channels with intersymbol interference. IEEE Trans. Commun., 20(4):774–780, Aug 1972. [21] Jie Chen, Yongru Gu, and K.K. Parhi. Novel FEXT cancellation and equalization for high speed ethernet transmission. IEEE Trans. Circuits Syst., 56(6):1272–1285, June 2009. [22] L. Jacobs, M. Guenach, and M. Moeneclaey. Linear MIMO equalization for high-speed chip-to-chip communication. In IEEE Int. Conf. on Communications (ICC), :4978–4983, London, UK, 8-12 June 2015. [23] L. Jacobs, M. Guenach, and M. Moeneclaey. Application of MIMO DF equalization to high-speed off-chip communication. In IEEE Int. Conf. on Computer as a Tool (EUROCON), :1–4, Salamanca, Spain, 8-11 Sep 2015.

116 | L. Jacobs et al.

Biographies Lennert Jacobs was born in Ghent, Belgium, in 1983. He received the Master’s degree in electrical engineering and the Ph.D. degree in electrical engineering, both from Ghent University, Ghent, Belgium, in 2006 and 2012, respectively. He is currently serving as a post-doctoral researcher in the Department of Telecommunications and Information Processing at Ghent University. His main research interests are in fading channels, MIMO techniques, signal processing, and modulation and coding for digital communications.

Mamoun Guenach is a research scientist with the Nokia Bell Labs. He received the degree of engineer in electronics and communications from the EcoleMohamadia d’Ingénieurs inMorocco. Following that, he moved to the faculty of applied sciences at the Université Catholique de Louvain (UCL) Belgium,where he received a M.Sc. degree in electricity and a Ph.D. degree in applied sciences. He served as a post-doctoral researcher at Ghent University where,since 2015, he is a part-time visiting professor.

Marc Moeneclaey is Full Professor at the Telecommunications and Information Processing (TELIN) Department, Ghent University, teaching courses on various aspects of Digital Communications. His main research interests are in statistical communication theory, carrier and symbol synchronization, bandwidth-efficient modulation and coding, spread-spectrum, satellite and mobile communication. He is the author of more than 500 scientific papers in international journals and conference proceedings. Together with Prof. H. Meyr and Dr. S. Fechtel, he co-authors the book Digital communication receivers - Synchronization, channel estimation, and signal processing (J. Wiley, 1998). He is a Highly Cited Researcher 2001. In 2002 he was elected to the grade of Fellow of the IEEE.

A. Maali, H. Semlali, N. Boumaaz and A. Soulmani

A Comparative Analysis between Energy and Maximum Eigenvalue based detection in Cognitive Radio Systems Abstract: Cognitive radio (CR) is a kind of “access technology” used to co-exist among more than one radio technology without (or with minimal) interference to each other. Typically, there is one primary user of the spectrum, whereas other users are “allowed” to communicate as long as they don’t interfere the primary user. One of the well-known techniques is to sense the spectrum before initiating the communication in order to analyze the occupancy of radio frequency spectrum. In this paper, we present a comparative analysis between Energy Detection (ED) and Maximum Eigenvalue Detection (MED) spectrum sensing techniques.The performance of these two methods is evaluated in terms of their Receiver Operating Characteristics (ROC) and their detection probability for different values of Signal to Noise Ratio (SNR). Keywords: Cognitive Radio; Spectrum Sensing; Maximum Eigenvalue Detection; Energy Detection. Classification: 65C05, 62M20, 93E11, 62F15, 86A22

1 Introduction Software Defined Radio (SDR) is a multi-mode, multi-standard and reconfigurable wireless communication system in which the majority of the physical layer functions are software defined. In such systems, we use software processing within the radio device to implement operating functions. The challenge of an SDR system is to deal with the need of high sample rates and the need of very selective filters treating high bands [1]. Cognitive Radio is an emergent technology which combines between the SDR and the artificial intelligence. Its objective is to allow the equipment to choose the best conditions for communications in order to satisfy the user needs. To achieve

A. Maali, H. Semlali, N. Boumaaz and A. Soulmani: Asmaa Maali, Hayat Semlali, Najib Boumaaz and Abdallah Soulmani, Department of Physics, Laboratory of Electrical Systems and Telecommunications Faculty of Sciences and Technology, Cadi Ayyad University Marrakech, Morocco., Emails: [email protected], [email protected], [email protected], [email protected]

https://doi.org/10.1515/9783110594003-009

118 | A. Maali et al. this goal, spectrum sensing plays a crucial part in obtaining status of the spectrum (vacant/occupied), so that the spectrum can be accessed by a secondary user without interference with the primary user (PU). Several techniques of spectrum sensing are presented in the literature including matched filter detection (MF) [2, 3], Energy detection (ED) [4–6], and cyclostationary feature detection (CSD) [2, 7]. Each one has its own pros as well as cons. The matched filter detection provides optimal detection but requires a complete knowledge of the PU signal. The cyclostationary feature detection is based on exploiting cyclostationary features of the received signal; it offers good performance for detection, but requires a partial knowledge of the PU characteristics and high computation time to complete sensing. The energy detection is a major and basic method due to its ease implementation and low complexity. Unlike other methods, energy detection does not need any prior information of PU’s signal. On the negative side, its performance at low SNR is not satisfactory. The Maximum eigenvalue detection (MED) is a method which is widely studied in recent research topics [8–15]. It has been proposed to overcome the noise uncertainty difficulty while keeping the advantages of the energy detection. In this work, we are interested by the energy detection and the maximum eigenvalue detection methods since they do not require knowledge of the PU signal characteristics. The rest of this paper is structured as follows. Section 2 presents an overview and a theoretical analysis of the energy detection and the maximum eigenvalue detection methods. Simulation results and discussion are given in section 3. And finally, the conclusion of this comparative study is drawn in section 4.

2 Energy Detection And Maximum Eigenvalue Detection Basis Suppose that the received signal has the following simple form: xn = sn + ωn

(1)

with s n denotes the primary user signal (signal to be detected) and ω n is an additive white Gaussian noise (AWGN) and n is the sample index. We note that s n = 0 when there is no transmission by the primary user. The signal detection problem is equivalent to the difference between the following states [2]: H0 : H1 :

xn = ωn xn = sn + ωn

(2)

H0 represents the null hypothesis that the primary user is absent, while H1 states that a primary user is present in the channel of the interest.

Cognitive Radio Systems

|

119

The performance of any spectrum sensing technique can be indicated by the probability of false alarm (P FA ) and probability of detection (P D ) which are defined as follows [11]: P FA : PD :

Prob{T > λ/H0 } Prob{T > λ/H1 }

(3)

where P D is the probability of detection of a signal on the considered band when it is truly present and P FA is the probability that the test decides incorrectly that the band considered is occupied when it is not really. T is the statistical test which is compared to the threshold λ to make decision.

2.1 Energy Detection technique The energy detection is a basic spectrum sensing technique; it was proposed for the first time in [5]. It does not need any prior information of the signal to be detected to determine whether the channel is occupied or not. The principle of energy detection is summarized in the block diagram of Fig. 1 The input band pass filter removes the out of band signals by selecting the central frequency f c , and the bandwidth of interest. After the signal is digitized by an analog to digital converter (ADC), a simple square and average block is used to estimate the received signal energy. Energy detection compares the decision statistic T ED with a threshold λ ED to decide whether a signal is present H1 or not H0 [1]. The test statistic of energy detection is given by [9]: T ED =

1 N ED ∑ |X n |2 N ED n=1

(4)

where N ED is the number of samples. To formulate the mathematical equation for the energy detector, the decision metric T ED should be investigated. Under H0 , T ED follows a central Chi-square (χ2 ) distribution with 2N degrees of freedom when under H1 , the decision statistic T ED has a non-central distribution with the same degrees of freedom and a non-centrality parameter equal to 2𝛶 [15]. 𝛶 denotes the SNR which is defined as the ratio of the

Fig. 1: Block diagram of an energy detection.

120 | A. Maali et al. signal variance σ2s to the noise variance σ2ω . SNR =

σ2s σ2ω

(5)

For a given P FA , the threshold can be obtained as [12]: T ED = √

2 Q−1 (P FA ) + 1 N ED

(6)

where: +∞

Q(t) = ∫ e− 2 u du 1

2

(7)

t

2.2 Maximum Eigenvalue Detection Maximum Eigenvalue based detection is a method recently developed. It can be considered as the most reliable one among the presented methods. It presents many advantages: it does not require any knowledge of signal properties [9–14], allows a good detection at low Signal to Noise Ratio (SNR) and makes it possible to overcome the noise uncertainty encountered in energy detection technique. In the maximum eigenvalue detection technique, random matrix theory (RMT) is used to formulate the detection algorithm depending on sample covariance matrix of the received signal. Let L be the number of consecutive samples, x̂ (n) an estimation of the received ̂ (n) an estimation of signal, ̂s(n) an estimation of primary signal to be detected and ω the noise. We define the following vectors form: x(n) x(n + 1) .. . [ x(n + L − 1)

[ [ x̂ (n) = [ [ [

s(n) s(n + 1) .. . [ s(n + L − 1)

ω(n) ω(n + 1) .. . [ ω(n + L − 1)

] [ ] [ ] , ̂s(n) = [ ] [ ] [

] [ ] [ ], ω [ ̂ (n) = ] [ ] [

] ] ] ] ]

]

]

]

(8)

R x is defined by [12] as: The approximated statistical covariance matrix ̂ ξ(0) ξ(1) .. . ξ(L − 1) [

[ [ ̂ R x (N s ) = [ [ [

ξ(1) ξ(0) .. . ξ(L − 2)

⋅⋅⋅ ⋅⋅⋅ .. . ⋅⋅⋅

ξ(L − 1) ξ(L − 2) .. . ξ(0)

] ] ] ] ]

(9)

]

where ξ(l) is the sample auto-correlations of the received signal. It is described as: ξ(l) =

1 N MED

N MED −1

∑ x(m)x(m − l)

m=0

(10)

Cognitive Radio Systems

|

121

Tab. 1: Numerical table computing F1 (t). t F1 (t)

-3.90 0.01

-3.18 0.05

-2.78 0.10

-1.91 0.30

-1.27 0.50

-0.59 0.70

0.45 0.90

0.98 0.95

2.02 0.99

for l = 0, 1...L − 1, and with N MED the number of available samples. Based on the random matrix theory (RMT), the probability of false alarm for maximum eigenvalue detection is given as: P FA ≃ 1 − F1 (

λ MED N MED − μ ) ϑ

(11)

where: μ = [√N MED − 1 + √L]

2

ϑ = [√N MED − 1 + √L] [

(12) 1 1 + ] √N MED − 1 √L

1 3

(13)

and F1 is the cumulative distribution function of the Tracy-Widom distribution of order 1. To compute the F1−1 at certain points, Tab. 1 below can be used. From given P FA , N MED and L, the sensing threshold used for the decision process is given in the following expression: 2

(√N MED + √L) [√N MED + √L] [ λ MED = [1 + 1 N MED (N MED L) 6 [

− 23

] F1−1 (1 − P FA )]

(14)

]

The detection algorithm of the maximum eigenvalue detection is summarized as follows: – Step 1: Compute the sample auto-correlations in (10) and form the sample covariance matrix defined in (9). – Step 2: Find the maximum eigenvalue ξ max of the sample covariance matrix by using the eigenvalue decomposition techniques. – Step 3: Decide: if ξ max > λ MED σ2ω , then the primary user exists, otherwise, it does not.

3 Application and Simulation Results The purpose of this section is to compare the performance of maximum eigenvalue detection with energy detection method. The simulation diagram is shown in Fig. 2. The test signal is a multi-band signal consisting of five carriers spaced by 80 Hz, modulated with QPSK then filtered by a raised cosine filter with a rounding coefficient

122 | A. Maali et al.

Fig. 2: Simulation diagram.

(roll-off) of 0.5. Each carrier has a symbol rate of R =40sym/s. The values considered are suitable for our calculator power. After digitizing the generated signal, its frequency components are calculated. And then, the band of interest is selected by applying the SVD algorithm. The occupancy of the radio frequency spectrum is analyzed using the two spectrum sensing studied methods. In this application, we consider an AWGN (Additive White Gaussian Noise) channel. The performance of the compared spectrum sensing methods is evaluated in terms of the ROC curves which is the graphical representation of the detection probability P D versus the false alarm probability P FA for different threshold values andthe detection probability for different values of SNR and smoothing factor L. Figure 3 illustrates the ROC curves for both energy detection and maximum eigenvalue detection methods for a SNR = −19dB and a smoothing factor L = 8. From this figure, we can note that the MED method presents a good detection compared to the ED method. Figure 4 represents the detection probability of the ED and the MED methods as a function of the SNR with a false alarm probability fixed to P FA = 0.1. From this

Fig. 3: ROC curve (P D vs. P FA ) for SNR = −19dB, L = 8 with 10000 Monte-Carlo realizations.

Cognitive Radio Systems

|

123

Fig. 4: P D vs. SNR for L = 8 using 10000 Monte-Carlo realizations.

Fig. 5: P D vs. SNR for various values of smoothing factor L (with 10000 Monte-Carlo realizations).

figure, we can note that the detection probability is better in the case of the maximum eigenvalue detection whatever the value of the SNR. In Fig. 5 and Fig. 6, we evaluate the detection probability of the two compared methods as a function of SNR and using different values of the smoothing factor (L varying from 1 to 14) in order to analyze its effect. These figures show that the detection probability in the case of the MED method decreases by increasing the SNR and increases by increasing the smoothing factor compared to the energy detection

124 | A. Maali et al.

Fig. 6: P D vs. the smoothing factor L, for P FA = 0.1 and SNR = −20dB, using 10000 Monte-Carlo realizations.

method which is not affected by L. Indeed, we can note that the PD in the case of the maximum eigenvalue detection method is not very sensitive to the smoothing factor for L ≥ 8.

4 Conclusion Cognitive radio is a novel technology to optimize the spectrum sensing and make devices more autonomous. In this paper, we were interested by spectrum sensing which is the key function in cognitive radio. We have presented both energy and maximum eigenvalue detectors considering their simplicity-detection trade off. From the Simulation results, we can conclude that for the energy detection (ED), for high signal to noise ratios (SNR), the detector gives a good detection. However, when the value of SNR drops to lower values, then the sensing with the ED goes down. Therefore, with the maximum eigenvalue detector the sensing performance is improved when the value of the smoothing factor L ≥ 8 and the noise uncertainty problem encountered by the ED is solved.

Bibliography [1] H. Semlali, A. Maali, N. Boumaaz, A. Soulmani, A. Ghammaz, and J.-F. Diouris. Spectrum sensing operation based on a real signal of FM radio: Feasibility study using a random

Cognitive Radio Systems

[2] [3] [4] [5] [6] [7]

[8] [9] [10] [11] [12] [13]

[14]

[15]

|

125

sampling mode. Int. Conf. on Information Technology for Organizations Development (IT4OD), :1–4, 2016. A. Sahai, R. Tandra, and M. Mishra. Spectrum sensing: Fundamental limits. draft of the book chapter in Cognitive Radios: System Design Perspective, 2009. R. Tandra and A. Sahai. Fundamental limits on detection in low SNR under noise uncertainty. Int. Conf. on Wireless Networks, Communications and Mobile Computing, 1:464–469, 2005. T. Yucek and H. Arslan. A survey of spectrum sensing algorithms for cognitive radio applications. IEEE Communications Surveys & Tutorials, 11(1):116–130, 2009. U. H. Energy Detection of Unknown Deterministic Signals, 55:523–531, 1967. L. Claudino and T. Abrão. Spectrum Sensing Methods for Cognitive Radio Networks: A Review. Wireless Personal Communications, 95(4):5003–5037, August 2017. Z. Khalaf, A. Nafkha, J. Palicot, and M. Ghozzi. Low complexity enhanced hybrid spectrum sensing architectures for cognitive radio equipment. Int. Journal on Advances in Telecommunications, 3(3-4):215–227, 2010. A. Ali and W. Hamouda. Advances on Spectrum Sensing for Cognitive Radio Networks: Theory and Applications. IEEE Communications Surveys & Tutorials, 19(2):1277–1304, 2017. Z. Li, H. Wang, and J. Kuang. A two-step spectrum sensing scheme for cognitive radio networks. Int. Conf. on Information Science and Technology (ICIST), :694–698, 2011. Z. Yonghong and L. Ying-chang. Eigenvalue-based spectrum sensing algorithms for cognitive radio. IEEE Trans. on Communications, 57(6):1784–1793, June 2009. Z. Yonghong and L. Ying-Chang. Spectrum-Sensing Algorithms for Cognitive Radio Based on Statistical Covariances. IEEE Trans. on Vehicular Technology, 58(4):1804–1815, May 2009. Y. Zeng, C. L. Koh, and Y.-C. Liang. Maximum eigenvalue detection: Theory and application. IEEE Int. Conf. on Communications, ICC’08, :4160–4164, 2008. S. K. Sharma, S. Chatzinotas, and B. Ottersten. Maximum eigenvalue detection for spectrum sensing under correlated noise. IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), :7268–7272, 2014. M. Hamid, N. Bjorsell and S. B. Slimane. Energy and Eigenvalue Based Combined Fully Blind Self Adapted Spectrum Sensing Algorithm. IEEE Transactions on Vehicular Technology, 65(2):630–642, February 2016. F. F. Digham, M.-S. Alouini and M. K. Simon. On the energy detection of unknown signals over fading channels. IEEE Int. Conf. on Communications, 2003. ICC’03., 5:3575–3579, 2003.

Biographies Asmaa Maali was born in 1989. She received the Bachelor degree in Network and Telecommunications from Hassan II University, Casablanca, Morocco, in 2011. In 2013, she received the Master degree in Network and Telecommunications from Chouaibdoukkali University, Eljadida, Morocco. She is pursing the Ph.D. degree at the Department of Applied Physics as a candidate in Electrical Systems and Telecommunications Laboratory of Faculty of Sciences and Technologies of Cadi Ayyad University, Marrakech, Morocco. His current research interests includes Telecommunications, signal processing, random sampling and Cognitive Radio.

126 | A. Maali et al.

Hayat Semlali was born in Morocco, in 1985. She received the Bachelor of sciences in electronic engineering from Cadi Ayyad University, Marrakech Morocco, in 2008. In 2010, she received the Engineer Diploma in Electrical Systems and Telecommunications fromCadi Ayyad University, Marrakech Morocco. She is currently working toward the Ph.D. degree at the Department of Applied Physics, ElectricalSystems and Telecommunications Laboratory, Cadi Ayyad University of Marrakech, Morocco. Her research interest includes electronic and telecommunications. Najib Boumaaz received theB.Tech. degree in electronics from Normal High School for Professor of Technology of Mohammedia (ENSET), Morocco and the M.S from University Cadi Ayyad of Marrakech, Morocco. In 2008, he received his Ph.D. in electronic from the Polytechnic School of Nantes, France. He is currently an Assistant Professor at the High School of Technology in Safi, Morocco. His research interests include electronics, telecommunications, statistical signal processing, signal sampling and interpolation.

Abdallah Soulmani was born in 1970. He received his Engineer Diploma in Electronics from the Polytechnic School of the University of Nantes in 1994. He joined the High School of Technology of Safi (EST Safi) as an Assistant Professor where he is now Professor. In 2003, he received his Ph.D. in Electronics and Telecommunications from the University ofNantes, France. His research interests cover software radio architectures, random sampling and electronics.

N. Smaoui and H. Amari

Developing a New Method for the Detection of the Cancerous Breast Mass Abstract: Breast cancer is now one of the main sources of death among women throughout the world. The detection of cancerous tumors in mammographic images is a great challenge for radiologists and physicians. It requires precise, experience and time. Fortunately, the evolution of science enabled the development of medical imaging techniques which are very efficacious for the benefit of health in order to detect any abnormalities in the breast parenchyma. It is in this context that our work fits. In fact, in this paper we have implemented a computer-assisted diagnostic system (CAD) aiming at facilitating the early detection of this cancer. The developed method proceeds in three steps: pretreatment mammograms to enhance the segmentation step, segmentation based on morphological operators to detect breast masses. Finally, the segmentation of the breast contour based on the method of Marching Square was adopted. Our approach has been tested on the basis of pictures “MIAS” showing its efficiency. In addition, a graphical interface was performed to facilitate the task for radiologists. Keywords: Breast Cancer, morphological operator, Marching square, graphical interface. Classification: 65C05, 62M20, 93E11, 62F15, 86A22

1 Introduction Cancers figure among the main causes of mortality throughout the world. For women, the 5 most common cancers diagnosed were breast, lung, colorectal, cervix, and stomach cancer. According to World Health Organization, breast cancer is the major cause of deaths with 521000 cases in 2012 [20]. This cancer is very menacing especially because its fixed causes are unknown until today. Experts in the field of medicine claim that some risk factors, such as person’s race or age, can’t be changed. But, other risk factors related to personal behaviors (smoking, drinking...) or cancer-causing factors in the environment increase the risk more than others. In general, the risk for breast cancer can change over time, due to things like aging or lifestyle but family inheritance is supposed to be the main origin for the moment [19].

N. Smaoui and H. Amari: N. Smaoui: University of Sfax, National School of Engineering of Sfax, Control and Energy Management Laboratory, Sfax, Tunisia, email: [email protected], H. Amari: Higher Institute of Computer and Multimedia of Gabes, Tunisia, email: [email protected]

https://doi.org/10.1515/9783110594003-010

128 | N. Smaoui and H. Amari So, screening a patient earlier is important especially if she has a family history with breast cancer. Screening proceeds through the development of science via highly effective medical imaging techniques. The best known ones are mammography, ultrasound and MRI. These methods are complementary to each other and allow their users to obtain a better diagnosis of areas affected by cancer and promote an effective treatment to the patient. In this article, our study focuses specifically on mammography images. In fact, radiographic images acquired from mammography equipments are one of the most often used techniques for aiding in early diagnosis. Owing to factors related to professional experience and cost [1, 2]. In this context, image processing aims at abstracting, from the acquired images, useful information for diagnosis to ease the therapeutic decisions of the malady. To act on the scanned image, the treatment uses various tools and algorithms. Segmentation is among the necessary processes in image processing which is considered as a key step [3], especially in medical imaging, and even the first step in any process to extract areas of concern. The latter are the main parts on which any treatment is focused and are considered significant objects for analysis and diagnosis. These areas are precisely tumors in our case. In this paper, our approach is founded on a series of morphological operators aiming at segmenting the tumor. Next, the detection of breast contour has been performed based on the Marching square algorithm. Our paper is organized as follows: Section 2 presents a state of art; Section 3 deals with the acquisition of mammography images. Section 4 details the developed application. Finally, we end up the paper with a conclusion.

2 Background In [5], the authors identified the tumor with the region growing method after filtering. This technique is effective since it is fast and easy to implement. Its main limitation resides in the choice of germs and the threshold; it can results in over-segmentation or under-segmentation. Since we segment breast tumor, it is not easy to find automatically a seed in the tumor region and proceed to growth. Thus, we need user intervention for this type of algorithm especially when it comes to determining the breast outline [6] on mammography images. In medical imaging, the k-means algorithm is used for specific purpose of tumor segmentation. He has demonstrated expertise in the extraction of mammary tumors and a good number of researches aimed at optimizing the choice of the initial center especially with images of mammography that are noisy [7, 8]. However, selection of the original center remains a recurring problem. Another method used in tumor detection is the “level set” belonging to the family of deformable curves. This is an implicit representation of the shape of the area to be

Detection of the Cancerous Breast Mass |

129

segmented. Topologically, it is based on geometric properties (shape) of the interest region. The “level set” form a set of all points satisfying a predefined function of an iso-contour updated in function of time. The evolution of the curve is compared to the function of iso-contour. Widely used in medical imaging, the level set algorithm has the capacity to be suitable for the detection of tumors or the delimitation of the periphery of a target organ. For the breast cancer detection, it defines the tumor and also the mammary path [4] especially with mammography images presenting a blurred outline. An extension of the algorithm “level set” is the “fast marching” algorithm [9] whose principle is to increase the initial contour gradually to fit the contour of the interest area. This seed contour is defined from a germ inside the area. The main advantage of the algorithm “level set” can be summarized in the automatic change of the topology on the image shape. Its main limitation is based on the fact that we must reconstruct each time the iso-contour of a function to initialize the zero level. In this paper, we will try to develop an automatic and simple method to isolate the breast contour and the tumor from the other parts in order to eliminate false positives, and to define and locate exactly the tumor.

3 Mammography Images In this section, we will mainly study the concepts of the mammography imaging and the acquisition of such images.

3.1 Acquisition of mammography images The image acquisition is a major step in the field of image processing especially in medical imaging. In medicine, the choice of sensors and acquisition conditions significantly affects the images quality. The more the acquisition is careful, the more the treatments will be fiable and efficient. Mammography is the breast X-ray used to view within the breast tissue using X-rays (in low doses) in order to detect any anomalies, such as a tumor mass, even before that it is palpable or clinically detectable. To get better results, several breast shots are taken from different angles. Then, the examination is done by a manipulator in radiology [10].

3.2 Mammographic database Mammographic databases aim to put mammograms data available to the scientific community and contribute to the development of algorithms for making aid-decision

130 | N. Smaoui and H. Amari and learning about diagnosis and automatic detection of cancer cases in medical field. Most mammographic databases are not accessible to the public [12]. The basics of the most commonly used data belong to the company’s image analysis (MIAS) [11].

3.3 Artifacts in mammography A digitized mammography usually has two distinct regions: the exposed area of the breast and the unexposed region (background). Visual interpretation of mammography often has consequences identifying radiopaque artifacts, which can be strongly related to the subject complicating breast tissue segmentation and recognition of abnormal structures [14]. There are two types of radiopaque artifacts: high intensity strips or corners and opaque markers. These markers are labels where the text is in high intensity. The corners are high intensity bands which are located along the edge of the mammogram.

4 Developed Method This section presents the procedures used for mammogram masses segmentation. In Fig. 1, the block diagram of the applied algorithms is presented.

4.1 Pretreatment step The performance of segmentation methods can be affected by various factors such as personal patient information and artifacts, which are presented on the image by lead blocks. In order to ameliorate the performance of segmentation techniques, it

Digital Mammogram

Artifacts Elimination

Tumor Extraction

Extraction of Breast Outline

Fig. 1: Block diagram of the computational algorithms developed for segmentation of masses using mammographic images.

Detection of the Cancerous Breast Mass |

(a)

(b)

(c)

(d)

131

(e)

Fig. 2: Application of an artifact elimination algorithm:(a)mammographic image, (b) image obtained after application of a top-hat operator, (c) image resulting from subtraction algorithm, (d) image obtained after thresholding, (e) Final image without artifacts obtained after multiplication.

was applied an algorithm for eliminating artifacts on each image. This procedure employs a top-hat morphological operator based on a structuring element with a 30 pixels radius (Fig. 2(b)). The top-hat is defined as the difference between an image f and the image obtained after the application of a morphological opening based on a structuring element B of size λ. The mathematical representation is done as follows: OPtop = max[0, f(x, y) − (f ∘ B)(x, y)]

(1)

In the next step, we performed a subtraction between the resulting and the original image (Fig. 2(c)). After that, we applied Otsu’s method for automatic thresholding on the image obtained after subtraction (Fig. 2(d)). Finally a multiplication was performed between the original image and the image obtained after thresholding, producing an image with no background artifacts (Fig. 2(e)).

4.2 Tumor Extraction In this step, to promote the detection of the regional maxima in the image, we use two morphological techniques called “opening by reconstruction” and “closing by reconstruction” with a disk structuring element (size 20). The normal morphological opening is erosion pursued by dilation. The erosion “narrows” an image according to the form of the structuring element, eliminating objects that are smaller than the form. Then the dilation step “resprouts” the remaining objects by the same form. This does not therefore allow finding all the shapes in the image as presented in Fig. 3. For that, we use the opening by reconstruction which is an erosion followed by a morphological reconstruction. To use reconstruction, we need first to define a “marker” image which is the image containing the starting locations. In our algorithm, the marker image is the output of the erosion. Next, we need to define the mask image. The flood-filling will be constrained to extent only to foreground pixels in the mask image. In our approach we have used the original image as our reconstruction mask.

132 | N. Smaoui and H. Amari

(a)

(b)

Fig. 3: Application of normal opening and closing on a mammogram image.

To recap, opening by reconstruction is based on using the eroded image as the marker and the original image as the mask. Subsequently, closing by reconstruction is calculated based on using the dilated image as the marker and the original image as the mask. This is seen as a further step in the closing stage. The closing by reconstruction can eliminate imperfections without affecting the shape that is to say preserving the contours of the considered image. This operation allows us to detect the region of interest in the image that is actually the tumor as shown in Fig. 4. Finally, in order to smooth the segmented object which is the tumor, we erode the image twice with a diamond structuring element. The final result is shown in Fig. 5.

(a)

(b)

Fig. 4: Extraction of the Tumor: (a) Cropped image, (b) Tumor.

Detection of the Cancerous Breast Mass |

(a)

133

(b)

Fig. 5: Smoothing the extracted tumor: a) Extracted Tumor, b) Smoothed tumor.

4.3 Extraction of Breast Outline The extraction of breast outline is based on the extraction of the geometry of the iso-contour using the principle of Marching square. Marching squares is a computer graphics algorithm that generates contours for a two-dimensional scalar field. Marching squares [6] takes a similar approach to the 3D marching cubes algorithm: 1. Process each cell in the grid independently. 2. Calculate a cell index using comparisons of the contour level(s) with the data values at the cell corners. 3. Use a pre-built lookup table, keyed on the cell index, to describe the output geometry for the cell. 4. Apply linear interpolation along the boundaries of the cell to calculate the exact contour position. The application of this approach on the mammogram image allows obtaining the results presented in Fig. 6.

4.4 Experimental Results In order to see the tumor as a whole, the two figures with the breast contour and the tumor were superimposed. 2 examples of the final result are presented in Fig. 7. The developed application was applied on the MIAS database in order to be evaluated according to several criteria: classification rate and error rate. The Classification rate (CR) represents the probability of the correctly-classified examples. It is calculated

134 | N. Smaoui and H. Amari

(a)

(b)

(c)

Fig. 6: Extraction of breast contour based on marching square Method.

(a)

(b)

(c)

(d)

Fig. 7: Mammogram images presenting a tumor.

as follows:

TC (2) NT with TC: true classified images, NT: Total images. The error rate (ER) refers to the proportion of misclassified examples. It is presented by the following equation: CR = 100

ER = 100 − CR In the Tab. 1, the obtained results are presented.

Tab. 1: Evaluation results Algorithm Our approach

CR 93%

ER 7%

(3)

Detection of the Cancerous Breast Mass |

135

Tab. 2: Comparaison between our approach and k-means algorithm Algorithm k-means Our approach

CR 92.32% 93%

ER 7.77% 7%

4.5 Comparative Study To evaluate our approach, we were based on the same database “MIAS” to compare the results of simulation. The comparison is performed with the algorithm k-means that is widely used in the detection of masses. According to [18], the k-means algorithm detected tumors at an acceptable level. Tab. 2 illustrates the comparison.

4.6 Graphical interface development To facilitate human-machine interaction, by a simple and user system, and reduce the gap between mathematical algorithms and clinical applications, we have implemented a graphical interface. After accessing to our application, we load the image and segment it to obtain the result as presented in Fig. 8.

(a)

(b)

Load an image

segmented image

Result Cancer Fig. 8: Window presenting the final result.

136 | N. Smaoui and H. Amari

5 Conclusion The detection of breast tumors in mammographic images based on morphological operators was studied. A preprocessing step was done before the detection of the tumors in the images. To detect the breast contour we were based on marching square algorithm. We have successfully detected the breast cancer area in raw mammogram images. The results indicate that this system can facilitate the detection of the breast cancer in the early stage of diagnosis. Besides, this proposed method is low cost as it can be implemented in general computer. These results are displayed in a graphical user interface for easy access to our application.

Bibliography [1] S. M.Dua, R. J. Gray and M. Keshtgar..Strategies for localisation of impalpable breast lesions. The Breast, :246–253, 2011. [2] P. Lovrics, S. Cornacchi, R. Vora, C. Goldsmith and K. Kahnamoui. Systematic review of radioguided surgery for non-palpable breast cancer. European Journal of Surgical Oncology, :388–397, 2011. [3] A. Khanna and M. Shrivastva. Unsupervised techniques of seg-mentation on texture images: A comparison. IEEE Int. Conf. on Signal Processing, Computing and Control (ISPCC), :1–6, 2012. [4] A. Q. Al-Faris, U. K. Ngah, N. A. Mat Isa and I. L. Shuaib. Breast MRI Tumour Segmentation using Modified Automatic SeededRegion Growing Based on Particle Swarm Optimization Image Clustering. ADFA, :1–11, 2011. [5] B. Gayathri, C. Sumathi and T. Santhanam. Breastcancer diagnosis using machine learning algorithms ASUR-VEY. Int. Journal of Distributed and Parallel Systems (IJDPS), :8, 2013. [6] R. Marti, A. Oliver, D. Raba and J. Freixenet. Breast Skin-Line Segmentation Using Contour Growing. LNCS 4478, :564–571, 2007. [7] H. M.,Moftah, A. T. Azar, A. T. I., N. Al-Shammari, G. A. E. Hassanien and M. Shoman. Adaptative k-means clustering algorithm for MR breast image segmentationt. Neuronal Compute, 2013 [8] P. M. Patel, B. N. Shah and V. Shah. Image segmentationusing K-mean clustering for finding tumor in medical application. Int. Journal of Computer Trends and Technology (IJCTT), 2013. [9] R. D. Yapa and K. Harada. Breast Skin-Line Estimation and Breast Segmentation in Mammograms using Fast-Marching Method. Int. Journal of Biological and Life Sciences, :54–62, 2007. [10] E. Wael. Segmentation itrative dimages par propagation de connaissances dans le domaine possibiliste: application à la dtection de tumeurs en imageriemammographique. Image Processing. Telecom Bretagne, Université de Bretagne Occidentale, 2012. [11] http://www.mammoimage.org/databases/ [12] M. Boukhobza and M. Mim. Détection automatique de la présence d’anomalie sur une mammographie par réseau de neurones artificiel. Laboratoire Signaux et Applications. Département Electronique. Faculté des Sciences et de la Technologie. Alégrie, 2012 [13] J. Suckling et al. The Mammographic Image Analysis Society Digital Mammogram Database. Int. Congress Series 1069:375–378, 1994.

Detection of the Cancerous Breast Mass |

137

[14] H.Ismahen. Approche morphologique pour la segmentation d’Images Médicales, Applications à la détection des Lésions. Master de l’Université Abou Bakr Belkaid, Tlemcen, Algeria, 2011. [15] J. Stawiaski. Mathematical Morphology and Graphs: Application to Interactive Medical Image Segmentation. Tèhse de doctorat. Paris. [16] C. Maple. Geometric design and space planning using the marching squares and marching cube algorithms. Geometric Modeling and Graphics, 2003. [17] A. Jemal, R. Siegel, T. Murray, J. Xu, W. Elizabeth, and T. Michael. Cancer Statistics. Cancer Journal for Clinicians, :43–66, 2007. [18] K. Rezaee, J. Haddadnia. Designing an Algorithm for Cancerous Tissue Segmentation Using Adaptive k-means Cluttering and Discrete Wavelet Transform. Journal Biomed Phys Eng., 3(3):93–104, September 2013. [19] Breast Cancer. American Cancer Society, available at: (accessed October 2016) http://www.cancer.org/acs/groups/cid/documents/webcontent/003090-pdf.pdf [20] World Health Organization, Cancer, available at: (accessed October 2016) http://www.who.int/mediacentre/factsheets/fs297/en/

Biographies Nadia Smaoui received her master’s and PHD degree in electrical engineering both from ENIS (National School of Engineering in Sfax, Tunisia) respectively in 2006 and 2010. Her research interests are mainly medical imaging. In this domain, she has an article entitled “A developed system for melanoma diagnosis” published in the International Journal of Computer Vision and Signal Processing in 2013 and a paper entitled “Designing a New Approach for the Segmentation of the Cancerous Breast Mass” presented at the International Conference SSD 2016. Halima Amari obtained her master’s degree in 2016 from the ISIMG, (Higher Institute of Computer Science and Multimedia, Gabes, Tunisia). Her work focuses on medical imaging.

R. Abdelmalek and Z. Mnasri

Prosody-based speech synthesis by unit selection for Arabic Abstract: This work aims to develop a high quality Arabic concatenative speech synthesis system based on unit selection. The original unit selection algorithm was modified to integrate more phonological, linguistic and contextual features in order to improve the selection cost calculation from one side, and more prosodic parameters for more exact concatenation cost estimation from the other side. The objective and subjective assessments based respectively on SER (Signal-to-Error Ratio) and on MOS (Mean Opinion Score) tests show satisfactory results. Keywords: Arabic speech synthesis; unit selection; selection cost; concatenation cost. Classification: 65C05, 62M20, 93E11, 62F15, 86A22

1 Introduction Text-to-speech (TTS), or speech synthesis from text, has appeared since a few decades as the outcome of the development of many disciplines such as digital signal processing, computer science and computational linguistics. Nowadays, computers, smart phones and many other home appliances can talk and even answer to questions, which are very useful for handicapped, sick and/or old persons. Thus, the market of this application is very promising and is growing continuously. However, a few problems mainly related to naturalness are still affecting the quality of the automatically generated sounds. Therefore, a variety of techniques were developed to cope with this issue. These techniques can be classified into two main families: parametric techniques and concatenative ones [5]. For parametric speech synthesis, the production of each sound unit is represented by a set of excitation source parameters and vocal tract parameters. The parametric synthesis offers a high degree of flexibility to change the parameters according to the preferences of the designer. But, the quality of the parametric synthesized speech will be monotonic, because parameters associated to each unit are fixed. In concatenative speech synthesis, a huge quantity of speech segments have to be stored in the speech database and each of the speech has many instances with varying prosodic and

R. Abdelmalek and Z. Mnasri: R. Abdelmalek and Z. Mnasri, email: [email protected], [email protected]

https://doi.org/10.1515/9783110594003-011

140 | R. Abdelmalek and Z. Mnasri context situations. The quality of the synthesized speech is close to natural speech since the natural waveforms are concatenated. A few years ago, it was difficult to implement this technique, due to the insufficiency of computers speed and memory. With the progress of computer hardware, large database can be used in concatenative speech synthesis with a reasonable computational load [6]. On the other side, parametric speech synthesis requires a high computational load, and then fast and powerful DSP chips, to comply with the real time constraints. Then concatenative speech synthesis seems to have a better compromise between computational load and storage memory, which is a key feature for selecting the suitable TTS technique: Whilst concatenative TTS is convenient for computers and server-based applications, parametric TTS is more adapted to embedded systems such as smart phones and other appliances (cars, ovens, washing machines, door opening systems ...etc). Since its appearance in mid 1990’s, the concatenative speech synthesis, based on unit selection from a large speech database, has been the most appreciated as it provides the most natural and intelligible speech. Furthermore, using small units allow reducing further the required memory size, which makes it suitable for mobile devices. However, it needs to improve the accuracy and to eliminate the artifacts, which are caused by the intra-segment coarticulation and the spectral discontinuity between adjacent units. Consequently, longer synthesis units such as diphone, half-syllable, syllable, triphone and polyphone are appropriately incorporated to reduce the effect of spectral distortion. [13]. In fact, it is important to note that the speech database has to be designed to cover as much linguistic variability as possible for a particular language or speech domain. The characterization of the database is still an important research issue. Nevertheless, it is clear that the computational cost at synthesis time grows with the size of the database [14]. This paper presents a high-quality Arabic speech synthesis system based on concatenative TTS, particularly unit selection. It is organized as follows. Section 2 explains the reasons for choosing concatenative speech synthesis instead of rule-based speech synthesis or parametric TTS, section 3 details the theory behind unit selection speech synthesis and the approach which was implemented in this work, section 4 presents the Arabic speech database used in this system, and section 5 shows the results of the conducted experiments. Finally in the conclusion, the findings are discussed and new developments are announced.

2 Speech synthesis 2.1 Speech signal processing Speech, from the physical point of view, is an acoustic phenomenon which corresponds to a vocal signal which has, as every signal, an analytic representation. It

Prosody-based speech synthesis by unit selection for Arabic |

141

is therefore represented by a sound wave whose general form is described by the following mathematical expression in the complex domain: Ṡ (t) = S m e jρ(t)

(1)

where S m is the amplitude of the signal and ρ is the phase of the signal [1]. This expression is easily obtained through the application of Hilbert transform ̇ [17]: (HT), giving the complex analytic signal S(t) ̇ = s(t) + jHT[s(t)] S(t)

(2)

where the Hilbert transform is given as the Cauchy principal value (pv) of the integral mentioned in: +∞ s(t − τ) dτ (3) HT[s(t)] = pv ∫ πτ −∞

Since the speech signal is neither deterministic nor stationary, then it’s difficult to characterize it using the conventional signals parameters (frequency, magnitude, initial phase...etc.). Therefore, a set of characteristic parameters is defined to characterize the speech signal, including prosodic parameters such as duration, fundamental frequency, pitch, energy and spectral parameters related to frequency analysis such as MFCC (Mel-Frequency Cepstral coefficients), MGC, PLP, ...etc.

Fundamental frequency (F0 ) Fundamental frequency is the frequency of vibration of the vocal cords. Thus it characterizes the segments of speech, within which it slowly evolves over time. The fundamental frequency, denoted F0 , varies from one speaker to another. It is necessary to differentiate the fundamental frequency and pitch, even if both are almost confounded.

Pitch Pitch is a subjective attribute of sound. It is equivalent to the fundamental frequency (physical attribute of the acoustic wave). The relationship between the pitch measured on a non-linear "mel-scale" in the frequency domain and the fundamental frequency (F0 ) of a signal is given by: Pitch mels = 1127 ln (1 +

F0 ) 700

(4)

142 | R. Abdelmalek and Z. Mnasri Energy Energy characterizes the sound intensity of a speech segment. A speech consists of a succession of voiced sounds and unvoiced sounds which amplitude differs significantly. Voiced sounds are considered as quasi-periodic signals with a fundamental frequency and harmonics. They are mainly expressed by the vowels (/ a /, / i /, / o /, / u /), the diphthongs (/ w /, / y /) and the nasal consonants (/m/, /n/). Unvoiced sounds correspond to the turbulent air flow. Thus, these sounds have a fairly high frequency and rapid variations, and are generally regarded as fricative noises [15]. For voiced parts of the speech signal, the energy spectrum shows more regular frequencies (F0 and its harmonics), whereas in unvoiced regions, the energy spectrum looks rather to a noise spectrum. (cf. Fig. 1).

2.2 Speech signal synthesis methods This was the first computational speech synthesis technique. The speech synthesizer of Klatt (1980) is still the renowned formant synthesizer [1]. This approach requires the knowledge of the mechanisms of perception and speech production. Formant synthesis is the most prominent example technique of synthesis by rules. Formants denote the maxima which correspond to resonance frequencies of the vocal tract transfer function. In this approach, we only need to specify formant frequencies and bandwidths, plus an overall gain factor and the rules of evolution between phonemes. This technique assumes that if any expert in phonetics can read the spectrogram of a speech, speech generation rules can therefore be deduced. These rules describe the coarticulation of the phonemes and the temporal evolution of formants by creating an artificial voice signal spectrum. The signal is then generated by a formants synthesizer. The speech production can be explained by a source filter modeling. It is therefore a synthesis that involves only the signal processing, without any voice sample. The main advantage of the formant synthesis technique is the use of minimum storage space (only the target values). However, the main disadvantage is the deterioration of the overall quality and intelligibility of synthesized speech.

2.3 Parametric Speech synthesis Parametric speech synthesis means that, starting from a small database of speech segments, i.e. phonemes, syllables or words, the TTS system should be able to calculate and modify the values of the prosodic parameters, i.e. duration, fundamental frequency (F0 ) and intensity, for the stored segments, according to the text requirements. Then, an audio synthesis filter such as MLSA filter [4] can generate

Prosody-based speech synthesis by unit selection for Arabic |

143

a waveform matching to the sequence of the modified segments. Hence the major issue is the accurate prediction of the prosodic parameters from text and eventually other environment features (gender of the speaker, nature of the speech...etc.). This is generally carried out by statistical learning systems such as neural networks [9] and HMM [7]. HTS (HTK-based speech synthesis system) is currently one of the most successful parametric TTS systems [4]. However, HTS users usually notice a humming noise and a sort of envelope sound. This is due to the smooth prediction of F0 and MFCC parameters, without taking care of the micro prosodic variations which characterize the human natural sound.

2.3.1 Concatenative speech synthesis This approach generates speech by concatenating pre-recorded speech units to build larger segments (syllables, words, phrases...). The unit’s size can be as large as a word or a phrase, or decrease to syllables, phonemes, or diphones. Large quantities of speech waveforms should be stored in the speech corpus, and each of the speech sound has many instances with varying context and prosodic situations [3]. On the other hand, concatenative speech synthesis offers a better quality in terms of intelligibility and naturalness. In fact, the text segments, whatever their level, i.e. phonemes, syllables or words are all selected from the system’s database. The larger the database is, the better the speech quality is. However, an optimal selection process should be used to ensure that the best fitting units are selected. This technique, called unit selection TTS, was developed since the mid 1990’s and applied to many languages (English, Japanese, German, Polish...etc.) [2]. However, works on Arabic TTS based on unit selection are very rare [8, 11]. Besides, Arabic is a multi-dialect language, and even if standard Arabic follows the same rues in every Arabic country, the pronunciation of standard Arabic is highly influenced by the native dialect of the speaker. Therefore, we aim to design a high-quality Arabic TTS system based on unit selection, mainly for North-African users. The designed system is based on the optimal selection of units through the minimization of a selection cost and a concatenation one. A backtracking algorithm to choose the least costing sequence is then applied.

3 Speech Synthesis Using Unit Selection Speech synthesis using unit selection is based on the selection of the best fitting units of speech (which have the same level, either phonemes or syllables or words). The selection must obey to two main criteria: The selected units are fitted to the context of the target utterance. The speech generated by the selected sequence offers the same level of prosodic variations as the target one.

144 | R. Abdelmalek and Z. Mnasri If the first criterion is not respected, the resulting segments sequence would be heterogeneous, as it does not take into account the contextual features of the target units. e.g. if the segment is placed in the beginning of the target sequence, and if it is the accentuated part of the word, the unit to be selected from the database should comply to these constraints, in order to ensure a maximum smoothness for the generated speech. In the same way, ignoring the second criterion would lead to a reverberating sound, resulting from the notable difference of the prosodic parameters, mainly F0 and intensity, between the target and the selected units [cf. (9) and (10) in Tab 1]. Therefore, to minimize the risk of selecting the wrong unit in the wrong place, a cost function is calculated to help selecting the best fitting sequence of units.

3.1 Cost function and unit selection The selection of good units for synthesis requires an appropriate definition of the selection and concatenation costs and an effective training of these costs. As it was already described, each target segment has a target intensity and pitch. From the sequence of targets, we can also determine the linguistic, phonological and contextual characteristics of the previous and following segments. Thus, each target segment and each candidate in the synthesis database is characterized by a multidimensional feature vector [15].

3.2 Elementary unit choice 3.2.1 Particularities of the Arabic language Arabic is a Semitic language of the same family as Hebrew and Aramaic. The Arabic language is now used as a first or second language by more than 300 million inhabitants in the Arab world in addition to the Arab Diaspora. Due to its syntactic and morphological properties, Arabic is considered as a delicate language to be mastered in the field of automatic language processing. Actually, the first works in the field of automatic processing of the Arabic language began in the 1970s, including research on morphology and lexicons [18]. The vocalic system of Arabic comprises 34 phonemes: 6 vowels and 28 consonants. The vowels are mentioned above and below the consonants as diacritic signs. However, these symbols are absent in the majority of written texts, for they can be easily guessed by an instructed Arabic speaker. Three diacritic signs are used to represent short vowels, in the following way: – Fat’ha “a”: represented by a short straight line on the consonant, for example “ba”:

Prosody-based speech synthesis by unit selection for Arabic |

145



Kasra “i”: represented by a small straight line on the consonant, for example “bi”:



Dhamma “u”: represented by a hook on the consonant, for example “bu”:

There is also another diacritic sign, called soukoun which indicates stop after a consonant. It is symbolized by a small ring affixed to the consonant when it is not followed by any vowel “b”: In addition to these three short vowels (/a /, /i /, /u/), there are three long vowels (/a:/, /i:/, /u:/), and which are represented by the graphemes “Alif”, “yaa” and “waw”, respectively: ,

, The Arabic word can be decomposed into syllables. This type of segment has a very important linguistic role. The number of types of syllables is restricted to six where C denotes a consonant; V is a short vowel, and V: is a long vowel /CV/, /CV:/, /CVC/, /CV:C/, /CVCC/ and /CV:CC/ [16].

3.2.2 Elementary unit choice The speech corpus has to be large enough to cover the maximum possible base units. Therefore, when selecting the base unit for the development of our synthesis system, we must consider the balance between the quality of the synthesized speech and the size of the speech corpus. Another factor for the choice of the base unit to the synthesis depends is the spoken language itself. For instance, most Arabic words are built from a common root from which other words are derived either by adding certain prefixes or suffixes, or by changing a few vowels. In other words, the structures of Arabic words are similar and generally refer to the grammatical nature of the word [3]. The third factor is the segment level (phoneme, diphone or syllable). Actually, the Arabic language has a well defined structure of syllables. Arabic Syllables have a vowel core surrounded by consonants on both sides. Syllables can preserve the coarticulation effect better than diphones or phonemes. In addition, the diphone may cause temporal discontinuities at the point of articulation between a consonant and a vowel. These discontinuities are less audible between two successive [5]. For these reasons, we chose to develop our techniques using syllables.

146 | R. Abdelmalek and Z. Mnasri

3.3 Cost calculation The first criterion, C s , is called the selection cost, whilst the second one, C c , is referred as the concatenation cost. Their sum gives the total cost, C t , which is defined as: p

C s (t i , u i ) = ∑ w sj C sj (t i , u i )

(5)

j=1 p

C c (u i−1 , u i ) = ∑ w cj C cj (u i−1 , u i )

(6)

j=1 N

N

C t (t ni , u ni ) = ∑ C s (t i , u i ) + ∑ C c (u i−1 , u i ) + C c (S, u1 ) + C c (u n , S) i=1

(7)

i=2

where: S: denotes a silence segment u i−1 , u i : the previous and the current candidate units t i : the current target unit P: Number of features used to calculate the selection cost Q: Number of prosodic parameters used to calculate the concatenation cost. w js and w jc : are the selection and the concatenation weights, respectively. The weights, w js and w jc , are set either manually, by addressing a specific value to each feature or class of features, according to their relevance, e.g. class of the segments (nasals, fricatives,...etc.), voicing, accentuation, ...etc. or by regression training, which is more suitable for huge databases. In our work, all weights were set to unit since all the features upon which the selection and the concatenation costs are calculated have the same relevance (cf. Tab. 1 and Tab. 2).

3.4 Least-cost sequence selection Once the total cost is calculated for the whole sequence of segments, the units giving the least total cost should be selected. This is done by backtracking, where the selected units should satisfy the following criterion: {U s } =

min

u1 ,u2 ,...u n

C t {(t1 , ..., t n ), (u1 , ..., u n )}

(8)

where (t1 , ..., t n ) is the target sequence, (u1 , ..., u n ) is a candidate sequence, and U s is the selected sequence. Many algorithms are able to solve this problem, including Viterbi, Bellman-Ford and Dijkstra’s algorithms. However, we used another approach which consists in selecting the least costing unit at each syllable. Actually, the sequence of syllables is known a priory, and we have to choose only a candidate sample for a reduced set

Prosody-based speech synthesis by unit selection for Arabic |

147

of similar syllables. That’s why this algorithm is faster than looking for the candidate syllable amongst the whole set of all syllables. Then the algorithm proceeds as follows: 1. Fix the starting unit (the least costing first syllable). 2. Calculate the total cost of the next syllable, only from the set of similar syllables. 3. Move forward until the end of the sequence. In this way, the total cost of the sequence will be minimal since it’s the sum of minimum costs.

3.5 Concatenation of units Once the units are selected from the database, we have to reconstruct the signal by concatenation. The criteria used in the selection phase have shown their relevance: the selected units are concatenated using the overlap and add method (OLA) [10], to yield in a smoothed signal, without abrupt transitions neither in the time or in the frequency domains.

4 Speech database The speech database used for this study consists in a phonetically-balanced Arabic speech corpus containing 105 utterances in standard Arabic [4]. This corpus was recorded by a male speaker using 16 kHz sampling rate and 16-bit encoding. The speech data undertook a two-step preprocessing.

4.1 Speech segmentation and labeling The segmentation of the speech signal can be performed either manually or automatically by means of a segmentation program. The segmentation step is very important for a system of synthesis by selection of units, but it can present difficulties because it is carried out manually. The method of segmentation that we used in our work is manual. During this stage, the identification of the different units was done through the use of the temporal form of the acoustic wave corresponding to the recording which remains the criterion of major choice for the segmentation. After completing the segmentation, all the resulting segments were listened to and some alterations were made to the poorly perceived units. The utterances were split into syllables, which were analyzed and stored. For each syllable, the prosodic parameters, i.e. duration, F0 and intensity were extracted. Tab. 2 presents the specific points at which the prosodic parameters were extracted. Actually, the parameters are necessary to calculate the concatenation cost.

148 | R. Abdelmalek and Z. Mnasri Tab. 1: Prosodic parameters used for concatenation cost Prosodic parameters Intensity Fundamental frequency (F0 )

∆(I) ∆(F0 )

Tab. 2: Features used for Selection cost Types of features Linguistic - Modality of the sentence - Syllable’s type - Adjacent syllables types

Phonological -Number of accented syllables

Contextual - Total number of syllables

- Accentuation level of the syllable - Class of the syllable’s nucleus

- Position of the syllable in the sentence - Number of remaining syllables in the sentence - Number of phonemes in the syllable - Nucleus position in the syllable - Nucleus position in the sentence - Last syllable in the sentence (Yes/No) - Nucleus is last phoneme in the syllable (Yes/No)

- Class of adjacent phonemes

4.2 Text-based features extraction To calculate the selection cost, many features are involved. They were automatically extracted from the speech corpus transcription using dedicated programs. These features can be classified into linguistic, phonological and contextual features (cf. Tab. 1 and Tab. 2), using the following mathematical expressions: |I mean,U i − I mean,U i−1 | |I mean,U i + I mean,U i−1 | |F0,mean,U i − F0,mean,U i−1 | ∆(F0 ) = |F0,mean,U i + F0,mean,U i−1 | ∆(I) =

(9) (10)

5 Experiments and results 5.1 Experimental methodology A set of sentences was generated from text using the method described in this paper, thus following a 3-step strategy.

Prosody-based speech synthesis by unit selection for Arabic |

149

5.1.1 Pre-processing The linguistic, phonological and contextual features described in Tab. 1 and Tab. 2 were extracted and stored as the target parameters which will be useful to calculate the selection cost (cf. 5).

5.1.2 Cost calculation Each unit having the same label as the target unit is considered. Since the database contains all the necessary data to calculate the total cost, the program should move forward until all candidate units are processed.

5.1.3 Backtracking Once the total costs are calculated for all the candidate units, the least cost sequence is browsed based on the minimal cost algorithm described above.

5.2 Results The unit selection base speech synthesis was applied to many sentences taken from the speech corpus. Actually, our aim was to assess the performance of the developed method subjectively, by comparing original and synthesized waveforms and sounds. This evaluation consists in comparing the waveforms and the quality of the generated sounds. Waveforms superposition is compared visually, whereas the sound quality is assessed by through formal listening by auditors, who shouldn’t be aware whether the sound is original or synthesized.

Original speech waveform Intensity

Time (s) Synthesized speech waveform Intensity

Time (s) Fig. 1: Original and synthesized waveforms of the Arabic sentence / hal laDa’tahu biqawlin/:

150 | R. Abdelmalek and Z. Mnasri Original speech waveform Intensity

Time (s) Synthesized speech waveform Intensity

Time (s) Fig. 2: Original and synthesized waveforms of the Arabic sentence / saqat’at ibratun/:

Many sample speech signals were synthesized using the unit selection algorithm described above. The syllables were taken from a phonetically balanced speech corpus (cf. section 4). Most of the generated signals give similar waveforms, except in few syllables where the syllable duration could be shorter than the original one (cf. Fig. 1 and Fig. 2).

5.2.1 Statistical evaluation The signal-to-error ratio (SER) function is commonly used in objective evaluation of synthesized speech [12]: +∞

∑ |s i (f)|2

−∞

SER (dB) = 10 log10 +∞ ∑ |s i (f) − ̂s i (f)|2

(11)

−∞

Our test sample set consists of 30 speech sentences, 24 of them are affirmative sentence and 6 are interrogative sentences. Each sentence is approximately 20 ms. In this section, we present the SER results. (cf. Tab. 3) Tab. 3: Unit selection performance Sentence Affirmative sentence Interrogative sentence

Number of sentence 24 6

SER (dB) 4.485 3.256

Prosody-based speech synthesis by unit selection for Arabic |

151

Tab. 4: MOS test results Criteria Intelligibility Naturalness

Unsatisfactory 8.3 % 12.5 %

Satisfactory 41.7 % 41.6 %

Excellent 50 % 45.9 %

5.2.2 MOS scores MOS (Mean opinion scores) were calculated from a survey realized by 10 auditors, who were asked to rate the speech quality in terms of intelligibility and naturalness, without telling them whether the signal is the original or synthesized. (cf. Tab. 4)

6 Discussion The evaluation tests show a remarkably good quality of speech synthesized with the syllable as a concatenation unit. This is due to several reasons; first of all, the quality of the segmentation of the speech during the construction of the database influences the quality of the synthesized speech. Indeed, using syllables as elementary units, we didn’t find any difficulty in specifying the boundaries of the syllable for Arabic language. Secondly, the number of concatenation points during the synthesis is low when using syllables, unlike smaller units as phonemes or diphones, hence minimizing the distortion of the speech signal at the concatenation points. At this stage, it can be said that the use of the syllable as a basic unit offers better quality for Arab speech synthesis by unit selection. However, the number of possible syllables in Arabic is relatively high (more than one thousand possible syllables), but the growth of embedded hardware, or online processing, this shouldn’t be a serious problem.

7 Conclusion In this paper, a novel algorithm of Arabic speech synthesis using unit selection was presented. The adopted unit was the syllable. The implementation of this algorithm was based on the calculation of the selection cost and the concatenation cost. To calculate the selection cost, a set of phonological, linguistic and contextual features was extracted and processed for each candidate unit, whereas the concatenation cost depends on the mean difference of prosodic parameters between the adjacent units. Finally, the least costing sequence is kept. Many Arabic speech utterances were tested, giving satisfactory results using objective evaluation, through SER measures, and subjective assessment by means of MOS tests. Finally, this work could be continued to include more contextual situations and emotion-sensitive speech synthesis.

152 | R. Abdelmalek and Z. Mnasri

Bibliography [1] D. H. Klatt. Synthesis by rule of segmental durations in English sentences. K. Lindblom & K. Ohman (Eds.), Frontiers of speech communication research. London: Academic Press, 1979. [2] H.J. Andrew and A.W. Black. Unit selection in a concatenative speech synthesis system using a large speech database. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, ICASSP-96, 1:, 1996. [3] A. Hunt and A. Black. Unit selection in a concatenative speech synthesis system using a large speech database. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, ICASSP-96, 1:373–376, 1996. [4] M. Boudraa, B. Boudraa and B. Guerin. Twenty lists of ten Arabic sentences for assessment. Acta Acustica united with Acustica 86(5):870–882, 2000. [5] W. Hamza and M. Rashwan. Concatenative Arabic speech synthesis using large database. ICSLP2000, 2:182–185, Beijing, China. 2000. [6] C. Henton. Challenges and rewards in using parametric or concatenative speech synthesis. Int. Journal of Speech Technology 5(2):117–131, 2002. [7] K. Tokuda, H. Zen and Alan W. Black. An HMM-based speech synthesis system applied to English. IEEE Workshop on Speech Synthesis, :227–230, 2002. [8] M. Elshafei, H. Al-Muhtaseb and M. Al-Ghamdi. Techniques for high quality Arabic speech synthesis. Information Sciences 140(3):255–267, 2002. [9] H. Mixdorff and O. Jokisch. Evaluating the quality of an integrated model of German prosody. Int. Journal of Speech Technology 6(1):45–55, 2003. [10] H. Kawai and T. Toda. An evaluation of automatic phoneme segmentation for concatenative speech synthesis. IEICE Tech. Rep., Shiga, January 2003 (in Japanese). [11] Y. Hifny et al. ArabTalkő: An Implementation for Arabic Text To Speech System. 4th Conf. on Language Engineering, 2004. [12] G. T. Beauregard, X. Zhu, and L. Wyse. An efficient algorithm for real-time spectrogram inversion 8th Int. Conf. Digital Audio Effects (DAFX-05), :116–121, September 2005. [13] D. Bigorgne, 0. Boeffard, B. Cherbonnel, F. Emerard, D. Larreur, J. L. Le Saint-Milon, I. Metayer, C. Sorin, S. White. Multilingual PSOLA text-to-speech system. ICASSP, II:187–190, Minneapolis, Minnesota, U.S.A., 1993. [14] A.W.Black. Perfect Synthesis for all of the people all of the time. IEEE TTS workshop, Santa Monica, 2002. [15] D. Alsteris and K. Paliwal. Iterative reconstruction of speech from short-time Fourier transform phase and magnitude spectra, Computer Speech & Language, 21(1):174–186, 2007. [16] Z. Mnasri. Conception et réalisation d’un générateur de prosodie de parole arabe par réseaux de neurones. Application à la Synthèse Vocale. PhD Thesis, Ecole Nationale d’Ingénieurs de Tunis, Université Tunis El Manar, Tunisia, 2011. [17] B. Boashash. Estimating and interpreting the instantaneous frequency of a signal. I. fundamentals. Proceedings of the IEEE, 80(4):520–538, 1992. [18] S. H. Al-Ani. Arabic phonology: An acoustical and physiological investigation. Walter de Gruyter, Series: Janua Linguarum, Series Practica 61, 2014.

Prosody-based speech synthesis by unit selection for Arabic |

153

Biographies Raja Abdelmalek was born on November 30th, 1990. She obtained Bsc. Degree in electrical engineering and the Msc. Degree in Automation and DSP from the Ecole nationale d’ingénieurs de Tunis (ENIT), respectively in 2014 and 2015. Since 2015, she started her Ph.D research at the Laboratory of Signal, Image and Technology of Information-LSITI at ENIT which is about studying the minimal conditions for signal reconstruction. She has been working as part-time instructor at the Institut supèrieur des Etudes Technologiques (ISET) teaching embedded systems and electronics and teaching digital signal processing at ENIT. Zied Mnasri obtained the Bsc. In electrical engineering and the Msc. degrees degree in Automation and DSP from the Ecole Nationale d’Ingnéieurs de Tunis (ENIT), respectively in 2003 and 2004. After a 4-year experience in industry, he conducted his Ph.D. degree in electrical engineering from Sept. 2007 to Feb. 2011. He has been working as tenured assistant professor at ENIT since September 2011, teaching digital signal processing, embedded systems, digital and analog electronics. He is also an active member of the Laboratory of Signal, Image and Technology of Information-LSITI at ENIT. His research interests are related to speech processing, including speech synthesis, prosody modeling and statistical learning techniques used in this field, especially Deep Neural Networks (DNN) and Hidden Markov Models (HMM).