Speech and Language Technologies for Low-Resource Languages - First International Conference, SPELLL 2022, Kalavakkam, India, November 23–25, 2022, Proceedings [1 ed.] 9783031332302, 9783031332319

This book constitutes refereed proceedings from the First International Conference on Speech and Language Technologies f

136 74 22MB

English Pages XIII, 356 [362] Year 2023

Report DMCA / Copyright

DOWNLOAD FILE

Speech and Language Technologies for Low-Resource Languages - First International Conference, SPELLL 2022, Kalavakkam, India, November 23–25, 2022, Proceedings [1 ed.]
9783031332302, 9783031332319

Author / Uploaded
Anand Kumar M
Bharathi Raja Chakravarthi
Bharathi B
Colm O’Riordan
Hema Murthy
Thenmozhi Durairaj
Thomas Mandl

Categories
Linguistics

Table of contents :
Preface
Organization
Contents
Language Resources
KanSan: Kannada-Sanskrit Parallel Corpus Construction for Machine Translation
1 Introduction
2 Related Work
3 Kannada-Sanskrit Parallel Corpus Construction
4 Baseline Machine Translation Models
4.1 Preprocessing
4.2 Subword Tokenization
4.3 Statistical Machine Translation
4.4 Neural Machine Translation
4.5 Back-Translation
5 Experimental Setup and Results
6 Conclusion and Future Work
References
A Parsing Tool for Short Linguistic Constructions
1 Introduction
2 Related Work
3 Proposed IL TAG Parser
4 Customized TAG Grammar
5 Parsing Example for Short Linguistic Notation
6 Performance Analysis
7 Conclusion
References
TamilEmo: Fine-grained Emotion Detection Dataset for Tamil
1 Introduction
2 Related Work
2.1 Datasets for Emotion Detection
2.2 Emotion Classification
3 Tamil Emotion Dataset
3.1 Scraping Raw Data
3.2 Annotator Statistics
3.3 Inter-Annotator Agreement
3.4 Selecting and Curating YouTube Comments
4 Data Analysis
4.1 Keywords
5 Modeling
5.1 Data Preparation
5.2 Baseline Experiments
5.3 Experiment Settings
5.4 Results and Discussion
6 Conclusion
A Emotion Definitions
References
Context Sensitive Tamil Language Spellchecker Using RoBERTa
1 Introduction
2 Related Works
3 Model
3.1 Dictionary Creation
3.2 Test Dataset Creation
3.3 XLM-RoBERTa-base Model
4 Error Analysis
5 Experiments
5.1 Experiment: Xlm-roberta-base Model on Wikipedia Article ch4wikiarticle
6 Comparison with Tamil Spellcheckers
7 Conclusion
References
Correlating Copula Constructions in Tamil and English for Machine Translation
1 Introduction
2 Zero Copula Construction
3 Copula Construction with àaku' as the Copula Verb
4 Copula Construction with ìru' as the Copula Verb
5 Copula Construction with àaka-Iru' as the Copula Verb
6 Copula Construction as Embedded Sentences
7 Problematic Cases
8 Conclusion
References
Language Technologies
Tamil NLP Technologies: Challenges, State of the Art, Trends and Future Scope
1 Introduction
2 Risks and Challenges in the Technological Development of Tamil
3 Language Resources: Data, Knowledge Base, and Resources
3.1 Text Corpora
3.2 Corpora for Speech
3.3 Parallel Corpora
3.4 Lexical Resources
3.5 Grammars
4 Language Technology: Tools, Grammatical Technologies and Applications
4.1 Word Segmentation or Tokenization
4.2 Stemming and Lemmatization
4.3 Morphological Analysis
4.4 Morphological Generation
4.5 Part-of-speech Tagging
4.6 Chunking
4.7 Named Entity Recognition (NER)
4.8 Shallow Parsing
4.9 Syntactic Parsing
4.10 Identification of the Clause Boundary
4.11 Speech Recognition
4.12 Speech Synthesis
4.13 Optical Character Recognition
5 Semantic Analysis
5.1 Word Sense Disambiguation
5.2 Question Answering
5.3 Relationship Extraction
5.4 Paraphrase Identification
5.5 Automatic Text Summarization
5.6 Co-reference Resolution
5.7 Text Generation
5.8 Machine Translation
6 Social Media Text Analysis
6.1 Sentiment Analysis
6.2 Offensive Content Identification
7 Conclusion and Future Scope
References
Contextualized Embeddings from Transformers for Sentiment Analysis on Code-Mixed Hinglish Data: An Expanded Approach with Explainable Artificial Intelligence
1 Introduction
2 Motivation
3 Related Work
3.1 Pre-trained Language Models
3.2 Sentiment Analysis on Hinglish
3.3 LIME
4 Methodology
4.1 Dataset
4.2 Experiment
4.3 Evaluation Metrics
5 Results and Analysis
5.1 Empirical Results
5.2 Statistical Testing
5.3 Explainability
6 Discussion
7 Conclusion and Future Work
References
Transformer Based Hope Speech Comment Classification in Code-Mixed Text
1 Introduction
2 Related Work
3 Dataset Description
4 Methodology
4.1 Feature Extraction
4.2 Machine Learning Model
4.3 Deep Learning Models
4.4 Approaches
5 Results and Evaluation
5.1 Results
5.2 Evaluations
6 Conclusion
References
Paraphrase Detection in Indian Languages Using Deep Learning
1 Introduction
2 Literature Survey
3 Methodology
3.1 System Architecture
3.2 System Description
3.3 Dataset
4 Implementation
4.1 BERT
4.2 Seq2Seq
4.3 USE
4.4 Ensembled Model
5 Experimental Results
5.1 Prediction Results
5.2 Task 1
5.3 Task 2
5.4 Comparison of Algorithms
5.5 Performance Comparison
5.6 Error Analysis
6 Conclusion
References
Opinion Classification on Code-mixed Tamil Language
1 Introduction
1.1 Sentiment Analysis
1.2 Sentiment Analysis on Code-mixed Language
2 Literature Review
2.1 Sentiment Analysis on Mono Lingual Data
2.2 Sentiment Analysis on Code-Mixed Data
3 Methodology
3.1 Data Description
3.2 System Design
3.3 Data Pre-processing
3.4 Word Embedding
3.5 Machine Learning Approach
4 Performance Evaluation
4.1 Evaluation Metrics
5 Results and Discussions
6 Conclusion
References
Analyzing Tamil News Tweets in the Contextof Topic Identification
1 Introduction
2 Related Work
3 Corpus Creation
3.1 Extracting Tweets
3.2 Generating Labels Through Keyword-Based Distant Supervision
4 Corpus Analysis
5 Experiments and Results
5.1 Experiments
5.2 Results
6 Limitations and Future Work
7 Conclusion
References
Textual Entailment Recognition with Semantic Features from Empirical Text Representation
1 Introduction
2 Related Work
3 Proposed Approach
3.1 Empirical Text Representation
3.2 Feature Extraction of Text-Hypothesis Pair
4 Experiments Results
4.1 Dataset
4.2 Experimental Settings
4.3 Performance Analysis of Entailment Recognition
4.4 Comparative Analysis
5 Conclusion with Future Direction
References
Impact of Transformers on Multilingual Fake News Detection for Tamil and Malayalam
1 Introduction
1.1 Motivation and Contribution
2 Multilingual Fake News Dataset Description
3 Methodology
3.1 Experimental Setup
4 Results and Discussions
5 Conclusion and Future Works
References
Development of Multi-lingual Models for Detecting Hope Speech Texts from Social Media Comments
1 Introduction
2 Literature Survey
3 Proposed Methodology
3.1 Preprocessing
3.2 Model Construction
4 Experimental Settings, Results and Findings
4.1 Experimental Results
4.2 Findings and Discussions
5 Conclusion and Future Work
References
Transfer Learning Based Youtube Toxic Comments Identification
1 Introduction
2 Related Work
3 Proposed System
3.1 Models
3.2 Classifiers
4 Performance Evaluation
5 Conclusion
References
Contextual Analysis of Tamil Proverbs for Automatic Meaning Extraction
1 Introduction
2 Background
3 Related Works
3.1 Works Related to Tamil Language and Literature
3.2 Works Related to Sentence Scoring Approach
4 Proposed Work
4.1 Dataset Creation
4.2 Meaning Extraction for Tamil Proverbs
5 Result and Discussion
6 Conclusions
References
Question Answering System for Tamil Using Deep Learning
1 Introduction
2 Related Works
3 System Architecture
3.1 Modules
4 Experiment and Results
4.1 Datasets
4.2 Models Used
4.3 BERT
4.4 XLM-RoBERTa
4.5 Results
5 Conclusion
References
Exploring the Opportunities and Challenges in Contributing to Tamil Wikimedia
1 Wikimedia Project- an Introduction
2 Tamil Wikimedia Project
2.1 Veteren Tamil Wikimedia Contributors
2.2 Being Part of Tamil Wikimedia
3 Opportunities in Tamil Wikimedia
4 Challenges in Tamil Wikimedia
5 Conclusion and Future Scope
References
Speech Technologies
.26em plus .1em minus .1emEarly Alzheimer Detection Through Speech Analysis and Vision Transformer Approach
1 Introduction
2 Related Work
3 Proposed Vision Transformer Approach for Alzheimer Detection
3.1 Log Mel Spectogram
3.2 Vision Transformer Deep Learning Model
4 MFCC and Random Forest Approach for Alzheimer Detection
4.1 MFCC
5 Experimental Analysis
5.1 Experimental Setup
5.2 Experimental Analysis
5.3 Metrics
5.4 Experimental Results
6 Conclusion
References
Multimodal Data Analysis
Active Contour Segmentation and Deep Learning Based Hand Gesture Recognition System for Deaf and Dumb People
1 Introduction
2 Related Works
3 Proposed System
3.1 Image Segmentation
3.2 Deep Learning Based Hand Gesture Recognition Model:
4 Implementation
5 Result and Discussions
6 Conclusion
Appendix 1
References
Multimodal Hate Speech Detection from Bengali Memes and Texts
1 Introduction
2 Related Work
3 Methods
3.1 Data Preprocessing
3.2 Neural Word Embeddings
3.3 Training of DNN Baseline Models
3.4 Training of Transformer-Based Models
3.5 Multimodal Fusion and Classification
4 Experiment Results
4.1 Datasets
4.2 Experiment Setup
4.3 Analysis of Hate Speech Detection
5 Conclusion
References
Workshop 1: Fake News Detection in Low-Resource Languages (Regional-Fake)
A Novel Dataset for Fake News Detection in Tamil Regional Language
1 Introduction
2 Related Work
3 Proposed Work
3.1 Data Scraping
3.2 Real News Data Collection
3.3 Fake News Data Collection
3.4 Challenges in Data Collection
3.5 Data Cleansing
3.6 Exploratory Data Analysis (EDA)
3.7 Corpus Statistics
4 Benchmark Models
4.1 Data Representation
4.2 Classifiers
4.3 Results
5 Conclusion
References
Fake News Detection in Low-Resource Languages
1 Introduction
2 Related Work
3 Fake News Dataset
4 Methodologies Used
4.1 Logistic Regression
4.2 BERT-Base Model
5 Implementation
6 Result and Analysis
7 Conclusion
References
Workshop 2: Low Resource Cross-Domain, Cross-Lingual and Cross-Modal Offensive Content Analysis (LC4)
MMOD-MEME: A Dataset for Multimodal Face Emotion Recognition on Code-Mixed Tamil Memes
1 Introduction
2 Related Works
3 Dataset Collection
4 Details of Dataset Construction
5 Dataset Analysis
6 Conclusion
References
End-to-End Unified Accented Acoustic Model for Malayalam-A Low Resourced Language
1 Introduction
2 Related Work
3 Proposed Methodology and Design
3.1 Dataset Construction
3.2 Feature Engineering
3.3 Building the Accented ASR System
4 Experimental Results
5 Conclusion and Future Scope
References
Author Index

Polecaj historie

Speech and Language Technologies for Low-Resource Languages : First International Conference, SPELLL 2022, Kalavakkam, India, November 23–25, 2022, Proceedings [1 ed.] 9783031332319, 9783031332302

This book constitutes refereed proceedings from the First International Conference on Speech and Language Technologies f

127 6 34MB Read more

Emerging Networking Architecture and Technologies. First International Conference, ICENAT 2022 Shenzhen, China, November 15–17, 2022 Proceedings 9789811996962, 9789811996979

489 95 53MB Read more

IoT Technologies for HealthCare: 9th EAI International Conference, HealthyIoT 2022, Braga, Portugal, November 16-18, 2022, Proceedings 9783031286636, 9783031286629, 3031286634

This book constitutes the refereed proceedings of the 9th EAI International Conference on IoT Technologies for HealthCar

189 101 34MB Read more

Blockchain Technology and Emerging Technologies: Second EAI International Conference, BlockTEA 2022, Virtual Event, November 21-22, 2022, Proceedings 3031314190, 9783031314193

This book constitutes the refereed proceedings of the Second EAI International Conference on Blockchain Technology and E

702 55 7MB Read more

Wireless Internet: 15th EAI International Conference, WiCON 2022, Virtual Event, November 2022, Proceedings 3031270401, 9783031270406

This book constitutes the refereed post-conference proceedings of the 15th International Conference on Wireless Internet

435 38 12MB Read more

Conceptual Modeling. 41st International Conference, ER 2022 Hyderabad, India, October 17–20, 2022 Proceedings 9783031179945, 9783031179952

456 74 31MB Read more

Performance Evaluation Methodologies and Tools: 15th EAI International Conference, VALUETOOLS 2022, Virtual Event, November 2022, Proceedings 9783031312342, 9783031312335, 3031312341

This book constitutes the refereed conference proceedings of the 15th International Conference on Performance Evaluation

208 42 33MB Read more

Telecommunications and Remote Sensing. 11th International Conference, ICTRS 2022 Sofia, Bulgaria, November 21–22, 2022 Proceedings 9783031232251, 9783031232268

184 62 16MB Read more

Design, Learning, and Innovation: 7th EAI International Conference, DLI 2022, Faro, Portugal, November 21-22, 2022, Proceedings 3031313917, 9783031313912

This book constitutes the refereed post-conference proceedings the 7th International Conference on Design, Leaning and I

298 27 10MB Read more

Performance Evaluation Methodologies and Tools: 15th EAI International Conference, VALUETOOLS 2022, Virtual Event, November 2022, Proceedings 3031312333, 9783031312335

This book constitutes the refereed conference proceedings of the 15th International Conference on Performance Evaluation

229 63 15MB Read more