Computational Intelligence in Software Modeling 9783110709247, 9783110705430

Researchers, academicians and professionals expone in this book their research in the application of intelligent computi

250 67 60MB

English Pages 216 Year 2022

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Computational Intelligence in Software Modeling 9783110709247, 9783110705430

Researchers, academicians and professionals expone in this book their research in the application of intelligent computi

466 98 5MB Read more

Advances in computational intelligence techniques 9789811526190, 9789811526206

1,344 151 5MB Read more

Computational Intelligence in Sustainable Reliability Engineering 9781119865018

COMPUTATIONAL INTELLIGENCE IN SUBSTAINABLE RELIABILITY ENGINEERING The book is a comprehensive guide on how to apply com

633 72 22MB Read more

Computational Intelligence in Digital Pedagogy 9789811587436, 9789811587443

This book is a useful guide for the teaching fraternity, administrators and education technology professionals to make g

528 66 4MB Read more

Reliability Engineering and Computational Intelligence (Studies in Computational Intelligence, 976) [1st ed. 2021] 3030745554, 9783030745554

Computational intelligence is rapidly becoming an essential part of reliability engineering. This book offers a wide spe

1,040 128 8MB Read more

Convergence of IoT, Blockchain, and Computational Intelligence in Smart Cities (Computational Intelligence Techniques) 1032404248, 9781032404240

This edited book presents an insight for modelling, procuring, and building the smart city plan using the Internet of Th

141 16 21MB Read more

Computational Intelligence in Manufacturing Handbook 0849305926, 9780849305924

Despite the large volume of publications devoted to neural networks, fuzzy logic, and evolutionary programming, few addr

525 121 10MB Read more

Artificial Intelligence for Computational Modeling of the Heart [1 ed.] 012817594X, 9780128175941

Artificial Intelligence for Computational Modeling of the Heart presents recent research developments towards streamline

732 119 15MB Read more

Springer Handbook of Computational Intelligence

The Springer Handbook for Computational Intelligence is the first book covering the basics, the state-of-the-art and imp

1,060 229 93MB Read more

Computational Intelligence Techniques and Their Applications to Software Engineering Problems 9780367529741, 9781003079996, 0367529742

Computational Intelligence Techniques and Their Applications to Software Engineering Problemsfocuses on computational in

1,837 197 8MB Read more

Computational Intelligence in Software Modeling
9783110709247, 9783110705430

Author / Uploaded
Vishal Jain (editor)
Jyotir Moy Chatterjee (editor)
Ankita Bansal (editor)
Utku Kose (editor)
Abha Jain (editor)

Table of contents :
Preface
Acknowledgment
Contents
Editors’ profile
Revolutionary transformations in twentieth century: making AI-assisted software development
Useful techniques and applications of computational intelligence
Machine learning-based attribute value search technique software component retrieval
Use of fuzzy logic approach for software quality evaluation in Agile software development environment
Metaheuristics for empirical software measurements
Use of genetic algorithms in software testing models
Insights into DevOps automation tools employed at different stages of software development
Study of computational techniques to deal with ambiguity in SRS documents
Utilization of images in an open source software to detect COVID-19
Designing of a tool for comparing and analyzing different test suites of open source software
Decision tree–based improved software fault prediction: a computational intelligence approach
Performance analysis for SDN POX and open daylight controller using network emulator Mininet under DDoS attack
Index

Citation preview

Vishal Jain, Jyotir Moy Chatterjee, Ankita Bansal, Utku Kose, Abha Jain (Eds.) Computational Intelligence in Software Modeling

De Gruyter Frontiers in Computational Intelligence

Edited by Siddhartha Bhattacharyya

Volume 13

Computational Intelligence in Software Modeling Edited by Vishal Jain, Jyotir Moy Chatterjee, Ankita Bansal, Utku Kose, Abha Jain

Editors Dr. Vishal Jain Sharda University Knowledge Park III 32, 34 APJ Abdul Kalam Rd Greater Noida 201310 Uttar Pradesh India [email protected] Jyotir Moy Chatterjee Lord Buddha Education Foundation Kathmandu 44600 Nepal [email protected] Dr. Ankita Bansal Netaji Subhas University of Technology (NSUT) Azad Hind Fauj Marg New Delhi 110078 India [email protected]

Dr. Utku Kose Faculty of Engineering, Department of Computer Engineering Süleyman Demirel University West Campus 32260 Isparta Turkey [email protected] Dr. Abha Jain Deen Dayal Upadhyaya College University of Delhi Dwarka, sector-3, New Delhi 110078 India [email protected]

ISBN 978-3-11-070543-0 e-ISBN (PDF) 978-3-11-070924-7 e-ISBN (EPUB) 978-3-11-070934-6 ISSN 2512-8868 Library of Congress Control Number: 2021948048 Bibliographic information published by the Deutsche Nationalbibliothek The Deutsche Nationalbibliothek lists this publication in the Deutsche Nationalbibliografie; detailed bibliographic data are available on the Internet at http://dnb.dnb.de. © 2022 Walter de Gruyter GmbH, Berlin/Boston Cover image: shulz/E+/getty images Typesetting: Integra Software Services Pvt. Ltd. Printing and binding: CPI books GmbH, Leck www.degruyter.com

Preface In recent era, software is being used in every phase of our lives. We cannot imagine our lives without software. The consumers demand for bug-free, good-quality software with high reliability. Software passes through a life cycle known as software development life cycle before it can be deployed or released in the market for the consumers. Software development life cycle consists of a number of steps starting from requirement elicitation, analysis, and then documentation, followed by design and finally testing. In each of these phases, a number of activities or tasks are identified which when conducted efficiently lead to a good-quality software. However, ever-changing customer requirement and complex nature of software make these tasks more challenging, costly, and error-prone. To overcome these challenges, we need to explore computational intelligence techniques for different software engineering tasks. Computational techniques such as optimization techniques, meta-heuristic algorithms, and machine learning approaches constitute different types of intelligent behavior. Industries are using these intelligent techniques and have been successful in addressing various software engineering problems, for example, code generation, code recommendation, and bug fix and repair. Especially, artificial intelligence (AI) can be leveraged to enhance software quality assurance efficiently: it can accelerate manual testing and remove human errors that may exist in manual testing. The use of computational techniques leads to the development of good-quality software in terms of high reliability and efficiency which make the consumers/customers satisfied and happy. On the other hand, if the customers are not satisfied with the performance of software, probably they will not use it, and this in turn leads to wastage of large amount of resources spent by the software organizations on that software. Thus, we should employ various computational intelligence techniques at every phase of software development life cycle to develop good-quality software. The aim of this book is to focus on the application of computational intelligence techniques in the domain of software engineering. In this book, researchers and academicians have contributed theoretical research articles and practical applications in the area of software engineering and intelligent techniques. This book would be primarily useful and will be an asset for the researchers working on computational intelligence techniques in the field of software engineering. Besides this, since this book provides a deep insight of the topic from diversified sources, the beginning and intermediary researchers working in this area would find this book to be highly beneficial. Chapter 1 focuses on each phase of the software development life cycle along with the AI conjunction. Chapter 2 provides a review of computational intelligence methods and their uses, especially the following four, which are neural networks, swarm intelligence, artificial immune systems, and fuzzy systems. Chapter 3 discusses the storing and retrieving of the components.

https://doi.org/10.1515/9783110709247-202

VI

Preface

Chapter 4 focuses on the approach of using soft computing techniques for software quality evaluation and also gives information about major software quality evaluation parameters in agile methods. Chapter 5 proposes a novel meta-heuristic technique for feature selection. It brings more clarity for devising new methods. Then the chapter is concluded with special remarks on future scope and references made. Chapter 6 presents the applications of genetic algorithms in two major software testing models: white box testing and black box testing. Chapter 7 presents the review of different automation tools adopted by DevOps at each and every stage of software development. Chapter 8 discusses some of the noteworthy works done by the researchers to provide an ambiguity less software requirements specification expressed in natural languages by using inspection techniques, checklists, controlled languages, and natural language processing tools. Chapter 9 conducts comparison between the models of convolutional neural network and support vector machine using different performance measures. Chapter 10 explores the working of compilers and interpreters to create a tool which could compile various codes and tell the total number of effective lines that were executed. Chapter 11 investigates the applications of computational intelligence in optimizing various phases of software development. Further, the application of decision tree regression for improving fault percentage prediction in different scenarios is the main contribution of this chapter. Chapter 12 discusses the performance analysis for SDNPOX and open daylight controller using network emulator Mininet under distributed denial-of-service attack.

Acknowledgment I would like to acknowledge the most important people in my life: my late grandfather Shri Gopal Chatterjee, late grandmother Smt. Subhankori Chatterjee, late mother Nomita Chatterjee, uncle Shri. Moni Moy Chatterjee, and father Shri. Aloke Moy Chatterjee. This book has been my long-cherished dream which would not have been turned into reality without the support and love of these amazing people. They have continuously encouraged me despite my failing to give them the proper time and attention. I am also grateful to my friends, who have encouraged and blessed this work with their unconditional love and patience. Jyotir Moy Chatterjee Department of IT Lord Buddha Education Foundation (Asia Pacific University of Technology & Innovation) Kathmandu 44600, Nepal

https://doi.org/10.1515/9783110709247-203

Contents Preface

V

Acknowledgment Editors’ profile

VII XI

Binayak Parashar, Inderjeet Kaur, Anupama Sharma, Pratima Singh, Deepti Mishra Revolutionary transformations in twentieth century: making AI-assisted software development 1 Sujit Kumar, M. Balamurugan, Vikramaditya Dave, Bhivraj Suthar Useful techniques and applications of computational intelligence

19

Dimple Nagpal, S. N. Panda, Priyadarshini A. Pattanaik Machine learning-based attribute value search technique software component retrieval 33 Ashish Agrawal, Anju Khandelwal, Shruti Agarwal Use of fuzzy logic approach for software quality evaluation in Agile software development environment 55 Somya Goyal Metaheuristics for empirical software measurements

67

Srilakshmi Venugopal, Tarun Praveen Purohit, Ajay Sudhir Bale, Suhaas Veera Raghavan Reddy Use of genetic algorithms in software testing models 81 Poonam Narang, Pooja Mittal, Preeti Gulia, Balkrishan Insights into DevOps automation tools employed at different stages of software development 93 Mohd Shahid Husain Study of computational techniques to deal with ambiguity in SRS documents 107

X

Contents

Ankita Bansal, Abha Jain Utilization of images in an open source software to detect COVID-19

121

Abha Jain, Ankita Bansal Designing of a tool for comparing and analyzing different test suites of open source software 143 Palak, Preeti Gulia Decision tree–based improved software fault prediction: a computational intelligence approach 163 Sameer Ali, Muhammad Reazul Haque, Kashif Nisar, Ramani Kannan, Tanveer Ahmed Khan, Khalil-Ur-Rehman Shaikh, Basit Ali Performance analysis for SDN POX and open daylight controller using network emulator Mininet under DDoS attack 177 Index

199

Editors’ profile Vishal Jain, PhD, is an associate professor in the Department of Computer Science and Engineering, School of Engineering and Technology, Sharda University, Greater Noida, Uttar Pradesh, India. Before that, he has worked for several years as an associate professor at Bharati Vidyapeeth’s Institute of Computer Applications and Management (BVICAM), New Delhi. He has more than 15 years of experience in the academics. He has more than 400 research citation indices with Google Scholar (h-index score 10 and i-12 index 16). He has authored more than 85 research papers in reputed conferences and journals, including the Web of Science and Scopus. He has authored and edited more than 15 books with various reputed publishers, including Elsevier, Springer, De Gruyter, Apple Academic Press, CRC, Taylor and Francis Group, Scrivener, Wiley, Emerald, and IGI-Global. His research areas include information retrieval, semantic web, ontology engineering, data mining, ad hoc networks, and sensor networks. He received the Young Active Member Award for the year 2012–2013 from the Computer Society of India, and the Best Faculty Award for the year 2017 and Best Researcher Award for the year 2019 from BVICAM, New Delhi. He holds a PhD (CSE), MTech (CSE), MBA (HR), MCA, MCP, and CCNA. Jyotir Moy Chatterjee is an assistant professor in the Information Technology Department at Lord Buddha Education Foundation (LBEF), Kathmandu, Nepal. Prior to joining LBEF, he worked as an assistant professor in the Computer Science and Engineering Department at GD Rungta College of Engineering and Technology (CSVTU), Bhilai, India. He received his MTech in computer science and engineering from Kalinga Institute of Industrial Technology, Bhubaneswar, Odisha, and a BTech in computer science and engineering from Dr MGR Educational and Research Institute, Chennai. His research interests include cloud computing, big data, privacy preservation, data mining, internet of things, machine learning, and blockchain technology. He is member of various national and international professional societies, including STRA, IFERP, ASR, IRSS, IAA, MEACSE, MIAE, IRED, IAOIP, ICSES, SDIWC, ISRD, IS, SEI, IARA, and CCRI. He is serving as an editorial board member of various reputed journals of IGI Global and as a reviewer for various reputed journals and international conferences of Elsevier, Springer, and IEEE. Ankita Bansal is an assistant professor at Netaji Subhas University of Technology (NSUT), Delhi, India. Prior to joining NSUT, Dr Bansal worked as a full-time research scholar at Delhi Technological University (DTU) (formerly Delhi College of Engineering). She received her master’s and doctoral degrees in computer science from DTU. Her research interests include software quality, soft computing, database management, machine learning, and meta-heuristic models. Abha Jain is an assistant professor at Deen Dayal Upadhyaya College, Delhi University, India. Prior to joining the college, she worked as a full-time research scholar and received a doctoral research fellowship from DTU. She received her master’s and doctoral degrees in software engineering from DTU. Her research interests include data mining, software quality, and statistical and machine learning models. She has published papers in international journals and conferences. Utku Kose received his BS in 2008 for computer education from Gazi University, Turkey, as a faculty valedictorian. He received his MS in 2010 from Afyon Kocatepe University, Turkey, in the field of computer and DS/PhD in 2017 from Selcuk University, Turkey, in the field of computer engineering. Between 2009 and 2011, he has worked as a research assistant in Afyon Kocatepe University. Following, he has also worked as a lecturer and vice director in Vocational School, Afyon Kocatepe University, between 2011 and 2012, as a lecturer and research center director in Usak University between 2012 and 2017, and as an assistant professor in Suleyman Demirel University between 2017 and 2019. Currently, he is an associate professor in Suleyman Demirel University, Turkey. He has

https://doi.org/10.1515/9783110709247-205

XII

Editors’ profile

more than 100 publications, including articles, authored and edited books, proceedings, and reports. He is also in editorial boards of many scientific journals and serves as one of the editors of the Biomedical and Robotics Healthcare book series by CRC Press. His research interests include artificial intelligence, machine ethics, artificial intelligence, safety, optimization, the chaos theory, distance education, e-learning, computer education, and computer science.

Binayak Parashar, Inderjeet Kaur, Anupama Sharma, Pratima Singh, Deepti Mishra

Revolutionary transformations in twentieth century: making AI-assisted software development Abstract: Artificial intelligence (AI) has a massive impact on all industries and digital advances. Software development is also at the forefront of receiving this massive influence. According to industry experts, AI and machine learning technologies are anticipated to enhance every area of the software development life cycle (SDLC). AI and software engineering are two disciplines that have developed independently and one after the other. AI techniques strive to construct software systems that inculcate some sort of human intelligence in software. Software inspections have been used successfully to detect faults in various types of software documents, such as specifications. The study broadly inculcates the involvement of human intelligence in software and trains it to think and analyze like humans. Nowadays, use of AI assists practitioners in a variety of ways, from project timetable prediction to software delivery estimation, bug resolving, coding, and testing. This chapter focuses on each phase of the SDLC along with the AI conjunction. Keywords: AI, requirement engineering, SDLC, software design, software testing, distributed artificial intelligence (DAI), agent-oriented software engineering (AOSE)

1 Introduction Artificial intelligence (AI) has tremendous impact on retail, banking, healthcare, and a variety of other businesses all around the world, and in future AI market is anticipated to be worth $60 million worldwide by 2025 [1]. Software is seen at the core of all innovations in our daily lives. Software development technologies have seen a significant Binayak Parashar, Assistant Professor, Department of CSE, Ajay Kumar Garg Engineering College, Ghaziabad, Uttar Pradesh, India, e-mail: [email protected] Inderjeet Kaur, Professor, Department of CSE, Ajay Kumar Garg Engineering College, Ghaziabad, Uttar Pradesh, India, e-mail: [email protected] Anupama Sharma, Associate Professor, Department of IT, Ajay Kumar Garg Engineering College, Ghaziabad, Uttar Pradesh, India, e-mail: [email protected] Pratima Singh, Associate Professor, Department of CSE, Ajay Kumar Garg Engineering College, Ghaziabad, Uttar Pradesh, India, e-mail: [email protected] Deepti Mishra, Associate Professor, Department of CS, Norwegian University of Science and Technology, Norway, e-mail: [email protected] https://doi.org/10.1515/9783110709247-001

2

Binayak Parashar et al.

transition in recent years, with the goal of improving human lives. The impact of AI on software development manages effective improvements in terms of precision, speed, and efficiency for entire software development life cycle (SDLC). Focusing on design and feature development using AI, customization is made feasible for the software development services on large scale for developing error-free applications on time. All software organizations use the most cutting-edge technologies for software development. AI in software development is expected to significantly improve problem identification, plan documentation, making a software prototype, code development stage, automated testing, and entire software product. In order to make software development easier, faster, and more reliable, most expressive AI tools are used [13]. The use of various business intelligence (BI) tools or through machine learning (ML) assists decision-makers in problem identification. The BI dashboard provides access to business data, which is shown as key performance indicator cards with performance values for various business indicators. Alternatively, using ML, the algorithm is subsequently analyzed providing insights for problem identification [5]. Figure 1 shows the ML life cycle to illustrate the use of AI in various phases [23].

Figure 1: AI in various phases.

ML can be used to automate requirement gathering based on cost, resources, and so on. The machine is fed with input data from the first step. In the next step, system works for remainder portion and projects the value for each benchmark [12, 13]. The resources are allocated in accordance with the measurement of the employee behavior analysis and their abilities, and accordingly timeline is prepared for every resource seeking the skills and past abilities with the company [15]. A software prototype is produced after the plan has been authorized by the relevant team in accordance with software development principles. During the software

Revolutionary transformations in twentieth century

3

development, the conceptual model is contemplated as the first phase, and the software planning is constructed further during the next stage. The first phase output is further examined as per the predetermined calculated values. The final output is foreseen after comparing the results of software prototype and historic data [3, 16]. The coding structure is created for all the required functionalities in a programming environment with ML capabilities. The programs like Kite and Codota support redundancy check platform [17]. Depending upon the historical analysis and prior projects, the testing team and developers feed the computer with problems or faults. The machine understands the pattern, recognizes these faults, and corrects them automatically or provides autosuggestions so that the workload is reduced. Using ML process for the deployment status, customers’ feedback is gathered and analyzed to provide new suggestions for developers. This code can be further optimized with recommended human participation known as hybrid intelligence [17]. The software company aids the customer with product application, provides frequent upgrades, and makes additional adjustments based on the client’s needs during the maintenance phase. In this phase, AI helps to identify and direct software users based on their query patterns. This chapter describes how AI aids software development and the best AI tools for optimizing this intelligent technology used in entire SDLC phases.

2 The significance of AI In today’s twentieth century, the organizations are becoming more interested in AI technologies. It is foreseen that the AI will undoubtedly affect the future of software development. According to the current estimates, almost 85% of organizations are investing in AI, and 45% of digitized companies are developing AI plans. AI tools are expected to provide almost $3 trillion in corporate value in the future. The key significance of AI in the software development is to provide personalized products or services to their customers depending on various applications [1, 16]. Figure 2 illustrates the applications of AI in various domains. AI is not a one-size-fits-all solution. AI’s enormous value is derived from a variety of AI capabilities, and the proposed AI solution may require multiple of them. Various AI capabilities as per application requirements are listed as follows: a. ML b. Natural language processing (NLP) c. Expert systems. d. Vision e. Speech

4

Binayak Parashar et al.

Figure 2: AI applications.

AI may enhance each phase of SDLC, Figure 3 shows the phase where AI may be applied.

Figure 3: SDLC using AI.

Revolutionary transformations in twentieth century

5

3 AI in requirement engineering Requirement engineering is the foremost phase in whole SDLC. It is the base of successful project, whereas poor requirement engineering may lead to delay or some time failure of project. If requirements are leading to misunderstanding, it requires unnecessary discussions, sometimes interpretation may also wrong, at last lots of effort will waste, design and development time will increase; hence, cost is higher. Companies may have to face fatal consequences due to delayed or failed projects in [6] terms of respect, revenue, and so on. Today’s competitive market makes this task more complex as data and requirement are becoming dynamic and increasing day by day. Now, it is the time to improve the engineering process, hence, to face the challenge of current scenario as well as to be prepared for future. AI may provide strength to this engineering process while reducing the redundancy and inconsistency [8]. It may help companies authoring the requirements in a better way, and real-time response may also be incorporated which improves the correctness. AI is a technique to incorporate the intelligent behavior to any process, and thus deal with automation [26] through ML or some other techniques. Various researchers and practitioners are working in this field to improve the AI support in requirement engineering. Feldt et al. [10] proposed the AI in SE Application Levels (AI-SEAL) taxonomy to systematize the AI applications in whole software engineering cycle. This taxonomy works on three facets and allows users to systematize the process, product, and runtime. The proposed AI-SEAL taxonomy also gives a base for software developers to analyze the risk associated with AI application. Hence, researchers and practitioners may analyze the trade-off of applying AI techniques in different SDLC phases. AISEAL is a generic proposal which is not limited to domain specification and includes all area of software engineering. Yaqoob et al. [36] investigated previous research in the field of self-driving cars and the current uses of AI for the improvement in the safety of self-driving cars. Authors analyzed that the techniques of edge computing, AI, deep learning, and data analytics are the most important to assist in this technology. They also discussed the current challenges faced by autotomized cars and the deployment issues. Many case studies related to the topic have been discussed in detail to identify the future research scope. Binkhonain and Zhao [5] summarized a systematic review of 24 current MLbased proposals. The basic focus of authors to use AI techniques for the classification of nonfunctional requirements (NFR) is that how NFRs can be collected and evaluated through AI techniques. Authors found out the state of ML approaches to assist the requirement engineering and tried to improve the same. AI assists the requirement engineering in many ways (Figure 4).

6

Binayak Parashar et al.

Figure 4: AI is a way to assist requirement engineering. Requirement transformation: It is a very important part of Requirement Engineering (RE) as during the requirement transformation practitioners need to identify requirement conflicts. Successful deployment depends directly on clear and concise requirements. Inconsistency in requirement leads to serious issues; hence, interactions and dependencies between different requirements should be correctly analyzed and evaluated. AI techniques that address the requirement conflicts during transformation are discussed in [2]. Knowledge-based process: Such processes are based on rule-based systems to represent knowledge in knowledge base. Researchers [11] tried to figure out this issue with the help of AI tools and techniques. Automated computing: There are many automated computing tools available nowadays to manage requirements in requirement engineering phase. Such tools are modern requirements, Jama software, Visure, Spira Team, and so on, which may automatically manage the requirements and generate requirement-related documents too. Requirement evaluation process: The above automated tools are also useful in requirement evaluation and analysis process. Hence, AI can directly support the requirement evaluation process. Intelligent computing process: Current intelligent computing techniques are fog computing, fuzzy-based computing, genetic algorithm-based computing, and so on. These intelligent computing techniques assist the current dynamic requirement engineering process.

Revolutionary transformations in twentieth century

7

4 AI in software design Software planning and designing requires comprehensive learning experience to develop an effective solution. Design and its integration at each step are an erroneous approach if done manually. AI techniques help design software correctly and quickly without any error. Sometimes clients require dynamic changes before finalizing the design to get the desirable outputs. Automation of the designing process using AI tools is capable to design the project efficiently. AIDA (AI design assistant) is a website incorporation software tool which helps designers to understand the client expectations more quickly and correctly. The power of AI and its future is known to all practitioners. Software designers are also a part of those practitioners. Automated or AI-oriented software design has many benefits, such as it reduces manual process of designing, and it can produce multiple versions of design parallelly so that one may select the best one. AI-oriented software design is inspired through several previous software designs [14]. Uizard, Khroma, Let’s Enhance, Remove.bg, Auto Draw, Fronty, Uibot, and Rosebud AI are some of the AI-oriented designing tools. In July 2021, Heier [24] proposed a solution to the design challenges imposed by AI. This work may transform the software design method incorporation with AI. Mriganka Shekhar Sarmah et al. [34] have written about the importance of AI at the software design phase. AI-based tools such as BigML, IBM Watson, and OpeNN, assisting software design and development are discussed in the next section in detail [7, 9].

5 AI in the code development The role of software developers has been reshaped by AI, and it may become very different in the next decade. It is expected that after a few years, AI system will be able to write code on their own as per input instructions for various applications. The role of developers would be to execute different set of activities and build abilities to work effectively with AI. AI has the potential to automate repetitive, easy operations, thereby allowing developers to focus on more sophisticated tasks. It will strengthen and uplift the software development process [18–20]. Practically, viewing the business idea and turning it into code for a large enterprise is time-consuming and labor-intensive. To utilize time and money, presently some developers are turning to a system that allows them to create code before beginning development. It could be a time-consuming task before feasibility check and requirement analysis. To alleviate these burdens, AI-based support helps by automating code generation and detecting faults in the code. AI-enabled coding tools like Tara, Kite (supports 13 languages and 16 editors), and Deep Tab Nine are one of the most recent developments in AI software development [21, 22]. To enhance coding speed and

8

Binayak Parashar et al.

accuracy, these tools essentially automate the software development process. These technologies have the potential to democratize development by allowing developers to devote more time to it. To automate and evaluate the software design phase of SDLC, various AI design and development tools are available in the market as follows: i. AI platforms such as Microsoft Azure and Google Cloud ii. Chatbots, for example, IBM Watson Developer iii. Deep learning software such as big ML iv. ML software such as Infosys Nia resources and Tensorflow Microsoft Azure and Google Cloud are the web services that allow us to install models both on premises and in the cloud. The drag-and-drop feature makes easier to scale and does not necessitate high-level programming knowledge. Chatbot by IBM Watson is a question-answering system that has Apache Hadoop framework, and is used to support the SUSE Linux Enterprise Server 11 operating system. This chatbot provides data analysis using NLP and assists companies in making more informed and goal-oriented decisions [4]. Similarly, TensorFlow is an open-source ML software library allowing users to create applications in Python or C++. It is free to download and includes APIs for beginners to make application creation easier. H2O.ai, a platform for deep learning software, analyzes detailed data for business functionalities and generates possible solutions to gaps and risk factors. Models are built using R and Python programming languages. BigML and AutoML engine, a platform for deep learning software, allows to deploy trained models in different environments. Cortana by Microsoft serves as a virtual assistant or personal productivity helper. It works with the Bing search engine and can assist users with inquiries, reminders, and other tasks in almost all operating systems such as Windows, iOS, Android, and Xbox. Virtual assistants like Amazon Alexa and Google Assistant are very popular. Alexa works with iOS, Fire OS, and Android and is compatible with cameras, electrical gadgets, and entertainment systems. Google Assistant works with KaiOS and allows for two-way communication. Both platforms support multiple languages. Among other prominent AI tools such as AIDA is a website design tool for analyses, combining numerous software combinations to create a requirement meeting solution. Figure 5 is a showcase of various AI tools, whereas Figure 6 depicts a comparison chart of various AI tools.

5.1 Implementing decision tree algorithm using python A sample dataset “user_data.csv” is used in previous classification models. By using this dataset, the comparison is made using decision tree classifier with other

Revolutionary transformations in twentieth century

9

Figure 5: Open-source AI tools used in various applications.

classification models such as k-nearest neighbor, support vector machine, and logistic regression. The following snapshot shows the pre-processing step: a. Importing libraries: Figure 7 shows the instructions to import the libraries in python. b. Importing datasets and extracting independent and dependent variable: Dataset and variables are described in Figure 8. c. Feature selection: feature selection and transformation have been shown in Figure 9. d. Correlation features for prediction: Figure 10 depicts the correlation matrix. e. Building, training, and testing the model The classifier is trained according to the data set, and by using the pattern, the results of the related input conditions can be predicted using decision tree classifier algorithm (in our case), random forest classifier algorithm, logistic regression algorithm, and selecting the best algorithm suited for data distribution and analysis. After building the model, it is validated by using certain values that are unused and not exposed to the desired model. In our case, data is divided into two parts using a class provided by Scikit-learn having a distribution of 85–15. The model is trained on 78.89% of data and validated against the other 21.11% of the data [33].

10

Binayak Parashar et al.

Figure 6: Comparison chart of various AI software development tools.

Figure 7: Importing libraries in Python.

Revolutionary transformations in twentieth century

11

Figure 8: Describing dataset and variables.

Figure 9: Feature selection and transformation.

Figure 10: Correlation matrix.

6 AI in software testing Software testing and AI can be fused in different perspectives. There is a huge potential of seamless integration of the two. Figure 11 is the graph which shows the extent of gap that is filled by AI/ML toward software testing [3, 35].

12

Binayak Parashar et al.

Complexity/Coverage

Software Complexity

AI-Driven Test Coverage

Conventional Test Coverage

Time Figure 11: Software complexity gap.

There may be several points where the two can be fused together very naturally. These can be identified as three major streams/directions such as: a. Software testing using AI-based tools b. Testing an AI system c. Other challenges of testing

6.1 Software testing using AI-based tools It is also known as AI-driven testing. There are numerous AI-based tools available in the market to perform testing. These tools are very accurate and efficient on ample data (in terms of test cases) or repetitive rerun of code. Moreover, usage of these tools has completely drifted the testing industry in terms of automation. Several test automation tools are listed as follows: Applitools: This is an AI-powered visual testing and monitoring platform that helps to increase test coverage and reduce maintenance [3]. Functionize: smart test automation allows to create end-to-end tests in minutes. Tests are powered by AI/ML to run on any browser. Sauce labs: It provides a comprehensive test infrastructure for automated and manual testing of desktop and mobile applications. It allows users to conduct tests in the cloud on a variety of browser platforms, operating systems, and devices. Testim: Testim.io is a cloud platform that uses AI for fast authoring, execution, and maintenance of automated tests, without deep knowledge of coding language.

Revolutionary transformations in twentieth century

13

6.2 Testing an AI system Testing an AI system has various challenges that include [27]: Scarcity in test data: Presently, testing uses only a small portion of accessible data, which is frequently insufficient to detect errors. A 70:30 training and testing data rule results in very scanty data available for rich testing experience. This scarcity of data precludes to the problem of “what-if” testing [27]. Non-interpretability of result: It takes lot of efforts and guess work or justification to explain the result. Phase of access for testing: Depending on the stage at which the testing is being done on the AI pipeline, tester may get to take up black box or white box testing of the model.

6.3 Other challenges of testing an AI system [25] Massive volumes of collected sensor data presents storage and analytic challenges in addition to creating noisy datasets [25]. AI model test scenarios should be equipped to identify and remove human bias which often becomes part of training and testing datasets. AI TESTING [3] and CHECKLIST [31] are two popular tools for testing ML and neural network-based models.

7 AI in software maintenance In comparison to software testing, there is lot of AI-based automation possible in maintenance phase due to several parameters which can be used to predict the maintainability of the product. There are three essential approaches to software maintenance: Reactive maintenance (RM): Here, equipment is run to the point of failure and then fixed or replaced since it is an RM. AI- and ML-based algorithm may not be very useful in handling RM which has nothing to do with early prediction. It is the corrective measure taken after the failure. Preventative maintenance (PM): PM seeks to anticipate wear and damage by scheduling overhauls based on the time passed and informed by prior knowledge of tolerance limits.

14

Binayak Parashar et al.

Predictive maintenance (PdM, i.e., “condition-based maintenance’): PdM is the capacity to potential issues before any disruptions occur in operations, processes, services, or systems by analyzing large amounts of data [28–30].

8 Case studies 8.1 AI at the stage of software implementation a. Software implementation refers to the software application’s actual coding procedure. b. Neural networks can be used to aid software coding as part of a software implementation. c. Contingent vectors can be applied to the gathered data. d. Auto encoders can reduce feature sets by using various codes that have been trained to use specific elements to achieve desired data set characteristics, such as dimensionality reduction, redundant information reduction, or the selection and classification of specific classes or categories.

8.2 AI in software maintenance In the case of maintenance part, the product developer assists the customer in the following points: a) Product application b) Provides upgradation of the application on a regular basis c) Makes further adjustments upon client requirement The way AI helps in this phase: d) Assists with software maintenance and updates e) Pattern recognition is a characteristic used in AI to extract coherent codes f) Software modernization is also aided by AI g) AI detects and simulates ambush patterns to pinpoint security flaws, vulnerabilities, and errors

9 Transformation in software development using AI Regular software can easily handle all the basic key interfaces, security issues, and their lookups. Intelligent decisions are made by collecting, categorizing, and analyzing the data using ML (subset of AI). The AI method can help SDLC in the following ways:

Revolutionary transformations in twentieth century

15

Intelligent assistants: Developers can save time by using intelligent programming assistants to provide in-the-moment guidance and recommendations such as best practices, code examples, and relevant documents. Quick prototype: A lot of work and planning is required to turn any business requirements into technology solutions. AI reduces extra effort and helps in doing planning, which in turn converts any business requirements into technology solutions. Error handling and automatic analytics: ML aids in faster system logs, enabling proactive flagging of issues and error-free answers. Real-time customer feedback: ML algorithms allow variable content and dynamic software experience for real-time user interaction by changing the size of font and buttons automatically. This improved functionality is combined with a continuous acknowledgment from the users, in order to reduce abandoned cart rates by smoothing down friction spots [32, 33].

10 Conclusion Present era is the digital era; hence, software is one of the world’s fastest growing sectors. It relies on problem-solving skills based on human knowledge and experiences. Human intelligence can be incorporated with software development through AI techniques. As a result, AI, expert systems, and knowledge engineering play a key role in automating a variety of software development processes. The development community is adopting AI techniques to develop scalable, secure, and distinctive applications to provide simple and efficient solutions. AI-based software solutions can be developed in a quick way and found to be more innovative. The interaction between AI and software engineering adds on complementary abilities in produced software. The results of the survey presented in this study have revealed some trends in the applications of AI to the software development process. Furthermore, there is a lot of room for experimenting with and assessing the usage of various AI techniques in the software automation process.

References [1] [2]

Barr, A., Feigenbaum, E. A., 1981, The Handbook of Artificial Intelligence, CA, USA, HeirisTech Press. Aldekhail, M., Chikh, A., Ziani, D., 2016, Software requirements conflict identification: Review and recommendations, International Journal of Advanced Computer Science Appl (IJACSA), 7(10), 326.

16

[3] [4] [5]

[6] [7]

[8] [10]

[9]

[11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24]

[25] [26]

Binayak Parashar et al.

Aggarwal, A., Shaikh, S., Hans, S., Haldar, S., Ananthanarayanan, R., Saha, D., Testing Framework for Black-box AI Models (IBM Research AI) arXiv:2102.06166v1,[cs.LG] 11 Feb 2021. Arinze, B., Partovi, F. Y. A knowledge-based decision support system for project management. Binkhonain, M., Zhao, L., 2019, A review of machine learning algorithms for identification and classification of non-functional requirements, Expert Systems with Applications, X(1), 100001. Caldas, C. H., Soibelman, L., Han, J., 2002, Automated classification of construction project documents, Journal of Computing in Civil Engineering, 16, 234–243. Clifton, D. A., Gibbons, J., Davies, J., Tarassenko, L. 2012. Machine learning and software engineering in health informatics. In 1st International Workshop on Realizing Artificial Intelligence Synergies in Software Engineering (RAISE ’12). IEEE Press, NJ, USA, 37–41. https://doi.org/10.1109/RAISE.2012.6227968 Dalpiaz, F., Niu, N., July-Aug 2020, Requirements engineering in the days of artificial intelligence, IEEE Software, 37(4), 7–10, doi: 10.1109/MS.2020.2986047. Feldt, R., De Oliveira Neto, F. G., Torkar, R., May 2018, Ways of applying artificial intelligence in software engineering. In 2018 IEEE/ACM 6th International Workshop on Realizing Artificial Intelligence Synergies in Software Engineering (RAISE), IEEE, 35–41. Meziane, F., Vadera, S., Artificial Intelligence Applications for Improved Software Engineering Development: New Prospects, Information science reference, IGI Global ISBN 978-1-60566759-1 Grosan, C., Abraham, A., 2011, Rule-based expert systems, In: Intelligent Systems, Berlin, Heidelberg, Springer, 149–185. https://bigdata-madesimple.com/how-artificial-intelligence-can-improve-softwaredevelopment-process/ https://www.the-next-tech.com/artificial-intelligence/the-transformation-stage-of-softwaredevelopment-applying-ai-in-sdlc-process/ https://www.devteam.space/blog/ai-development-life-cycle-explained/ https://learn.g2.com/ai-in-software-development https://medium.com/@xtine415_74841/how-to-manage-ai-projects-the-activities-duringsdlc-phases-7dace2147fcb https://www.tiempodev.com/blog/7-things-software-leaders-should-know-about-artificialintelligence/ https://bigdata-madesimple.com/how-artificial-intelligence-can-improve-softwaredevelopment-process/ https://learn.g2.com/ai-in-software-development/ https://www.tiempodev.com/blog/7-things-software-leaders-should-know-about-artificialintelligence/ https://bigdata-madesimple.com/how-artificial-intelligence-can-improve-softwaredevelopment-process/ https://learn.g2.com/ai-in-software-development Ammar Walid Abdelmoez, H. H., Salah Hamdi, M., 2012, Software Engineering Using Artificial Intelligence Techniques: Current State and Open Problems, ICCIT, 24–29. Heier, J., July 2021, Design Intelligence-Taking Further Steps Towards New Methods and Tools for Designing in the Age of AI, International Conference on Human-Computer Interaction, 202–215, Springer, Cham. Arbon, J., Arbon, J., 2016, AI for Software Testing, CEO Appdiff, Published on March 2. Kaur, K., Singh, P., Kaur, P., 2021, A review of artificial intelligence techniques for requirement engineering, In: Singh, V., Asari, V. K., Kumar, S., Patel, R. B., eds,

Revolutionary transformations in twentieth century

[27]

[28] [29] [30] [31] [32] [33]

[34]

[35] [36]

17

Computational Methods and Data Engineering. Advances in Intelligent Systems and Computing, Vol. 1257, Singapore, Springer, doi: https://doi.org/10.1007/978-981-15-79073_20. Kukkuru, M. G., Associate Vice President and Principal Research Analyst, Infosys Center for Emerging Technology Solutions: ”Testing AI Systems – How to Overcome Key Challenges, Infosys. Anderson, M., Predictive Maintenance Software and AI: How It Works – December 30, 2020, Iflexion. Martínez, D. M., Fernández-Rodríguez, J. C., 2015, Artificial intelligence applied to project success: A literature review, IJIMAI, 3, 77–84. Davahli, M. R., 2020, The Last State of Artificial Intelligence in Project Management, https:// arxiv.org/ftp/arxiv/papers/2012/2012.12262.pdf Prasanth, Evolution of test automation through artificial intelligence – cognitive QA, TV a Sr Test Architect at Happiest Minds. Feldt, R., De Oliveira Neto, F. G., Torkar, R., Ways of Applying Artificial Intelligence in Software Engineering, RAISE’18, May 2018, Gothenburg, Sweden. Schemelzer, R. Understanding Explainable AI Available online: https://www.forbes.com/ sites/cognitiveworld/2019/07/23/understanding-explainable-i/#406c97957c9e (accessed on Apr 10, 2020). Sarmah, M. S., Sumer, J. M., Bey, M., Sharma, B., 2021, A study on the state of artificial intelligence in the design phase of software engineering, In: Smart Computing Techniques and Applications, Singapore, Springer, 467–473. Tariq, M., Feb 2019, King, Automation Guild, Test, Machina, Demystifying AI-Driven Testing Bots. Yaqoob, I., Khan, L. U., Kazmi, S. A., Imran, M., Guizani, N., Hong, C. S., 2019, Autonomous driving cars in smart cities: Recent advances, requirements, and challenges, IEEE Network, 34(1), 174–181.

Sujit Kumar, M. Balamurugan, Vikramaditya Dave, Bhivraj Suthar

Useful techniques and applications of computational intelligence Abstract: The key component of computational intelligence is computational models and techniques that incorporate learning, adaptability, and/or heuristic optimization. Complex problems that conventional computer techniques cannot solve are handled with artificial intelligence. Neural networks, evolutionary computing, and fuzzy systems are the three main principles for creating artificial intelligence. These are some of the latest issues in machine learning and natural language processing, including artificial immune systems, support vector machines, rough sets, and chaotic systems. Computational intelligence techniques, including neural networks, fuzzy systems, and swarm intelligence, are reviewed in this chapter. Keywords: Neural Networks, Evolutionary Computing, Fuzzy Systems, Machine Learning, Swarm Intelligence

1 Introduction The environment and earth are dynamic. Changes occur constantly as a result of corporeal, organic, and biochemical procedures. For instance, there were enduring breakdowns of rocks and earths into their elemental components. Carbon dioxide, which makes rainfall somewhat acidic, is absorbed and dissolved by water droplets when it rains. The discharged silt and chemicals then go on to chemically alter the earth’s surface. Sedimentation may cause rocks and soils to be created from it. Biological processes include death, excretion, and degradation, among other processes. They recycle their constituents in the environment. Therefore, many organic, corporeal, and natural procedures interrelate with each other on the Earth and are challenging to describe and evaluate. Complex spatial and temporal dynamics are common to earth and environmental systems. The connections between these systems are often unclear and nonlinear. Many environmental issues have no solid theoretical knowledge; therefore, no

Sujit Kumar, Department of Electrical and Electronics Engineering, Jain (Deemed-to-be University), Bangalore, Karnataka 560041, India, e-mail: [email protected] M. Balamurugan, Department of Electrical and Electronics Engineering, Jain (Deemed-to-be University), Bangalore, Karnataka 560041, India Vikramaditya Dave, Department of Electrical Engineering, College of Technology and Engineering, Udaipur, Rajasthan 313001, India Bhivraj Suthar, Department of Mechatronics Engineering, Chungnam National University, Chungnam 305764, South Korea https://doi.org/10.1515/9783110709247-002

20

Sujit Kumar et al.

complete numerical models exist. Earth and ecological schemes have created the demand for operative and fast computational tools to handle extremely nonlinear functions and skilled on fresh data to simplify. The computational intelligence methods under development include the following: Numerical models cannot offer the fresh insights and underlying concepts that can be discovered using traditional statistical methods. Computational intelligence is a set of computational theories and technologies that use computational methods to unsolved issues [1, 2]. Artificial intelligence utilizes heuristic methods such as neural networks and fuzzy systems. Computational intelligence is characterized by the capability to learn and adapt to an ever-changing environment. Additionally, computational intelligence consists of four major pillars: artificial neural network (ANN), fuzzy systems, and additional approaches such as artificial immune systems (AIS), support vector machines, rough sets, confused systems, probabilistic approaches, and additional methods [3–6] and forecasting of air pollution [7]. This chapter provides a review of computational intelligence methods and their uses, especially the following four, which are neural networks, swarm intelligence, AIS, and fuzzy systems.

2 Artificial neural network 2.1 Simple ideologies ANNs were created to imitate organic neural networks. Neurons are the cells of the biological neural system. The cell body, an axon, and dendrites make up a neuron [1]. The axon of one neuron connects to the dendrite of another neuron. Synapse means contact between neurons. Environmental cues are sent to neurons. When the neuron is triggered, the cell evaluates all the incoming signals and generates an output signal that is then sent to all other linked neurons. Computational models of actual neurons do simple computation instead of performing advanced calculation [1]. An artificial neuron has an activation or transfer function as its cell body. Artificial neurons are connected to external sources, which are processed, and their combined total is used to compute an activation value. This value is then applied to the activation function. The activation function is a nonlinear function, and the output represents an axon, which is linked to another neuron through a synapse. An exponential, hyperbolic tangent, or linear activation function (e.g., a straight line) are used as activation functions in ANN network. ANNs are made up of multilayer ANNs, conveying ideas or information (Figure 1). Artificial neurons are called nodes. An artificial neuron layer is connected to the one above it through weights. The model may be seen as an extension of a nonlinear model. To facilitate accurate computation of highly nonlinear functions, an ANN superposes several basic nonlinear activation functions. The complexity of nonlinear network

Useful techniques and applications of computational intelligence

21

activity arises from this [8]. If you train the procedures with new data, they will be able to deal with it. Network adaptability is used in the training of the interconnecting weights. After the training process, the system is ready to be deployed. An ANN is an adaptive, nonlinear system that has the ability to learn to carry out a task using input data. ANNs including multilayer feed-forward, recurrent, temporal, probabilistic, fuzzy, and radial basis functions have been shown to exist [9]. Figure 1 shows a multilayer feed-forward neural network. One of the names for the one-layer multilayer perceptron network is the layered perceptron network.

Figure 1: Single hidden layer of an ANN [9].

ANN does not just produce nodes with connection weights and send them out as inputs, but instead scales the nodes and then sends them forward as inputs for the next layer of nodes. The data is circulated in the input layer before it reaches the hidden layers, where it is manipulated. The input layer is not processing information, but rather supplying it to the network. We train the connection weights to return them to their natural states. An activation function integrates all of the nodes of the preceding layer, connecting data that have passed through every layer before. A complex function of internal network activities is the output layer.

22

Sujit Kumar et al.

2.2 Utilizations An ANN has been used to provide weather and air quality predictions. Using multilayer perceptron networks, researchers in London calculated the hourly concentrations of nitrogen oxides (NOx) and nitrogen dioxide (NO2) in metropolitan air. There were hidden layers between 20 and 30 nodes in these neural networks. The pollution concentration was estimated by calculating the concentrations of five primary pollutants. Tangent activation function is the hyperbolic tangent [10]. To train their conjugate gradient network model, daily meteorological data, the five input variables were fed into the model each day. ANNs are better at understanding complex patterns of source emissions than regression models. Principal component analysis (PCA) and ANN techniques are used in forecasting hourly ozone concentrations in [11] by merging feed-forward ANNs. This equation reduces the nonlinearity of the data sets, and thus yields a simpler ANN architecture as a result. It reduces the mean squared error (MSE) in both training and validation, avoiding overfitting. We found that levels of ozone generation and precursor concentrations were inextricably linked to climate conditions. To compare PCA-based ANNs with multiple linear regression, principal component regression, and feed-forward ANNs, the PCA-based ANNs were stacked in layers and compared with them. Krasnopolsky [12] discovered that, when utilizing major components as inputs, the ANN’s predictions were better. A method that has been often used in geology is the use of ANNs for forecasting sediment yield, permeability, and the concentration of compounds in minerals. A Global Inference System and an MLP were used to calculate the location of sand, silt, and clay dispersion in Hamburg Port Germany. The network was trained using hidden layers with 139 sample stations [13]. Well logs supplied to different geostatistical models, together with the reservoir structural model, were utilized to determine permeability of a gas reservoir. Using ANN, a permeability forecast for four of the wells was achieved. To aid in the development of the ANN model, two of the wells were used, while the third was used to evaluate the dependability of the model. For the uncured wells, the model was utilized to evaluate permeability. An ANN was used to predict nonlinear radon gas concentrations in soil [14]. Five input nodes were interconnected to one output node in this study’s network (concentration of radon). With high accuracy, the ANN system successfully predicted 10 earthquakes over the next 2 years. Landslide susceptibility mapping in Vietnam was investigated using ANN in the Hoa Binh Province [15]. They constructed two multilayer feed-forward ANNs: one utilizing a back-propagation training technique and the other a policy gradient training method. Training data were gathered by using satellite pictures, field research, and previously collected literature. Training weights were used to calculate landslide susceptibility indices across the whole study area. The landslide susceptibility mapping prediction accuracies were 90.3% and 86.1%, respectively, utilizing the Bayesian regularization neural network and the Levenberg–Marquardt neural

Useful techniques and applications of computational intelligence

23

network. Predictive capacity is a major benefit of ANNs; however, Bayesian regularization networks exhibit better resiliency and scalability [15]. Conventional and traditional modeling methods are often outperformed by ANNs in ecological modeling [16]. Since the early 1990s, environmental modeling has been used in numerous fields. An ANN model for forecasting dissolved oxygen in the Gruža reservoir, Serbia, was created in this way [17]. According to the findings of the neural network model, the output was correct, as shown by association coefficient, mean absolute error, and mean square error. Many case studies use ANN like analyzing fish community diversity [18], invasive species dispersion [19], and water quality indicators [20–22]. As was covered by Zhang [23], there was an in-depth and comprehensive study of ANNs in ecology. Pattern recognition is frequently utilized in remote sensing via the usage of ANNs. It does not need assumptions about the fundamental spreading of standards for the descriptive variables and exercise data, since ANN is nearly nonparametric and nonlinear. Remote sensing data are classified correctly [24–27].

3 Swarm optimization 3.1 Simple ideologies Also known as colony behavior, swarm optimization mimics the social behavior of animals that live in swarms or colonies. Agent-based modeling aims to represent cooperative performance of intellectual mediators in dispersed, self-organized schemes. In order to illustrate swarm intelligence, scientists have turned to biological populations, such as ants, bees, birds, mammals, fish, and microbes. In a swarm intelligence system, the agents are people interacting locally and with their surroundings. Decentralized rules do not dictate what behavior agents must have. One-to-one or one-to-environment interactions between agents enable intelligent behavior to emerge and spread globally. Many routing and optimization techniques have given by swarm theory [28]. Swarm intelligence techniques like “ant colony optimization (ACO) and particle swarm optimization (PSO)” are among the many types of swarm intelligence used today [28]. Using the ACO algorithm, you may determine optimum paths across networks. For this process to continue, there must be constant ant-like activity. However, in the real world, ants just move anywhere they choose. Once they find and return to their colony, they leave behind pheromone trails for other bees to follow. It is probable that this ant has left a pheromone trail that others will pick up on. However, after time, the pheromone along the path loses its potency. As the trail extends, the ant’s pheromone strength fades. It does not take long for the intensity of the pheromone to

24

Sujit Kumar et al.

increase if the route is short. Pheromone concentrations seem to drive ant trails. When one ant discovers a shorter route, additional ants will most likely follow, which ultimately encourages all ants to choose the shortest path. An ACO acts as though ants were marching over a graph which shows the issue to be solved. The graph has edges connecting nodes. Every edge is a route between two nodes, indicating a solution. Pheromone values are assigned to all edges. Figure 2 illustrates ACO’s general approach. First, the graph is constructed by setting each edge to have the same pheromone value and picking a random node to install a synthetic ant. Then, the ant must decide which side of the node to go on [29].

Figure 2: A more generalized ant colony optimization strategy (ACO).

Bird flocking and fish schooling are modeled using PSO. It is an evolutionary computation-like optimization method. This process starts off with a random populace of explanations and explorations for the best answers via regeneration. While it does not use evolutionary operators like edge and alteration, it does make use of others such as natural selection. In PSO, particles (entities) are organized into a swarm. The swarm’s individual particles form possible answers to the optimization issue. A multidimensional search space is traversed by “flown” particles following the current “best” particle guide.

3.2 Utilizations Truck routing and military robots are gaining from new research in swarm intelligence. A fixed-size set, such as a cylinder, offers distinctness for discrete optimization glitches such as the wandering salesman problem. To optimize the architecture

Useful techniques and applications of computational intelligence

25

of the sewage network, ACO was used. The optimization problem used nodal heights as its decision variables. For each choice variable, a Gaussian possibility thickness function was used to indicate its pheromone concentration [30]. ACO implementation was implemented in two ways: with controls and without controls. This approach ignored the need of just a little slope and other constraints. By using the lower limits established at the upstream node under the restricted approach, downstream nodes of a pipe’s elevation may create new upper limits for the upstream node, which represents the pipe’s limitations. To see how the results of the limited method contrasted to those of the unconstrained method, the two sets of findings were compared. The ACO found the best solution rapidly and within a short time frame. It was also found that its sensitivity to the size of the colony and to the unrestricted technique was present. PSO makes no assumptions regarding the issue to be optimized. PSO was used for land-use optimization [31]. The researchers divided the land parcels into particles by their centroids. Particles are continuously shifting their locations and velocities to optimize their personal and global bests, which consider landscape change, biophysical appropriateness, and overall compactness. PSO was utilized in simulating rainfall–runoff interactions. This study investigated two ways of determining structure and the features of the rainfall–runoff association, each using its own independent methodology [32]. To simulate several hydrological models, a linear model combines multiple hydrological models into a structure identification issue, making the structural identification problem combinational [33]. To best fit the subsystem model, the PSO was used. The PSO gets time of peak arrival estimates to a finer degree, and it also calculates the underlying system structure and rainfall–runoff connection parameters [34]. This method has also been used to estimate river stage, simulate turbidity intrusion processes during the flooding season, improve greenhouse climate model parameters, and plan electricity output at a wind farm [35].

4 Artificial invulnerable schemes 4.1 Simple ideologies The human immune system is resilient, distributed, error-tolerant, and adaptive. Everywhere throughout the body, molecules, cells, and organs are found. The human immune system’s primary job is to seek for and kill defective cells and invading pathogens from its own body (such as bacteria and viruses). The immune system’s antigens are known as such. The body’s cells are called self (self-antigens), whereas alien cells are referred to as no self (no self-antigens). The immune system is able to tell if something is self or non-self. The algorithms covered by AIS are wide-ranging.

26

Sujit Kumar et al.

Several algorithms mimic the immune system (specifically B-cells, T-cells, and dendritic cells). Negative selection, clonal selection, and immunological networks are often used to create an AIS [36]. During thymic T-cell development, negative selection mechanisms occur. Before they circulate the body, T-cells grow in the bone marrow. Negative selection is the process of removing self-reactive T-cells from the thymus. T-cells seen throughout the body that have the ability to tolerate the self are said to be thymus-derived T-cells. In negative selection, the samples that include a significant amount of the own antigen (S) are analyzed, as well as all the components that have been marked as being without the own antigen (no S). It first generates a random list of candidates, and then compares those candidates to the components already in place [37]. It will either be ignored or added to F. To maximize coverage of the one space, one should choose a method that minimizes the number of produced detector elements. Once F is produced, it is utilized to find out whether S* has any new elements, be they features or new features altogether. S* checks against F. A match identifies a no self-element and an action will follow. As a result, no self may be detected according to the issue at hand. For a positive selection technique, the issue space (continuous, discrete, mixed, or mixed and discrete), the problem representation scheme, and the matching criteria are the important determinants. The majority of studies have utilized the binary matching criteria like r-contiguous [38].

4.2 Utilization AI has proven effective in helping optimize, classify, and recognize patterns. The adaptive interactive system has shown to be the most effective way of allocating water resources in river basins [39]. To perform a global optimal search, they utilized the macroevolution technique, clonal selection, and an entropy-based density assessment system (EDAS). The developing variety of populations was exploited using the technique of solution exploitation via the application of clonal selection [40]. The method used to create a portion of the pool solution uses EDAS to distribute nondominated people evenly over the identified Pareto frontier [40]. Remote sensing image categorization and pattern recognition were extensively used in the major part of the research [41–44]. A newer method for land cover categorization based on remote sensing pictures was developed by Gong et al. [45]. The general idea is that in order to have a successful identification of a land cover class without having prior knowledge of specific antigens, a large number of antibodies are first produced. The antibodies are cloned and then modified to get the best antibody that will recognize the antigens. Many generations of antibodies are generated until circumstances are reached that stop antibody production. In this study, each generation had land cover class antibodies maintained on them. To account for the change in classification accuracy between the current and previous generation, an

Useful techniques and applications of computational intelligence

27

adaptive mutation rate was used. These distance measures (in the context of affinity) are called Euclidean and spectral angle mapping distances. This group also used a genetic method to arrive at weight numbers that corresponded to different affinity measures [45]. Using Quick Bird pictures and LiDAR data, as well as HyMap hyper spectral images, artificial immune network technique was utilized to detect land coverings in Denver, CO, and in Monticello, UT. The method surpassed the others with greater accuracy and less pepper effect [45].

5 Fuzzy system 5.1 Simple ideologies Fuzzy systems utilize fuzzy sets or fuzzy logic. In outmoded set model, sets must include either a member or nothing. Rice paddy fields and woodland stands are two examples of distinct sets. When land is divided between two groups, it cannot be in the middle. Such sets are known as crisp. Crisp definitions contain no confusion regarding objects’ membership. But in earth and ecological sciences, our explanations and perceptive almost always include some kind of membership. Additionally, field studies frequently use terms such as “poorly drained,” “somewhat sensitive to soil erosion,” and “marginally appropriate for maize” to characterize data gathered. It is a hazy set. Fuzzy sets give a degree of membership to objects in a set. The final state of the items in the set is called their state; this condition holds true whether the set is isolated or if the pieces are absent (Figure 3a). A fuzzy set is defined by a membership function that may take any value between 0 and 1 (Figure 3b). Elements that belong to a fuzzy set have nonzero membership values.

Figure 3: Defining plain showing crisp set versus fuzzy set. (a) Distinguishing purpose and (b) fuzzy membership function [45].

28

Sujit Kumar et al.

An element’s membership in the set depends on the value of the membership function. Fuzzy logic uses fuzzy sets for approximation reasoning. It often utilizes IF–THEN rules. A fuzzy rule is provided below: In shallow soil and moderate cumulative temperature, maize’s adaptability is limited. Traditional Boolean logic also uses the AND, OR, and NOT logic operators. At the intersection of fuzzy sets, a union is created that is determined by the minimum membership function or the maximum of the membership functions. There is no full member of the fuzzy set complement. Systems with fuzzy logic inputs and outputs may be represented as IF–THEN rules with fuzzy predicates, or differential equations whose parameters are numbers that reflect fuzziness, or differential equations with parameters that are specified by integers linked to human perception. Only approximately a third of the earth and environmental occurrences can be described mathematically, and therefore the vast majority of earth and environmental happenings remain unexplained. Heuristic knowledge and expert rule of thumb are both easily available to the general population since they are heuristic information. Heuristic information may be represented and used via rule-based systems. When applied to fuzzy systems, expert systems and knowledge-based systems may be seen as types of fuzzy systems [46]. Furthermore, fuzzy systems offer an alternate method to handling uncertainty since stochastic modeling cannot manage all kinds of uncertainty. Fuzzy systems are inconsistent due to ambiguity, roughness, and/or haziness. Intrinsic indecision is essential to a structure and cannot be addressed by observations. Statistical techniques address uncertainty based on probability rules, which may be measured using observations. In addition, fuzzy systems may be used to describe nonlinear systems and estimate additional utilities or dimension statistics with a specified precision. It depicts the system in a clearer manner because of the language model interpretation which is closer to how people think about the actual world.

5.2 Utilizations When nonstatistical uncertainties are present, fuzzy systems are typically preferable. The use of fuzzy logic was used to describe in evaluating groundwater contamination [47]. In this chapter, vulnerability was evaluated using language factors and words that described hydrogeological features. Due to the inaccuracy of Boolean logic, areas that should have distinct susceptibility directories may be described by the same directory. Fuzzy systems may self-adjust to changing input indices. Fuzzy logic results in more logical and accurate evaluations for values situated close to the boundary classifications. To obtain information on the hydrology and chemistry of groundwater, we may utilize fuzzy methods to collect qualitative data on the physical properties of groundwater [48, 49].

Useful techniques and applications of computational intelligence

29

The local terrain, roughness, and barriers affect wind velocity. While using the meteorological stations’ data to compute the regional wind climate, it does not reflect the variability in the local climate. Circumstances in the local area cause the real wind. This ambiguity, for example, in the topography description (e.g., down slope, up slope, and plain) at the local scale results in the accurate recreation of the landscape being difficult. A method for regional wind dispersion was suggested in [50], which takes topographic impacts into consideration. Topographical features were associated with the local wind conditions using the fuzzy method. Using the terrain analysis, it predicts changes in wind direction that may occur. An evolutionary method was used to improve the system membership distribution of fuzzy systems. This system used a regional wind climate model to mimic the local wind conditions [50].

6 Conclusions Most environmental and earth issues have led to an increasing usage of computational intellect methods. This chapter went through four computational intelligence techniques and their uses. To identify each methodology’s benefits and drawbacks, we used several techniques. An intelligent fuzzy system, for instance, is very good at approximating, but it lacks the ability to learn and optimize. ANNs, on the other hand, can learn and optimize. A combination of computational intelligence techniques yields better computational models. It is one way of doing things combining ANNs and fuzzy systems, in which the ANNs can train and the fuzzy systems can reason with approximation data. For decades, attempts have been made to combine evolutionary computing with ANN. Consider, for example, when using evolutionary computing methods, the output layer weights are determined by the mean square error total of the neurons [51]. Fuzzy set settings may also be optimized by using evolutionary computing [52]. Membership functions that have been fine-tuned through an evolutionary computation may be used to train an ANN. To aid in the choice of the optimum training data for an ANN fuzzy system, evolutionary computation may be used. This book focuses on computational intelligence (or computer intelligence) in the fields of earth and environmental sciences. In this chapter, we provided just a brief introduction to the computing techniques discussed in the rest of the book. A computer algorithm is an algorithm that uses computational models and tools that are continuously refined and improved. Computational intelligence (AI) already has numerous tools at its disposal. It is not hard to discover alternatives to a particular earth and environmental issue. There should be incorporation of physical, chemical, or biological system-specific information in the problem formulation, use of suitable computational intelligence methods, and assessment of model outcomes. Additionally, in order to be generally accepted, computational intelligence methods need to be included into current modeling frameworks. Computational intelligence enables

30

Sujit Kumar et al.

scientists to build models of uncertainty utilizing both historical, current, and prospective data, as well as ambiguous, inconsistent, and inaccurate data. Future uses of computational intelligence techniques will be paired with creativity in earth and environmental applications.

References [1] [2] [3] [4]

[5]

[6]

[7]

[8] [9] [10] [11]

[12] [13]

[14] [15]

[16] [17]

Konar, A., 2005, Computational Intelligence: Principles, Techniques and Applications, Berlin, Springer. Madani, K., 2011, Computational Intelligence, Berlin, Springer. Engelbrecht, A. P., 2007, Computational Intelligence: An Introduction, Chichester, England, Wiley. Watts, M. J., Li, Y., Russell, B. D., Mellin, C., Connell, S. D., Fordham, D. A., 2011, A novel method for mapping reefs and subtidal rocky habitats using artificial neural networks, Ecological Modelling, 222(15), 2606–2614. Aertsen, W., Kint, V., Van Orshoven, J., ÖZkan, K., Muys, B., 2010, Comparison and ranking of different modelling techniques for prediction of site index in Mediterranean mountain forests, Ecological Modelling, 221(8), 1119–1130. Areerachakul, S., Sophatsathit, P., Lursinsap, C., 2013, Integration of unsupervised and supervised neural networks to predict dissolved oxygen concentration in canals, Ecological Modelling, 261–262, 1–7. Antanasijevia, D. Z., Pocajt, V. V., Povrenovia, D. S., Ristia, M. A., Peria-Grujia, A. A., 2013, Submission forecasting using artificial neural networks and genetic algorithm input variable optimization, The Science of the Total Environment, 443, 511–519. Principe, J. C., Euliano, N. R., Lefebvre, W. C., 2000, Neural and Adaptive Systems: Fundamentals through Simulations, New York, Wiley. Haykin, S., 1999, Neural Networks: A Comprehensive Foundation, 2nd edn, Upper Saddle River, NJ, Prentice-Hall. Engelbrecht, A. P., 2007, Computational Intelligence: An Introduction, Chichester, England, Wiley. Sousa, S. I. V., Martins, F. G., Alvim-Ferraz, M. C. M., Pereira, M. C., 2007, Multiple linear regression and artificial neural networks based on principal components to predict ozone concentrations, Environmental Modelling and Software, 22, 97–103. Krasnopolsky, V. M., 2013, The Application of Neural Networks in the Earth System Sciences, the Netherlands, Springer. Yang, Y., Rosenbaum, M. S., 2003, Artificial neural networks linked to GIS, In: Nikravesh, M., Aminzadeh, F., Zadeh, L. A., eds, Developments in Petroleum Science, Vol 51, Soft Computing and Intelligent Data Analysis in Oil Exploration, the Netherlands, Elsevier, 633–650. Torkar, D., Zmazek, B., Vaupotič, J., Kobal, I., 2010, Application of artificial neural networks in simulating radon levels in soil gas, Chemical Geology, 270, 1–8. Bui, D. T., Pradhan, B., Lofman, O., Revhaug, I., Dick, O. B., 2012, Landslide susceptibility mapping at Hoa Binh Province (Vietnam) using an adaptive neuro-fuzzy inference system and GIS, Computers & Geosciences, 45, 199–211. Lek, S., Guegan, J. F., 1999, Artificial neural networks as a tool in ecological modelling, an introduction, Ecological Modelling, 120, 65–73. Ranković, V., Radulović, J., Radojević, I., Ostojić, A., ČOmić, L., 2010, Neural network modeling of dissolved oxygen in the Gruža reservoir, Serbia, Ecological Modelling, 221(8), 1239–1244.

Useful techniques and applications of computational intelligence

31

[18] Chang, F., Tsai, W., Chen, H., Yam, R. S., Herricks, E. E., 2013, A self-organizing radial basis network for estimating riverine fish diversity, Journal of Hydrology, 476, 280–289. [19] Pontin, D. R., Schliebs, S., Worner, S. P., Watts, M. J., 2011, Determining factors that influence the dispersal of a pelagic species: A comparison between artificial neural networks and evolutionary algorithms, Ecological Modelling, 222(10), 1657–1665. [20] Kuo, J., Hsieh, M., Lung, W., She, N., 2007, Using artificial neural network for reservoir eutrophication prediction, Ecological Modelling, 200(1), 171–177. [21] Huo, S., He, Z., Su, J., Xi, B., Zhu, C., 2013, Using artificial neural network models for eutrophication prediction, Procedia Environmental Sciences, 18, 310–316. [22] Song, K., Park, Y., Zheng, F., Kang, H., 2013, The application of Artificial Neural Network (ANN) model to the simulation of denitrification rates in mesocosm-scale wetlands, Ecological Informatics, 16, 10–16. [23] Zhang, W., 2010, Computational Ecology: Artificial Neural Networks and Their Applications, Singapore, World Scientific. [24] Ma, S., He, J., Liu, F., Yu, Y., 2011, Land‐use spatial optimization based on PSO algorithm, Geo-spatial Information Science, 14, 54–61. [25] Bao, Y. H., Ren, J. B., 2011, Wetland landscape classification based on the BP neural network in DaLinor Lake Area, Procedia Environmental Sciences, 10, 2360–2366. [26] Dobreva, I. D., Klein, A. G., 2011, Fractional snow cover mapping through artificial neural network analysis of MODIS surface reflectance, Remote Sensing of Environment, 115(12), 3355–3366. [27] Cruz-ramírez, M., Hervás-martínez, C., Jurado-expósito, M., López-granados, F., 2012, A multiobjective neural network based method for cover crop identification from remote sensed data, Expert Systems with Applications, 39(11), 10038–10048. [28] Blum, C., Merkle, D., 2008, Swarm Intelligence: Introduction and Applications, Berlin, Springer. [29] Dorigo, M., Maniezzo, V., Colorni, A., 1996, Ant system: Optimization by a colony of cooperating agents, IEEE Transactions on Systems, Man, and Cybernetics. Part B, Cybernetics: A Publication of the IEEE Systems, Man, and Cybernetics Society, 26(1), 29–41. [30] Afshar, M. H., 2010, A parameter free Continuous Ant Colony Optimization Algorithm for the optimal design of storm sewer networks: Constrained and unconstrained approach, Advances in Engineering Software, 41(2), 188–195. [31] Ma, S., He, J., Liu, F., Yu, Y., 2011, Land‐use spatial optimization based on PSO algorithm, Geo-spatial Information Science, 14, 54–61. [32] Chau, K. W., 2007, A split-step particle swarm optimization algorithm in river stage forecasting, Journal of Hydrology, 346(3), 131–135. [33] Wang, S., Qian, X., Wang, Q. H., Xiong, W., 2012, Modeling turbidity intrusion processes in flooding season of a canyon-shaped reservoir, South China, Procedia Environmental Sciences, 13, 1327–1337. [34] Hasni, A., Taibi, R., Draoui, B., Boulard, T., 2011, Optimization of greenhouse climate model parameters using particle swarm optimization and genetic algorithms, Energy Procedia, 6, 371–380. [35] Yagiz, S., Karahan, H., 2011, Prediction of hard rock TBM penetration rate using particle swarm optimization, International Journal of Rock Mechanics and Mining Sciences, 48(3), 427–433. [36] Dasgupta, D., Forrest, S., 1999, An anomaly detection algorithm inspired by the immune system, In: Dasgupta, D., ed, Artificial Immune Systems and Their Applications, Berlin, Springer, 262–277. [37] De Castro, L. N., Timmis, J., 2002, Artificial immune systems: A novel paradigm to pattern recognition, In: Corchado, J. M., Alonso, L., Fyfe, C., eds, Artificial Neural Networks in Pattern Recognition. SOCO-2002, Paisley, England, University of Paisley, 67–84.

32

Sujit Kumar et al.

[38] Forrest, S., Perelson, A. S., Allen, L., Cherukuri, R., 1994, Self-nonself discrimination in a computer. In: Proceedings of the IEEE symposium on research in security and privacy. IEEE Computer Society Press, Los Alamitos, CA, 202–212. [39] Liu, D., Guo, S., Chen, X., Shao, Q., Ran, Q., Song, X., Wang, Z., 2012, A macro-evolutionary multiobjective immune algorithm with application to optimal allocation of water resources in Dongjiang River basins, South China, Stochastic Environmental Research and Risk Assessment, 26(4), 491–507. [40] Marin, J., Sole, R. V., 1999, Macroevolutionary algorithms: A new optimization method on fitness landscapes, IEEE Transactions on Evolutionary Computation, 3(4), 272–286. [41] Xu, S., Wu, Y., 2008, An algorithm for remote sensing image classification based on artificial immune B-cell network, In: The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Vol. XXXVII, part B6b, 107–112. [42] Zhang, X., Shan, T., Jiao, L., 2004, SAR image classification based on immune clonal feature selection, In: Mohamed, S. K., Aurélio, C. C., eds, Proceedings of Image Analysis and Recognition, Vol 3212, Lecture Notes in Computer Science, Berlin, Springer, 504–511. [43] Zheng, H., Li, L., 2007, An artificial immune approach for vehicle detection from high resolution space imagery, International Journal of Computer Science and Network Security, 7, 67–72. [44] Zhong, Y., Zhang, L., Huang, B., Li, P., 2007, A resource limited artificial immune system algorithm for supervised classification of multi/hyper-spectral remote sensing imagery, International Journal of Remote Sensing, 28, 1665–1686. [45] Gong, B., Im, J., Mountrakis, G., 2011, An artificial immune network approach to multi-sensor land use/land cover classification, Remote Sensing of Environment, 115, 600–614. [46] Patterson, D. W., 1990, Introduction to Artificial Intelligence and Expert Systems, Englewood Cliffs, NJ, Prentice-Hall. [47] Rezaei, F., Safavi, H., Ahmadi, A., 2013, Groundwater vulnerability assessment using fuzzy logic: A case study in the Zayandehrood aquifers, Iran, Environmental Management, 51, 267–277. [48] Güler, C., Kurt, M. A., Alpaslan, M., Akbulut, C., 2012, Assessment of the impact of anthropogenic activities on the groundwater hydrology and chemistry in Tarsus coastal plain (Mersin, SE Turkey) using fuzzy clustering, multivariate statistics and GIS techniques, Journal of Hydrology, 414–415, 435–451. [49] Iliadis, L. S., Vangeloudh, M., Spartalis, S., 2010, An intelligent system employing an enhanced fuzzy c-means clustering model: Application in the case of forest fires, Computers and Electronics in Agriculture, 70(2), 276–284. [50] De La Rosa, J. J. G., Pérez, A. A., Salas, J. C. P., Leo, J. G. R., Muñoz, A. M., 2011, A novel inference method for local wind conditions using genetic fuzzy systems, Renewable Energy, 36, 1747–1753. [51] Piotrowski, A. P., Napiorkowski, J. J., 2011, Optimizing neural networks for river flow forecasting – evolutionary computation methods versus the Levenberg–Marquardt approach, Journal of Hydrology, 407, 12–27. [52] Sanchez, E., Shibata, T., Zadeh, L. A., 1997, Genetic Algorithms and Fuzzy Logic Systems: Soft Computing Perspectives, River Edge, NJ, World Scientific Pub., Singapore.

Dimple Nagpal, S. N. Panda, Priyadarshini A. Pattanaik

Machine learning-based attribute value search technique software component retrieval Abstract: Component reuse is an significant aspect in component-based software development, but the problem in reusing the component is that the retrieval of the component from the repository should be done. To retrieve the component from repository, diverse classification and retrieval schemes are present. In this chapter, component retrieval is done by using the attribute value search. Machine learning is an important concept in which processing and learning of a component are done from the composite dataset and solving various kinds of tasks well. Basically to retrieve the component from the tool SRSCAVS (Storing and retrieving the component from attribute value search), the user should know the specification of the component. By inputting the correct component name and operating system, it will be easier for the user to search the component efficiently. But if the user only knows one of the parameters, for example, either knows the component name or operating system, then also the user successfully searches the component. Keywords: software component, retrieving, storing component, repository, retrieving techniques, component reuse

1 Introduction Component-based software engineering (CBSE) focuses on the concept of reusability rather than redeveloping it from the beginning. But the reusability can only be achieved if the software component is present in the database and component can easily be retrieved from it. There are various classification techniques to store the software component, which makes the retrieval of the component easier. The classification techniques are free text classification, enumerated, attribute value, and formal specification. The retrieval techniques are keyword-based retrieval, query-based retrieval, signature matching, behavior-based retrieval, and so on.

Dimple Nagpal, Chitkara Institute of Engineering and Technology, Chitkara University, Punjab, e-mail: [email protected] S. N. Panda, Chitkara Institute of Engineering and Technology, Chitkara University, Punjab Priyadarshini A. Pattanaik, Image and Information Processing Department IMT Atlantique, LaTIM Inserm U1101, Brest 29238, France https://doi.org/10.1515/9783110709247-003

34

Dimple Nagpal, S. N. Panda, Priyadarshini A. Pattanaik

Now, before discussing on the whole retrieval process, one should know the meaning of CBSE and how it is related to components. Basically, CBSE commences the field of the component in the early 1990s in which the idea of composing the component was emerged [6]. But due to the wide composing ability, and commercial off-the-shelf components become a part in a business application. Basically, CBSE is a process of developing or designing a component in such a general way, so that the component can be reused with other applications in other domains. It makes a component plug and plays device, in which the component can be plugged in an application and it can easily be played according to the user specification. CBSE is familiar to object-oriented programming (OOP) in many ways. Just like in OOP, the code is reused in the form of objects, in the same way in CBSE, components are reused. Software reuse is the motivation of CBSE which is based on adapting and reusing the component rather than building it from the beginning. If a large software system shatters into small components, it will be beneficial to store that component in the repository, so that it can further be reused in another application. By doing so, not only the component is reused but the cost of building that component can also be saved. In this chapter, storing and retrieving of the component are described. The literature survey of all the classification and retrieval techniques is described in Section 2. The evaluation parameters are described in Section 3. The system description of the tool SRSCAVS is described in Section 4. The proposed work is explained in Section 5. Comparison of various research works is done in result analysis in Section 6. At the end, we bind up with the conclusion in Section 7 and future scope in Section 8.

2 Related work As mentioned earlier, there are many classification and retrieval techniques for effective classifying and retrieving the component information. Once the repository or database gets fully populated, one can effectively retrieve the component by various retrieving schemes. These are described below.

2.1 Keyword-based retrieval This technique requires the user to enter the keyword that is used for searching for information. The input of the keyword-based retrieval may be a precise term of the component that describes the exact component or it may be the information of the component in the form of the sentence through which the component can be retrieved by using that information.

Machine learning-based attribute value search technique software component retrieval

35

2.2 Behavior-based retrieval In the behavior-based retrieval, the user or engineer inputs the query in the form of a function having various parameters, thus yielding behavior of that function or subprocedure call. If the behavior of that function matches the behavior in the repository, then the component is presented to the user or engineer [1].

2.3 Query-based retrieval In this, different parameters are used by the user for searching the component. The user selects the particular feature, and if one wants to retrieve, then according to that retrieval of the user, the query is executed. On the basis of the user query, the results for retrieval are displayed [2].

2.4 Signature matching In this method, the storage of the component is done by using signatures. Signatures are basically the features of component like the type of input and number of input. This is beneficial in programming, where a developer can specify the signature according to the component and then the developer can easily retrieve the component.

2.5 Hypertext technique This technique provides the interconnection of textual information by providing the links between nodes and anchors. A small unit of information is considered as a node, and connection of various nodes is known as a link. Hypertext technique has a source and the destination through which we can retrieve the information [7]. We usually start with the anchor node, that is, the starting node and reach to the destination node. By doing this, a structure is maintained in the collection of data in which information of nodes and links are there.

2.6 Browsing technique Browsing is a retrieval technique that necessitates the use of a well-defined information repository. Browsing is beneficial for those users who do not know exactly what they want to search. In this, no suitable keyword query is inputted. Instead,

36

Dimple Nagpal, S. N. Panda, Priyadarshini A. Pattanaik

the user decides based on the interconnected nodes, whether the node being accessed by the user is interesting or not. The information must be interconnected in the repository, and hypertext is a way through which interconnected information can be stored.

2.7 SOM and GSOM technique Self-organizing maps (SOM) and growing hierarchical SOM are both the clustering approaches. These approaches are compared by Ronaldo C. Veras and Silvio R.L. Meira, in which clustering of the classes in Java API was performed. These clustering approaches [7] are performed to provide the visualization of software components in the mobile, and search can also be refined by clustering the component that is similar.

2.8 Multiattribute search The multiattribute search mechanism is a retrieval technique, in which retrieval of the component can be done only by specifying the query. Besides keyword-based retrieval which retrieves the component only by matching, multiattribute takes multiple attributes as a query. The attributes can be characteristics of software components or the information of software component. The user inputs the parameters and according to that output appears.

2.9 Natural language interface Initially, the retrieval of documents is done. To understand the documents in various languages, a natural language interface is needed, which retrieves the document in natural language so that information can be retrieved easily.

2.10 Case-based retrieval Case-based retrieval is a combination of two techniques, namely, information retrieval technique based on statistical methods and knowledge base technique [8]. Both these techniques are helpful for retrieval of the component. While the first technique is helpful for posing the query in natural language, the second technique is beneficial for retrieval by accessing the knowledge base.

Machine learning-based attribute value search technique software component retrieval

37

2.11 External classification External classification is a classification of a description of the component either automatically or manually. It not only faceted classification and vocabulary but also classifies the query in natural language. It uses linguistic approaches to solve the engineer query by automating the description of the component. The disadvantage of using this approach is that while faceted classification gives good results, manual indexing is expensive. Another disadvantage is that while classification with natural language provides lexical, semantic, and syntactic information from natural language interface, it is restricted to a particular domain [8].

2.12 Encoding of component description It is the method in which encoding of semantic component description is done in order to lessen the problem of classifying and retrieving the component. To encode the description, software resource model (SRM) and software resource diagram (SRD) are used. The SRM is used to remove ambiguities and inconsistencies in the component description and SRD represents the information of SRM graphically.

2.13 Conversational case-based reasoning In this reasoning approach knowledge-intensive, case-based reasoning (CBR) is used to solve classification and retrieval problem of the component. The main idea in this is that, while searching for a problem we usually fit the previous solution to the new problem. By doing so, sometimes we get the solution to the problem very easily. The CBR consists of four phases, namely retrieve, retain, reuse, and revise. CCBR is the extended form of CBR, which is explored in various application domains. Table 1: Various classification/retrieval techniques. Technique

Classification Description technique/retrieval technique/optimization technique

Keyword based Retrieval technique

Retrieval of the component can be done by inputting the keyword

Pros

Cons

Easy implementation and its conceptual simplicity for the user

Number and choices of the word are crucial to success

38

Dimple Nagpal, S. N. Panda, Priyadarshini A. Pattanaik

Table 1 (continued) Technique

Classification Description technique/retrieval technique/optimization technique

Pros

Cons

Behavior based Classification technique Component behavior is stored in the repository by testing various components with different arguments, and yields various new responses

Only that user accesses the behavior which inputs the specific query

The user should have the knowledge of writing the query in that language

Hypertext

Retrieval technique

Set of interconnected units of information of the component

Provides a convenient way to access the data gathering

The relationship between various nodes should be predefined

Browsing

Retrieval technique

Components are represented as a unit of interconnected nodes, but in this, the user does not have to first formulate the query

For this type of matching, there are no explicit rules

The ability of the user to move quickly and efficiently through the database

Multiattribute

Retrieval technique

Searches the component by specifying the attributes of the component

Component description contains more than just keyword information

Classification and storing of the component can be done by an administrator only

Natural language

N.A.

Fitting to retrieve the natural language query

Ease of language Difficult to query implement formulation

Machine learning-based attribute value search technique software component retrieval

39

Table 1 (continued) Technique

Classification Description technique/retrieval technique/optimization technique

Pros

Cons

Case based

Retrieval technique

Interrelate the component through a rich set of relationships expressing system architecture, the design decision

Scalability and extensibility are compromised due to the need for the manual, high-cost knowledge acquisition process

External classification

Classification technique Includes all those techniques that describe the component by an external description

Faceted classification technique gives good result

This technique is expensive due to manual indexing

Encoding component description

Classification and retrieval technique

Encoding of semantic information of reusable component is done by classifying and retrieving the component

Both software resource model and software resource diagram have consistency

N.A.

Optimization

Retrieval technique

Application for the planning of software reusable component

An application of component can be built out by accessing the software component

It will cost high if no software component in the repository is used for the application

Components are represented as cases and are an interactive form of case-based reasoning

Probed in several application domains

Dependency on Knowledge Engineering

Conversational Retrieval technique case based

Enables reasoning about a concept, allowing approximate retrieval of component

40

Dimple Nagpal, S. N. Panda, Priyadarshini A. Pattanaik

Table 1 (continued) Technique

Classification Description technique/retrieval technique/optimization technique

Pros

Cons

Genetic

Retrieval technique

Applies various operators such as selection, crossover, and mutation to extract the component from a repository

Enhances the chance of retrieving appropriate component from the repository

It is sometimes very difficult to calculate the fitness function value

Specification based

Retrieval

Based on specification matching of the component in the enterprise information system

It is a multilayer matching mode that can enrich the semantic information of the business component repository

N.A.

Free text or faceted

Classification technique Defines attribute classes that can be instantiated with different terms

Facets represent the component easily by combining terms in the repository

It becomes hard for the user to find the right combination of words that describe the information need

Attribute value

Classification technique It uses a set of the attribute to classify the component

According to attributes, components are classified in the repository

Only the developer can classify the component

Knowledge based

Retrieval technique

They are based N.A. on a knowledge base that contains semantic information about a domain as well as natural language

Knowledge base is created for every domain and is populated manually that requires enormous human resources

Machine learning-based attribute value search technique software component retrieval

41

Table 1 (continued) Technique

Classification Description technique/retrieval technique/optimization technique

Pros

Cons

Deep learning using software engineering

Learning technique

It improves the efficiency and accuracy in the retrieval of the component over time

Retrieval of results can take time in terms of resources or dataset automation

This algorithm is based on training the output and automating the retrieval of future components using various algorithms used in it

2.14 Genetic algorithm Genetic algorithm is a robust optimization algorithm as it includes genetically breeding of software components to produce a new software component. The genetic algorithm produces the new component with the help of some principle known as the Darwinian principle of reproduction which executes the following operators: – Selection – Crossover – Mutation

2.15 Free text or faceted Faceted and free text classifications of various features are incorporated in the repositories that improve the classification and retrieval of the component [9]. Classification in a faceted scheme is straightforward. Several terms are composed in a single facet, and there are so many facets in this scheme. Free text analysis analyzes the word frequency in the sentence. The advantage of faceted classification is that the components can easily be stored and retrieved from facets.

2.16 Attribute value Attribute value classification uses a set of predefined attributes. For example, a component can have many attributes like component name, brand, version number, cost,

42

Dimple Nagpal, S. N. Panda, Priyadarshini A. Pattanaik

component id, operating system, and component URL. Depending on the needs of the user request, the particular component is retrieved with the following attributes that are present in the repository [10].

2.17 Knowledge-based approach The knowledge-based approach is a retrieval system in which components are retrieved by storing in the knowledge base. The components are stored in the knowledge base by making lexical, syntactical, and semantically analysis of software components without understanding the whole document.

2.18 Machine learning algorithm Machine learning algorithm mainly focuses on retrieving the relevant component and rejecting the irrelevant component. Machine learning consists of various methods through which retrieval of the relevant component can be done with ease [11]. The various methods are through supervised, semi-supervised, unsupervised, and reinforcement machine learning algorithm. The comparison of various classification and retrieval techniques is shown in Table 1.

3 Evaluation criteria There can be different results according to the different techniques used for the retrieval of the component. Results are usually measured in form of precision, recall, and F-measure. Before explaining and comparing different results one should know the exact meaning of precision, recall, and F-measure.

3.1 Precision The term itself describes how much the output is precise in the context of retrieving information from the repository. Precision is a number that ranges from 0 to 1. It describes the ratio between the number of relevant retrieved information of component from the total number of the component that is retrieved. The formula for precision is as follows: Precision =

No. of retrieved relevant component Total no. of relevant component retrieved

Machine learning-based attribute value search technique software component retrieval

43

3.2 Recall The ratio of a document that is relevant to the query which is successfully retrieved is referred to as recall. In simple words, recall means that all necessary components can be quickly retrieved. This is done by calculating the ratio of the number of relevant retrieved documents to the total number of relevant components in a repository. Recall also ranges between 0 and 1. The formula for the recall is as follows: Recall =

No. of significant retrieved component Total no. of significant component in repository

3.3 F-measure Precision and recall cannot be attained at the same time in the best of circumstances. Only the relevant component should be retrieved from the repository for maximum precision, and all the components in the repository should be retrieved for maximum recall [2]. As a result, a measure that combines precision and recall, such as the F-measure, is necessary. The harmonic mean of precision and recall is known as F-measure. It is basically a measure of the average of the two. The formula for F-measure is as follows: F − measure =

Precision × recall Precision + recall

4 System description In the proposed system, we have an interactive user interface. This system has two modules. One is the developer module and another is the user module. In the developer module, it can insert and delete the component in the repository. Insertion and deletion of components can be done on the basis of parameters. For inserting the component, the developer should know its name, brand, version number, cost, and id. While for deleting the particular component, only the component id is required. Basically, the developer module manages the work of inserting and deleting the component in the repository as shown in Figure 1. In the user module, the user can only search for the components that are present in the repository. Components can be searched in a query-based manner. Basically, in this mechanism, searching is based on various parameters. The parameters for searching are a component name and component id. The component may be readily obtained from the repository if the correct component name and id are entered into the text box and the component is present in the repository.

44

Dimple Nagpal, S. N. Panda, Priyadarshini A. Pattanaik

Figure 1: Flow of insertion and deletion of component.

5 Proposed work 5.1 Tool introduction Component retrieval in machine learning is the concept that involves reusing the relevant component from the repository. It commences from representing the components in an n-dimensional space. Information of the component can be stored either in indexed or unindexed attributes. Queries in Graphical user interface (GUI) are represented as constraints, and a particular retrieval method is used in order to retrieve a relevant component. The component retrieved can be structured or derivational depending on the output [12]. Keyword-based retrieval of components does not give precise output as only the term related to the query is extracted. It does not extract the component semantically, while in query-based searching, searching of the component can be done semantically by précising the input through inputting various parameters in the query. A technique is introduced to circumvent the difficulty of keyword-based retrieval and is used for effective component search and retrieval.

5.2 Proposed algorithm The proposed algorithm for the tool SRCAVS (storing and retrieving component using attribute value search).

45

Machine learning-based attribute value search technique software component retrieval

The SRCAVS algorithm Start Insert/Delete Component Repeat steps – for inserting and deleting of component If(user choice = = developer) { The user can insert and delete a component if the developer is valid } Search Component If(Userchoice = = engineer) { Search the component by inputting the exact terms against the attributes. }

5.3 Tool description The tool named as SRCAVS is used for storing and retrieving components from the repository. The repository schema for the tool is stated below: Repository (component_name, type of component, OS, cost, component_id, domain, component): a sample record of this schema is shown in Table 2. Table 2: Schema of the component repository. Component name

Type

OS

Cc

Comp

L

Cost

C_id





Domain

Comp

Ac

Li

The proposed tool works as follows.

5.3.1 Home page The home page is provided with the option that whether the one who uses GUI is a developer or an engineer as shown in Figure 2.

5.3.2 Developer login screen Basically Login screen is provided to determine whether the user is authenticated or not as indicated in Figure 3. If the user enters the correct login id and password against text field, then the user is authenticated; otherwise, the user is not authenticated.

46

Dimple Nagpal, S. N. Panda, Priyadarshini A. Pattanaik

Figure 2: Screenshot of home page.

Figure 3: Screenshot of developer login.

5.3.3 Developer interface In this, the developer can choose whether the developer wants to insert the component to the repository or the developer wants to delete the component from the repository. According to the developer need, he/she can insert and delete a component from the repository. The developer interface is shown in Figure 4.

Machine learning-based attribute value search technique software component retrieval

47

Figure 4: Screenshot of developer interface.

5.3.4 Developer insertion screen In this, developer can add the component in the repository. Insertion of the component can only be done if the user has the information of component name, type of the component, system, cost, domain, component, and component id as shown in Figure 5.

Figure 5: Screenshot of inserting the component in the repository.

48

Dimple Nagpal, S. N. Panda, Priyadarshini A. Pattanaik

5.3.5 Developer deletion screen Deletion of the component can be done by deleting the component id as component id is unique, and the developer can access the repository, so the developer can delete the component which is not relevant by only putting its id against the text field as shown in Figure 6.

Figure 6: Deleting the component from the repository.

5.3.6 Engineer login As only the authenticated user can access this interface. The engineer can also access this interface if the engineer is authenticated and inputs the correct id and password against the text field as shown in Figure 7.

5.4 Tool input In this, the user enters the appropriate query for retrieving of the component. The component can be fetched from the repository if the parameter in the query matches the component in the repository, as shown in Figure 8.

5.5 Tool output The screenshot of tool output, or more precisely, search is shown in Figure 9. When the user searches by inputting the query, this tool extracts the component

Machine learning-based attribute value search technique software component retrieval

49

Figure 7: Screenshot of the user login page.

Figure 8: Screenshot for searching the component.

from the repository and the results are shown in the figure. This tool takes the input parameters and by using AND operation retrieves the component from the repository.

50

Dimple Nagpal, S. N. Panda, Priyadarshini A. Pattanaik

Figure 9: Output screen showing the component.

6 Result analysis This study involved keyword-based retrieval, as explained in [3]. This retrieval procedure is implemented with the help of a technology called automated repository exploration. In this chapter, component ranking and retrieval are also accomplished by providing weights to each component. While keyword-based retrieval has a number of advantages, it also has a number of drawbacks. The limitation of keyword-based retrieval is that keyword searches are ambiguous, resulting in a plethora of ambiguities, such as structural and keyword ambiguity. Furthermore, with keyword-based retrieval, evaluation of results is harder to obtain. This study presents the retrieval by first classifying the component [6]. The component’s classification is based on a combination of attribute value and faceted classification. The component can be retrieved using both keyword and query-based methods; however, query-based produces more specific results than keyword. However, the query-based retrieval approach has the drawback of only producing useful results for a small number of components in the repository. It does not provide us the precise output if the component repository is large enough. A mechanism known as ontology is necessary to extract the component semantically and syntactically. Component retrieval based on ontologies is far more effective than any other technique. Essentially, the ontology establishes a relationship between concepts, classes, and individuals so that the component received from the repository not only syntactically but also semantically matches the query.

51

Machine learning-based attribute value search technique software component retrieval

Gupta and Kumar [5] described a component retrieval system that retrieves components using a faceted and metadata repository. When compared with Bhatia [4], the attribute value gives a precise output than query-based retrieval, while query-based retrieval gives a link to the site where components are stored on the web and do not give a precise demonstration of the component. Attribute value retrieval gives precise output and the link associated with retrieving the component. Also, the component features chosen by Bhatia [4] is as not accurate as done by this experiment using attribute value search. The output of the attribute value search is shown in Table 3. Table 3: Output schema of component repository. CN

Br

Vn

Ta

Tc

OS

C

L

DL

Ba

S

.

Exef

COTS

Mac



VB

CL

Ba

Mh

.

Sc

COMP

SL



VB

CL

CN, component name; Br, brand; Vn, version no.; Ta, type of application; Tc, type of component, OS, operating system; C, cost; L, language; DL, download link; Ba, battery analyzer; S, Sify; Mh, microhard; Exef, EXE file; Sc, source code file; COMP, component; SL, Solaris; CL, http://www.com ponentsource.com.

In Table 3, components are retrieved but not precisely. If the repository is large enough, then it will be difficult to look on all the components that are retrieved. So to avoid this problem, the tool so generated takes as input two attributes, namely, component name and operating system and accordingly give precise output as shown in Table 4. Table 4: Output schema of attribute value. Component name

Type

OS

Cc

Comp

L

Cost

C_id





Domain

Comp

Ac

Li

Cc, Commons Chain; Comp, component; L, Linux; Cid, Component id; Ac, Apache component; Li, https://repository.apache.org/content/repositories/releases/commons-chain/commons-chain/. This experiment gives more precise output than query-based retrieval.

7 Conclusion This chapter focuses on reusing the component by retrieving it from the repository. Nowadays, reusability is an important aspect as the developer or user simply wants the component according to their needs and used them, rather than rebuilding the same component. In this chapter, the working of the tool is shown in which the

52

Dimple Nagpal, S. N. Panda, Priyadarshini A. Pattanaik

attribute value search technique used is the combination of the query-based and keyword-based technique for effective retrieval of the component. This chapter briefly describes the machine learning concept, through which relevant component from the repository can be extracted with some constraints in the form of attributes.

8 Future work Future work involves the retrieval of the component by using ontology as it not only gives the retrieval of component effectively but also the retrieval is based on semantic and syntactic manner. Also, various ways of exploiting the ontologies must be explored in order to retrieve the component efficiently and in a costeffective manner. In the near future, we will try to implement the component retrieval by using ontology with machine learning techniques. Ontology plays a vital role in retrieval as it not only retrieves the component syntactically but also retrieves the component semantically by providing the interrelationships between the components. Usually, ontology with machine learning can be applied in various domains; in a question–answer scenario, ontology is linked with answers for the simple question or it can be connected with the knowledge base in which preprocessing and learning of component is there. After preprocessing and learning, retrieval of the component is done through machine learning and deep learning techniques.

References [1] [2] [3]

[4] [5] [6] [7]

[8]

Kaur, N., 2007. Retrieving Best Component From Reusable Repository (Doctoral dissertation). Dalipi, F., Ninka, I., 2013, Semantic information retrieval from heterogeneous environments, International Journal of Scientific & Engineering Research, 2229–5518. Chatterjee, R., Rathi, H., 2015. To ameliorate component searching by automating keyword search technique. In International Conference on Computing for Sustainable Global Development, 560–565. Bhatia, V. K., 2011. Implementing Improved Classification and Multiple Search Criteria in Software Repository (Doctoral dissertation). Gupta, S., Kumar, A., 2013, Reusable software component retrieval system, International Journal of Application or Innovation in Engineering and Management, 187–194. Mahmood, S., Lai, R., Kim, Y. S., 2007, Survey of component-based software development, IET Software, 57–66. Bakshi, A., Bawa, S., 2013, A survey for effective search and retrieval of components from software repositories, International Journal of Engineering Research and Technology, 1935–1939. Jatain, N. A., Component retrieval techniques-a systematic review, International Journal of Scientific & Engineering Research, 2229–5518.

Machine learning-based attribute value search technique software component retrieval

[9]

53

Andreou, A. S., Vogiatzis, D. G., Papadopoulos, G. A., 2006. Intelligent classification and retrieval of software components. In Annual International Computer Software and Applications Conference, 37–40. [10] Pandove, D., Goel, S. G., 2010. Designing an interface for effective retrieval of components (Doctoral dissertation). [11] Ap, M. S. K., Ap, B. P. D., 2018, A novel approach for deep learning techniques using information retrieval from bigdata, International Journal of Pure and Applied Mathematics, 601–606. [12] Zhang, D., June 2000, Applying machine learning algorithms in software development. In Proceedings of the 2000 Monterey workshop on modeling software system structures in a fastly moving scenario, 275–291.

Ashish Agrawal, Anju Khandelwal, Shruti Agarwal

Use of fuzzy logic approach for software quality evaluation in Agile software development environment Abstract: Quality is not only required by customers but also by serving organizations, no matter if they are giving services for factory-made products, utility items, or software products. In terms of software products, quality estimation process is different and becomes little hard as the developer needs to work according to the customer requirements which are actually always updating in modern scenario. Some decades ago, when traditional software development life cycles like waterfall model and prototype model were mostly in use. Software quality evaluation was performed using function points and line of codes but since 2001 when Agile software development practices came into existence, software quality evaluation parameters have also been updated and various different techniques are being implemented for the purpose. This chapter focuses on the approach of using soft computing techniques for software quality evaluation and also gives information about major software quality evaluation parameters in Agile methods. Keywords: software quality, quality attributes, Agile software development (ASD), scrum, software quality evaluation

1 Traditional software quality attributes Software quality is defined as an area of learning and training that describes required features of software products. The quality of the software is considered to be the best by certain quality models such as ISO/IEC 25010: 2011. The basic objective of software engineering is to advance approaches and measures for software development. These can be large scale for large structures and can be used to yield high-quality software at a consistently truncated rate and with a short cycle of time. As predicted by software quality, do it once and do it right, and less rework, less variation in productivity, and overall performance will lead to improved routine. Products are distributed

Ashish Agrawal, Department of Computer Science, SRMS CET, Bareilly, India, e-mail: [email protected], Orchid.org/0000-0001-9483-3393 Anju Khandelwal, Department of Mathematics, SRMS CET, Bareilly, India, e-mail: [email protected], Orchid.org/0000-0002-9730-9084 Shruti Agarwal, Department of Computer Science, SRMS CET, Bareilly, India, e-mail: [email protected], Orchid.org/0000-0001-6242-4990 https://doi.org/10.1515/9783110709247-004

56

Ashish Agrawal, Anju Khandelwal, Shruti Agarwal

on time, and they are manufactured more efficiently. Poor quality is more problematic to accomplish. The development, operation support, and improvement are carried out within IT projects in the public sector. The expansion, procedure support, and enhancement are supported within IT projects in the public sector. Despite large investment in IT projects, a majority of them did not fit within scope, budget, or schedule [3]. Software quality features are features that allow software testing professionals to measure software product performance. The McCall model established the product quality through several features. These were grouped into three perspectives: product review (maintenance, flexibility, and testing), product operation (correct, reliable, efficient, integrity, and usability), and product transition (portability, reusability, and interoperability) [5].

1.1 Availability Any software application will be called good in availability if that system is able to provide its features to users most of the time. And all the faults and issues generated by any condition will be sorted out within a stipulated time.

1.2 Interoperability After the evolution of Agile methods, almost all software are developed in modular approach. And in these cases, system must be able to exchange information between modules to increase cohesion. Highly interoperable systems are also capable of making good communication with other systems.

1.3 Performance Performance is not only concerned with producing the correct output as per the requirements but also concerned with time and space. This is one of the most important quality attributes as it directly affects the customer.

1.4 Testability Software testability focuses on the way a software tester can apply different test cases and test suites. Testability is concerned with software quality assurance (QA) through verification and validation.

Use of fuzzy logic approach for software quality evaluation

57

1.5 Security There are many multiuser systems which lead to many security issues whether its spam mails, unauthorized accessibility, service unavailability, and information breach, so it becomes very important for any software developer to check all the security measurements of the software applications.

1.6 Usability Every software-driven system is designed for convenience to achieve certain tasks. This property, that is, usability denotes the simplicity through which any user can able to access the task or we can say they can execute the task on the system very easily; it additionally demonstrates the sort of clients’ support which is given by the system. The most notable principle for this property is “Keep It Simple Stupid.” In addition, software QA engineers should test the software to check whether it supports distinctive accessibility kinds of control for people having some inability to access the system.

1.7 Functionality This attribute defines the conservatism of an s/w-driven system with actual condition with its details. Most software testing experts see this attribute as pivotal and a foremost requirement of an advanced application and would therefore advocate the completion of tests that evaluate the desired functionality of a system in the initial phases of software testing activities.

1.8 Reliability Reliability measure should give consistently correct results if the product is reliable enough to sustain in any condition. Product reliability is measured in terms of working on the project under different working environments and different conditions.

1.9 Maintainability Maintenance should be cost-effective and easy. It is easy to maintain the system and repair faults or make changes to the software. Different versions of the product should be easy to maintain. It should be easy to add code to the existing systems for development and upgrade from time to time for new features and new technologies.

58

Ashish Agrawal, Anju Khandelwal, Shruti Agarwal

Other than these parameters, inspection rate and error density are most significant examination metrics for improving the software quality. By using fuzzy logic approach, we quantify the qualities of software modules on the basis of above two metrics. We have used triangular fuzzy values to examine the inspection rate and error density of the product. By using fuzzy logic, quality grades are divided in the form of modules. Fuzzy logic offers an important ascendancy over different methodologies because of its capacity to normally showing the qualitative aspect of inspection data, and with the help of fuzziness technique we apply the inspection data on fuzzy inference rule based on fuzziness.

2 Software quality evaluation in Agile A common pitfall that QA teams face when adopting Agile methods is putting too much focus on speed testing while ignoring some of the other essential factors of the technique. Agile, for example, encourages a closer link between developers and testers, urging teams at the very beginning of a project to start working together. This ensures that when the stakeholders sit down and lay out criteria and an overall growth plan, QA teams should be brought in to provide feedback and insight. Through doing so, companies will ensure that everybody is on the same page in order to determine project progress and quality of software, resulting in a smoother phase of creation and testing, as well as a stronger initial construction. The use of scrum teams is another component of agility that is key to enhancing software quality. Again, it is crucial that QA leaders here do not focus too much on the speed factor. While scrum teams are often distinguished by their use of sprints on a specific component or function of software to zero in, there are other keys to their success. For example, the test management approach can go a long way toward deciding how effective the scrum team is. As explained by Software Testing Help, getting able team members, open communication lines and a welcoming environment can help enhance the efficiency of all. When properly performed, scrum techniques will decrease the amount of time required to run comprehensive tests and lead to comprehensive tests. Some of the major quality estimation factors that can be used in Agile environment are unit tests per user story, functional (FitNesse) tests per user story, defects carried over to next iteration [2], running tested features (RTF) [4], enhancement rate, defect removal rate, velocity, enhancement time ratio, and defect removal effectiveness [1].

Use of fuzzy logic approach for software quality evaluation

59

3 Fuzzy approach for software quality evaluation In order to create software quality assessment model, one should initially distinguish factors that firmly influence soft quality and the number of residual errors. Unfortunately, it is too difficult to accurately identify relevant quality factors. Besides, the degree of influence is not fixed in nature, that is, although the definite and discrete metric information are utilized, inference rules utilized might be fuzzy in nature. Assume, for example, that an inspection team announced an inspection rate of over 380 Loc/h, whereas ordinary investigation rate goes from 150 to 200 Loc/h. One can convincingly contend that such inspection rate essentially extends the detailed normal from industrial applications, and specialists will undoubtedly concur consistently with the end. In any case, such assessment is fuzzy because the term “significantly” cannot be precisely quantified. Additionally, if a team reports inspection rate of 275 Loc/h, specialists are probably to differ in their views concerning whether the inspection rate exceeds the industrial norm and by how much it exceeded. In other words, decision boundary is not very much characterized. So we use inspection rate and error density to increase the software quality assessment.

1 Inspection rate To monitor the inspection process is extremely fundamental and essential to just get early estimate of software’s quality, evaluate staff’s conformance to inspection procedures, and identify status of inspection process. Metrics generally provides subjective evaluation of complete number of errors or deformities that were left in code after review. Investigation quality status is essentially a long-term metric of a company’s or division’s accumulated inspection experience. Investigation quality status will generally portray attributes of assessment in long term when software quality and inspection of crude information clarify and characterize the status of specific examination. Information that is generally gathered during the programming measure is just needed to process the set of measurements. These measurements support evaluation and improvements of process alongside the quality of planning and tracking. Metric that is computed during such process should be clarified and characterized by necessities of association that is excessively in quality manual. Collection of information and computation of metrics for no reasons are basically considered as waste of time. Inspection quality status is from the most important part made out of two unique kinds of measurements. Average quality metrics: It can be characterized in terms of time used, number of inspectors that are participating, and number of defects that are being infected.

60

Ashish Agrawal, Anju Khandelwal, Shruti Agarwal

Maturity metrics: It describes the extent of inspection adoption. It is also a new metric. During inspection process, there are several metrics that can be calculated. Some of them are given below: 1. Total number of major and minor defects being found: It is usually found by a reviewer. 2. Number of major defects that are found to the total found: If its proportion of minor defects to major defects is much greater, then the moderator might request the reviewer to repeat the review and focus on major defects much prior to commencing logging meeting. 3. Size of the artifact: It can be pages, LOC, or other size measures. 4. Rate of review: It represents the size of reviewed artifact that is divided by time, that is, expressed in hours. For example, 14 pages/h. 5. Defect detection rate: It represents the total number of major defects that are found per review hour.

2 Error density a. Number of defects: The total number of defects that are found is the amount of complete number of defects that are found by each analyst, and the less number of deformities found are normal. For example, for two reviewers, metric is computed as follows: Total defect found = A + B – C A and B are the number found by reviewers A and B, and C is the number found by both A and B. b. Defect density: It is the absolute ratio of the total number of defects found to the size of artifacts. In simple words, it is defined as the total number of confirmed defects that are recognized in software during the time of development, divided by the size of software. It simply helps us to decide whether or not the piece of software is ready to be released. It also allows us to compare the relative number of defects in various software components that further help in finding candidates for additional testing or inspection. It is expressed as follows: Defect density = total defect found/size Size is measured in the number of pages, LOC, or other size measures.

Use of fuzzy logic approach for software quality evaluation

61

3 Test project, test cases, and quality characteristics a. RTF: RTF shows how many numbers of features are transported in each iteration. It is the capacity to proceed the shipping quality software product to the client. The truth at metrics is tried to ensure some level of quality. But unfortunately, a great RTF estimation is as if it were as good as your final iteration. b. Functional tests: This test checks that the feature or the group of the metrics meet client’s prerequisites and to meet the goals such as execution. Normally functional tests are the duty of the customer but it can be written by the developers based on the requirements of the clients. c. Builds per iteration: This metric defines the number of iterations that was done by the researcher. When a build is done successfully, it guarantees that there are no defects remaining within the construct and all modules are working correctly together as a single unit. So, it guarantees the quality of the features implemented within the build. d. Maturity metrics: It describes the extent of inspection adoption. It is also a new metric. During inspection process, there are several metrics that can be calculated. Some of them are given below: 1. Total number of major and minor defects being found: It is normally found by the reviewer. 2. Number of major defects that are found to the total found: Its extent to minor errors to major errors is much greater; at that point, mediator may demand to an analyst to rehash the survey and focus on significant errors much before initiating logging meeting. 3. Size of the artifact: This metric can be including pages, LOC, or any other sizes which the researcher wants to measure. 4. Rate of review: It represents the size of reviewed artifact that is divided by time, that is, expressed in hours. Example, 14 pages/h. 5. Defect detection rate: It represents the total number of major defects that are found per review hour. Figure 1 represents the fuzzy inference system (FIS) of the first metric, that is, inspection rate. We take four major inputs to calculate the output value of our first metric through which we calculate the software quality. To calculate the output value of inspection rate, we have to create some fuzzy if–then rules. The researcher can create any number of fuzzy rules. For example, if the value of major defect is very high, artifact size is very high, review rate is very high, and the defect rate is very high, then the inspection rate is very high. Based on the value of these input variables, the researcher finds the value of an output variable.

62

Ashish Agrawal, Anju Khandelwal, Shruti Agarwal

Figure 1: FIS of inspection rate.

Figure 2: Membership editor of defect detection rate.

Figure 2 shows the membership function editor of first metric of software quality assessment. In this, the researcher uses the Gaussian membership function to evaluate the maximum number of defects per hour. This figure shows all the details about the metric defect rate if the researcher wants to change the range, then they

Use of fuzzy logic approach for software quality evaluation

63

easily update the value either by clicking on the curve or by writing the value in the range column. The researcher can also define the range of the defects, that is, a maximum of how many defects can be accepted per hour. The researcher uses five attributes for this metric, and the names of those attributes are VH (very high), H (high), A (average), L (low), and VL (very low). This membership editor helps to edit the values or range of any metric and also displays the maximum value of any metric of software quality assessment. If we click on the line (membership function), it helps to change its attributes such as name, type, and numerical parameter. If the researcher wants to change its shape, then the researcher drags the curve to move it or to change its shape. This metric evaluates by the inspection team because after the inspection anyone can calculate the defect rate per hour.

Figure 3: FIS editor for evaluating error density.

Figure 3 explains the second metric of software quality assessment which evaluates the error density. This metric includes two submetrics. Based on these two submetrics, the researcher can evaluate the error density of the software and the names of the two submetrics are the number of defects and defect density. This FIS helps to change the name of the metric by simply clicking the metric and changing the name of the metric in the name column. This figure also represents the number of input variables and output variables. The researcher can easily understand that for this metric we have two input variables and one output variable. Figure 4 represents the membership function for second metric, that is, error density. If the researcher wants to change the name of the attributes, then simply click on the curve and change the name of that particular attribute, and also, they change the

64

Ashish Agrawal, Anju Khandelwal, Shruti Agarwal

Figure 4: Membership function editor for error density.

range according to their need. This membership function editor also helps to change the range of any attributes. To evaluate, we take five membership function plots, that is, VH, H, A, L, and VL. It depends on the researcher how many membership function plots they want to take. The researcher can either increase or decrease the membership function plot to evaluate the error density. Also, we apply if–then fuzzy rules to evaluate the error density rate. For example, if the value number of defect is very high and defect density is also very high, then it means that the error density is also very high. Figure 5 represents the FIS of the third metric, that is, quality. We take three major inputs as you can see in Figure 7 to calculate the output value of our third metric through which we calculate the software quality. To calculate the output value of software quality, we have to create some fuzzy if–then rules. The researcher can create any number of fuzzy rules. For example, if the value of RTF is very high, the value of functional test is very high, and build per iteration is very high, then quality is very high. Based on the value of these input variables, the researcher finds the value of the output variable. This FIS is simply representing how many input variables and output variables are to be taken. Also, if the researcher wants to change the name of the variable, then he/she can simply click on the variable and change the name. Figure 6 represents the membership function for the third metric, that is, RTF. If the researcher wants to change the name of any attribute, then simply click on the curve and change the name of that particular attribute, and also they can change the range according to their need. This membership function editor also helps to change the range of any attributes. To evaluate we take five membership function plots, that is, VH, H, A, L, and VL. It depends on the researcher how many membership function plots they

Use of fuzzy logic approach for software quality evaluation

65

Figure 5: FIS editor for evaluating quality.

Figure 6: Membership function editor for running tested feature (RTF).

want to take. The researcher can either increase or decrease the membership function plot to evaluate the quality. Also, we apply if–then fuzzy rules to evaluate the error density rate. For example, if the value of RTF is very low, the value of functional test is also very low, and build per iteration is very low, then it means the quality is also very low.

66

Ashish Agrawal, Anju Khandelwal, Shruti Agarwal

Based on these output values of the above three metrics, the researcher evaluates the software quality assessment. The output values of these three metrics behave as an input to the final result through which the researcher can evaluate the software quality. After finding all the output values of all the metrics, again the researcher has to apply the fuzzy if–then rules to get the output values to improve the software quality assessment by using the Gaussian membership function.

4 Conclusion Software quality is a crucial factor that has to be analyzed properly whether any organization is dealing with traditional software development life cycle or modern one like Agile. Each life cycle has its own environment parameters, and different quality estimation factors can be used with different approaches. In this chapter, we had discussed about the fuzzy approach for quality estimation. In future, DevOps will be in use for software development. Although the working scenario of DevOps is different, but still these parameters and approaches can help in estimating the software quality.

References [1] [2]

[3]

[4] [5]

Oberscheven, F. M., 2014, Software Quality Assessment in an Agile Environment (PhD thesis). Nijmegen: Radboud University. Mishra, D., Balcioglu, E., Mishra, A., 2012, Measuring project and quality aspects in agile software development, Technics Technologies Education Management, 1–10 Retrieved from https://www.researchgate.net/publication/261613720 Kārkliņa, K., Pirta, R., 2018, Quality metrics in agile software development projects, Information Technology and Management Science, 21, 54–59, Retrieved from https://itmsjournals.rtu.lv, doi: 10.7250/itms-2018-0008https://itms-journals.rtu.lv. Ron, J., 2004, A Metric Leading to Agility, Available at http://xprogramming.com/xpmag/ jatRtsMetric Miguel, P. J., Mauricio, D., Rodríguez, G., November 2014, A review of software quality models for the evaluation of software products, International Journal of Software Engineering & Applications (IJSEA), 5(6).

Somya Goyal

Metaheuristics for empirical software measurements Abstract: Empirical software measurement is a dynamic research domain within software engineering. Measuring the software attributes during software development is crucial to ensure the successful delivery of a desired software product. It involves the measurement of multiple dynamic metrics based on software process and software product. The traditional approaches to measure the current software practices are not so appropriate due to the changing development process and increasing complexity. The changing environment requires techniques to make measurements that are adaptable to the changes. Machine learning (ML)-based techniques are prominently being deployed for measuring the software empirically. Past three decades witness the successful application of ML-based techniques for empirical measurements of software. ML is powerful enough to accurately measure the software attributes. But the synergy of ML algorithms and metaheuristic algorithm is apparently more promising in the domain of software measurement. Since 2001, metaheuristics are being fused with ML-based techniques to optimize the accuracy of the predictor based on ML. The estimation of software cost (or effort), estimation of development time (or schedule), prediction of software quality, and early detection of software anomalies are key tasks in the most popular research sector of entire field of software measurement. The vital role of metaheuristics along with ML in empirical software measurements is as follows. First, metaheuristics are used to optimize the hyperparameters of ML algorithms; second, to mitigate the problem of the curse of dimensionality of data; and third, to select the best ML model for the candidate problem. This study proposes a full-fledged description to the application of metaheuristic algorithms to empirical software measurements. The chapter begins with the introduction to the metaheuristic approach and its application to measure software empirically in conjunction with ML algorithms. Then, the state of the art of deployment of metaheuristics in the domain of software measurements is elaborated. Later, the study discusses the three major application areas of metaheuristics: hyperparameter tuning, dimensionality reduction, and model selection in detail. For each of these application areas, case studies are quoted for clearer insight into the techniques and their applications. The author proposes a novel metaheuristic technique in this chapter for feature selection. It brings more clarity for devising new methods. Then the chapter is concluded with special remarks on future scope and references made.

Somya Goyal, Department of Computer and Communication Engineering, Manipal University Jaipur, Jaipur; Department of Computer Science and Engineering, G J University of Sc & Tech, Hisar, e-mail: [email protected] https://doi.org/10.1515/9783110709247-005

68

Somya Goyal

Keywords: metaheuristics, machine learning (ML), software measurements, software defect prediction (SDP), software effort estimation (SEE), genetic algorithm (GA)

1 Introduction to the metaheuristic approach and its application to empirical software measurement The metaheuristics are methods of optimization which model the problem as a searchbased problem. For the problems in the field of empirical software measurements (SM), the exhaustive search of combinatorial solution space is next to impossible. Hence, metaheuristic algorithms play a vital role [9, 11, 26]. Most of the algorithms are inspired from nature, that is, from the biological behavior of birds or animals. That is why nature-inspired metaheuristics or bioinspired metaheuristics is a more appropriate term. The objective of application of metaheuristics is to improve the overall accuracy of a prediction technique and a better understanding of application domains of machine learning (ML). Traditionally, the optimal solutions were computed with an exhaustive search. Otherwise, the near-optimal solutions were the only alternate. The heuristic algorithms are inspired by the biological imprints [10] of natural beings and are proven to be strong and successful attempt toward global optimization (shown in Figure 1). The literature suggests [29] that the genetic algorithms (GA) were the foremost algorithms based on the natural evolution process for solving complex problems that exhibit nonlinearity. There are a wide range of bioinspired metaheuristics which are successfully deployed in SMs and have promising future prospects.

Figure 1: Metaheuristic algorithms.

Metaheuristics for empirical software measurements

69

1.1 Genetic algorithms (GA) The GA is based on Darwinian theory [23] of evolution and the principle “survival of the fittest.” Its working revolves around three operations: crossover, mutation, and selection. The solution is initialized randomly. Then new generation is created using crossover and mutation. The survival of fittest allows to optimize the optimality criteria.

1.2 Ant colony optimization (ACO) The ant colony optimization (ACO) is a swarming-based algorithm. It imitates the behavior of group of ants. As ants search for the shortest path to reach to the food from anywhere, then rest of the ants follow that shortest path only identified by few ants. It utilizes the local information available at neighboring ants, and the global best solution is attained [18].

1.3 Particle swarm optimization Particle swarm optimization (PSO) is based on the swarming features of birds and fish. PSO is an example of swarm intelligence (SI) [32]. PSO finds various applications in SMs [1]. The sync in flocking speed avoiding collision is key to PSO [21].

1.4 Bat algorithm Bat algorithm mimics the nature of bats to kill the prey [30]. It is based on the frequency of sound and echo time of sound to detect the distance of prey. In this way, optimization criterion is met.

1.5 Firefly algorithm Firefly algorithm (FA) is motivated by the light-flashing nature being exhibited by fireflies [2]. FA formulates nonlinear problems using the combination of decay of light absorption and light variation with distance.

70

Somya Goyal

1.6 Cuckoo search Cuckoo search (CS) mimics the brooding of cuckoo with their mates [28]. CS uses a combination of both local and global search capabilities, controlled by a switching probability. SM can be formulated as a learning problem [16]. The quality-wise SM is a classification problem as software defect prediction (SDP). The quantity-wise software is measured as a regression problem under the field of software effort estimation (SEE) [18]. The next section gives a highlight on the application of metaheuristic algorithms in the field of SMs.

2 The state of the art of deployment of metaheuristics in the domain of software measurements Since last 30 years, ML techniques are prominently being utilized in the field of SMs. The traditional techniques bring suboptimal results with nonlinear and multimodal problems, especially in situ when gradients are required. Metaheuristic algorithms are promising solutions to SMs empirically. These are gradient-free methods and deal perfectly with highly nonlinear SM problems. SM problems are formulated as search-based optimization problem. Table 1 highlights the literature work where SDP and SEE are formulated as search-based optimization problems and solved with metaheuristic algorithms. Table 1: State of the art of the practice. S. no. Year

Literature Technique

Dataset

Performance metrics

Software measurement done



 []

Rule based + kernel PROMISE, KC, AUC, K- means PC, CM, JM accuracy, clustering + DSMO precision PSO

Software fault prediction



 []

Genetic algorithm

Promise

Defect prediction



 []

ANNs + PSO/PCA/ GA

COCOMO, MMRE, NASA, Albrecht PRED(l)



 []

SVM based on CBA- JM, CM, PC Recall, centroid/BAT/Algo precision, F-measure

ROC, recall, precision

Software development effort estimation Software defect prediction

Metaheuristics for empirical software measurements

71

Table 1 (continued) S. no. Year

Literature Technique

Dataset

Performance metrics

Software measurement done



 []

Genetic evolution

Promise dataset

Accuracy

Software defect prediction



 []

Genetic evolution

Promise dataset

AUC, ROC

Software defect prediction



 []

Genetic evolution

UCI repository

AUC, accuracy

Software defect prediction



 []

PSO + GA

CM, JM

AUC, accuracy

Fault-prone module detection



 []

Genetic algorithm

Promise data

F measure, G-mean

Cross-project defect prediction



 []

Cuckoo algorithm

NASA corpus

MRE, MMRE

Software cost estimation



 []

Under sampling + cuckoo search + SVM

PROMISE

pd, pf, G-mean

Software defect prediction



 []

GA, PSO

ISBSG

MRE, MMRE

Software effort estimation

3 Case study This section brings the real-life implementation of metaheuristics to SM problems. It is observed that metaheuristics are applied in three aspects to SMs in association with ML models: (1) to tune hyperparameters of prediction models, (2) to reduce dimensions of high-dimensional problem by feature selection (FS), and (3) to select the appropriate ML model from a wide range of ML models. The SM problem is formulated as search-based optimization problem, and metaheuristics complement the ML methods to obtain global best solution. This section discusses all three aspects of application of metaheuristics as three case studies.

3.1 Hyperparameter tuning SM problem has prominently been formulated as a learning problem. Past three decades are evidence to it [14, 16]. The prediction models developed using ML methods have a large set of hyperparameters. SM problems, including SDP and SEE, are

72

Somya Goyal

solved with ML prediction models which require their hyperparameters to be set once, and then the same values of parameters are used for training and testing of models. The values of hyperparameters are required to be selected from a large range of values. The selection of values of hyperparameters to achieve the best accuracy of prediction model is formulated as a search problem. It is also known as hyperparameter tuning. It is to be done effectively to attain best prediction power with minimal error. Initially, it was used to be done manually. With the increasing size and complexity of software, manual methods are not suitable and hence automation using metaheuristic algorithms is incorporated. The study [27] deployed GA and PSO algorithms customized to tune hyperparameters of their prediction model for SEE. They used International Software Benchmarking Standards Group (ISBSG) dataset for experimental work. Multiple predictors using multilayer perceptron and support vector regression have been developed to estimate the development effort. They customized the GA and PSO algorithms to select optimal values for the hyperparameters of estimation models. They computed the prediction error in terms of MRE, MMRE, and PRED(l) at l = 25 given as follows: jTrue value − estimated valuej True value n X 1 ðMMREÞ = MREi n i ðMRE Þ =

(1) (2)

where n is the number of samples ðPREDÞ =

n 1X f1 if MREi ≤ 25; else 0g n i

(3)

The comparative analysis is given in Figure 2. It is clear that tuning the hyperparameters improves the performance of estimation model and reduces the estimation error. Figure 2 shows the estimation error along the Y-axis and three variants of MLP predictor are shown along the x-axis. It is evident that without tuning the error is 0.008 and with tuning it is 0.002 using PSO and 0.003 using GA. This case study concludes that GA and PSO are simple and effective evolutionary algorithms to tune the hyperparameters of ML-based estimation models.

3.2 Dimensionality reduction SM is achieved empirically using ML prediction and estimation models. The redundant and insignificant features in high-dimensional dataset negatively impact the performance of prediction models [20, 25]. It is desirable to reduce the dimensions of dataset by selecting the most significant features only. Literature advocates that the SI-based algorithms are very popular to reduce the dimensionality of datasets [15, Goyal et al. 2019].

Metaheuristics for empirical software measurements

0.008 0.007

73

Using PSO Using GA Without Tuning

Values

0.006 0.005 0.004 0.003 0.002 0.001 0.000 alpha

Multi Layer Perceptron Hyperparameter values Figure 2: Comparative analysis of performance of MLP.

Anbu et al. (2017) formulated the FS as a search-based optimization problem. They proposed SDP model using support vector machine (SVM), naïve Bayes (NB), and k-nearest neighbor (KNN). NASA corpus was used for the experimental study. They deployed FA to select features. They evaluated the performance of models with and without FS. Evaluation criteria are accuracy, precision, and recall given as follows: x+y x+y+z+w

(4)

Recall =

w y+w

(5)

Precision =

w z+w

(6)

Accuracy =

where x is the true negative, y is the false negative, z is the false positive, and w is the true positive. Figure 3 shows the comparative analysis of performance of SDP classifiers over accuracy. The performance of all three SDP classifiers (SVM, KNN, and NB) is compared with their variants SVM-FF-FS, KNN-FF-FS, and NB-FF-FS (selecting features (FS) using FA) over accuracy. It is evident that the accuracy of SDP model is raised by applying the FS using FA for all three classification algorithms. It is inferred that SI algorithms are effective to reduce dimensionality of SM problem and to improve the performance of predictor or estimator efficiently.

74

Somya Goyal

Figure 3: Comparative analysis of performance of SDP.

3.3 ML model selection Empirical SM has successfully modeled as a learning problem and solved using ML and deep learning (DL). The range of ML and DL algorithms for classification and regression is very high. The application of different techniques on the same problem yields different results. Hence, it becomes very important to decide the best suitable model for a specific problem. In this scenario, the selection of the most appropriate model for particular SM problem is modeled as a search problem which is solved efficiently using metaheuristics. The automated selection and combination of techniques using metaheuristics improves the overall accuracy of predictors. Murillo-Morera et al. [24] proposed the GA-based framework to select the best prediction model for SEE. They proposed 600 SEE predictors using combination of 8 preprocessing techniques, 5 FS techniques, and 15 learning schemes. GA-based selector was proposed to select the predictor with best performance among 600 proposed models. The evaluation criteria are MRE, MMRE, and PRED(l) given as (1), (2), and (3). They computed the experimental results for ISBSG datasets. They statistically proven that GA-based search is better than exhaustive search techniques. In this section, all three aspects in which metaheuristics are being applied for empirical SMs are demonstrated with the help of suitable real-life case studies.

4 A novel proposed metaheuristic technique In this section, the author proposes FAFS-customized FA for FS. The defect dataset is preprocessed, and then it is fed to the artificial neural network (ANN)-based classifier. The choice of classifier is made from the popularity and wide acceptance for SDP

Metaheuristics for empirical software measurements

75

from the literature surveys. The data for conducting experiments are taken from PROMISE corpus. The performance of model is measured in terms of AUC, ROC, and accuracy. The NASA Promise dataset is used for the experimental work [13, 14]. Initially, the defect dataset is divided into 80/20 partitions. The training set is preprocessed, and then it is fed to the ANN-based classifier. The well-known evaluation metrics, namely, ROC, AUC, and accuracy [13] are taken for performance evaluation. Figure 4 shows the detailed proposed model of the work.

Figure 4: Working of FAFS algorithm.

4.1 Comparative analysis of performance A comparative analysis is made with GA, PSO, and BA. The AUC values for all stateof-the-art methods and proposed FAFS model are recorded as Table 2. The accuracy records are given under Table 3. As per the records, the proposed FAFS-ANN method shows best results among the competent metaheuristic algorithms.

76

Somya Goyal

Table 2: Comparison over AUC. Dataset

GA []

PSO []

BA []

Proposed FAFS

CM

.

.

.

.

JM

.

.

.

.

KC

.

.

.

.

KC

.

.

.

.

PC

.

.

.

.

GA []

PSO []

BA []

Proposed FAFS

CM

.

.

.

.

JM

.

.

.

.

KC

.

.

.

.

KC

.

.

.

.

PC

.

.

.

.

Table 3: Comparison over accuracy. Dataset

It can be inferred from the reported results that the novel FAFS has better potential for FS in SFP than the existing metaheuristic methods. The box plots for visual comparison are drawn over AUC and accuracy measure. It is shown in Figures 5 and 6, respectively.

4.2 Statistical analysis of the experimental study It is essential to statistically validate the comparison made between the performance of the proposed model and traditional models. Friedman’s test is used for statistical proof [19]. It is assumed that “The performance reported by proposed FAsFS-ANN and the performance reported by other traditional classifiers are not different.” Figure 7 shows the result of the test at 95% confidence level, which is very small (