Collaborative Knowledge Management Through Product Lifecycle: A Computational Perspective 9811996253, 9789811996252

This book not only presents the state-of-the-art research on knowledge modelling, knowledge retrieval and knowledge reus

509 77 8MB

English Pages 294 [295] Year 2023

Report DMCA / Copyright


Polecaj historie

Collaborative Knowledge Management Through Product Lifecycle: A Computational Perspective
 9811996253, 9789811996252

Table of contents :
1 Knowledge Management Through Product Lifecycle
1.1 The Role of Knowledge Management Through Product Lifecycle
1.1.1 Product Design Knowledge
1.1.2 Product Manufacturing Knowledge
1.1.3 Production Scheduling Knowledge
1.1.4 Assembly Knowledge
1.1.5 Diagnosis and Maintenance Knowledge
1.2 The Essence of Knowledge and Its Relationships with Data and Information
1.3 The Forms and Characteristics of Different Kinds of Knowledge
1.3.1 Knowledge from Mechanism, Experience and Data
1.3.2 Tacit and Explicit Knowledge
1.3.3 Product and Process Knowledge
1.4 The Variety of Knowledge Needs in the Modern Context
1.5 Knowledge Management Methodologies
1.5.1 Knowledge Capture
1.5.2 Knowledge Representation
1.5.3 Knowledge Retrieval
1.5.4 Knowledge Reuse
1.6 Discussion
2 The Collaborative Knowledge Management Paradigm
2.1 The Need for Collaborative Knowledge Management
2.1.1 Collaboration Between Multidisciplinary Engineers
2.1.2 Collaboration of Lifecycle Activities
2.1.3 Collaboration of Multi-source Knowledge Resources
2.2 Collaborative Knowledge Management Architecture and the Key Enabling Technologies
2.2.1 Integrated and Collaborative Knowledge Management Architecture
2.2.2 Enabling Technology for IKCM—Platform Interoperability
2.2.3 Enabling Technology for IKCM—Data Interoperability
2.2.4 Enabling Technology for IKCM—Integrated Knowledge Representation Model
2.2.5 Enabling Technology for IKCM—Intelligent Knowledge Service
2.3 Collaborative Knowledge Representation Model
2.3.1 Construction of the Designer Network
2.3.2 Construction of the Product Network
2.3.3 Construction of the Issues Network
2.3.4 Construction of the Knowledge Resource Network
2.3.5 Application
2.4 Collaborative Knowledge Management in Intelligent Manufacturing
2.5 Discussion
3 Representation and Modeling of Knowledge for Collaborative Knowledge Management
3.1 Background
3.2 Knowledge Capture in the Collaborative Working Environment
3.2.1 Knowledge Mining from Text Using Deep Learning Based Methods
3.2.2 Knowledge Capture from CAD Models Based on the Formal Description
3.2.3 Capture of Process Knowledge Based on the RFBSE Model
3.3 Collaborative Design Knowledge Modeling
3.3.1 Multi-level Knowledge Representation Based on the BOM Language
3.3.2 Project-Process-Activity (P2A) Knowledge Model
3.3.3 RFBSE Knowledge Representation Model
3.3.4 The C-RFBS Model
3.4 Discussion
4 Collaborative Design Knowledge Retrieval
4.1 Overview
4.2 Knowledge Retrieval Based on Keyword
4.2.1 Introduction to Information Retrieval
4.2.2 Extracting Terms from Knowledge Models
4.2.3 Quantifying Similarities and Ranking Retrieved Results
4.2.4 Implementation of a Keyword-Based Retrieval System
4.3 Retrieval of Structured Design Knowledge
4.3.1 Using the Feature Information of Nodes
4.3.2 Returning Nodes Group as the Results
4.3.3 Using Complex Queries
4.3.4 Implementation and Evaluation of the Retrieval System with the Structured Information
4.4 Towards Semantic Retrieval of Knowledge Model
4.4.1 Extracting Concepts Categories Information
4.4.2 Extracting Context Information
4.4.3 Utilisation of Semantic Information
4.4.4 Implementation and Evaluation
4.5 Discussion
5 Collaborative Design Knowledge Reuse
5.1 Overview
5.2 Collaborative Design Knowledge Retrieval
5.2.1 Product Design Knowledge Retrieval
5.2.2 Event Knowledge Retrieval
5.3 Knowledge Recommendation
5.4 Collaborative Knowledge Reasoning
5.4.1 Case-Based Reasoning
5.4.2 Ontology-Based Reasoning
5.4.3 Collaborative Reasoning of Design Knowledge Based on the Bayesian Approach
5.5 Knowledge-Assisted Decision Making
5.5.1 Knowledge Reasoning Based on the Context
5.5.2 The Traditional Decision Support System and Its Limitation
5.5.3 Knowledge-Based Decision Support System
5.5.4 Knowledge Reuse in the Collaborative Design Process
5.5.5 Case Study of Knowledge Reuse in the Assembly Process
6 The Merging of Knowledge Management and New Information Technologies
6.1 Big Data Technology
6.1.1 Overview of Big Data
6.1.2 Knowledge Management in Big Data
6.1.3 Case Study
6.2 Internet of Things (IoT) Technology
6.2.1 Overview of IoT
6.2.2 Knowledge Management in IoT
6.2.3 Case Study
6.3 Digital Twins
6.3.1 Overview of Digital Twins
6.3.2 Knowledge Management in Digital Twins
6.3.3 Applications
6.4 Cyber Physical Systems (CPS)
6.4.1 Overview of Cyber-Physical Systems
6.4.2 Knowledge Management in Cyber-Physical Systems
6.4.3 Applications
6.5 Digital Factory
6.5.1 Industrial Robots
6.5.2 Intelligent Assembly of Industrial Robots

Citation preview

Hongwei Wang Gongzhuang Peng

Collaborative Knowledge Management Through Product Lifecycle A Computational Perspective

Collaborative Knowledge Management Through Product Lifecycle

Hongwei Wang · Gongzhuang Peng

Collaborative Knowledge Management Through Product Lifecycle A Computational Perspective

Hongwei Wang ZJU-UIUC Joint Institute Zhejiang University Haining, Zhejiang, China

Gongzhuang Peng National Engineering Research Center for Advanced Rolling and Intelligent Manufacturing University of Science and Technology Beijing Beijing, China

ISBN 978-981-19-9625-2 ISBN 978-981-19-9626-9 (eBook) © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd. The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore


Over the past three decades, the manufacturing industry has developed very fast. On the one hand, the complexity of modern products has been increasing in terms of their structure, function, and technological requirements. These products, from the small ones such as smart phones and integrated circuits to the big ones like electric cars and high-performance aero-engines, have brought dramatic change to people’s life. On the other hand, the severe competition across the global supply chain has raised demanding requirements for manufacturing enterprises not only to reduce cost and lead time but also to improve their capabilities of reacting to challenges through the whole product lifecycle. The diversified application of information technology (IT), without doubt, is a major element in the innovation of the manufacturing industry. Since 1980s, there have been great efforts, from both industry and academia, on developing effective and efficient IT systems under the umbrella of computer-integrated manufacturing to improve the management of data, technologies, and processes. Recently, as the development of artificial intelligence (AI), there has been great interest in intelligent manufacturing and the global intelligent manufacturing network has been a new paradigm. The addressing of issues relative to the integrated and collaborative nature of modern product development as well as the sustainable and intelligent requirements has become a new trend of innovation in manufacturing. Specifically, the integrated nature emphasizes the integration of multidisciplinary resources in a distributed environment. The collaborative perspective concerns the collaborative work of team members of multiple roles as well as the adaptation of models and the interoperability between systems and platforms. The sustainability perspective involves achieving the balance between the efficiency of manufacturing processes and the constraints imposed on these processes in terms of sustainable concern. Last but not the least, the intelligent perspective focuses on the development of intelligent methods to effectively utilize information and knowledge in the product design and manufacturing process so as to achieve swift and resilient response market needs. In this context, we need to reconsider the development of effective IT and intelligent methods to meet the new challenges mentioned above. In particular, there is a




great need of effective support for the management of product design and development knowledge through the lifecycle. The previous methods for this support have some drawbacks: (1) focusing on data without enough attention to model; (2) valuing information but missing context; (3) valuing process but overlooking users; and (4) focusing on query but overlooking interaction. This book precisely aims to provide some insights into these drawbacks. Different from the previous ones on enterprise knowledge management (KM) which are more focused on operation and innovation management, this book has a specific focus on the collaborative nature of KM from a particular computational perspective, i.e., addressing the interface between IT and KM. The main reason for this unique perspective is that, as the development of state-of-the-art technologies such as big data, industrial Internet of things (IoT) and cloud computing, a huge amount of data has been generated. In this sense, the effective support of KM throughout the product design and manufacturing process should heavily rely upon this data, and thus there is an urgent need of research on the computing power and algorithms useful for this purpose. Additionally, discovery of knowledge from both the product data and the complex process, whereby this data is generated, is also of highly importance for the next-generation KM systems. As such, this book precisely covers case studies from the design and manufacturing process of complex products and focuses on the computational issues such as knowledge modeling, knowledge storage and retrieval, and knowledge reasoning and reuse. The authors believe that this book is relevant to a wide range of readers such as the students, researchers, and practitioners of intelligent manufacturing, advanced KM systems, enterprise intelligent computing, and industrial applications of AI. Moreover, this topic of this book is highly interdisciplinary, and as such, it is suitable for faculty and students from diversified disciplines such as computer and electronic engineering, mechanical and manufacturing engineering, automatic control, and management science and engineering. This book covers topics like knowledge modeling and reasoning—since knowledge is a key concept in education, the undergraduates with interest on these topics can also refer to the methods detailed in this book. The authors would like to thank the help from the previous and current students— they are Dr. Hao Qin, Mr. Yufei Zhang, Mr. Zixuan Wang, Mr. Haibo Wang, Miss. Qi Li, and Miss. Mengxuan Li. The first author appreciates the help and support from his former colleagues at both the University of Cambridge and the University of Portsmouth. The authors are really grateful to the editors and officers from Springer Nature who have been both so professional in handling the publication and so patient while the publication process was delayed for various reasons. The authors also would like to thank their families for their generous love, understanding, and support, without which, it would not have been possible to complete this big work. This is the first time that the authors make an effort to put their research work into a book, and as such, there must be shortcomings and limitations. We hope that the readers would understand and help us improve, and the authors will try our best to improve the work



in the future. Every exiting thing has a start—we wish that this publication be a great start for more interesting and useful work on collaborative knowledge management. Haining, China Beijing, China

Hongwei Wang Gongzhuang Peng

Acknowledgements This research is supported by the National Key R&D Program of China under the grant number 2020YFB1707803, and the National Natural Science Foundation of China under the grant number 61903031.


1 Knowledge Management Through Product Lifecycle . . . . . . . . . . . . . . 1.1 The Role of Knowledge Management Through Product Lifecycle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1.1 Product Design Knowledge . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1.2 Product Manufacturing Knowledge . . . . . . . . . . . . . . . . . . . . . 1.1.3 Production Scheduling Knowledge . . . . . . . . . . . . . . . . . . . . . 1.1.4 Assembly Knowledge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1.5 Diagnosis and Maintenance Knowledge . . . . . . . . . . . . . . . . . 1.2 The Essence of Knowledge and Its Relationships with Data and Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 The Forms and Characteristics of Different Kinds of Knowledge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3.1 Knowledge from Mechanism, Experience and Data . . . . . . . 1.3.2 Tacit and Explicit Knowledge . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3.3 Product and Process Knowledge . . . . . . . . . . . . . . . . . . . . . . . . 1.4 The Variety of Knowledge Needs in the Modern Context . . . . . . . . . 1.5 Knowledge Management Methodologies . . . . . . . . . . . . . . . . . . . . . . . 1.5.1 Knowledge Capture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.5.2 Knowledge Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.5.3 Knowledge Retrieval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.5.4 Knowledge Reuse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 The Collaborative Knowledge Management Paradigm . . . . . . . . . . . . . 2.1 The Need for Collaborative Knowledge Management . . . . . . . . . . . . 2.1.1 Collaboration Between Multidisciplinary Engineers . . . . . . . 2.1.2 Collaboration of Lifecycle Activities . . . . . . . . . . . . . . . . . . . . 2.1.3 Collaboration of Multi-source Knowledge Resources . . . . . . 2.2 Collaborative Knowledge Management Architecture and the Key Enabling Technologies . . . . . . . . . . . . . . . . . . . . . . . . . . .

1 3 3 8 9 10 12 13 16 17 18 19 23 24 27 30 32 32 35 36 39 39 40 41 43 46




2.2.1 Integrated and Collaborative Knowledge Management Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.2 Enabling Technology for IKCM—Platform Interoperability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.3 Enabling Technology for IKCM—Data Interoperability . . . 2.2.4 Enabling Technology for IKCM—Integrated Knowledge Representation Model . . . . . . . . . . . . . . . . . . . . . . 2.2.5 Enabling Technology for IKCM—Intelligent Knowledge Service . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Collaborative Knowledge Representation Model . . . . . . . . . . . . . . . . 2.3.1 Construction of the Designer Network . . . . . . . . . . . . . . . . . . 2.3.2 Construction of the Product Network . . . . . . . . . . . . . . . . . . . 2.3.3 Construction of the Issues Network . . . . . . . . . . . . . . . . . . . . . 2.3.4 Construction of the Knowledge Resource Network . . . . . . . . 2.3.5 Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Collaborative Knowledge Management in Intelligent Manufacturing 2.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

47 48 49 50 50 50 51 52 53 54 55 58 62 63

3 Representation and Modeling of Knowledge for Collaborative Knowledge Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 3.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 3.2 Knowledge Capture in the Collaborative Working Environment . . . 66 3.2.1 Knowledge Mining from Text Using Deep Learning Based Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 3.2.2 Knowledge Capture from CAD Models Based on the Formal Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 3.2.3 Capture of Process Knowledge Based on the RFBSE Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 3.3 Collaborative Design Knowledge Modeling . . . . . . . . . . . . . . . . . . . . 84 3.3.1 Multi-level Knowledge Representation Based on the BOM Language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 3.3.2 Project-Process-Activity (P2A) Knowledge Model . . . . . . . . 91 3.3.3 RFBSE Knowledge Representation Model . . . . . . . . . . . . . . . 98 3.3.4 The C-RFBS Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 3.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 4 Collaborative Design Knowledge Retrieval . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Knowledge Retrieval Based on Keyword . . . . . . . . . . . . . . . . . . . . . . . 4.2.1 Introduction to Information Retrieval . . . . . . . . . . . . . . . . . . . 4.2.2 Extracting Terms from Knowledge Models . . . . . . . . . . . . . . 4.2.3 Quantifying Similarities and Ranking Retrieved Results . . . 4.2.4 Implementation of a Keyword-Based Retrieval System . . . . 4.3 Retrieval of Structured Design Knowledge . . . . . . . . . . . . . . . . . . . . .

123 123 124 124 126 129 131 137



4.3.1 4.3.2 4.3.3 4.3.4

Using the Feature Information of Nodes . . . . . . . . . . . . . . . . . Returning Nodes Group as the Results . . . . . . . . . . . . . . . . . . Using Complex Queries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Implementation and Evaluation of the Retrieval System with the Structured Information . . . . . . . . . . . . . . . . . 4.4 Towards Semantic Retrieval of Knowledge Model . . . . . . . . . . . . . . . 4.4.1 Extracting Concepts Categories Information . . . . . . . . . . . . . 4.4.2 Extracting Context Information . . . . . . . . . . . . . . . . . . . . . . . . 4.4.3 Utilisation of Semantic Information . . . . . . . . . . . . . . . . . . . . . 4.4.4 Implementation and Evaluation . . . . . . . . . . . . . . . . . . . . . . . . 4.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

140 140 154

5 Collaborative Design Knowledge Reuse . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Collaborative Design Knowledge Retrieval . . . . . . . . . . . . . . . . . . . . . 5.2.1 Product Design Knowledge Retrieval . . . . . . . . . . . . . . . . . . . 5.2.2 Event Knowledge Retrieval . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3 Knowledge Recommendation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4 Collaborative Knowledge Reasoning . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4.1 Case-Based Reasoning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4.2 Ontology-Based Reasoning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4.3 Collaborative Reasoning of Design Knowledge Based on the Bayesian Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5 Knowledge-Assisted Decision Making . . . . . . . . . . . . . . . . . . . . . . . . 5.5.1 Knowledge Reasoning Based on the Context . . . . . . . . . . . . . 5.5.2 The Traditional Decision Support System and Its Limitation 5.5.3 Knowledge-Based Decision Support System . . . . . . . . . . . . . 5.5.4 Knowledge Reuse in the Collaborative Design Process . . . . 5.5.5 Case Study of Knowledge Reuse in the Assembly Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

187 187 188 189 191 195 197 197 202

6 The Merging of Knowledge Management and New Information Technologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1 Big Data Technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1.1 Overview of Big Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1.2 Knowledge Management in Big Data . . . . . . . . . . . . . . . . . . . 6.1.3 Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Internet of Things (IoT) Technology . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.1 Overview of IoT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.2 Knowledge Management in IoT . . . . . . . . . . . . . . . . . . . . . . . . 6.2.3 Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3 Digital Twins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3.1 Overview of Digital Twins . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3.2 Knowledge Management in Digital Twins . . . . . . . . . . . . . . .

157 164 165 167 173 176 183 184

208 212 212 215 216 218 220 226 229 230 230 232 235 240 240 243 246 251 251 253



6.3.3 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4 Cyber Physical Systems (CPS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4.1 Overview of Cyber-Physical Systems . . . . . . . . . . . . . . . . . . . 6.4.2 Knowledge Management in Cyber-Physical Systems . . . . . . 6.4.3 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.5 Digital Factory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.5.1 Industrial Robots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.5.2 Intelligent Assembly of Industrial Robots . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

254 265 265 267 270 273 273 276 281



2 Dimension 3 Dimension World Wide Web Automated Guided Vehicle Artificial Intelligence Application Programming Interface Augmented Reality Big Data Analytics-Knowledge Management Bidirectional Encoder Representations from Transformers Business Intelligence Bill of Material Computer-Aided Design Computer-Aided Engineering Computer-Aided Manufacturing Computer-Aided Three-dimensional Interactive A Computer-Aided X Case-Based Reasoning Concurrent Engineering Cyber-Physical Production System Cyber-Physical System The Requirement-Function-Behavior-Structure Model Based on the Cognitive Process Theory Customer Relationship Management Computer-Based Supporting Tools Collaboration Technologies and Systems Accessing Relational Databases as Virtual RDF Graphs Data Acquisition Database Management System Diagnosis Fault Management Design for X Design Rationale xiii




Design Rationale editor Decision Support Distributed Semantic Network Decision Support System Digital Twin Digital Twin Manufacturing Cell Digital Twin System Digital Video Disc Engineering Design Center Executive Information System Embedding from Language Models Enterprise Resource Planning Extracted, Transformed, Loaded The Function-Behavior-Structure False Negative False Positive The Functional Requirement Garbage In, Garbage Out Geographic Information System Generalized Markup Language Global Positioning System Graphical User Interface High-Level Architecture HyperText Markup Language Hypertext Transfer Protocol Infrastructure-to-Infrastructure Issue-Based Information System Integrated and Collaborative Knowledge Management Information and Communications Technology Infrared Device The Initial Graphics Exchange Specification Internet of Things Information Retrieval Information Technology Knowledge-Based Decision Support System Knowledge-based Engineering Knowledge and Experience Management Knowledge Graph Knowledge Management Knowledge Management System Knowledge Question & Answer Knowledge Representation Language Knowledge, Skills, Abilities, and Competencies Local Area Network Multiple Agents System




Manufacturing Execution System Media Object Server Multi-Order Semantic Analysis Mixed Reality National Aeronautics and Space Administration Named Entity Recognition Natural Language Processing Ontology Inference Layer Ontology Web Language Description Logic Ontology Web Language Ontology Web Language for Services Project-Process-Activity Product Data Management Physical-to-Digital-to-Physical Process Designer and Process Simulate Product Design Specification Programmable Logic Controller Product Lifecycle Management Part-of-Speech Pro/ENGINEER Quality Function Deployment Research and Development Resource Description Framework Representational State Transfer The Requirement-Function-Behavior-Structure-Evolution Radio Frequency Identification Remaining Useful Life Software as a Service The Structure-Behavior-Function Socialization, Externalization, Combination, Internalization Search on Lucene Replication SPARQL Protocol and RDF Query Language Structured Query Language Standard for the Exchange of Product Model Data Semantic Web Rule Language Term Frequency–Inverse Document Frequency True Negative True Positive Unigraphics Uniform Resource Identifier Uniform Resource Locator Vehicle-to-Infrastructure Virtual Reality Virtual Reality Modeling Language Work in Process


Word2Vec XML


Word To Vector Extensible Markup Language

Chapter 1

Knowledge Management Through Product Lifecycle

In the modern context, product lifecycle spans a wide range of stages, including design, manufacture and shipping, through to maintenance, recycling and disposal. Amongst these, engineering design is a systematic process whereby market needs are transformed into detailed information that allows the phyisical realization of a product. This process broadly includes the stages for specification of design, generation and evaluation of concepts, embodiment of the chosen concepts, production of detailed information. As well as a physical product, the outcome of this process can also be in the form of a system that fulfils some logical function (e.g. network video recording) or a service that is provided to customers through value-added expertise and solutions (e.g. maintenance and overhaul service). Apart from the elements of process and product in modern product design, there exist another important element, namely people. Here, people can specifically refer to the users, the designers and other stakeholders of this systematic process. In this context, the interactions between these three elements need to be taken into account to better project-manage design whilst designers need to deal with a vast amount of “cradle-to-grave” data to make informed decisions on material selection, production methods as well as on the assessment of environmental impacts. As such, this raises the need of developing a knowledge management (KM) through lifecycle scheme that can supply in-context knowledge to support designers’ decision making. This scheme aims to go beyond the traditional KM research by focusing on people (instead of only on process), model (instead of only on data) and computation (instead of only on organization). The demands from industry, along with the development of computer capabilities, contribute to the significant advances of Computer Support Tools (CSTs). Tools for CAD/CAE/CAM/PDM have been developed over the past few decades, and have been widely applied in modern product development. The trends of research in CSTs include the integrated use of various tools for effective management of a whole product lifecycle, and the development of tools to meet the needs of designers that are not well addressed by currently available tools. Some researchers have studied the requirements for future design support tools, and their ideas have much in common. © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. Wang and G. Peng, Collaborative Knowledge Management Through Product Lifecycle,



1 Knowledge Management Through Product Lifecycle

The functions identified for these tools include underpinning the evolution of design development, setting up a creative design environment, KM, collaborative design, and supporting design synthesis. Amongst these topics, KM is the one very helpful in giving enterprises a competitive advantage, and has attracted much attention [1]. KM is a popular and broad subject with a research spectrum encompassing a variety of disciplines, e.g. management science, social science, cognitive science, informatics and engineering sciences. A number of questions need to be answered before knowledge can be well managed. First, what can be viewed as knowledge and what distinguishes knowledge from data and information? Second, can different categories of knowledge be identified based on their different characteristics and perspectives? Third, is there a generic process of managing knowledge? Last but not least, starting from the point that knowledge is viewed as intellectual property, can we and how should we apply AI to addressing KM issues? This section of the book is precisely aimed at finding answers to these questions, starting with understanding the differences between knowledge, information and data. After that, different views on knowledge will be discussed to understand the context in which KM activities are undertaken. Theories and practices on KM will then be introduced and the role of AI in KM will be discussed at last. Engineering design is a complex process in which designers iteratively develop and optimise solutions for technical problems by applying their scientific knowledge. The industrial scene continues to change due to economic factors, new legislation and technological breakthroughs, resulting in ever-increasing pressures on designers and organizations. The effectiveness and efficiency of carrying out design tasks depends upon designers’ capabilities of processing information about the function, behavior, and form of technical products. Therefore, there is a great need for a program of fundamental design research to understand, and ultimately improve, the design process. This can be achived by developing methods and tools that can support engineering designers in tasks such as processing information and acquiring knowledge. The idea of expert systems is to search through a knowledge base that includes exhaustive rules, perform reasoning on the rules and give advice to users—this is similar to how a doctor works to treat diseases [2]. Although the development of computer-integrated manufacturing has resulted in the rapid application of automatic systems in industrial production, these systems still reply upon human knowledge in complex analysis, precise judgment, and innovative decision-making [3]. Due to the increasingly fierce market competition in a global economic environment, as well as the individualization and diversification of user requirements for products, the space for product innovation has been greatly expanded, and the complexity of products has been increasing. It involves more and more technical and scientific fields, but the development cycle has been forced to be reduced to an unprecedented extent. The pressure of competition has stimulated enterprises to pursue new ideas and new technologies more strongly, and a need arises from this pursuit is the access to more abundant intellectual resources. This has initiated a development trend that manufacturing companies must follow a route towards knowledge-based enterprises.

1.1 The Role of Knowledge Management Through Product Lifecycle


1.1 The Role of Knowledge Management Through Product Lifecycle Knowledge-based work plays a central role in the operation of modern industrial enterprises. Designing, planning, scheduling, management, and operation in industrial production, for example, are all such knowledge-based work. To complete these tasks, various factors across multiple levels of production and operation must be considered in an holistic manner. Take operation optimization in the process industry as an example, since it is very hard to establish accurate mathematical models, the selection and setting of operating parameters for process optimization and control mainly rely on experienced engineers to conduct in-situ control commands. This requires engineers to perform the following knowledge-intensive work, namely, anlaysis of process mechanism, judgement of working conditions, comprehensive calculation of energy efficiency, and execution of operational decisions. At the planning and scheduling level, it is necessary to consider the production factors of humans, machines, materials, and energy, as well as their temporal and spatial distribution and correlation. At the management level, the decision-making process requires taking account of the internal production capacity, the external market environment and relevant laws, regulations, policies and standards. As such, knowledge-based work holds the key to making successful decisions through the whole product lifecycle, which is highlighted in Fig. 1.1. To achieve competitive advantages, enterprises also need to invest in the development of advanced tools for knowledge capturing, storage, retrieval and reuse. These tools not only help attain and protect intellectual property but also faciliate knowledge work. As the outbreak of AI technologies such as natural language processing (NLP) and knowledge discovery, the knowledge records accumulated can also speed up the development of KM. For example, the idea of knowledge work automation has been proposed as one of the promosing ideas in AI applications. This indicates that AI applications in industry can go beyond business intelligence and empower downstream activities such as operation and service. The knowledge through product lifecycle can be divided into the following categories.

1.1.1 Product Design Knowledge Engineering design is a knowledge-intensive process and designers need a lot of informational support throughout this process. Previous studies have shown that engineers spend nearly 60% of their working time engaged in all types of information-related activities. These activities include using software packages to process information and facilitate knowledge-based engineering analysis as well as sharing knowledge with colleagues to improve decision making. In this sense, design knowledge takes the form of both tangible objects that can be edited, copied, transferred and programmed


1 Knowledge Management Through Product Lifecycle

Fig. 1.1 Knowledge management through product lifecycle

and precious experience that can only be learnt through a community of expertise, i.e. the codification view and the personalisation view of design knowledge [4]. A particular type of system which provides supports for designers during the designing process is called the design support system. Mulet and Vidal examined the functions that a knowledge-based design support system might undertake: (1) supporting and visualizing the evolution of the design idea; (2) seting up a creativitysupporting environment; (3) knowledge management; (4) supporting collaborative work. The design support system can help with the specific operations in design processes, as well as the transition between these operations. A simplified flow chart is adapted from the classic design process model by Pahl et al. and presented in Fig. 1.2, to show the different categories of knowledge support that a design support system should provide throughout the design process.

1.1 The Role of Knowledge Management Through Product Lifecycle Fig. 1.2 Flow chart of a product development process


Requirements specification Fulfilled by Function Decomposed as Sub-functions Giving feedback

Realised by

3D shape models Used by

Used by

Used by Manufacturing

2D pictures


Evaluated by Use and recycling

Giving feedback

There are in total seven entities in the flow chart, namely requirements, functions, sub-functions, 3D models, 2D drawing pictures, analysis methods and manufacturing considerations. Four types of transitions are identified to describe the relationships between these entities. For instance, requirements are fulfilled by functions while functions are decomposed as sub-functions. Sub-functions are then realized by 3D shape models which can be used by downstream entities such as 2D drawing pictures, analysis methods, manufacturing considerations, and even the use and recycling of a manufactured product. Moreover, a particular relationship between entities is defined as ‘giving feedback’, as found in many places in the figure. The different categories of knowledge support are summarized as follows. 1. Requirements Requirements specification is the starting point of a design process. In some cases, requirements are treated as values for parameters of the product model rather than as a separate knowledge source. Nevertheless, the importance of requirements can be recognized in terms of a set of constraints influencing the product structure throughout the design process [5]. In fact, requirements can be both concrete and straightforward, e.g. “increase thrust of an existing aero-engine by 20%”. The research work reviewed here, however, mainly deals with more abstract and complex requirements. Three ‘structures’ were identified, namely requirements, functions and systems, to find the link between requirements and downstream design activities. They argued that requirements can be divided into primary requirements (defined by customer) and derived requirements. The former is customer-oriented while the latter represents


1 Knowledge Management Through Product Lifecycle

technical specifications intended to lead to the fulfilment of the primary requirements. Two approaches need to be mentioned in terms of establishing the relationship between customer needs and design solutions, namely Quality Function Deployment (QFD) and axiomatic design. QFD helps transform customer needs into design quality and deploy the functions forming quality and the methods achieving quality into subsystems of a product. Axiomatic design is a method devoted to the application of fundamental principles that make designs good. These approaches also serve the purpose of improving design quality. 2. Functions and sub-functions Literally, function represents the purposes of a design solution, e.g. ‘keeping the water temperature in a swimming pool at 30 °C. According to the classic design theory of Pahl and Beitz [6], a design prototype can be characterised by its form, function and behaviour. Function links requirements and form and is thus very important for the identification and evaluation of design solutions. A definition of function was cited by Gzara et al. [7] as “an action of the product or one of its parts expressed in terms of finality”. They further pointed out that a function could be either an external function or an internal function. The former, describing what the product does to satisfy a user need, expresses an action delivered by the product to the environment. The latter describes the behaviour of product components in terms of how they contribute to realisation of external functions. Supports of knowledge about function can be found in three areas: (1) the effective mapping from requirements to functions; (2) the division of a function into sub-functions and the query of design cases by function; (3) the generation of design solutions. The first area focuses on the relationship between requirements, functions and solution elements, e.g. the QFD-related research reviewed in the previous section. The second area tries to identify the key factors of function decomposition. Functions and sub-functions were also indexed to support the query of design cases by functions (and sub-functions). The third area is addressed most by researchers as it is specifically related to design knowledge. The fundamental principle of functional modelling is to support computer-aided embodiment design where the design space is bounded by specific constraints, e.g. design constraints and kinematics constraints. In this context, design can be viewed as a constraint-driven process where constraints concerning a design problem need to be specified and satisfied. 3. 3D shapes and 2D drawing pictures 3D shape models and 2D drawing pictures are essential parts of a development process. Any good concept will ultimately be represented and detailed in this form. The idea of supporting knowledge for a design process with 3D shapes and 2D drawing pictures comes from the assumption that designers can benefit from reusable parts retrieved from archived digital documents [8]. The research in this area mainly deals with how to represent the geometrical and topological information of CAD models, and how to measure the similarity between models based on this information. The generation of “signature” was accomplished via feature recognition from

1.1 The Role of Knowledge Management Through Product Lifecycle


CAD models, and the feature data recognised were translated into the attributes in the graphs. These graphs were then hashed and inserted into the knowledge base, and were ultimately retrieved based on graph matching. Some work used a graph representation to perform graph-based matching. CAD models were exported as data files in the Virtual Reality Modelling Language (VRML) format and these files were imported by the system to attach annotations. Moreover, research work on the bidirectional transformation between 3D shape and 2D drawing can also be found to relieve the designers from manual work. The transformation process (from 2 to 3D, or from 3 to 2D), to some extent, involves some knowledge about a design solution. With the development of machine learning, it has become possible to represent and regonize intricate features and thus can support 2D/3D transformation. However, the research work in this area mainly focuses on the transformation of geometric information. Little attention has been paid to the role of knowledge in this process as well as how the knowledge involved in such as process can be captured and reused. 4. Analysis methods Design analysis is often used to investigate the performance of a design under specific operating conditions. Experienced designers may have a set of methods for performing design analysis. However, manual calculation is very inefficient in cases requiring a large amount of iterations and frequent changes in testing conditions. Modeling and Simulation tools can provide feedback on how design proposals would behave if built, potentially avoiding costly design errors. However, a major concern is the lack of confidence in the information that the simulation is dependent upon the assumptions made during the simulation processes. It is arguable that analysis routines can be viewed as the embodiment of specific knowledge. For instance, a stress calculation routine, developed by an experienced designer, can be utilized by a novice with little knowledge of how to perform the calculation. In this context, this routine actually enables reusing the knowledge underlying the code. The proposal of performing composable simulation or model management is to select and interface a simulation model (or several models) based on specific simulation requirements. Analysis methods can be generated and integrated with other product information. 5. Manufacturing considerations The practice of taking manufacturing into account at an early stage in design is advocated by the Concurrent Engineering (CE) paradigm. CE allows an integrated development team to use various inputs, knowledge and technologies to speed up development by integrating downstream concerns as early as possible in the design process, and by performing simultaneously many activities that used to be performed subsequently. Specifically, manufacturing considerations in the early design stages involve questions such as whether particular design concepts can be ultimately manufactured, and if they can, whether the cost involved is acceptable. The literature found in this area is mostly about DFM, e.g. manufacturability analysis, design solutions generation by taking into account manufacturing factors. 6. Use and recycling


1 Knowledge Management Through Product Lifecycle

A good design process should look further than the release of working products to the customer; activities after manufacturing can provide important inputs to the design stage. Although these inputs have little influence on currently-used products, they are valuable for a redesign process or the development of a new product. Research on gathering and analyzing feedback from the use, maintenance, and even recycling of products is also undertaken. For example, Customer Relationship Management (CRM) is a scheme that utilizes customer feedbacks to maintain a sustainable relationship. Jagtap and Johnson [9] studied how to incorporate knowledge about the performance of existing products throughout their service life, into the design phase of new products. Their approach is to understand the flows of in-service information to designers, to identify the in-service information requirements of designers, and to develop methods to support designers with appropriate in-service information.

1.1.2 Product Manufacturing Knowledge Product manufacturing refers to the process in which raw materials go through multiple processes and become finished products under different technological processes and production parameters. After years of development, modern enterprises have already possessed a good foundation of information technology, and have generated a large amount of production and processing data such as models, plans, reports, etc. The records and results produced by these production processes contain a wealth of practical experience, processing rules, and implementation methods. However, due to their multi-faceted nature, retricted timing and complex semantic relations, the existing workshop managements model cannot be effectively processed. And the organization of production process data, which in turn leads to a large amount of manufacturing experience and knowledge that cannot play its due value. Therefore, how to transform the knowledge that exists in individuals and scattered in different stages of production into public, organized, disseminated and shared and reused knowledge is an important means to improve production efficiency [10]. In the process of product manufacturing, parameter optimization is an indispensable part. For example, the content of Carbon must be strictly controlled in steel production, which determines the function and value of steel products. However, the content of Carbon is determined by various parameters, such as oxygen content, water, furnace dumping angle, or the temperature of each process stage. Technicians are required to strictly control the entire production process, otherwise it will have a huge impact on the yield strength of the product. Traditional parameter control is determined by the operator based on experience. The use of manufacturing knowledge to monitor the parameters in real time to help managers make optimal parameter decisions is also the core of the optimization of the production process. On the other hand, knowledge-based quality analysis is also an important link in the product manufacturing process. It comprehensively considers the data recorded in the production process to achieve product quality prediction; at the same time, when quality problems occur, each link in the process can be traced to find out the

1.1 The Role of Knowledge Management Through Product Lifecycle


reasons, so as to improve the process quality. Quality analysis can be conducted based on knowledge graphs. Through knowledge graphs, the various process elements and process dependencies in the product production process are modeled, and the variables and quality data of each process are recorded. The quality of product can be achieved through knowledge matching and reasoning so as to enable better forecast, analysis and traceability.

1.1.3 Production Scheduling Knowledge Production scheduling is the buffer connecting the production planning and manufacturing execution, and holds the key to manufacturing efficiency. Scheduling refers to the process of arranging the production sequence of workpieces and allocating corresponding production resources to optimize the objective functions. Typical objectives are manufacturing cost and makespan under particular constraints, which includes process sequence, product delivery date, equipment processing capacity, etc. The massive production data contains a wealth of scheduling knowledge. After mining, valuable rules can be drawn, which are helpful to the management and decisionmaking optimization in the production scheduling. The feasibility and efficiency of scheduling solutions can thus be improved [11]. As shown in Fig. 1.3, the planning and scheduling process of a steel plant include the steps as follows. First, the order requirements of different customers are collected from the ERP system, and the optimal order combination is then selected according to various constraints such as the type of order and production capacity. Then the contract is allocated according to the degree of matching between the order and the production line. Each sub-process plant schedules production according to specific contracts, such as steelmaking plans, rolling plans, and auxiliary process plans. In the process of plan execution, materials need to be scheduled according to production requirements, such as the scheduling of ladle in the steelmaking workshop, the scheduling of slabs in the hot rolling workshop, and the scheduling of steel coils in the cold rolling workshop. In these plans and scheduling, most of the current steel mills rely on experienced operators to manually operate, and only part of the production plan has a corresponding model to make the schedule. As the main logistics equipment in slab yards, cranes are responsible for performing slabs’ inbound, outbound and relocation operations. Relocation refers to the operation of moving a slab from one stack to another, which is caused by inappropriate slab storage location assignments. In conventional steel enterprises, crane scheduling and storage location assignment decisions are made mainly by production managers based on their experience. This has obvious drawbacks. On the one hand, due to the unreasonable assignment of cranes, a large number of no-load trips are generated. On the other hand, extensive relocation operations caused by inappropriate allocations of slab piles not only increase the workload of logistics equipment, but also bring the accumulation of raw materials, resulting in high inventory cost. Statistics data shows that the internal logistics cost of steel companies account for


1 Knowledge Management Through Product Lifecycle

Customer demands

Allocation of production line

Customer order analysis

Experiential Knowledge

Steel-making plan

Hot-rolling plan

Auxiliary process plan Mathematic al model

Ladle scheduling

Slab scheduling

Coils scheduling

Fig. 1.3 Planning and scheduling in the iron&steel factory

more than 30% of the total product cost. This has been a common problem faced by I&S enterprises and many research efforts have been devoted to the logistics optimization in slab yards. Expert knowledge systems are commonly used to deal with these scheduling problems, which capture the knowledge of scheduling experts to form knowledge bases. Compared with traditional scheduling methods, expert knowledge systems have the advantages of combining qualitative and quantitative knowledge, processing complex information relationships, speed and efficiency, etc. However, current expert knowledge systems are mostly based on experts’ experience, and often have defects such as strong subjectivity, reliance on partial attributes, conflicts of multi-source knowledge, and delay in finding knowledge. Therefore, finding an efficient and agile job shop scheduling knowledge acquisition method has important theoretical significance and practicality.

1.1.4 Assembly Knowledge Product assembly is one of the most important steps in the mechanical product lifecycle, which accounts for 30–50% of the total production time and more than 30% of the total production cost. As shown in Fig. 1.4, an assembly process of complex products involves the step of sequence planning, assembly resources allocating and path planning. Each of the above steps requires the collaboration and cooperation of distributed heterogeneous manufacturing resources such as manipulators, Automated Guided Vehicles (AGVs), various tooling and fixtures. Most of these activities are

1.1 The Role of Knowledge Management Through Product Lifecycle

Model Library

Resource Library Knowledge Library

Digital Modeling

Algorithm Library Assembly Equipment

Intelligent Planning Virtual Prototyping

Process Planning

Assembly Process Resource Scheduel

Assembly Environment


Demand Generation Digital Production Line

Simulation Control

Environmental Sensitivity

Assembly Robot

Flexible Assembly

Feedback Grating ruler

Laser sensing

Vision sensing

Measurment Equipment

Force sensor

Position sensor

Feedback Equipment

Fig. 1.4 The complex product assembly process

knowledge-intensive and highly depend upon personal experience accumulated from the previous projects over a long period of time. Thus, how to plan the assembly process intelligently according to the dynamic product and environment information has become an urgent problem to be addressed [12]. Discovering and reusing engineering knowledge from the large amount of data accumulated in complex industrial processes can be of great help in supporting and accelerating assembly-related decision makings. With the iteration and upgrade of manufacturing products, complex product structures and processes have brought greater difficulty to product assembly. A large amount of assembly historical data is also accumulated and stored at the same time. However, this assembly data, due to its multi-faceted feature and huge amount, lacks an effective organization form. It is also difficult to provide easy-to-use knowledge acquisition services for assembly engineers. This makes the production process inefficient and affects the competitiveness of enterprises. On the other hand, the design of modern industrial assembly is mostly conducted with the help of computer-aided technology. The design process depends on the professional knowledge and experience of designers. The reuse of assembly knowledge can provide auxiliary recommendations and decision-making for the current assembly design, and all of these can help achieve improved design efficiency.


1 Knowledge Management Through Product Lifecycle

1.1.5 Diagnosis and Maintenance Knowledge Maintenance and overhaul is critical to the operaiton of complex equipment such as high-precision machining tools. Amongst the maitenance and overhaul tasks, fault diagnosis is a typical and challenging one involving multiple disciplines and multiple departments. Many techniques have been developed to support fault diagnosis according to the operation status of equipment such as condition monitoring, knowledge modeling, diagnostic reasoning, and auxiliary decision-making. The ultimate goal of fault diagnosis is to formulate a reasonable and effective equipment maintenance strategy. The diagnosis conclusion can guide the effective implementation of the maintenance strategy in the production process, so as to reduce the time and cost of equipment maintenance. The diagnosis and maintenance tasks mentioned above are highly knowledge-intensive, and require the support or participation of diagnosis experts throughout the entire process from data collection to maintenance decision-making. The complexity of mechanical faults makes it difficult to automate fault identification and processing, which often requires human intervention by fault diagnosis experts and maintenance technicians [13]. The main reason is that there is a large amount of unstructured fault data and diagnosis experience, and it is difficult to standardize and streamline the fault diagnosis reasoning process. Equipment maintenance decisions are made based on a large amount of information such as equipment structural characteristics, production scheduling arrangements, spare parts inventory status, and related maintenance experience in the past [14]. If the timeliness of obtaining and querying this information cannot be ensured, it may lead to uninformed decision-making. As shown in Fig. 1.5, operating mode recognition links the fault characteristics with the operating status of the equipment so as to determine the current operating status. Failure mode recognition is the task to determine the failure mode and severity based on the characteristics of the failure and identify the root cause on this basis. After that, the fault diagnosis and probabilistic reasoning module will acquire empirical knowledge of experts from the knowledge base. Online diagnosis, reasoning and evaluation of the current state of the system are then conducted based on data collected from multiple groups of sensors. Maintenance decision-making integrates information such as external constraints, task requirements and cost factors to support decision-making in the preparation and execution of maintenance tasks. The fault prediction and decision support module adopts data-driven diagnosis and prediction technology, exploits fault knowledge and historic cases to predict the Remaining Useful Life (RUL) of components and systems, and integrates expert evaluation opinions and fault reasoning conclusions to suggest effective maintenance methods. With the rapid development of industrial automation and Internet of Thing (IoT) technologies, the dependence on manual operation has gradually decreased. Fault diagnosis has raised more demanding requirements on accuracy, adaptability and response time of online analysis. The diagnosis and maintenance of complex industrial production equipment involves many constraints. Knowledge and experience in the diagnosis and maintenance process have the characteristics of heterogeneous

1.2 The Essence of Knowledge and Its Relationships with Data and Information

Operating mode recognition Process parameters

mechanical products


Fault prediction Knowledge reasoning

Failure mode recognition

Case Library

State variable

Fault diagnosis and probabilistic reasoning

Maintenance decision-making Knowledge base

Maintenance and Scheduling Fig. 1.5 Knowledge-driven diagnosis and maintenance process

and multi-sources. The forms of expression and composition methods are numerous. Equipment maintenance and fault diagnosis involve the entire process from equipment manufacturing, commissioning to scrapping, and key maintenance elements are hidden in the massive information flow in the production environment. In addition, maintenance and diagnosis knowledge comes from condition monitoring data and the empirical thinking of maintenance technicians. Its heterogeneous and multisource knowledge structure has great uncertainty, and a systematic and comprehensive knowledge management method is needed to realize the expression and reasoning of diagnosis and maintenance knowledge.

1.2 The Essence of Knowledge and Its Relationships with Data and Information A characteristic that distinguishes engineering designers from other professionals is that they have the capability of applying technical knowledge, making decisions, and adopting courses of action, to solve engineering design problems. This capability can also be termed ‘design knowledge’. The collective knowledge of individual designers determines the creativity, productivity, and competitiveness of an organization. Thus, managing knowledge is very important for organizations to compete in the market place. Other motivations for KM are to provide a trail for product liability legislation and to retain knowledge and experience as engineering designers retire. KM research in engineering design is a relatively new topic. Some researchers do not distinguish between data, information and knowledge, and claim their work to be knowledge-based only if useful information is recorded for further usage either by human users or computers. However, some others do make a distinction between these terms to identify the real requirements for computer systems for different occasions. The identified distinction between these terms is


1 Knowledge Management Through Product Lifecycle Knowledge reuse and reasoning

Decision making

Knowledge updating

Knowledge models


Informal information

Formal information


Knowledge K capture and ca representation rep

Data Processing


Model data

ta Algorithmic data

Pictorial data

Symbolic data

Linguistic data












Decision tables




Fig. 1.6 Relationship between data, information, knowledge

helpful in the development of CSTs especially for the current industrial scene in which a huge amount of information is generated during the course of the design process. The distinguishing method adopted in this book tends to have a computer engineering focus as it aims to provide insight for the development of CSTs. As shown in Fig. 1.6, the term ‘data’ is used to represent the measurable storage of information in a certain format and on a certain medium, e.g. binary files of two gigabytes size stored in a computer. Information is embodied as the facts with contents described using a set of data with context. Knowledge, as a much higher level of intelligence generally possessed only by human beings, is the capability to understand some given information, to correlate it with related information, and ultimately to generate new knowledge to explain a fact or solve a problem. Knowledge engineering is critical in academic and industry communities. In the context of engineering design, knowledge management specifically aims to reuse useful knowledge in new design tasks and this reuse is realized through transferring knowledge in the form of information. As such, the terms ‘knowledge’ and ‘information’ are often used interchangeably while ‘knowledge’ is particularly used to emphasize reusing knowledge. A differentiation of the terminologies can help researchers identify the particular focus of research to achieve an appropriate scope of definition. Specifically, data refers to raw data in the form of numbers, words, symbols, etc. to describe basic facts, which can be created, copied, edited and deleted. Information usually takes the form of structured data, which is more tangible than knowledge [2]. The terms ‘information’ and ‘information management’ are often used in the context of knowledge and knowledge management as information is a necessary medium or material for eliciting and constructing knowledge. Information in itself does not necessarily embody knowledge which is more about beliefs and commitment and is usually associated with actions and particular business processes.

1.2 The Essence of Knowledge and Its Relationships with Data and Information


An example can be used to illustrate the relationship between data, information, and knowledge. Nowadays, it is useful to analyse the in-service data (the amount of which is huge) of aero-engines, to make suggestions for the design process. This data (e.g. the temperature profile of a key component during a flight) is generally stored in hard disks in a certain format, and can be incorporated to form a piece of information (e.g. the component’s temperature kept rising and exceeded a limit set). An engineer with experience of designing aero engines then starts, using his/her knowledge, to interpret this information (e.g. over-heating took place on the component during the flight), to correlate it with related information (e.g. a similar problem happened to the same component of another engine last year), and to generate new knowledge (e.g. the reason for last year’s over-heating problem was the use of an inappropriate material, and the present problem can be very possibly attributed to the same reason). If an engineering design process can be modeled as an information processing activity, then it is important to provide relevant and useful information to designers effectively and efficiently. Data is usually considered to be textual, either numeric or alphabetical. From the engineering design point of view, data is considered to be structured and to represent a measure such as quantity. Data is easily differentiated from information and knowledge as we can view information and knowledge as structured sets of data with rich contents. There is little research highlighting data in engineering design, with PDM an exception. PDM specifically deals with electronic data such as artefact models and documents that are generated during a design process. The distinction between data and information can be made in terms of context. After decades of development, steel industries have generally formed a mature five-level information system, i.e., the equipment control system (L1), the process control system (L2), the workshop level manufacturing execution system (MES) (L3), enterprise resource planning (ERP) system (L4) and the inter-enterprise management decision support system (L5). In the production and operation processes, the five-level systems continuously generate data, which differ greatly in data protocols, data types, and real-time performance [15]. There are many types of data collected from the production line, including numerical data such as common process parameters, textual data such as work logs, CAD data such as drawings, pictorial data such as surveillance videos, surface defects, and simulation data such as finite element simulation, equation data such as mechanism models and empirical formulas. This heterogeneous data is processed and transformed into useful information. Typical processing methods include the traceability and matching of data between different sub-processes, the temporal and spatial transformation of data in different dimensions, the synchronization of data at different sampling frequencies, and the fusion of data between different characteristics. Experts systematically refine, research and analyze the accumulated information, and combine it with the existing human knowledge system. This part of valuable information is eventually transformed into knowledge. In the steel production process, as shown in Fig. 1.7, knowledge exists in the form of different themes, such as knowledge related to quality control, knowledge related to fault diagnosis, and knowledge related to production scheduling.


1 Knowledge Management Through Product Lifecycle Planning and scheduling

Fault diagnosis

Quality control

Knowledge theme

Protocol Conversion

Space and time transformation

Data-caching mechanism

Data matching

Data Synchronism

Surface image data adaption

Data Fusion

Data track

Information domain

Data type Segregation Equation

Numerical data

Textual data

Pictorial data

CAD data


Simulation data



Data source

Setting models Process parameters

Formula data

Quality report

Surface detector


Order specification


Decision rules



Heating Furnace

Production line

Roughing mill

Finishing mill group Laminar cooling

Coiling machine

Fig. 1.7 Data, information, knowledge in I&S production

The difference between knowledge and information can be identified on the basis of the definitions and comparisons discussed above—that is, the former represents a competent notion while the latter represents tangible objects. So we can draw the conclusion that knowledge about a problem can be generated after relevant information has been well understood by people. The generated knowledge strongly depends upon what information is provided to people, and what knowledge they already have. Therefore, the provision of correct and straightforward information with explicit context is important to achieve effective KM.

1.3 The Forms and Characteristics of Different Kinds of Knowledge The above definition of knowledge is still too abstract to be used to guide the development of KM tools unless it can be classified and given more concrete functions. Generally, there are two well-accepted schemes for classifying knowledge. The first is based on the contents of the knowledge, whereas the second has a focus on the property of knowledge. In the first scheme, knowledge about artefacts (product knowledge)

1.3 The Forms and Characteristics of Different Kinds of Knowledge


and knowledge about a problem-solving process (process knowledge) are distinguished [16]. In the second scheme, knowledge is divided into explicit knowledge and implicit knowledge (and also tacit knowledge), in terms of the extent to which a piece of knowledge can be articulated. The distinction between explicit knowledge and implicit knowledge also reflects the codification and personalisation views on KM. These two views pertain to the two prevalent approaches: Knowledge-Based Engineering (KBE) and Knowledge and Experience Management (KEM). KBE and KEM, though having different focuses, can be applied to manage both product and process knowledge.

1.3.1 Knowledge from Mechanism, Experience and Data From the perspective of manufacturing systems, there are three types of knowledge sources: mechanism knowledge reflecting the essential laws of the production process; empirical knowledge reflecting people’s cognition and understanding of the production process; data knowledge implicitly reflecting individualized and real-time production conditions in massive data, as shown in Fig. 1.8. The mechanism knowledge reflects the essential laws of the industrial production process. Taking steel production as an example, it is often a typical largescale material-energy conversion process involving a series of complex physical and chemical reactions. Process industry products and raw materials have a wide

Mechanism knowledge

Experience knowledge Data-driven knowledge

Segregation Equation

Macro Simulation

Constitutive Equations

Mesoscale Simulation

Recrystallization Equation

Microscopic Simulation

Quality Prediction

Parameter Optimization

Process Control

Knowledge from mechanism, experience and data Fig. 1.8 Knowledge from mechanism, experience and data in the rolling process


1 Knowledge Management Through Product Lifecycle

variety of types, and the production process is complex and diverse. The manufacturing processes, devices, and steps of different products are different, which involve different kinds of process mechanisms. The mechanism knowledge obtained based on the field experiments, laboratory experiments, theoretical derivation and other research work is explicit knowledge. Part of the mechanism knowledge has been formalized into mathematical forms such as formulas and equations, and is continuously updated with the in-depth study of mechanism. At the same time, mechanism knowledge still has unstructured characteristics and is difficult to formalize. Empirical knowledge reflects people’s cognition of the inherent relationship between operations and processes, and has the characteristics of concealment, nonquantification, and inconsistency. Industrial production and management usually require process experts or managers to make analysis and conduct decision-making judgments based on empirical knowledge. In most cases, the experience and knowledge of process experts or managers are personalized. The experience, skills, knowhow, and intuition of knowledge workers are all tacit, which needs to be transformed into explicit knowledge and verified. This is a process of knowledge innovation. Explicit experience knowledge can be formalized as a variety of representations models, such as expert rules and semantic networks. Data knowledge is the knowledge implicit in design and production data. Industrial production data has the characteristics of dynamic, relevance, and multi-scale. The dynamics of process data reflect the changes of working conditions, operations, and raw materials in the industrial production process. Data knowledge is implicit in these process production data. Compared with the data knowledge in the general sense, the data knowledge of the process industry is often related to the process mechanism and production process, and has a richer process semantics. With the wide application of big data and machine learning technology, mining knowledge from industrial data has attracted increasing attention.

1.3.2 Tacit and Explicit Knowledge Knowledge can be classified as tacit knowledge and explicit knowledge, according to its accessibility. Tacit knowledge corresponds to a designer’s intuition about a process while explicit knowledge can help to explain specific activities. Tacit knowledge and explicit knowledge can also be differentiated in the ways that they represent personal knowledge and organizational knowledge, which correspond to the codification perspective and personalisation perspective of KM for engineering design, respectively. It is advocated that tacit knowledge needs to be transformed into more tangible, shareable information resources. The successful elicitation of tacit knowledge needs to take into account the skills of the individuals involved. Knowledge is difficult to assimilate and has a personal aspect which demonstrate the key difference between knowledge (as a ‘competence notion’) and information (as tangible objects that can be managed). In this sense, the terms ‘tacit knowledge’ and ‘explicit knowledge’ have been proposed as a way of differentiating between personal

1.3 The Forms and Characteristics of Different Kinds of Knowledge


knowledge and that which has been codified as a company information resource. Tacit knowledge resides in a community’s know-how which can be market-based (in products), infrastructure-based (in systems), personal (concerning staff and competence of suppliers) or administrative (concerning workflow and processes). The term ‘knowledge model’ has been used to refer to an information representation scheme for facilitating codification. In this book, formal knowledge is used to describe engineering know-what and know-who embodied in codified information sources such as a 3D geometric model, a simulation model, a data-accessing source (e.g. material and manufacture data) or a computer routine (e.g. parameters optimization); and tacit knowledge refers to engineering know-how and know-why in relation to personal knowledge and experience (within a community) of understanding an issue, developing a problem-solving strategy, considering necessary constraints and options, and reasoning on possible decisions.

1.3.3 Product and Process Knowledge Based on the different features possessed by knowledge, knowledge can be classified such that the specific meanings of different pieces of knowledge can be referred and used for different application domains. There are two main categories of design knowledge models, namely design artefact and design process. The former describes different aspects of an artefact throughout its lifecycle, such as functional, behavioral and structural models, or causal relationships between these aspects. The latter represents knowledge models of the design process itself, which includes descriptive, prescriptive, and/or computational models. Most of the existing knowledge models classify design knowledge into two main categories, product knowledge and process knowledge [17]. The former specifically focuses on knowledge elements about the product itself and generally describes know-what information using pictorial and symbolic means, while the latter emphasizes know-how and know-why information during the problem-solving process. The Function-Behavior-Structure (FBS) model is one of the most popular product knowledge representation models, which essentially maps the function requirements into an artifact’s structure parameters by the transformation of system behavior. The FBS model derivatives are able to capture design cases and facilitate the creation of new products from the perspective of representing specific design solutions. However, they are less inadequate in representing how solutions are created and why they work as they are. Process knowledge models are thus developed to record the decision-making activities and the relevant context during the design process. Typical examples of process knowledge representation include Issue-Based Information Systems (IBIS) and Design Rationale editor (DRed) [18]. Both the product and process knowledge are essential in the development of new products due to the variety of designers’ knowledge requirements, which, as such, has raised the need for an integrated representation of design knowledge.


1 Knowledge Management Through Product Lifecycle

To verify the notion of context, Demian discussed two commonly-used classification schemes. This first classifies knowledge as declarative knowledge, procedural knowledge and causal knowledge, based on the aim of reusing knowledge. Declarative knowledge refers to knowing about the information relating to a certain artefact. Procedural knowledge represents knowing how to undertake some activities. Causal knowledge means knowing why decisions are made. The second classification aims to identify different types of knowledge about an artefact, namely knowledge about form, function, and behaviour. The first is knowledge about its physical composition; the second is knowledge about what it should do; and the third is knowledge about what it actually does, or how well it performs. In summary, two main methods of classifying knowledge can be found in current research, based on where it resides (organisational knowledge or personal knowledge) and what is the aim of the knowledge (artefact knowledge for knowing what, process knowledge for knowing how, rationale for knowing why). It is noteworthy that the classification is not rigorous and there exists some intersection between these classifications. The classification of knowledge offers some specific meanings of knowledge for different application domains and paves the way for the effective KM. • Product knowledge Product knowledge covers all kinds of product-related information and knowledge involved in the whole life cycle. Figure 1.9 shows the representative types of product knowledge, including requirements, functions, costs, structure, behavior, and personnel information. Specifically, product requirements include customer information, market information, performance requirements, and locations. Product requirements can be obtained through user interviews, questionnaire surveys, and competitive product analysis through products of the same type in the similar industry, and the design can be adjusted from the perspective of product market positioning, core functions, and user feedback. The function information of the product includes product features, uesful life, reliability, safety, etc. Product cost refers to the cost of each stage from design, production, assembly to maintenance. The structure of the product mainly includes geometric information such as shape and topology, and assembly relationships such as parts and assemblies. It is worth mentioning that in addition to the low-level information represented by CAD drawings, the product structure also includes high-level information related to processes, manufacturing, assembly, etc. Behavior establishes a communication bridge between function and structure. It represents the method used by designers to infer structure, composed of behavior types, behavior variables, and qualitative behavior causality. Behavioral relationships may include qualitative logical relationships, organizational relationships, temporal relationships, and spatial relationships, as well as quantitative temporal and spatial relationships. The following example, as shown in Fig. 1.10, uses a two-stage reducer to illustrate the specific composition of product knowledge. The product demand is to design a reducer for an automatic feeding belt conveyor transmission device, which is an

1.3 The Forms and Characteristics of Different Kinds of Knowledge



Market Technical index




Assemblies CAD

Location Product knowledge

Features Functions



States Constraints Mapping relationships

Material cost Cost

Manufacturing cost


Planning Personnel

Fig. 1.9 Framework of the product-related knowledge

independent component composed of a gear transmission enclosed in a rigid shell, and is used between the power machine and the working machine. Before the design calculation, it is necessary to clarify the technical parameters of the reducer design and the design calculation task, which is an important part of the demand. In terms of function, the reducer parameters include output power, conveyor belt speed and useful life. In terms of behavior, it includes four parts: motor design, V-belt design, gear design and shaft design. According to the work requirements and conditions, select the type and power of the motor. Comprehensively consider the size and weight of the motor and the transmission device and the transmission ratio of the belt drive and reducer, determine the overall transmission plan, calculate the total transmission ratio and the distribution transmission ratio. Select the model,



Design a reducer for the transmission of an automatic feeding belt conveyor.


Functions Features Output Power (kw) Conveyor speed r/min Useful life y

Value 3.3 100 5

Fig. 1.10 Example of the product-related knowledge

Calculate kinetic parameters of motor

Design of V belt

Design and check of shaft

Selection of gear


1 Knowledge Management Through Product Lifecycle

diameter and number of V-belts according to the transmission ratio and power. Then, select the gear material and accuracy grade, and calculate the number of teeth and width coefficient of the gear according to the stress analysis. Finally, determine the diameter of each shaft segment according to the torque strength. Comprehensive analysis of functions and behaviors to determine the final structure and assembly relationship of the reducer. • Process knowledge Design Rational editor (DRed) is one of the most typical process-related knowledge model, as shown in Fig. 1.11. It is aimed at modelling the problem-solving process by identifying and devising the contents of rationale and thus offer a structure for the rationale to be captured. DRed is a derivative of the venerable Issue Based Information System (IBIS) concept, i.e. the rationale in DRed is modeled as the issues to be solved, the potential solutions to those issues, and the arguments for or against these solutions. There are mainly four parts of a project where DRed can be utilized to organize design knowledge from various sources, namely project management, problem formulation and understanding, solution generation and evaluation, and prototype manufacturing and development. Specifically, the rationale about project management is mainly concerned with how the design project should be organised and carried out (network diagram, work breakdown, stakeholder analysis, etc.). The rationale about problem understanding and formulation involves the analysis work on previous designs and the specific requirements for this design. Solution generation is about the design tasks and covers both the conceptual design stage and the detailed design stage. In prototype manufacturing and development, design rationale about the issues addressed in the manufacturing, assembly, and even service can be captured. As shown in Fig. 1.11, various files are created for different purposes (e.g. pictures of sketches, CAD models, and files for project management) and linked to DRed files via bi-directional links.

An issue solved A rejected answer A tunnel link An accepted answer

A pro argument

Fig. 1.11 Example of the process-related knowledge based on Dred

1.4 The Variety of Knowledge Needs in the Modern Context


Studying the nature of design is concerned with how design should be understood as a subject and how design research should be approached. It helps define the scope of design studies and gain insights which cannot easily be gained by other methods. Methods developed for descriptive models pave the way for further research on understanding designers’ behaviours and needs. Prescriptive models mostly depend upon researchers’ experience in the domain, and therefore make intuitive sense to many designers only if researchers understand the problem well. However, further work needs to be done to validate the methods for prescriptive models, since an implicit assumption of this research is that, if designers follow the prescribed process, better designs will result.

1.4 The Variety of Knowledge Needs in the Modern Context In engineering activities, knowledge is the basis for problem solving and innovation. In the new-generation intelligent manufacturing framework which is proposed by Zhou et al. [19], the cyber system not only has strong perception, calculation analysis and control capabilities, but also has the ability to learn, improve and generate knowledge. Among them, self-learning and cognition are the core of distinguishing the intelligent level of manufacturing, which includes creating, accumulating, utilizing, imparting, and inheriting manufacturing knowledge. Before conducting research on the acquisition, representation and reuse of knowledge involved in knowledge management, we need to conduct a detailed investigation on the engineers’ needs for knowledge and information throughout the product life cycle. Relevant scholars have summarized some knowledge needs items through questionnaires and online surveys of engineering personnel, which involved in product development and management including managers, software engineers, production engineers and service engineers [20]. Heisig et al. investigated the knowledge needs from two aspects through the following two questions [21]: “Q1. Describe the information and knowledge you would like to retrieve from previous products/services, as specifically as possible.” “Q2. Describe the information and knowledge you think should be captured to assist future engineering tasks, as specifically as possible.” Question Q1 mainly considers the demand for historical knowledge in current engineering activities, while question Q2 focuses on the support of knowledge formed by current activities for future decision-making. Based on Ahmed and Heisig’s research, we summarized 23 catogories of knowledge needs in the modern context, as shown in Table 1.1. The most common knowledge needs in engineering activities are know-what, know-who, know-why, knowhow and know-when information. Although a variety of information and models have been codified in the producy life cycle as the development of digitalization technology, engineers still have to spend much time in searching the interested files. In particular, needs about trade-offs, what (when) issues should be considered, and


1 Knowledge Management Through Product Lifecycle

why a design is carried out, fit well with the concept of capturing rationale. A few other needs such as terminology and how a part of a product works can also be fulfilled, as knowledge contains a lot of contextual and semantic information. The remaining needs, such as where to obtain information and how to carry out a specific calculation, can also potentially be fulfilled by the scheme. Even when they cannot be fulfilled by knowledge model on its own, other files linked to a piece of the model record can also assist in the fulfilment of information needs, given that the query somehow matches the record. An empirical study was done to give an account for the information-seeking behaviour of designers on the basis of the analysis of their information requests with different sources of answers, e.g. colleagues, databases, and drawings. It is shown in this study that there are mainly two kinds of requests, namely requests to acquire information and requests to process information; and designers tend to use emails or phone calls for the former whilst preferring using face-to-face discussion for the latter. An interpretation of this finding can be that the requests to process information generally involve further justification and explanation.

1.5 Knowledge Management Methodologies The manufacturing of complex products emphasizes the collaboration between enterprises and the sharing of resources within the society, including a series of collaborative processes and complex decision-making activities. It is necessary to effectively use the existing knowledge resources of various enterprises and integrate with the manufacturing process to guide the operation of the collaborative manufacturing process and improve the rapid decision-making ability of complex product manufacturing. As shown in Fig. 1.12, the knowledge management frameworks have some knowledge processing activities in common, although they actually have different focuses. The commonly-used activities are capturing (also called elicitation or acquisition), storing (also called formalizing, consolidation), mining (also called discovering), and reusing (also called dissemination or combination) [22]. Knowledge retrieval is surely a necessary part of the KM process although it is not included in those frameworks. One possible reason for the exclusion is that the researchers viewed knowledge retrieval and reuse as a whole. Although knowledge discovery through data mining is relevant to knowledge retrieval, it has an emphasis on automatically identifying the correlation between different pieces of knowledge for generating new knowledge rather than on searching for knowledge. In summary, it can be concluded that the main tasks involved in a KM process mainly include the capture, storage, retrieval, and reuse of knowledge. The ultimate goal of any knowledge management research is to re-use the knowledge effectively and efficiently, without imposing too much burden on designers.

1.5 Knowledge Management Methodologies


Table 1.1 Different kinds of knowledge needs in the modern context ID

Knowledge needs

Illustration to the needs



Rationale/why/reason for

Why the desicions or choices are made

Why use dynamic finite element analysis to determine the structural parameters in the structural design stage


Products/component /what It represents the level of detail What are the components of of engineering design a gear reducer


Codified information

Specific information in the CAD files of the products form of drawings, documents, numerical data



Various forms of models, such Dynamic simulation model of as finite element model, stress assembly structure using model, simulation model ADAMS


Problem description

Description and definition of Detailed description of the manufacturing problem or standard hardware object at different levels components and designs, including part numbers, and location/arrangement in cabinets


Engineering solutions

Solutions implemented to solve the relevant problems

Local strengthening can improve the fatigue strength of the material at the stress concentration


Engineering process

Details about the task in each step: the information provided during the manufacturing process; what is expected to be produced, etc.

The design of casing products includes structural design, mold design, material design, simulation, and experimental verification



Find people or by looking for people to seek for solutions

Who is best at doing parameter optimization algorithm calculations


Options and choices

Optional solutions, software, structure, personnel, etc.

You can choose python, matlab or java to simulate machine learning algorithms


Constraints and assumptions

The constraints that need to be considered in the specific problem solving process, and the assumptions that can be made

When optimizing the hot rolling plan, it is necessary to consider the constraints of the rolling schedule and rolling mill equipment


Difficulties, problems and issue

Common difficulties or problems in the manufacturing process

It is difficult to determine and calculate the boundary conditions in finite element simulation (continued)


1 Knowledge Management Through Product Lifecycle

Table 1.1 (continued) ID

Knowledge needs


Changes and modifications All changes or modifications to the original design/previous designs including change requests

Illustration to the needs

Example Changes in the transmission mode of different versions of gearboxes in a automobile company


Failures and faults

Failures or faults in engineering activities, including the design, manufacturing, assembly, maintenance and service

Due to the assembly tolerance constraints, the design scheme failed, and the cylindrical part needed to be redesigned



Success of solutions in solving the relevant problems

Through the change of the b-pillar material, the overall weight of the body-in-white is reduced by 5%



Experiences from the use of the product and its components including the ‘service history’ as well as comparison with ‘predicted behaviour’

Service history e.g. part lives achieved, conditions as part ages, reasons for rejection, mission profiles


‘Feedback and suggestions’

Feedback or specific suggestions from customers or downstream companies on the product

How to adjust the processing parameters if you need to further increase the yield strength


Typical value

Requesting typical values, as well as maximum and minimum values

The typical value of carbon content of Q235 steel grade, and the maximum and minimum interval



Queries regarding what a particular term meant

Text node is sometimes used in DRed graphs for explaining terms



Effects of one issue on another Trade-off of strength and weight in body-in-white design


How does it work

How a particular part of the product functioned

The functional analysis diagram can record functions of components or systems


When to consider issues

When issues should be considered

IBIS-based DR captures issues, answers, and arguments


How to calculate

The methods used by a designer to achieve a task

The DR about how the dimension of a component is calculated (continued)

1.5 Knowledge Management Methodologies


Table 1.1 (continued) ID

Knowledge needs

Illustration to the needs



Company process

The distribution of design work between departments; the relevant company procedures; information on relevant people; other aspects of company procedure fell into this category

Diagrams for “project management” mainly concern the design project and company procedures are seldom covered

Knowledge Service Manufacturing




Knowledge capture

Knowledge representation

Knowledge storage Rules

Knowledge Management Process Knowledge reasoning

Customer demands


Knowledge reuse



Design cases

Knowledge retrieval


Ontology models

Simulation Experiments

Knowledge models

Fig. 1.12 Framework of knowledge management

1.5.1 Knowledge Capture Complex product design teams will face different task requirements in various stages such as customer demand analysis, scheme design, overall design, structural design, and process design. These task requirements have generated corresponding knowledge requirements, such as the need to determine the main functional modules and technical principles in the overall design, and the system design and the mechanical, electrical, and hydraulic working principles of the corresponding modules; the structural design phase requires structural materials, knowledge of mechanical properties and reliability. However, in the design process of complex products, only part of the knowledge that can be used directly, there is still a part of knowledge that requires designers to use existing knowledge to create or improve, which involves the process of knowledge capture. The capture can be done in many ways, e.g. manual capture


1 Knowledge Management Through Product Lifecycle

as the design project proceeds, automatic capture during the design process, and extraction of knowledge from design components. In the modern manufacturing environment, a data warehouse contains a large amount of data, which covers stages from product and process design, material planning, quality control to scheduling, maintenance, and diagnosis. The acquisition of knowledge is based on these massive data. Related methods of data mining have become an important tool for knowledge capture. Data mining methods commonly used for knowledge capture include concept extraction, association relationship analysis, classification, prediction, clustering and evolution analysis. Figure 1.13 shows the main steps of knowledge capture in the manufacturing system. In the process of knowledge acquisition, the application domain and manufacturing tasks are analyzed through the learning of relevant prior knowledge. Engineerers collect raw data and focus on the set of variables that have impacts on manufacturing problems. The data preprocessing is then conducted, such as eliminating noise, replacing missing values, and data cleaning, which forms the data into a form suitable for mining. Based on the type of knowledge required, appropriate functions such as clustering or rule extraction are selected, and related data mining algorithms to find specific patterns are utilized. The unearthed knowledge is combined with the manufacturing problem to assist the decision-making process and further modify the knowledge content based on feedbacks. Manufacturing knowledge is finally stored and updated in the knowledge base. The following section describes the three commonly used methods of knowledge capture mentioned above. • Concept extraction In most manufacturing problems, it is necessary to extract the concept of the problem object, for example, in the fault diagnosis, the summary of the fault mode and the description of the fault influencing factors. The fault log often records irregular data about failure mechanisms, causes, and patterns, such as fuzzy descriptions of faults, incomplete data, name abbreviations, etc., which cannot be directly used as Manufacturing domain

Knowledge storage and updating

Data collecting

Decision making and feedback

Fig. 1.13 Process of knowledge capture

Data cleaning

Model application

Model building and validation

1.5 Knowledge Management Methodologies


engineering knowledge for fault analysis. In the process of product design, text data such as design documents, design cases, and design questions are often transformed into information that can be processed by a computer. Therefore, natural language processing (NLP) technology is increasingly used in combination with knowledge engineering to process unstructured data, such as word segmentation, named entity recognition, and relationship extraction. • Association relationship analysis Association rule mining is a method used to analyze the relationship between a group of entities in the database. In the manufacturing process, the hidden patterns or related factors can be found through the mining of a large number of cases. For example, in the design of a complex product, by mining the designer’s design activities that occur in the product CAD modeling process, a meaningful sequence of design activities can be found, and the existing sequence of activities can then be used in the design of new products to improve design efficiency. In the product manufacturing process, the probability and expectation of time risk in industrial production systems are predicted by analyzing the correlation between failure modes and production time delays to guide maintenance decision-making activities. In the assembly process, the rapid assembly process planning is carried out by analyzing the relationship between the three-dimensional shape of the product and the assembly path. • Clustering and classification Classification and clustering methods are applied in many areas of the manufacturing process. For example, in the semiconductor industry, by classifying defects, specific failure modes can be found and the corresponding failure knowledge can be extracted for quality improvement. Regarding the type of manufacturing system, scheduling goals, and current production status categories, classification algorithms can be used to assign different scheduling rules to different scenarios achieve the most efficient use of available manufacturing resources. In the design of new material products, steel grades with similar properties can be merged through clustering, and the efficiency of product development can be accelerated through the rapid design of components and processes. Figure 1.14 shows the process of material product development. For different steel grades and specifications, there exists large difference between the relationships of their mechanical properties, process parameters, and chemical composition, which drives a need to establish their own prediction models for different steel grades. However, a large steel mill usually produces hundreds of steel grades. Moreover, the sample of each steel grade is extremely unbalanced, and some steel grades have only dozens of samples. In order to solve the problem, we use clustering algorithms to analyze the chemical composition of different grades of steel, find the similarity and specificity of the raw material compositions, and automatically merge the grades with the same or similar raw material composition into sub-clusters. In this way, the establishment of steel grade clusters according to the composition range will help establish a more accurate prediction model and improve the efficiency of new product


1 Knowledge Management Through Product Lifecycle Steel cluster ing model

Per for mance prediction model

Mechanical Performance Process parameters

Chemical components

Par ameter optimization model

Fig. 1.14 The development process of material product

development. At the same time, the number of steel-making grades can be reduced under the condition of meeting product performance requirements, so as to realize the full use of surplus slabs. It can also effectively help enterprises realize flexible rolling with multiple varieties and small batches.

1.5.2 Knowledge Representation Engineering activities are closely related to information, and engineers need information and knowledge from various sources to support their decision-making. Therefore, the establishment of an effective knowledge representation model to realize the sharing and integration between enterprises has always been the focus of knowledge management in engineering manufacturing. As mentioned above, knowledge can be divided into explicit knowledge and tacit knowledge. Engineers have matured the use of explicit knowledge, such as drawings, CAD models, calculation results, and simulation results. However, excellent decision-making often relies on the problemsolving strategies accumulated by experienced engineers. This experience is often referred to as “internal knowledge” or “tacit knowledge”. The integration and unified representation of explicit and tacit knowledge is one of the problems to be solved urgently in knowledge management, which is of great help to improve manufacturing quality and efficiency. Knowledge representation refers to the process of codification and formalization of knowledge, transforming domain knowledge into information that can be processed in a computer. The appropriateness of knowledge representation methods is not only related to knowledge storage and knowledge management, but also directly affects the efficiency of knowledge acquisition and knowledge use. Knowledge representation methods combine multidisciplinary fields such as logic,

1.5 Knowledge Management Methodologies


ontology, and computing technology. The current knowledge representation models are mainly divided into the following categories: (1) Function-based model The research on knowledge representation model started in the 1990s, and a typical example is the function-behavior-structure (FBS) model proposed by Gero. The FBS model describes the main elements relative to the engineering design object as well as the activities concerning these elements. It takes function as the main carrier and integrates knowledge such as expert design experience into the product model. (2) Process-based model Process-based knowledge representation refers to a model that takes the product lifecycle as the main thread and describes the data and information associated with the product objects at each stage to realize the knowledge representation and reuse model in the manufacturing process. For example, the design resources required in different design stages can be defined according to different manufactured products and goals, and they can be associated with the process in the manufacturing stage. (3) Object-based model Object-based model emphasizes representing engineering knowledge by exploiting the power of object-oriented programming languages. The typical one is Modelica, which defines the data and behavior of objects through classes, and classes use domains to represent the solution results. In Modelica, equations are used to express the solution process, the rules, principle formulas. Documents in the database are transformed into multi-domain knowledge, and the knowledge in the product development process is effectively described. (4) Ontology-based model The ontology model is composed of conceptual entities, entity attributes and the relationships between entities. It expresses the associations between entities through conceptual implication, attribute associations, constraints and axiom definitions. Based on the ontology model, design knowledge such as product concepts, functional relationships, and technical principles are expressed as a semantic network, which promotes the sharing and reuse of knowledge. In the authors’ project on material product development, D2RQ semantic mapping technology was used to transform structured data into RDF data for ontology retrieval and reasoning services. OWL-based ontology modeling technique is then utilized to represent the product design knowledge. In addition to inheriting RDF, OWL also adopts the Ontology Inference Layer (OIL) to facilitate rule-based reasoning. The constituent elements such as classes, attributes, and individuals in OWL are defined as RDF resources and identified by URIs. We first formulate the D2RQ mapping rules and formalize the corresponding rules template according to the established ontology model. We then call the D2RQ mapping engine on the development platform, load the ontology model and mapping templates. The connection between the ontology model


1 Knowledge Management Through Product Lifecycle

and the data source is established, so as to finally transform the actual production into design ontology.

1.5.3 Knowledge Retrieval The purpose of knowledge retrieval is to seek the relevant ones from a large amount of knowledge records, so as to faciliate decision-making in problem-solving activities across the whole lifecycle. The development of a product is a process of using knowledge and creating knowledge. In this process, knowledge retrieval and knowledge use are combined with each other. Similar manufacturing tasks have similar requirements for knowledge, so the same knowledge object may be used in similar problem-solving activities. Due to the nature of knowledge explosion in the network environment, methods and algorithms are necessary to improve knowledge retrieval performance in terms of both efficiency and accuracy. Therefore, database technology and distributed computing technology are more and more commonly-used in massive knowledge retrieval. The existing knowledge retrieval methods mainly include concept-based retrieval, vector space-based retrieval and ontology-based knowledge retrieval. Concept-based retrieval is to match the keywords of text information from the perspective of functional requirements, constraints and product attributes, and express knowledge topics through the combination of keywords. Feature sets can be used to represent product requirements, and match existing cases in the knowledge base by calculating the similarity of semantics and values. Retrieval based on vector space generally uses cosine distance to measure the similarity between vectors, where vectors are composed of feature sets. The above two retrieval methods mainly rely on keywords formed by features or concepts, but may not accurately reflect the contextual semantic information of a problem. The retrieval method based on ontology uses the semantic information contained in the ontology language to retrieve the knowledge model. It can not only use contextual background knowledge to obtain more accurate association relationships, but also perform related concept inferences. In the design of mechanical products, the ontology semantics are efficiently retrieved through the steps of ontology design, semantic similarity calculation and ranking based on similarity score.

1.5.4 Knowledge Reuse The reuse of knowledge in engineering design typically aims to make available the experiences of individual knowledge or organizational knowledge from previous design activities to better inform and enable future design activities. We categorize the knowledge reuse modes into manual mode, semi-automatic mode and fully automatic mode. The above three knowledge reuse modes have gradually increased the

1.5 Knowledge Management Methodologies


supporting role of engineers, and at the same time, the requirements for knowledge reuse algorithms are getting increasingly demanding. In manual mode, the system provides engineers with a series of knowledge, which is stored in the system in the form of a tree or graph. On the basis of problem demand analysis, designers make design decisions independently by manually retrieving relevant engineering knowledge. In the automatic mode, engineers can directly call the encapsulated engineering knowledge that can be directly run and generate corresponding results. For example, the system automatically generates material composition and processing parameters based on product performance requirements and historical design cases. The semi-automatic mode is between the manual mode and the fully automatic mode. It requires manual decomposition and analysis of the problem to be solved. The system actively recommends the relevant knowledge required for each step in the process, or recommends the tools or experts needed to solve the relevant problems, and seeks further help to solve the problem. (1) Manual knowledge-reuse mode In the engineering problem-solving process, engineers often need to refer to a large amount of knowledge, such as schemes, constraints, manuals, standards, to make the final decision. This knowledge includes possible problems, feasible solutions, and related experts. In the manual mode shown in Fig. 1.15, engineers first decompose the problem into specific subtasks based on personal experience, and then retrieve and browse the knowledge records related to the subtasks. Users need to quickly find knowledge related to the current task, and on this basis to expand the knowledge to find more relevant supporting tools. This kind of knowledge reuse mode generally appears in the requirements analysis, functional design and conceptual design stages of the product design process. Manual mode requires the application of retrieval algorithms to calculate knowledge similar to the current task. (2) Automatic knowledge-reuse mode A large part of product design is variant design, which can realize the design of new products by reusing past design models and algorithms. Knowledge records in the form of geometric models, algorithms and codified rules are an important source of knowledge in the design process. Therefore, through the management and reuse of this part of knowledge, the repetitive work in the design process can be greatly reduced. As shown in Fig. 1.16, automatic knowledge reuse mode refers to the use of encapsulated geometric models, algorithms, and rules by designers to complete current design tasks. In order to support this knowledge reuse model, a variety of technologies are needed. First, geometric models, algorithms, and rules are required to be formalized and encapsulated into service. Second, the encapsulated components need to be exposed through Web Services technology, so that the knowledge can be accessed from the network. Finally, the packaged knowledge service is automatically recommended according to the description of the task. (3) Semi-automatic knowledge-reuse mode


1 Knowledge Management Through Product Lifecycle

Engineering Problem

Knowledge Database

Problem decomposition

Issue 1

Issue 2

Knowledge classification

Knowledge retrieval

Knowledge Unit 2

Knowledge Unit 1

Issue 3

Knowledge Unit m

Knowledge Unit 3

Issue n

Decision Making

Fig. 1.15 Process of manual knowledge-reuse mode

Product design

Producti on

Service match Knowledge composition

mainten ance

Assemb ly

Knowledge application

Supply chain


Decision making Recommendation

Knowledge resource



Knowledge encapsulation

Knowledge base

Fig. 1.16 Process of automatic knowledge-reuse mode

In actual manufacturing activities, due to the multi-discipline and complexity of the complex product design process, it is difficult for the system to automatically judge the required design knowledge based on the context of the design problem and push the relevant content to the designer. The knowledge reuse model is more to manually reorganize and improve the existing knowledge to generate a corresponding plan based on the retrieved knowledge. In the semi-automatic knowledge reuse mode, on the one hand, the system can provide a process for problem solving or problem

1.6 Discussion


decomposition, and designers can decompose design problems or solve design problems according to the guidelines of the design process. On the other hand, when encountering a design problem, the designer can directly find experts in the field related to the problem to help solve the related design problem.

1.6 Discussion The role and framework of knowledge management are introduced in this chapter. Despite the rapid development of knowledge management and knowledge engineering, the existing manufacturing knowledge management methods face the following challenges: (1) Enterprises have accumulated a lot of experience and knowledge in the production process, but due to the lack of an effective knowledge sharing mechanism, they have not been well organized and utilized in the networked collaborative manufacturing process. Knowledge cannot be fully shared during engineering operations. There are barriers to semantic understanding in knowledge dissemination and communication, and it is difficult to provide intelligent support for the networked manufacturing process of complex products and improve operational efficiency. (2) The networked manufacturing process of complex products requires the participation of multiple departments and roles. The dispersity and independence of knowledge and information hinder the improvement of the decision-making efficiency of the networked manufacturing process. For knowledge-intensive manufacturing activities, it is necessary to integrate multi-departmental and multi-disciplinary knowledge to realize the related sharing of knowledge. Only by fully integrating the knowledge resources of various enterprises, departments and individuals, making tacit knowledge explicit, the rapid response and decision-making capabilities of the networked manufacturing process can be improved. The current manufacturing knowledge management is generally independent of the manufacturing process, and lacks capture and exploitation of the knowledge-related context—this tends to make the knowledge acquired out of context. (3) Knowledge is constantly evolving dynamically in the reuse and interaction. With the updating of product design and manufacturing mode, the structure and content of knowledge are changing. Traditional knowledge management platforms and integration methods have fixed knowledge input and integration styles, and cannot adapt to changes in structure and content during the dynamic evolution of knowledge. This has raised the need of studying the methods of knowledge acquisition, representation and reuse from a new perspective. In the next chapter, we will introduce the role and content of collaborative knowledge management based on the current deficiencies of knowledge management.


1 Knowledge Management Through Product Lifecycle

References 1. Chandrasegaran, S. K., Ramani, K., Sriram, R. D., Horváth, I., Bernard, A., Harik, R. F., & Gao, W. (2013). The evolution, challenges, and future of knowledge representation in product design systems. Computer-Aided Design, 45(2), 204–228. 2. Peng, G., Wang, H., Zhang, H., Zhao, Y., & Johnson, A. L. (2017). A collaborative system for capturing and reusing in-context design knowledge with an integrated representation model. Advanced Engineering Informatics, 33, 314–329. 3. Yue, W., Gui, W., Chen, X., Zeng, Z., & Xie, Y. (2019). A data and knowledge collaboration strategy for decision-making on the amount of aluminum fluoride addition based on augmented fuzzy cognitive maps. Engineering, 5(6), 1060–1076. 4. Wang, R., Nellippallil, A. B., Wang, G., Yan, Y., Allen, J. K., & Mistree, F. (2021). A process knowledge representation approach for decision support in design of complex engineered systems. Advanced Engineering Informatics, 48, 101257. 5. Wang, Y., Yu, S., & Xu, T. (2017). A user requirement driven framework for collaborative design knowledge management. Advanced Engineering Informatics, 33, 16–28. 6. Pahl, G., Beitz, W., Feldhusen, J. & Grote, K. (2007). Engineering design: A systematic approach. 7. Gzara, L., Rieu, D., & Tollenaere, M. (2003). Product information systems engineering: An approach for building product models by reuse of patterns. Robotics and Computer-Integrated Manufacturing, 19(3), 239–261. 8. Li, M., Zhang, Y. F., & Fuh, J. Y. H. (2010). Retrieving reusable 3D CAD models using knowledge-driven dependency graph partitioning. Computer-Aided Design and Applications, 7(3), 417–430. 9. Jagtap, S., & Johnson, A. (2011). In-service information required by engineering designers. Research in Engineering Design, 22(4), 207–221. 10. Camarillo, A., Ríos, J., & Althoff, K. D. (2018). Knowledge-based multi-agent system for manufacturing problem solving process in production plants. Journal of manufacturing systems, 47, 115–127. 11. Leo Kumar, S. P. (2019). Knowledge-based expert system in manufacturing planning: Stateof-the-art review. International Journal of Production Research, 57(15–16), 4766–4790. 12. Kretschmer, R., Pfouga, A., Rulhoff, S., & Stjepandi´c, J. (2017). Knowledge-based design for assembly in agile manufacturing by using data mining methods. Advanced Engineering Informatics, 33, 285–299. 13. Gao, Z., Cecati, C., & Ding, S. X. (2015). A survey of fault diagnosis and fault-tolerant techniques—Part II: Fault diagnosis with knowledge-based and hybrid/active approaches. IEEE Transactions on Industrial Electronics, 62(6), 3768–3774. 14. Xiao, S., Hu, Y., Han, J., Zhou, R., & Wen, J. (2016). Bayesian networks-based association rules and knowledge reuse in maintenance decision-making of industrial product-service systems. Procedia Cirp, 47, 198–203. 15. Peng, G., Li, T., Zhai, X., Liu, W., & Zhang, H. (2021). Knowledge-driven material design platform based on the whole-process simulation and modeling. International Journal of Modeling, Simulation, and Scientific Computing, 2241001. 16. Poorkiany, M., Johansson, J., & Elgh, F. (2016). Capturing, structuring and accessing design rationale in integrated product design and manufacturing processes. Advanced Engineering Informatics, 30(3), 522–536. 17. Kim, K. Y., & Kim, Y. S. (2011). Causal design knowledge: Alternative representation method for product development knowledge management. Computer-Aided Design, 43(9), 1137–1153. 18. Bracewell, R., Wallace, K., Moss, M., & Knott, D. (2009). Capturing design rationale. Computer-Aided Design, 41(3), 173–186. 19. Zhou, J., Zhou, Y., Wang, B., & Zang, J. (2019). Human–cyber–physical systems (HCPSs) in the context of new-generation intelligent manufacturing. Engineering, 5(4), 624–636. 20. Ahmed, S., & Wallace, K. M. (2004). Understanding the knowledge needs of novice designers in the aerospace industry. Design Studies, 25(2), 155–173.



21. Heisig, P., Caldwell, N. H., Grebici, K., & Clarkson, P. J. (2010). Exploring knowledge and information needs in engineering from the past and for the future–Results from a survey. Design Studies, 31(5), 499–532. 22. Gao, J., & Bernard, A. (2018). An overview of knowledge sharing in new product development. The International Journal of Advanced Manufacturing Technology, 94(5), 1545–1550.

Chapter 2

The Collaborative Knowledge Management Paradigm

As the functions and structures of modern products becoming more and more complex, the development of products is concerned with multiple disciplines, diverse teams and organizations, distributed platforms, and heterogeneous tools. In this context, collaborative product development has become a popular paradigm for today’s manufacturing industries. Under this paradigm, the traditional knowledge management framework encounters challenges in efficiently capturing, representing, retrieving, and reusing engineering knowledge involved in the lifecycle phases of a product. Thus, it is of great significance to analyze the main features and requirements of the collaborative knowledge management, and propose the corresponding collaborative knowledge capturing, retrieving, and reusing technologies. In this chapter, we first analyze the collaborative activities involved in the entire product lifecycle, including the types and elements of collaboration. On this basis, the requirements of collaborative knowledge management are introduced, and the architecture and key enabling technologies of collaborative knowledge management are proposed. Next, we introduce the key content in this architecture in detail—collaborative knowledge representation and collaborative knowledge reuse. Finally, to demonstrate the support of collaborative knowledge management for the product development and assembly design process, some practical cases are illustrated.

2.1 The Need for Collaborative Knowledge Management Collaboration refers to cooperative relationships that build the shared vision and understanding needed for conceptualizing cross-functional linkages in the context of engineering activities. Since collaboration facilitate the acquisition and sharing of resources through integration and cooperation with other teams, the collaborative mode has become popular in modern product development process [1]. Kim et al. conducted in-depth interviews in six consumer product manufacturers and identified four types of typical collaborative product design processes based on the © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. Wang and G. Peng, Collaborative Knowledge Management Through Product Lifecycle,



2 The Collaborative Knowledge Management Paradigm

interactions between industrial designers and engineering designers [2]. Mcharek et al. [3] proposed a knowledge management framework for mechatronic concurrent design to promote the collaboration between system engineers and disciplinary engineers. Based on the analysis of the modern product development process, we have summarized three types of collaboration as follows.

2.1.1 Collaboration Between Multidisciplinary Engineers Collaborative product design generally involves an integrated development team that includes professionals from varied disciplines. The collaborative development process can be divided into multiple knowledge-intensive activities such as conceptual design, engineering analysis, embodiment design, process design, performance evaluation and detail design. Each task requires a great amount of design knowledge and experience from different disciplines. When the whole product lifecycle is concerned, design activities are linked to downstream tasks including process planning, manufacture, assembly, maintenance, etc. In this case, the interaction and collaboration between members of the integrated team become more diverse. Figure 2.1 shows an idea of the interaction between different professionals across multiple disciplines. People are the main part of engineering activities such as product design and manufacturing. Experts with different disciplinary knowledge and from multiple design teams form a design community, quite often in a distributed environment, on the basis of consensus, trust, coordination and cooperation on the overall product design goals. The design community exchanges traditional materials, and requires the integration of distributed design tasks by sharing and reusing design knowledge.



Project manager Design specifications Cost accounting

CAE analyst 1

CAE analyst n Processing instructions

Manufacturing engineers

Production personnel

Financial staff

Products information

Product quality inspectors Quality feedback

Fig. 2.1 Collaboration between multidisciplinary engineers

After-sales personnel

2.1 The Need for Collaborative Knowledge Management


As shown in Fig. 2.1, the roles related to the product development mainly include customers, designers, technologists, manufacturing engineers, etc., and their commitments and responsibilities in the whole process are different whilst relevant. In the process of product development, customers first put forward requirements related to the product function, cost, time and so on. In the preliminary design stage, a number of overall system designers carry out the scheme design, and convert customer requirements into design specifications related to structure or function. Engineers in different fields carry out detailed parametric design and simulation analysis of design specifications according to different functions and disciplines. During the detailed design process, CAE designers analyze products and schemes in parallel according to the requirements, and feed back the analysis results to the scheme design department for correction. After the scheme is determined, it is submitted to the structural design department, and several designers carry out the structural design. In the process of structural design, the technologists will conduct a process analysis based on the designed product model, and feed the modification suggestions back to the structural designer for modification. At the same time, the manufacturing group will conduct a machinability analysis of the designed product, and feed the existing problems back to the structural designers for modification. Those that pass the machinability analysis will be processed, and finally the overall product will be assembled by manufacturing engineers. After-sales personnel follow up the use and maintenance process of the product, synchronize the quality problems reported by customers to the corresponding R&D personnel at each stage for the follow-up product improvement. At the same time, the financial personnel conduct cost analysis according to the process path and material consumption involved in the design, production and assembly process, and feed back the indicators that do not meet the customer’s needs to the overall system designer. In this complex process, since each step is interdependent and many people participate in parallel, how to deal with these influencing factors together holds the key to efficiently completing high-quality product design.

2.1.2 Collaboration of Lifecycle Activities In product lifecycle management (PLM), a product’s evolution process is divided into market analysis, product design, process development, product manufacturing, product distribution, product use, post-sale service, and recycling. From the perspective of knowledge collaboration, the lifecycle engineering activities are divided into three main stages, e.g., product design stage, product manufacturing stage, and product service stage. Each stage is further divided into sub-stages, and the collaboration between different stages are list in Table 2.1. In the preparation stage, the task road-map (e.g. demand analysis) is produced, the planning feasibility scheme is formulated, and the design task document is finally formed. In the preliminary design stage, the principle working scheme of the product is firstly studied, and some principal tests are carried out to select several feasible schemes; after comparison and evaluation, the final choice is made to determine


2 The Collaborative Knowledge Management Paradigm

Table 2.1 Collaboration between lifecycle activities Recycle


Product manufacturing stage




Production planning

Detailed design

Embodiment design

Concept design

Functional modeling

Requirements modeling

Product design stage

Product service stage

Requirements modeling Functional modeling Concept design Embodiment design Detailed design Production planning Manufacturing Logistics Utilization After-sale Recycle Strong



the best implementation scheme. In the detailed design stage, engineers analyze the mechanism of the design scheme; analyze the dynamics of the product; draw the final assembly drawing and part drawing. In the trial production and production stages, product samples are manufactured, experiments are carried out, problems are found and improvements are made, and the quality of the design is gradually improved. After the product enters the service stage, a feedback mechanism is established to collect customer evaluations and suggestions, and then continue to improve. Table 2.1 shows the strength of collaboration between different engineering activities. For example, conceptual design needs to analyze the market demand and design the basic solution structure, which is the bridge between the customer and the designer. The detail design stage involves designing the structure, shape, material, process route, assembly method and other related parameters of the product. At this stage, not only the processing capacity of the production equipment and the supply of raw materials, but also the usage scenarios and remanufacturing needs of the products need to be considered to a certain extent. The manufacturing process needs to balance the resources of each link such as equipment, raw materials, process personnel, assembly and logistics, so as to optimize the overall production scheme.

2.1 The Need for Collaborative Knowledge Management


2.1.3 Collaboration of Multi-source Knowledge Resources When designers share product information in collaborative design, they may use different tools and work in different contexts. There are a diverse range of knowledge resources from which knowledge can be acquired and shared such as design models, simulation and analysis programs and process planning files. These resources also require close collaboration by defining intermediate format (e.g. sharing information across multiple CAD platforms) and contextualizing information (e.g. extraction of requirement information to set the values for parameters). In the context of lifecycle knowledge management, the diverse knowledge resources can be classified from different perspectives: (1) from the perspective of knowledge attributes, dynamic knowledge describing the product design process and static knowledge describing product design objects; (2) from the perspective of logical abstraction, the relationship knowledge between product objects and objects attributes, the development law of product objects and the control process knowledge of product design, expert experience and skills knowledge, design common sense, etc.; (3) from the whole cycle process of product design, including market information, cases knowledge, design simulation knowledge, physical model testing knowledge, manufacturing prototype knowledge, feedback in product application and other knowledge; (4) from the way of acquisition, there are engineering cases knowledge, engineering specification knowledge and design experience knowledge and so on. We divide product knowledge into five categories using the classification of knowledge representations by Owen and Horváth’s [4]: Pictorial, Symbolic, Linguistic, Virtual, and Algorithmic. In the process of collaborative product development, different kinds of knowledge resources will be processed, shared, and integrated within the integrated development team. Resource collaboration refers to the way that engineers share resources between different application systems when performing different tasks, and realize the conversion of data to adapt to different formats and storage methods. Information resources with different formats and contents are systematically analyzed, adapted, and merged with similar information, thereby generating valuable knowledge in the form of unified data. Information collaboration can improve the efficiency of information exchange, processing and feedback, and thus improve the ability of heterogeneous systems to solve complex problems through cooperation [5, 6]. Figure 2.2 shows the two most common types of knowledge resource collaboration: the collaboration of geometric CAD model and textual information. The sharing of geometric models mainly refers to the collaboration and information exchange between CAX and DFX systems, which has always been a hot and difficult point of research. Taking the collaboration of graphics files as an example, since different designers are responsible for different product modules, including transmission modules, hydraulic modules, etc., CAD files need to be shared and collaborated among different designers; while engineers in different sub-processes pay attention to different views of graphical files. For example, assemblers need assembly-related


2 The Collaborative Knowledge Management Paradigm

Fig. 2.2 Collaboration of multi-source knowledge resources

information from CAD files, the supply chain needs bill of materials (BOM) information of CAD for material supply, and simulators need to perform simulation analysis on different dynamic and static properties. Taking the collaboration of textual files as an example, designers need to convert customers’ functional requirements for products into requirements documents, and provide them to engineers in different departments for subsequent design. At the same time, in the new design, the previous design cases will be carried out for reference and reuse. The collaboration of graphical information currently mainly includes the following approaches: (1) standardized intermediate file formats; (2) the interface conversion method; and (3) the unified service standard method. Specifically, the first method enables heterogeneous CAX systems to independently use their own graphical file formats, and when information exchange occurs, the respective graphics files are converted into standardized intermediate formats such as IGES, STEP, etc. The second method establishes a one-to-one information conversion mechanism through analyzing the data structure correspondence between different CAX/DFX systems. For example, the NX system of UGS company and the CATIA system of Dassault company have provided file format conversion to each other. The third method focuses on the encapsulation of the application program interface of the CAX/DFX systems as standard services, and on this basis enables the application program to access the internal functions of the CAX/DFX system through these services. The three categories of collaborations discussed above emphasize the linkcage between different elements in product lifecycle. There also exist other collaborations if we take a closer look at the interaction between digital contents produced through the whole product design and development process [7, 8]. For example, the

2.1 The Need for Collaborative Knowledge Management


collaboration between simulation models running in parallel takes the form of runtime interaction, i.e. exchange of simulation results and information about simulation parameters. The collaboration between model-editing activities of multiple designers takes the form of concurrent operation on the same model, i.e., synchronization of operation between all the models seen by these designers. These two examples further show that the diverse and frequent exchange of information between people, activities, resources, models and codes is a very important feature of modern product development. This is also the focus of this book, i.e. understanding knowledge management from a computational perspective. The investigation into the collaborative development of large complex products reveals that a large-scale design process is characterized by the following features: • Distributed and grouped: The collaborative design process often involves different design teams and departments distributed in different regions. At the same time, this process involves the participation of multiple expert groups with different domain knowledge to collaborate to accomplish a common task. In product design, experts achieve “collaborative innovation” of products through collaboration, communication, and inspiration. The final design result reflects the collective wisdom of the whole development team. • Heterogeneity and integration: The collaborative development system is crossdepartment, cross-enterprise, and cross-industry. Its work platform is heterogeneous, which is mainly reflected in the shared data sources, the expression of solving knowledge, and the software and hardware used. On the other hand, each design team must integrate heterogeneous resources to effectively share knowledge and design experience, and develop new products under the effective coordination of the design process. • Globality and synergy: For common design goals, product design needs to be carried out with the concept of the whole lifecycle. From the early stage of product design, one should consider possible problems in a series of subsequent work links, establish a unified product model to avoid the loss of information between designers, and take overall optimization as the purpose of product design to reduce design iterations and rework times. At the same time, it is necessary to deal with multi-objective, cross-time, cross-space, and complex interlaced problems in a “co-decision-making” way to coordinate and resolve various conflicts encountered by personnel. Collaboration can take place not only between members within a team/domain, but also between teams/domains. • Openness and sharing: Design resources are continuously enabled and released during design, which requires the collaborative design system to have the characteristics of “plug and play”, i.e. openness to these resources, so as to realize the mutual transparent interoperability of the system, which is beneficial to the replacement of design components. From the features identified and analyzed above, certain requirements for a collaborative knowledge management platform can be identified:


2 The Collaborative Knowledge Management Paradigm

• The collaborative knowledge management platform should support distributed designers to flexibly integrate heterogeneous software and hardware resources, provide efficient access to engineering data and knowledge, and facilitate the exchange of information between engineers. • The collaborative knowledge management platform needs to efficiently acquire and collect multi-source and heterogeneous knowledge resources, including graphic resources, text resources and data resources. • To make better use of multi-source heterogeneous knowledge, it is necessary to build an integrated knowledge representation model for knowledge sharing, reuse and integration in a collaborative environment. As mentioned above, with the increasing demand for collaborative product development process and massive resource sharing, it is necessary to make full use of computer technology and information technology to carry out collaborative product design, and to further study knowledge management methods in the acquisition, representation, reuse and update of knowledge.

2.2 Collaborative Knowledge Management Architecture and the Key Enabling Technologies In the networked work environment, the collaborative knowledge management technology can provide an alliance and collaboration environment for enterprises scattered in different places, so that enterprises can quickly obtain the required knowledge resources and adapt to market changes. By providing users with personalized product demand services in a short time, while achieving the “win–win” cooperation, it also improves the overall competitiveness of the group in the region. In order to realize the effective service of knowledge in the collaborative product design of multiple teams, the key and difficulty lies in breaking the “knowledge barriers” caused by regions and disciplines, so as to realize the collaboration and accumulation of knowledge among teams. In the process of collaborative product design, the design tasks are diverse and concurrent, requiring a large amount of knowledge to be supplied according to specific context of working. These characteristics make the collaborative work process dynamic and uncertain. Moreover, the diversity of designers and design resources, the complexity of design activities and tasks, and the multidirectional, transferable and variable nature of information and knowledge flow among designers increase. Various conflicts continue to appear in the design process, resulting in difficulty in the coordination of the design process. In addition, the heterogeneity of systems among different enterprises would also lead to the phenomenon of “knowledge island”. The information and reasoning knowledge implicit in each system cannot be effectively explored, leading to the inability of knowledge sharing and interoperability between systems. Therefore, in order to solve the above problems, it is necessary to study the related technologies of collaborative knowledge management.

2.2 Collaborative Knowledge Management Architecture and the Key …


2.2.1 Integrated and Collaborative Knowledge Management Architecture To give an overview of this Integrated and Collaborative Knowledge Management (ICKM) scheme as well as descriptions to its key components, a system framework is developed and shown in Fig. 2.3. In the center of the figure are the lifecycle stages of collaborative knowledge management all of which are linked to specific design activities (e.g. selecting the diameter of a turbo charger). For each of these activities, formal knowledge and tacit knowledge is linked through an integrated knowledge model. Different users with different roles can work in this virtual collaborative environment as a design project to create knowledge records and receive suggested knowledge records. Four main underpinning technologies are listed in the bottom of the figure, which need to be developed to implement an ICKM system. An advanced distributed computing environment is exploited in this framework to facilitate collaborative work and system integration, providing a virtual working space for different users to undertake design tasks as well as knowledge management tasks [9, 10]. This means a range of computational methods need to be developed to integrate design objects and associated problem-solving and decision-making knowledge as well as to enable supplying design knowledge according to specific context of working. Another feature of the framework is that design activities are placed in the very center, meaning that this ICKM scheme aims to make knowledge capture and reuse an integral part of the design process. In other words, design knowledge is captured Collaborative Knowledge service

Intelligent Knowledge Retrieval

Knowledge recommendation

Knowledge Integration

Collaborativ e Knowledge obtaining

Knowledge capture Sales personne l

Knowledge reuse

Formal knowledge

Design engineers


Manufacturing engineers

Textual data

CAD data

storage System analysts

Design Activities

Formula data

Informal knowledge

Service engineers Collaborative Work environment

Knowledge retrieval

Knowledge validation

Project manager s

ICKM Computing methods Framework interoperability

Communication protocol

Multi-agent technology

Fig. 2.3 Integrated and collaborative knowledge management architecture

Simulation data

Numerical data


2 The Collaborative Knowledge Management Paradigm

as a design project proceeds (i.e. as design issues are being addressed) whilst knowledge about previous design issues is supplied to designers to drive the design process. The whole knowledge lifecycle is also an important part of the framework, which applies to any knowledge records to be created within ICKM. A knowledge record can be created in response to a request for any level of information granularity, i.e. knowledge capture can be done for both large scale projects and small scale issues. In ICKM, both formal knowledge (e.g. use of a set of formulas, structured material data and geometric models) and informal knowledge (e.g. experience of considering issues, strategies of problem-solving, and justification and evaluation of solutions) are supported across the lifecycle. Supporting collaborative design is another important feature of the framework and this collaboration can take place between participants with different roles across the whole product lifecycle such as design engineers, project managers, system analysts, service engineers and manufacturing engineers. This not only extends applications of knowledge-based engineering to the whole product lifecycle but also enables capture of complete, contextual and trustworthy knowledge through combining the considerations and options of different participants of the same project. A considerable amount of work is required towards a full implementation of the framework for the development of next-generation ICKM systems. First, a knowledge model is required to integrate formal knowledge and tacit knowledge into an integrated knowledge space. Second, many computational methods are needed to enable representation and exploitation of design context. Third, a distributed computing environment needs to be developed to support acquisition and dissemination of design information as well as to facilitate completion of design tasks. Last but not least, advanced knowledge retrieval methods need to be developed particularly for high-level and in-context tacit knowledge. These features highlight the computational nature of ICKM and this section aims to introduce the key enabling methods for the design and development of a prototype collaborative system.

2.2.2 Enabling Technology for IKCM—Platform Interoperability To realize resource sharing and collaborative work between multiple enterprises in a dynamic, distributed and heterogeneous virtual community, a platform that supports interoperability is the underpinning technology of collaborative knowledge management. A distributed and interactive platform facilitates the development of multidisciplinary activities which is performed by various users focusing on different tasks. Moreover, it enables the collaborative work of users by providing consistent interfaces and using unified knowledge resources [11]. Platform interoperability technologies for product collaborative development include web-based technology, semantic technology, cloud platform technology, and multi-agent technology.

2.2 Collaborative Knowledge Management Architecture and the Key …


The web-based approach is the most commonly used platform collaboration technique. Qiu et al. [12] proposed a collaborative design architecture under a dynamic integration environment based on the idea of service-orientation, agent and semantic web technologies, which covers the whole product development lifecycle and integrates multidisciplinary knowledge services efficiently. Agent is a kind of computing entity or program that can perceive the environment in a specific environment and can run autonomously to achieve a series of goals of its designers or users. A system composed of multiple agents is called a Multi-Agent System (MAS). The MAS-based collaboration model is autonomous, reactive, distributed and collaborative. Autonomy means that it can run continuously without external interference, and can control its own actions and internal states. Reactivity means that it can perceive the environment at any time and act upon the environment in an appropriate way. Distribution means that there is a low degree of coupling between agents and they can spread on any effectively connected network. Collaboration means that multiple agents can negotiate with each other, resolve conflicts, and finally complete some complex tasks that benefit each other and cannot be solved independently by themselves. The cooperation mechanism based on MAS reduces the processing capability of a single agent, improves the adaptability of a single agent in the collaborative design environment and the robustness of the whole system.

2.2.3 Enabling Technology for IKCM—Data Interoperability In the process of collaborative design, multi-source heterogeneous data is the object of working for designers, and the creative work of designers is reflected in the form of digital information as part of the final product. There are many types and formats of data related to collaborative knowledge management, such as textual information, vocal information, video information, geometric model information, data reports, etc. Therefore, in order to realize the integration of the knowledge of different designers, the interoperability of data is of great significance [13]. Collaboration and sharing methods for CAD data can be divided into the following four categories: keyword-based retrieval, content-based retrieval, sketchbased retrieval, and semantic-based retrieval. In the collaborative management of massive text data and variable data, Natural Language Processing (NLP) and machine learning technologies that have emerged in recent years can be applied to extract insights and dependency patterns.


2 The Collaborative Knowledge Management Paradigm

2.2.4 Enabling Technology for IKCM—Integrated Knowledge Representation Model Design knowledge is diverse and highly heterogeneous. It includes not only explicit knowledge existing in engineering manuals, 3D geometric models, simulation models and material database, but also implicit knowledge related to engineers’ individual experience and wisdom in understanding and approaching problems. Traditional data models such as various relationships, hierarchies, frameworks, and meshes are difficult to express the entire structure of complex knowledge. To better express product knowledge and effectively support product design and development, it is necessary to build an integrated knowledge representation model for knowledge sharing, reuse and integration in a network environment.

2.2.5 Enabling Technology for IKCM—Intelligent Knowledge Service In the process of intelligent product design, it is necessary to match the huge design resource base according to the knowledge needs of designers. By selecting the most appropriate design knowledge and supplying it to the designers, the efficiency and quality of product design is improved. At present, the utilization of design knowledge is mainly studied from the perspective of matching and reuse of design cases. There is a lack of utilization of knowledge in problem solving methods, solution generation strategies, design intentions (principles) and design history. The various types of design knowledge are also not well correlated with the dynamic design process, and no requirement-driven knowledge matching or automatic push is achieved. In the process of collaborative knowledge management, the characteristics of dispersion, dynamism, large capacity, and heterogeneity bring difficulties to the knowledge service. Therefore, it is necessary to study the method of intelligent knowledge service in combination with computer technology.

2.3 Collaborative Knowledge Representation Model The aim of design knowledge modeling is to capture and represent the decisionmaking path that leads to artifact design generation during the problem-solving process. The problem-solving process for design generally consists of four basic elements, namely designer, product, design issue, and knowledge resources. Depending on the scope and scale of design tasks, the roles of designers vary greatly in terms of project team and disciplines. A company has its own branded product series, which are similar in function and structure, but different in performances according to market preference and resource limitations. It has been estimated that

2.3 Collaborative Knowledge Representation Model


Designer Network

Product Network

Issue Network

Resource Network

Fig. 2.4 The design knowledge hypernetwork

a majority of the design work is adaptive or variant based on the existing products, which means that the development of new products is highly depend upon past design cases. Design issues are the core of the network, since knowledge is generated when a variety of issues are solved by different designers through processing a wide range of information. The knowledge resource here refers to the codification view of design knowledge, which embodies knowledge in codified information sources such as CAD models, simulation models, design manuals, technical reports and so on. Above all, as shown in Fig. 2.4, the design knowledge hypernetwork H dk = (V dk , E dk ) is consisted of four sub-networks, i.e., the designer network N d = (V d , E d ), the product network N p = (V p , E p ), the issue network N i = (V i , E i ) and the knowledge resource network N k = (V k , E k ).

2.3.1 Construction of the Designer Network Since all design decisions are made by engineers based on the analysis of existing design resources and constraints, engineering designers are an important part of the product design process. With the high speed of products renewal and fierce market competition, the demand for different professional capabilities in product development is ever-increasing. This makes the cooperation between different organizations across the supply chain evolve from a loosely-coupled point-to-point structure to a more closely connected network structure. It is not just common in high-tech industries, but also in the traditional automobile and manufacturing industries. This is particularly the case in the context of the recent modular design paradigm in which all the knowledge resources from different disciplines need to be integrated to construct


2 The Collaborative Knowledge Management Paradigm

Table 2.2 Network characteristics of the designer network Statistical index

Calculation formula

Designer network (a)

Designer network (b)

Node number N



Edge number




Average degree


1 N




1 N











Average weighted degree Density




i=1 ki


2M N (N −1)



1 2M

Average clustering coefficient


1 N

Average path length


2 N (N −1)



i=1 ki


Ai j −


ki k j 2M

δ(Ci , C j )

2E i i=1 ki (ki −1)

1≤i < j≤N

di j

a complete product design. The network structure improves the decision-making processes to deal with complex innovations and expands the scale of innovation activities, thereby improving efficiency and speeding up response to the market. The designer network is composed of some different innovation entities in the process of collaborative innovation, which refers to the product design activities. As the source of design knowledge, designers are involved in all kinds of knowledge processing activities, such as knowledge generating, transferring and updating. N d is a weighted network consisting of various design entities and the relationships between different entities. According to the complexity of design tasks, node V d in the designer network can denote individuals, groups or organizations. Edge E d has a weight attribute representing the collaboration numbers between designers. In Table 2.2, k i is the degree of node i, and k˙i is the weighted degree of node i. A is the adjacency matrix and Aij = 1 means that there is a link between  nodei and node j; C i is the label of the community that node i belongs to; and δ Ci , C j = 1 only if C i = C j . E i is the number of edges that exist between neighbor nodes of node i. d ij is the shortest path length between node i and node j.

2.3.2 Construction of the Product Network N p is the product network composed of a variety of design objects and product relationships. Each node V p represents a design task, ranging from a small component to a complex product such as the input shaft. According to the authors’ previous study, the relationships between product nodes can be reflected by the structure similarity

2.3 Collaborative Knowledge Representation Model


Table 2.3 The definition of correlation degree in product network Correlation degree

Geometrical correlation Sg


Fixed connection, such as Consistent riveting and welding functionality


Movable fixed connection, Strong auxiliary Information flow such as key and gear function


Movable active connection, such as bolts and nuts and mate

Weak auxiliary function

Material flow



No functional correlation

No physical correlation

Functional correlation Sf

Physical correlation Sp Energy flow

and feature similarity in the product structure tree. Thus, edge E p has a weight attribute representing the correlation degree between products. In Table 2.3, three kinds of correlation are defined to measure the similarity between different products or parts. The weight of the edges in the product network can be expressed by the linear weighted sum of geometrical correlation S g , functional correlation S f , and physical correlation S p .

2.3.3 Construction of the Issues Network N i is the issue network with the node V i which represents an issue to be solved and is linked by edges to other nodes (i.e., related design problems). Issues are problems that should be considered during particular stages of the design process. Design issues are described in the form of sentences each of which can be divided into phrases. To measure the relationship between different issues, the semantic similarity is used in this work. Vmi and Vni are two nodes in the design issue network, which can be transformed to sets of concepts, namely Vmi = {Cm1 , Cm2 , . . . Cm J }, i Vni = {Cn1 , Cn2 , . . . Cn K }. Then the weight of E mn can be expressed as: i W (E mn )


j Engine Component Requirement Function WHAT HOW




Left Canvas

Right Canvas

Connect Line Edit Text Move Remove Clear Refresh Save


Operation Option Bar

Insert a Model: Choose file

Instruction: Please create an object by clicking on the button. No file chosen

Remove Image

Select a file to save in database: Choose file

No file chosen






Fig. 3.34 Knowledge capture GUI of the web-based knowledge management system

3.3 Collaborative Design Knowledge Modeling

Fig. 3.35 Structural decomposition in the system

Fig. 3.36 Capturing the design evolution process of a component



3 Representation and Modeling of Knowledge for Collaborative …

Figure 3.35 shows how the system is used to capture and represent design knowledge related to the intake system design. Specifically, on the Left Canvas, a CAD model is displayed to give engineers a straightforward way of understanding the design object, while useful design knowledge is organized and recorded on the Right Canvas. For each component in the design object, the detailed function and relevant requirements are clearly identified. Additionally, each element created in the knowledge capture GUI can be double-clicked to open a new GUI for capturing more details related to the element. Any supplementary documents can be uploaded to the system through the GUI as attachments, e.g. calculations and simulation results, in order to achieve completeness for the design knowledge captured. As the system has a particular focus on capturing the knowledge about design evolution, a GUI specific for capturing design evolution is developed in the system. The example in Fig. 3.36 shows how the system can be used to capture the knowledge generated and exchanged during design evolution of the intake body. Within this GUI, the knowledge generated during a design evolution process can be captured and organized with the guidance of the RFBSE model. Through this systematic method of organizing design knowledge, designers can capture and record useful knowledge as a design issues is being addressed. In addition to the designers working on the issue, other people across the development chain of a company, e.g., sales person, manufacture engineers and services engineers, can also contribute to the creation of knowledge elements as well as benefit from the elements in other stages of the chain. For example, the improvements achieved during the design evolution in Fig. 3.36 are made in response to suggestions from manufacture engineers. Sales people can also give inputs to the design based on customer requirements, and adjust their marketing plans according to the improvements of the design. In addition, the specific knowledge elements captured can help service engineers to make informed decisions when doing maintenance by referring to reasons of changes and the change propagation amongst key design parameters. In this sense, the proposed system not only provides useful supports to design engineers but also benefits the development chain of a company.

3.3.4 The C-RFBS Model During the design process, customer requirements are transformed into detailed technical information by designers to allow physical realization of the product. Designers consider the characteristics of the expected product and the current market trend to analyze these requirements, and communicate with customers to confirm details about the key functions. In the traditional function-behavior-structure model of product, function decomposition and combination affect rationality of the behavior and structure analyzed and elaborated subsequently and behavior also affects rationality of structure. Thus, as main embodiment for the requirements and product characteristics, structure layer affects the entire design activities.

3.3 Collaborative Design Knowledge Modeling






Balance fB'






fR New Content

New Structure



Function Knowledge Network

fF 2



structure solution in requirement


The forward reasoning process


behavior solution in requirement


The backward amending process


structure solution in function

Fig. 3.37 The C-RFBS knowledge representation scheme

The C-RFBS model [6] proposed in this work views design activities such as requirement mapping, function mapping, and behavior mapping as a process whereby records in an existing knowledge network contextualize and adapt to the external environment. Changing of knowledge contents and structures is accompanied by the cognitive assimilation and accommodation, as shown in Fig. 3.37. In this paper, the conceptual design process consist of the forward reasoning process and the backward amending process, which relies upon the articulation of designers’ experience and knowledge. The cognition takes place during these processes and cognitive terms in the design process are shown in Table 3.4.

Cognitive Process in Design

Cognition is a process whereby external environmental information is captured, categorized and articulated. Knowledge can be captured in the process of problem solving from requirement to structure. Furthermore, tacit knowledge captured is externalized and relationships between decision-making knowledge, behavior knowledge and driving knowledge are identified. In the meanwhile, the relationships between the internal and external knowledge units/groups are also constructed. Specifically, the C-RFBS model is focused on capturing the knowledge of product improvement, variation, and development, and the cognitive process (i.e. assimilation and accommodation) provides the foundation for building the local and global knowledge networks. An integrated knowledge network is then established to combine cases of success and cases of failure.


3 Representation and Modeling of Knowledge for Collaborative …

Table 3.4 Concepts and their interpretations during the cognitive process in design Concept



After a change of the external environment or external feedback is received, the current knowledge network merges external information into the existing body of knowledge. This is a process of quantitative change to the knowledge network

Accommodation After a change of the external environment or external feedback is received, the current knowledge network restructures to meet the external environment, which is a process of qualitative change to the knowledge network Environment

The external factors that lead to the change of content or structure of the current knowledge network, which can be regarded as the driving force of design innovation and improvement


Knowledge network reduces the influence of the external environment through a dynamic process of assimilation and accommodation. Balance is a synergy between the external environment and the internal knowledge system. Balance is the process of forming a consistent knowledge structure for the internal knowledge system

The cognitive process of multiple knowledge groups is shown in Fig. 3.38 in which these knowledge groups come from different design cases or design processes. And these knowledge groups must meet the following two conditions. First, the internal conditions of the knowledge groups: the knowledge groups can be represented by one or more complete triples of knowledge to ensure the their completeness. Decision-making Knowledge Physical relationship Logical relationship Business relationship Knowledge Group 1

Requirement Function

Knowledge Group 2

Behavioral Knowledge Content change

Requirement Function

Structure adjustment Behavior




Driving Knowledge


Subjective intention Objective constraints Fig. 3.38 The cognitive process of multiple knowledge groups

3.3 Collaborative Design Knowledge Modeling


K g = {kt1 , kt2 , kt3 , . . . , ktn }


where K g represents a knowledge group and k tn represents a knowledge triples. Second, the external conditions of the knowledge groups: there are one or more nodes in the K gi {i = 1, 2, …, n} that are directly related. 

K g1 − [r ] − K g2


where K g1 and K g2 represent two knowledge groups, respectively, and r represents one or more relationships between nodes of K g1 and K g2 . Among them, behavior knowledge describes the behavioral measures (e.g. changes in structure, updating of content, etc.) taken by the existing knowledge network to adapt to the external environment, which also shows the status of knowledge unit/group and explains how new knowledge is generated. Decision-making knowledge describes the possible physical, logic and business relationships between new knowledge and old knowledge, and provides an interpretation for both the generation of new knowledge and the reuse of old knowledge. Driving knowledge translates external environment factors into understandable knowledge, and represents driving force behind the generation of new knowledge in a knowledge network. Knowledge records derived from different design cases are independent of each other, and new knowledge records are embed in the existing knowledge network. The cognitive process establishes the reuse and step relationships between knowledge units/groups in the product design knowledge network. For facilitating the understanding of the cognitive process, some mathematical expressions are used to explain the assimilation and accommodation. The assimilation process: K curr ent + E = K new .


where, K current represents current knowledge unit/group; E is driving knowledge that can cause content of current knowledge to change; and K new represents new knowledge produced by changing the content of the current knowledge network. The accommodation process: K curr ent ⇒ K ,


K , + E = K new


The accommodation process is a two-step process: the first step is shown in Eq. (3.4) that describes that new knowledge K’ is generated through adjusting the structure of current knowledge network, and K’ is the newly generated knowledge that approximates K current in content. And the second step is shown in Eq. (3.5) in the same means as Eq. (3.3).


3 Representation and Modeling of Knowledge for Collaborative …

Cognitive process occurs at different mapping stages in the design. To further indicate that cognitive process is an interaction between design process functions and environmental variables, the knowledge representation process is divided into three basic processes: forward reasoning process, backward amending process and correlation construction. Assuming that the design process follows the sequential process of R-F-B-S, the expressions are then given as follows: (1) The forward reasoning process: from requirement to structure. P −1 = f x (P, Cassimilation )


S = f (R, Cassimilation ) :


In the above expressions, f x (including f R , f F , f B , f ) is the reasoning function; P represents R, F, B, P−1 represents F, B, S correspondingly and C assimilation represents the assimilation process in design including knowledge reuse and content update. Therefore, Eq. (3.6) represents the reasoning process from P to P−1 and Eq. (3.7) represents the reasoning process from R to S. During the process of forward reasoning, designers just change the relation path and knowledge content of a knowledge network without adjusting the network structure. In other words, the assimilation process only integrates external environment knowledge into current network, and constructs the step/reuse relationships between new and old knowledge. (2) The backward amending process: from structure to requirement.   P , = f x, P −1 , Cassimilation , Caccommodation


R , = f , (S, Cassimilation , Caccommodation )


In the above expressions, f x , (including f R , , f F , , f B , , f , ) is the amending process function. P, represents R, , F , , B, ; C assimilation is the assimilation process; and C accommodation is the accommodation process. Therefore, Eq. (3.8) represents the amending process from P−1 to P and Eq. (3.9) represents the amending process from S to R. During the process of backward amending, designers receive the external information, and adjust the structure of the knowledge network to reduce the impact of driving knowledge accordingly. (3) The correlation construction process. In order to support better knowledge reuse, the correlation formula is used to describe direct relationship between new and old knowledge. K new = λK old


where K new is new content or new relationship, K old represents the existing knowledge in the knowledge network. λ is used to express the relationship to ensure that

3.3 Collaborative Design Knowledge Modeling


new knowledge is expressed in the form of triples. Based on the C-RFBS model, the correlation relation includes reuse, decomposition, mapping, explanatory and semantic relations, etc. Based on the formulation of the forward reasoning and the backward amending processes, function and requirement may be adjusted by the cognitive process in the same means as behavior and structure when environment changes. At the same time, differences of designers’ understanding will further lead to the dynamic process of knowledge network. Although requirement may provide partial solutions for function, behavior, and structure in addition to the technical specifications and constraints, personal accumulation of knowledge and situation of historical knowledge reuse is critical to the decomposition of requirement, function, behavior and structure. In particular, the decomposition granularity of requirement, function and behavior decides whether expected structures could be obtained.

The Multi-level Structure Model

Design is a decision-making process, and diversity of personal knowledge will affect the information granularity of knowledge records. Whether that desired knowledge could be matched in the knowledge network is an important criterion for the ability to quickly meet requirement. Therefore, a knowledge network that meets multigranularity knowledge query is more in line with the actual need. One of the characteristics of the C-RFBS model is global extension, which exploits a multi-level structure to provide extended representation for reducing the differences in the decomposition of requirements, functions, behaviors and structures. The knowledge correlation between multiple sources of knowledge during cognitive process in which knowledge is restructured and revised. As shown in Fig. 3.39, new knowledge is added to fasten the completion or correction of the structure, behavior, function and requirement framework. For example, Sub_S11 and Sub_S22 are structure schemes of B1 , and S1 is the integrated structure of Sub_S11 and Sub_S12 . As a supplementary structure for B1 , S1 provides a new selection for reasoning solution of B1 and adds keywords to be retrieved though B1 will decompose Sub_S11 and Sub_S12 . On the contrary, Sub_B21 and Sub_B22 are the decomposed behavior of behavior B2 , which refines the information granularity of knowledge so the knowledge can be easily accepted by other designers. As such, the multi-level structure minimizes the influence of personal knowledge on a knowledge network. And a more complex knowledge network is constructed to organize and manage knowledge of different granularities.


(1) The RFBS process during the fork design


3 Representation and Modeling of Knowledge for Collaborative … R4

























Forward reasoning process from requirement to structure


















Reasoning process of supplementary knowledge

Decomposition of supplementary knowledge

Add the result of the reasonning

Fig. 3.39 The multi-level representation of structure, behavior, function, and requirements

The proposed methods and models are explained using a fork design case. This design is required by a company (referred to as “the customer” throughout this paper unless otherwise stated) to reduce the cost of adjustment and reconstruction of warehouses. An adaptive design is employed to meet the new requirements and knowledge reuse plays an important role. The customer proposes that “the fork can handle the loading and unloading of goods within 500 kg in the warehouse at an angular velocity of π rad/min”. Then, considering the market trend, and according to the terms and specifications, the requirements are initially formalized as listed in Table 3.5: Then designers seek a design solution to achieve this customized design through market research and expert knowledge. During the process of forward reasoning, they map the requirements to the function “Multi-angle cargo handling” that is divided into “horizontal uniform rotation” and “loading”. As the customer gives a clear constraint on the angular velocity, these structures including AC motor, planetary reducer, flexible coupling, planetary gear mechanism and fixed fork are obtained. The obtained structures can meet customer’s expected requirement. Among these, the AC motor and planetary reducer can be flexibly matched to meet required load moment and angular velocity. The FBS process is shown in Fig. 3.40. The contents and symbol of the knowledge component of function, behavior and structure are shown in Table 3.6. Table 3.5 A requirement about 500 kg rotary fork

Product design

Technical specifications

Values (unit)

500 kg rotary fork

Load mass

500 kg

Angular velocity

π rad/min

Work level


3.3 Collaborative Design Knowledge Modeling


F Process of function decomposition







Process of solving from function to behavior






Process of solving from behavior to structure






Process of supplement


Fig. 3.40 The FBS process during the fork design case Table 3.6 Contents and symbols of the function, behavior and structure nodes Class







Multi-angle cargo handling



Matrix support


Horizontal uniform rotation


Rack transmission






Horizontal uniform telescoping


Top-level movement



Bottom fixed drive


AC motor


Top-level movement



Planetary reducer


Provide the torque


Flexible coupling


Deceleration and Increased torque


Planetary gear mechanism




Fixed fork


Power output


Rotating fixed fork


Power input


Geared motor


Move horizontally


Rack and pinion mechanism


Convert electric energy into kinetic energy


Roller guide rail mechanism


Planetary reduction




Flexible connection


Two-stage-telescopic fork


Gear meshing transmission


Bevel gear transmission mechanism



3 Representation and Modeling of Knowledge for Collaborative …

Further, rotating fixed fork, as a new structure, is added to eliminate the influence of personal thinking. The rotating fixed fork involves all the behaviors of gear meshing transmission and matrix support. Besides, the fixed fork and planetary gear mechanism are the decomposition of the rotating fixed fork. The rotating fixed fork, as a supplement to available structure, is convenient for other designers to retrieval multi-granularity knowledge. (2) The cognitive process during the fork design The structures discussed above has met the requirements. As such, the knowledge generated in the RFBS process can construct a local knowledge network to support knowledge reuse for meeting the design requirements of the 500 kg rotary fork. But when external factors act on the current knowledge network, the cognitive process is needed to update the network content and structure. • The assimilation process In order to improve customer satisfaction, the design team expects to improve the fork to reduce the space occupation and production cost of fork as much as possible. The external factors then drive the current knowledge network to evolve. For designers, it is necessary to eliminate impacts on network through changing knowledge content or structure. Geared motor has been widely applied as a power structure in the design field. Though its load torque and transmission ration of geared motor are relatively limited, its space occupation is reduced greatly, compared with the combination of planetary reducer and AC motor. Therefore, geared motor, as a better structure, replaces the AC motor and planetary reducer. The fuzzy factor that reducing space occupation, is transformed into driving knowledge embedded into the current knowledge network. At the present, geared motor provides the behaviors of the AC motor and planetary reducer. AC motor and planetary reducer explain the new knowledge about geared motor. As shown in Fig. 3.41, the assimilation process achieves the balance between the knowledge network and the external environment. • The accommodation process The rotating fixed fork also occupies a large space. During the backward amending process from structure to behavior, knowledge network unable to eliminate the space constrain by modifying the content and structure of the behavior layer. Then the structure of the function layer is reconstructed through adding the function of horizontal uniform expansion as a sub-function of the multi-angle cargo handling, as well as through adjusting the function of loading as a sub-function of the top-level movement. The resultant structure is shown in Fig. 3.42. After reasoning from the sub-function “Bottom fixed drive” to the corresponding behaviors, all behaviors are the same as the function of horizontal uniform rotation, and the same knowledge can be directly used. As shown in Fig. 3.43, the reuse relationship between these two knowledge groups is built by the behavior and decision-making knowledge.

3.3 Collaborative Design Knowledge Modeling Fig. 3.41 An example of the assimilation process in the fork design case


Driving knowledge Improve the compactness of the structures . Reduce the occupation of space. Structure 1

Structure 2



Behavior knowledge Apply the geared motor that occupies less space

Structure 3


Decision-making knowledge The geared motor integrate the AC motor and the planetary reducer. Its transmission ratio is in accordance with the characteristics of the market product, but saves space

F: Multi-angle cargo handling Sub-function

F1: Horizontal uniform rotation


F3: Horizontal uniform telescoping Sub-function

F31: Bottom fixed drive

Sub-function F32: Top-level movement


F2: Loading

Fig. 3.42 The newly generated functional structure

Like the reasoning process from the function of horizontal uniform rotation to behaviors, the rack and pinion mechanism, roller guide rail mechanism and matrix base are obtained in the solution of the function of top-level movement. Similarly, the backward amending is carried out to verify that the structures can meet the requirement. As the structures supplement, a two-stage telescopic fork is similar to above rotating fixed fork mentioned above. It further reduces the space required. An accommodation is completed and the current knowledge network is extended. Further, the motor of horizontal uniform rotation shares with bottom fixed drive to reduce the cost of a motor and a coupling. Until now, the structures have relieved the impact of cost constraint and space constraint. The entire accommodation process is shown in Fig. 3.44.


3 Representation and Modeling of Knowledge for Collaborative …
















Decision-making knowledge The same behavior in rotation with the same level uniform velocity

Behavior knowledge The structures are accepted to realize the function













S7 F31

Fig. 3.43 Reuse relationship between the knowledge groups

3.4 Discussion In the chapter, collaborative knowledge capture and modeling are introduced, in which knowledge capture presents collection, extraction, transform of diversity knowledge, while knowledge modeling promotes knowledge comprehension and knowledge diffusion through organizing knowledge node, constructing relationships and so on. Dispersed and heterogeneous data is transformed into formal knowledge after processing of capture and modeling. More knowledge elements have been considered based on the reuse necessary. Knowledge captured and modeled involves in design, manufacturing, processing, assembly, testing, evaluation, maintain, service, etc., which provides strongly knowledge aid for production activities of the whole product life cycle. During the collaborative knowledge management, knowledge modeling make it possible for computer to understand knowledge and provide upper-level guidance for knowledge reuse. Therefore, knowledge reuse process can obtains logical constraints,


121 Decision-making Knowledge The telescoping function can save the space and extend loading distance.

Behavior Knowledge Add the telescoping function as sub-function








Behavior Knowledge Too much space for the fixed fork to rotate

Decision-making Knowledge Adaptability to space of the rigid structure



























Behavior knowledge Replace the planetary gear mechanism with a bevel gear mechanism and share the same gear motor.



Driving knowledge Reduce the occupation of space

Behavior knowledge The number of the geared motor and coupling decreases.

Decision-making knowledge Cone tooth mechanism transfers power between staggered shafts S12 Driving knowledge Reduce the cost

Fig. 3.44 The accommodation process during the fork design

and become an executor of maximizing the profit of knowledge. In subsequent chapters, knowledge retrieval, knowledge reasoning, knowledge-aided decision-making in collaborative knowledge management will be interpreted in detail.

References 1. Qin, H., Wang, H., & Johnson, A. (2017). A RFBSE model for capturing engineers’ useful knowledge and experience during the design process. Robotics and Computer-Integrated Manufacturing, 44(30–43). 2. Bracewell, R., Wallace, K., Moss, M., & Knott, D. (2009). Capturing design rationale. ComputerAided Design, 41(3), 173–186. 3. Qin, H., Wang, H., Wiltshire, D., & Wang, Q. (2013). A knowledge model for automotive engineering design; proceedings of the DS 75–6. In Proceedings of the 19th International Conference on Engineering Design (ICED13), Design for Harmonies, Vol 6: Design Information and Knowledge, Seoul, Korea, 19-2208.


3 Representation and Modeling of Knowledge for Collaborative …

4. Ding, L., Matthews, J., McMahon, C., & Mullineux, G. (2007). An extended product model for constraint-based redesign applications. In Proceedings of ICED 2007, the 16th International Conference on Engineering Design. 5. Fernandes, R., Grosse, I., Krishnamurty, S., Witherell, P., & Wileden, J. C. (2011). Semantic methods supporting engineering design innovation. Advanced Engineering Informatics., 25(2), 185–192. 6. Zhang, Y., Wang, H., Zhai, X., Zhao, Y., & Guo, J. (2021). A C-RFBS model for the efficient construction and reuse of interpretable design knowledge records across knowledge networks. Systems Science and Control Engineering, 9(1), 497–513.

Chapter 4

Collaborative Design Knowledge Retrieval

4.1 Overview The demand for knowledge retrieval in the collaborative design of modern products has greatly increased, as the provision of knowledge according to a specific context of collaborative design can be very useful for making informed decisions. With the highly integrative feature of multi-disciplinary and multi-mode data, the development of effective and efficient retrieval algorithms has become a very challenging topic. It is natural for designers to regard different pieces of data as nodes and their correlations as links, and this results in the creation of knowledge elements in the form of graphs. In the design database, there exists plenty of graphs which is made up of nodes and relationships, and each graph is a detailed description of an object. These objects can be regarded as different meanings depending on the real requirement of the designer. The objects include the detailed design parameter of a system with all its components, as well as the project management, problem formulation, solution generation, and prototype manufacturing and development. In the rest of this chapter, we use design rationale records as an example to explain the creation of graphic representations for product design and development knowledge. Design rationale records are captured as graphs with dependencies, which describes the issues considered, options explored and the reasons on decision. The retrieval of design rationale records is highly important as the number of nodes can be huge. The records used to explain the retrieval algorithms are constructed using the Design Rationale editor (DRed) tool developed by researchers at the University of Cambridge. It is noteworthy the methods developed are based on DRed graphs but are general enough for easy extension to other design rationale models. A scheme on how to retrieve design rationale is useful for the effective reuse of integrated design knowledge. We take DRed as the example to introduce the methods of information retrieval in detail. Considering the main information contained in DRed graphs, and also in other forms such as knowledge graph, is in plain text, so the first method is the keywordbased search of the text. The nodes in graphs are found and returned if they include © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. Wang and G. Peng, Collaborative Knowledge Management Through Product Lifecycle,



4 Collaborative Design Knowledge Retrieval

any of the keywords in a query and those including more are ranked in higher recommending order. To provide comprehensive information for designers, the automatic recommendation of other relevant DRed nodes is developed which makes the basic keyword method more efficient with fine granularity of information. The second method aims to utilize the structured information within DRed graphs to improve retrieval performance. We also try to add some innovative ideas to facilitate the efficiency of the retrieval performance, such as generating summaries for DRed nodes, filtering retrieval results based on type and status information, finding groups of nodes to achieve better match, and using complex queries to enable users to better express knowledge needs. The third method is the semantic retrieval of DRed graphs, which aims to enable the system to understand the meanings and contexts of DRed nodes, the intents behind a user’s queries, and the context in which a request for information is being made. Besides, some prototype systems are developed to evaluate the efficiency of those information retrieval, and to demonstrate the ideas of knowledge retrieval.

4.2 Knowledge Retrieval Based on Keyword Text is the main body of most knowledge models, as is the case in DRed graphs that are further supplemented by external files such as spreadsheets and pictures. Therefore, the keyword-based retrieval method for the textual information becomes an interesting topic in the first place. In this section, the design and development of a keyword-based retrieval system for DRed graphs will be described, with theories and methods from the IR (Information Retrieval) domain applied and augmented. Fundamental topics for further studies on advanced retrieval methods for DRed graphs will also be discussed, e.g. how to process DRed files, how to measure the degree to which an element in a DRed graph matches a query, etc.

4.2.1 Introduction to Information Retrieval IR is very common in our daily life, e.g. when finding a book in the library, searching information on search engines, etc. Three types of IR systems have been developed: (1) the personal retrieval system that deals with the retrieval of information in a personal computer; (2) the enterprise-level system that retrieves information distributed within an enterprise to fulfil the information needs of staff; (3) the commercial search engines that serve a wide range of customers with various information needs. Any retrieval system essentially involves exploring a collection of documents and finding useful ones to fulfil the user’s information needs. An information need is the topic about which the user desires to know more. It is often differentiated from a query, which is what the user conveys to the computer in an attempt to communicate the information need [1]. Relevance is an important concept

4.2 Knowledge Retrieval Based on Keyword


in IR, which is used to determine whether the retrieved documents are genuinely useful. A retrieved document is relevant only if it contains information of value with respect to an information need whilst it is possible that a document is deemed not to be relevant even though it happens to contain all the words in a query. A retrieval system has three main tasks in general: (1) reading through the large collections and organising the information by developing indexes; (2) understanding the information needs that are expressed as queries; (3) matching the queries to the indexes to find the results that best fulfil the information needs. There are two retrieval models in terms of how the retrieved results are obtained, namely a Boolean retrieval model and a ranked retrieval model [1]. In the former, a query is used in the form of a Boolean expression of terms. The terms are combined with the operators AND, OR, and NOT, e.g., “Ceramics (OR) Plastics (NOT) Metals”. The user thus needs to explicitly specify the words to be included or excluded in the results. The retrieval system will strictly examine all the documents and discard those that fail to satisfy the specified condition. The ranked retrieval model does not require a strict Boolean expression and allows the use of text queries, e.g., “design research history”. The retrieval system using this model examines all the documents on the basis of the measure of their similarities to a text query given by the user, and returns a sorted list of results with different degrees of similarity. The general framework of a retrieval system for textual information is shown in Fig. 4.1. The first task of a retrieval system is to collect, extract, and index information from various forms of document collections, as shown on the left of Fig. 4.1. This task is specifically completed by three components, namely an information collector, a parser and an indexer. An information collector is the component developed to locate

User query Parsing Linguistics Free text query parser


Document cache

Spell correction


Sorting and ranking


Metadata in zone and field indexes

Inexact top K retrieval

Tiered inverted positional index


Scoring parameters Machine Learning

Indexes Training set

Fig. 4.1 Framework of an information retrieval system


4 Collaborative Design Knowledge Retrieval

and access the sources of information, e.g. a crawler programme used by commercial searching engines such as Google to collect information from Web pages. A parser performs linguistic analysis on the textual information collected by the information collector to extract the elements from a document, e.g. words, phrases, and sentences. An indexer then tries to construct the indexes between a document and the elements extracted so that a document can be efficiently located when these elements are used as queries. The documents frequently used are usually stored in a document cache so that accessing them can be more efficient. It is noteworthy that different indexes can be developed according to the specific requirements of a retrieval system; see for example those listed in Fig. 4.1. A document is indexed by multiple fields (e.g., the authors, publication date, category, of a book) by the zone and field indexers. The K-gram indexer is used to deal with queries in which users only put K characters in a sequence (e.g., the three-gram of the word ‘castle’ consists of ‘cas’, ‘ast’, ‘stl’, and ‘tle’) for the query. These letters are extended to form a range of terms that users may mean to use. To reduce the size of the index, the parser component should not extract each word from a document while tokenising the text. A number of issues need to be solved to reduce the index size: (1) dropping stop words (the common words like a, an, the, etc.); (2) treating equivalent terms (e.g., PhD, PHD, and Ph.D.) as one single term; (3) processing punctuations, e.g., hyphens; and (3) stemming and lemmatization (treating words like organization, organize, and organizer as one stemmed term ‘organiz’). Moreover, there is also a need to compress the indexes for large-scale retrieval systems [1]. Apart from the parsing and indexing of information, the processing of queries and ranking of retrieved results are also important to ease the use of a retrieval system. The processing on queries aims to correctly parse them so that the information needs of users can be well understood. Similar to the parsing of a document, such processing also involves the tokenisation of queries. Moreover, research work is also being done to identify the phrases from a set of terms, i.e. supporting phrase-based query, as well as to correct users’ spelling automatically. The ranking of results is concerned with developing a method to calculate the degree to which a document matches the queries given by users. Once such a measure is available, the scoring and ranking of results retrieved can be built on top of that. Different weights can be assigned to different terms when dealing with scoring. These weights can be calculated by using machine learning algorithms on the training dataset obtained from users as shown on the right of Fig. 4.1.

4.2.2 Extracting Terms from Knowledge Models Like other textual information of different knowledge models, the contents of DRed graphs need to be parsed before indexes for retrieval are built. As well as parsing the words in each DRed node, the dependencies between these nodes and the linking of related DRed files also need to be identified which can be useful for retrieval. It is tedious for designers to specify the interconnections of DRed nodes and the linking

4.2 Knowledge Retrieval Based on Keyword


of DRed files as this work has been done when files are created. Therefore, a number of methods and algorithms for automatically identifying the information have been developed by the author. The development is discussed in detail in this section. The DRed parser scans through a DRed folder to get all the files included, and then reads each DRed file line by line. These files are stored in the General Markup Language (GML) format with information indicated using labels and grouped using brackets. For example, a ‘node’ label indicates that this is the starting point of a DRed node and the contents within the brackets following it are the detailed information of this node. Therefore, the DRed parser can infer the meaning of each line read from the DRed file by understanding the labels and find detailed information by recognising brackets. The meanings of the labels used by the DRed parser when parsing DRed files are illustrated in Table 4.1. An algorithm for parsing DRed files is shown in Fig. 4.2, which, given a DRed file, extracts information and populates the data structure. It involves an iterative process in which each line in a DRed file is read and analysed following the rules in Table 4.1. In the figure, italic elements highlighted in red (e.g. nodevct) are used to denote variables in a programming language, and words in bold (e.g. if, for, else) represents the basic elements of the programming logic. A vector nodevct is defined to store the nodes, edges, and tunnel-links. Three Boolean variables (isNewNodeFound, isNewEdgeFound, and isNewLinkFound) are defined as flags to indicate whether a new node, edge, or tunnel-link is found while the DRed parser was parsing previous lines. Once one of these flags is placed, the DRed parser will create a new object accordingly, and carry on to read more lines until all the necessary information are Table 4.1 Meanings of the labels used in DRed files Label name

Meaning of the label


A new node is found and the following text until the starting point of next node is the information about this node


The id (unique in the current file) of the node found


The text between the following double quotation marks is the content of the node found


The text between the following double quotation marks indicates the type of the node found


The text between the following double quotation marks indicates the status of the node found


The text between the following double quotation marks is the name of the file to which the current DRed file is tunnel-linked


The text following this label is the id of the pair tunnel link in the targeted DRed file


The text following this label is the id of the tunnel link in the current DRed file


This means an edge connecting two nodes in the DRed file is found


The id of the source node of an edge


The id of the target node of an edge

128 Fig. 4.2 An algorithm for parsing information from DRed files

4 Collaborative Design Knowledge Retrieval ParsingADRedFile(Dredfile file) 1. Vector nodevct=Φ; 2. isNewNodeFound = false, isNewEdgeFound = false, isNewLinkFound = false; 3. for each line el read from file 4. do infer the meaning of the el 5. if el represents the starting point of a node 6. do isNewNodeFound = true 7. create a new node object node 8. else if el represents an edge 9. do isNewEdgeFound = true 10. create a new edge object edge 11. else if el represents a tunnel-link 12. do isNewLinkFound = true 13. create a new tunnellink object tlink 14. else if isNewNodeFound 15. do extract the data in el and store them in node 16. if all the data for node are stored 17. isNewNodeFound = false 18. insert node to nodevct 19. else if isNewEdgeFound 20. do extract the data in el and store them in edge 21. if all the data for edge are stored 22. isNewEdgeFound = false 23. insert edge to nodevct 24. else if isNewLinkFound 25. do extract the data in el and store them in tlink 26. if all the data for tlink are stored 27. isNewLinkFound = false 28. insert tlink to nodevct

extracted. When all the data are extracted for an object, the object will be inserted to nodevct and the flag will be set as false. The textual information in DRed nodes consists of a number of words and is used to describe problems, propose solutions, or make arguments. These words need to be extracted and processed to construct the index between terms and DRed nodes, which raises several issues. First, the DRed parser should be able to judge whether a word is useful for presenting the meaning of a DRed node. Secondly, punctuations need to be removed from a word otherwise errors may occur, e.g. the computer will treat ‘blade’ and ‘blade;’ as different terms. Thirdly, words that actually represent the same thing are not necessarily treated as separate terms (e.g. ‘airfoil and ‘airfoils’) while they need to be identified and linked to just one term. The removing of punctuations from DRed nodes not only reduces the size of the index but can also ensure that correct terms are identified. In addition, some special symbols are also used by DRed and need to be discarded, e.g. ‘>’, ‘"’ and ‘&’. An algorithm is developed to remove such punctuations and symbols, as shown in Fig. 4.3. A function ‘getTokensOfAString’ is developed to split a string by identifying the characters given as gaps in the string and getting the substrings on

4.2 Knowledge Retrieval Based on Keyword Fig. 4.3 An algorithm for removing punctuations and symbols


RemovePunctuations(Vector symbolList,DRedNode node) 1. Vector tokens = getTokensOfAString(node,‘ ’); 2. Vector terms = new Vector(); 3. for each token in tokens 4. do trim(token) 5. for each symbol in symbolList 6. if symbol exists in token 7. replace symbol with ‘ ’ 8. trim(token) 9. Vector termInToken=getTokensOfAString(node,‘ ’); 10. Store the terms in termInToken to terms Symbols to remove: ‘.’ ‘,’ ‘'’ ‘~’ ‘=’ ‘/’ ‘?’ ‘!’ ‘;’ ‘:’ ‘(’ ‘)’ ‘&’ ‘£’ ‘>’ ‘<’ ‘"’ ‘-’

both sides of these gaps. Such a process is also termed tokenisation of a string, with the characters termed as splitting characters. For example, ‘Cambridge University’, when using space as splitting character, is tokenised as ‘Cambridge’ and ‘University’, and is tokenised as ‘Cambri’ and ‘University’ when ‘dge’ is used for the splitting. The ‘trim’ function in Fig. 4.3 is used to remove the spaces on both the left and right ends of a string. The first step of the algorithm is to tokenise the text of a DRed node as a set of sub-strings by using space as the splitting character. Each sub-string is then checked to see whether it contains the symbols to remove; if the symbols exist, replace them with a space. When such a process is completed, all the symbols in a sub-string will be replaced by spaces and a new string is constructed. The newlyconstructed string is again tokenised using spaces as the splitting characters, resulting in the generation of a set of terms that will be stored in a vector (i.e., terms in Fig. 4.3).

4.2.3 Quantifying Similarities and Ranking Retrieved Results Once an index is constructed, two further questions are raised: (1) how to determine the importance of a term within a node; and (2) how to quantify the content of a node and evaluate the similarity between two nodes. Answers to the former question provide a measure which can be used to rank the results retrieved. The latter question can not only cluster DRed nodes that have similar contents, but also deal with the cases when multiple terms are submitted as queries. Shallow Natural Language Processing (NLP) techniques are applied in this research to solve the problems raised. For question one, the TF-IDF measure is used to determine a term’s importance to a node, which depends upon statistical information obtained from the information extracted from DRed files. Let’s consider a term t and a DRed node n, the TF of t in n can be defined as t f t,n which is an integer value of how many times t occurs in n. To make the importance of t increase smoothly as TF increases, a log frequency weight of term t to node n can be defined as follows.


4 Collaborative Design Knowledge Retrieval

 wt,n =

1 + log10 t f t,n , i f t f t,n > 0 0, other wise


The premise of TF is that if a term occurs more frequently than others in a node, it tends to be more important to this node. However, such a premise has some problem for those popular terms which tend to appear in many DRed nodes, and as such reduce the accuracy of the TF measure. Therefore, a new measure of the informativeness of a term t should be used, which records the number of nodes the term occurs in, i.e., Document Frequency (DF). Obviously, a term’s importance to a node degrades if its DF is very high. Based on DF, the Inverse Document Frequency (IDF) is proposed to lower the influence of popular terms on their importance to a node, which is defined as follows where N is the total number of nodes and d f t represents the DF of a term t.  log10 N /d f 1 , i f d f t > 0 id f t,n = (4.2) 0, other wise Then the importance of a term t to a DRed node n can be determined by the TF-IDF measure as defined in Eq. (4.3). Further discussion of the TF-IDF measure can be found in [1]. t f id f t,n = wt,n × id f t,n = (1 + log10 t f t,n ) × log10 (N /d f t ), t f t,n , d f t > 0 (4.3) Each DRed node then has a number of TF-IDF values each of which corresponds to a term occurring in it. Based on these values, a vector space model for a document can be developed, with each term representing a direction in the multiple-dimensional space [1]. Thus the similarity between two nodes can be calculated by measuring their ‘distance’ in the vector space. Here, the cosine similarity measure is employed, as shown in Eq. (4.4) where n and m are two nodes and V is the set of terms occurring in the two nodes. A query with more than one term can be treated as a short document and can then be scored against all the related DRed nodes using the cosine similarity measure. Based on the values obtained, the nodes can be ranked and recommended to users with different priorities. Similarly, the similarity between two separate DRed nodes can also be quantified using this measure. cos(→ n , m) → =

|V | m i ni n→ · m → = / i=1/ |→ n ||m| → |V | 2 |V | 2 i=1 n i i=1 m i


Also, with the increase of computing capacity, neural network is widely used in NLP, which further promotes the development of pre-trained language model like Word2Vec [2], ELMO [3], Bert [4], etc. At pretraining stage, the specific unsupervised manner applied in large amount of unlabelled text corpus is utilized, which can introduce the semantically similar words to be close in the characteristic space. The pre-trained model learns semantic knowledge after training in a large number

4.2 Knowledge Retrieval Based on Keyword


of unannotated data corpus. Since the range of training datasets are typically open domain, fine-tuning the model to make it applicable to specific areas is also an important part for getting the domain-specific semantic information. Some methods based on the knowledge representation of language model are employed to calculate the similarity between plain texts. Generally, the words with incomputable form would be converted to low dimensional dense vectors. The vectors contain the semantic information and make it easy to measure the semantic distance between terms and documents. Given a term t and a DRed node n, the words of t and n is passed into the pre-trained language model, and then the embedding vectors vt , vn of t and n are obtained separately with the output of language model. Thus, the importance of a term t to a DRed node n can be measured by calculating the cosine distance between vectors. As for the rest part of quantifying similarity, we can also construct the specific vectors for each document, with each elements of vectors representing semantic similarity between the terms and nodes. Considering the similarity calculation of two nodes, we can also use the cosine formulation for getting the similarity value mentioned before.

4.2.4 Implementation of a Keyword-Based Retrieval System The long term goal of this research is, through the development of methods and tools, to support the reuse of design knowledge by capturing, storing, retrieving, and reusing DR. In particular, the retrieval and reuse of DR is the focus. Generally, information can be provided to designers in two ways, namely a passive way and an active way. In the former, information is pulled by designers by submitting explicit queries. The latter involves the interpretation of designers’ intentions in seeking information, and actively pushing information to them. The methods discussed in this chapter are the basis of all further work as the DR captured using DRed is mainly embodied as text. A number of requirements are thus raised for the implementation of a prototype system. First, the system must support the retrieval of DRed graphs with acceptable recall and precision. Secondly, the system should enable easy interactions with users. Thirdly, the system must have a flexible framework by which it can easily adapt to new situations and be upgraded with new functionalities. The last requirement is very important as future changes to the DRed system are inevitable. First, the DRed tool will continue to evolve to meet the needs raised in industry, e.g., new diagrams are added to DRed 2.0 [5]. Secondly, the format of DRed files may change. Thirdly, the system will keep being extended by improving old, and adding new, functionalities. In this case, a framework is utilized to define the structure of a complex system. This structure helps to identify the sub-modules performing different functions together with their relationships, which is very popular in software system design. As shown in Fig. 4.4, a three-layer framework has been developed by the author, which includes a GUI layer, a methodology layer, and a resource layer. Specifically, the Graphical User Interface (GUI) layer provides interfaces through which users can interact with the retrieval system. Currently, three kinds of interaction


4 Collaborative Design Knowledge Retrieval


User GUI Layer Selection of query terms

Navigation of nodes retrieved

Navigation of relevant nodes

Methodology Layer Coordinator Auto-suggestion of terms Shallow NLP technique Identification of dependencies

DRed Parser

Resource Layer Design rationale database DRed repository Fig. 4.4 The framework of the prototype system

are supported, namely specifying queries, navigating DRed nodes retrieved, and navigating relevant DRed nodes. In the methodology layer, the methods described in Sects. 4.2.1, 4.2.2 and 4.2.3 are implemented to receive commands from, and generate data for, the GUI layer. The resource layer consists of the DRed files to be retrieved, as well as the database to store information about the index, equivalent words, etc. The core of the framework is the methodology layer which has two important components, namely the DRed parser and the coordinator. The DRed parser uses the methodology developed for parsing the information from DRed files and identifying dependencies between DRed nodes. The component bridging the GUI layer and the methodology layer is called a coordinator, which controls the execution of all the tasks when the system is used. For instance, when a user presses a button with the purpose of extracting information from DRed files, the coordinator will initiate the DRed parser and make it to scan through the repository. Once the scanning task is completed, the coordinator then sends a message to the user via the GUI. The performance of a retrieval system can be evaluated in terms of effectiveness (i.e. the quality of its search results) and efficiency (i.e. the time it spends on searches in general). Different retrieval systems may have different focuses, e.g. efficiency is the main concern of most commercial searching engines whereas effectiveness may be more important than efficiency for enterprise-level retrieval systems. In an ideal case of retrieval, the results obtained include all the relevant documents in a collection but exclude any single non-relevant document. Such an ideal case is

4.2 Knowledge Retrieval Based on Keyword Table 4.2 The subsets of a documents collection [1]

133 Relevant



true positive (tp)

false positive (fp)

Not retrieved

true negative (tn)

false negative (fn)

extremely difficult to achieve and a retrieval system needs to tolerate that some relevant documents are not found whilst some non-relevant documents are mistakenly found. These two issues need to be taken into considerations by any retrieval system and result in the use of two measures for the effectiveness of a retrieval system, namely recall and precision. Both recall and precision are set-based measures. As shown in Table 4.2, a whole collection of documents can be divided into four subsets, namely true positive (tp), false positive (fp), true negative (tn), and false negative (fn). Specifically, the tp subset involves the documents that are relevant and successfully retrieved; the fp subset comprises the non-relevant documents that are retrieved by mistake; tn denotes the relevant documents that the retrieval system fails to find; fn denotes the non-relevant documents that are not retrieved. Recall is defined as the fraction of the retrieved relevant documents in all the relevant documents in a collection, and can be calculated using the formula t p/(t p + tn); precision is defined as the fraction of the retrieved relevant documents in all the retrieved documents, and can be calculated using the formula t p/(t p + tn). Recall indicates a retrieval method’s capability of exploring a large collection and locating the relevant documents while precision indicates its capability of filtering out the non-relevant ones. These two measures need to be traded off against each other in any IR system. For example, it is simple to achieve a recall of one by returning all the documents in a collection, which will, however, greatly reduce the precision. In the Boolean retrieval model, the two measures can be simply obtained by identifying the numbers of the different subsets and doing calculation using the two formulas because all the results equally match the query. In a ranked retrieval context, the results are sorted on the basis of their different degrees of similarity to the query, that is, they are suggested to the user with different levels of priority. For example, the first result in a sorted list with 100 results is in general much more important than the last one. The recall and precision of a ranked retrieval system are usually evaluated and traded off against each other by analyzing the top K retrieved results [1]. To measure the effectiveness of an IR system in the standard way, a test collection is needed, which consists of three things: (1) a document collection; (2) a test suite of information needs, expressible as queries; (3) a set of relevance judgments, usually a binary assessment of either relevant or non-relevant for each query-document pair [1]. Moreover, system quality and user utility are also important if IR systems are to be evaluated from a broader perspective. In this sense, evaluation is a way of quantifying aggregate user happiness, based on the relevance, speed, and user interface of a system [1]. A combination of the ideas from the two evaluation methods discussed above is utilized to evaluate the prototype system as the focus of this research is not to design and develop a perfect IR system, but to develop effective retrieval methods that can


4 Collaborative Design Knowledge Retrieval

help designers to find the DR they need. The system is evaluated in terms of both functionality and performance. Its functionality is evaluated by checking whether the methods discussed in previous sections of this chapter can help with the retrieval of DRed graphs. Factors for evaluating system performance include the running speed, and the recall and precision of retrieval. The criteria for temporal performance include the response time should be within one second; the time for starting up should be within 10 s. The main criterion for good recall is that the system should return as many results (containing at least one keyword in the query) as possible (preferably 100%). The main criteria for good precision include nodes precisely meet the need expressed in the query should be placed at the top of the results list; and if a node better meets the need than others, it should be placed at a higher position in the list. As shown in Fig. 4.5, the test of functionality involves five tasks, from specifying DRed folders to scan to opening DRed files using the link shown in the GUI of the retrieval system. Once a DRed folder is selected, the system can automatically extract information from the DRed files included. The extraction of information is not necessarily done each time and can be performed only when a good number of new DRed files have been created. Users can either select keywords suggested by the system or type in a few letters from which potential keywords will be generated by using a filtering method. A process of filtering the keywords by typing letters to the system is shown in Fig. 4.6. If users make a decision on which keywords to use, they can click a ‘Go’ button and browse results displayed in the GUI. For each DRed node retrieved, two further links are provided, namely a file link and a dependency link. The former is used to show a hyperlink to the DRed file, which, once clicked, will start DRed to open the file as shown in step 5 of Fig. 4.5. The latter only exists in the context of a specific DRed node and can trigger the panels on the right of the GUI to show relevant nodes. This process shows that the system has got the basic functionality to support keyword-based retrieval of DRed graphs. The dataset used in the evaluation involves 11 projects and 35 DRed files. 62 tunnel-links are used to link the 35 files and 892 DRed nodes are connected using 865 edges. There are 1323 terms extracted totally, with stop-words discarded and synonyms grouped. As the current dataset is not very large, it is reasonable to store all the data in files and load them to memory for retrieval without using a Database Management System (DBMS). During the benchmarking, the system ran on a PC with 1.83 GHz Duo Core CPU and 8 GB of memory, and completed the scanning in 234 ms. It took about 1300 ms to load the data to memory. This long loading time is mainly attributed to reading files on the hard drive. This indicates that it will take 6 s to parse 1000 files with about 30,000 DRed nodes, which is reasonable. However, this also means it will take about 30 s to load all the data for 30,000 nodes to memory. The time taken to parse DRed files does not affect the performance of the system much as they are parsed incrementally. The time taken to load data to memory only affect the performance of starting up the system and can be improved by using DBMS and distributed servers with high-performance hardware. The time taken to search results is the one most critical to the performance as the user cannot wait too long. The average time taken for searching results is about 10 ms. Therefore, the criteria for

4.2 Knowledge Retrieval Based on Keyword


Fig. 4.5 A process of using the retrieval system

good temporal performance discussed at the beginning of Sect. 4.2.4 have all been met in the evaluation. However, high-performance hardware and a DBMS should be used when the number of DRed files dramatically increases, which is beyond the scope of this dissertation. The recall of the system is evaluated by performing a number of retrieval tests. The keywords in these tests were chosen from some DRed nodes that were found by randomly opening some DRed charts. Firstly, eight tests are performed, in which only one single keyword is used. The keywords used are ‘dust’, ‘cap’, ‘engine’, ‘blade’, ‘chemical’, ‘seal’, ‘weld’, and ‘stress’, respectively. In all these cases, the prototype


Nothing inputted

4 Collaborative Design Knowledge Retrieval

‘co’ inputted

‘con’ inputted

‘cons’ inputted

Fig. 4.6 A process of filtering the terms suggested

system can find all the DRed nodes that contain the keyword used. Secondly, eight tests are performed, in which two or three keywords are used. The queries used are ‘reduce stress’, ‘dust cap’, ‘shroud fitting’, ‘increase flow’, ‘bucket groove’, ‘fatigue cycle’, ‘core material’, and ‘improve ignition capability’, respectively. In all the cases, again, the system can find all the nodes that contain at least one of the keywords in the queries. This indicates that the methods developed can extract information from DRed graphs exhaustively and search relevant pieces of information from a whole DR space effectively. Therefore, the criteria discussed at the beginning of Sect. 4.2.4 are met. It is hard to evaluate the precision by simply calculating the fraction of good results in an intended (for achieving high recall) large number of results. As discussed at the beginning of Sect. 4.2.4, two criteria are used for assessing precision: nodes that precisely meet the need expressed in the query should be placed at the top of the results list; and if a node better meets the need than others, it should be placed at a higher position in the list. The precision is also evaluated by randomly opening a DRed file and finding a DRed node from which some terms are selected to form a query. The queries used for the evaluation of the recall are used again in the tests done for evaluating the precision. In all the cases, the top one result is the DRed node from which a query was formed. This indicates the first criterion is met. The second one is not always met as there are some cases in which some results are ranked in lower places than those less useful ones. Table 4.3 (please note that some words are

4.3 Retrieval of Structured Design Knowledge


replaced by XYZ, ABC, AB and AAA for the sake of confidentiality) shows the results for one of the tests. As shown in the table, the query is ‘improve ignition capability’, which has 33 results ranked in a sorted order by the retrieval system. The criteria used for assessing the utility of the results are (1) a result includes all the keywords which have the same (or similar) meaning as the query and (2) a result does not include all the keywords but include the ones central to the meaning of the query. Take ‘improve ignition capability’ as an example, the word ‘ignition’ is central to the knowledge need and results that contain ‘improve ignition’, ‘ignition capability’ or even only ‘ignition’ may also be useful. Result 1 in the list is exactly the node from which this query was formed and therefore it is a useful result. Result 2 is about how to improve ignition and meets criterion (2). Results 5 and 10 are related to ignition performance, which can also be useful to the user. Result 3 states there is no evidence of ignition problem. Although it is not related to ignition capability, the nodes connected to it may be of interest to the user. The other results in the top 10 places are not useful although they contain either ‘improve’ or ‘capability’. Some results not ranked in the top 10 places can also be useful, e.g. results 13 and 16. Results ranked between 21 and 33 are all not relevant to the query. It is shown that some useful results are ranked in lower places than those that are less useful. This is due to that keyword-based search does not distinguish the keywords used and mainly relies on the calculation introduced in this chapter. Another interesting finding is that although some results (e.g. Result 3) are not useful, other nodes connected to them may be of interest to the user. These issues can be addressed by taking into account the inherent structure of, and the semantic information in, DRed graphs, which will be discussed in Sect. 4.3 and Sect. 4.4 respectively.

4.3 Retrieval of Structured Design Knowledge The keyword-based retrieval method lays a foundation for algorithm to be detailed in this section, resolving problems such as how to extract information from, and construct indexes for, knowledge models like DRed graphs. As demonstrated in the preliminary evaluation of a prototype retrieval system described in Sect. 4.2, such a method can well support the retrieval of relevant DRed graphs. Two further methods have also been developed to improve the performance and the functionality of the system. Specifically, potential keywords are automatically suggested by the tool to help users who have no exact ideas about what to search for; the dependencies between DRed nodes are identified and utilized to suggest related information for such users. However, this method still has a big drawback—it treats all DRed nodes equally without considering the inherent structures within DRed graphs. For instance, assume ‘fuel combustion’ is used as a query, then two nodes would be deemed to be equally useful if they both include the two keywords and the times the two keywords appearing in the two nodes happen to be equal. However, there are two cases in


4 Collaborative Design Knowledge Retrieval

Table 4.3 Results retrieved for “improve ignition capability” Number Results retrieved 1–10

(1) Improve ignition capability (2) Improve basic XYZ ignition chc. by design mod. to improve fuel/air mixing, placement & atomization (3) No evidence of ignition problem (4) Provide ignition or low power assistance to XYZ to improve fuel/air mixing, placement and atomization (5) Unknown XYZ ignition performance (6) This will improve resistance to bending (7) The high temperature capability of adhesive XYZ is insufficient (8) Can’t turn the ignition key and see whether or not there is a spark (9) Improve adhesion at elevated temperatures (10) WHAT IF… Ignition performance is UNACCEPTABLE for basic XYZ?


(11) Higher temp. capability disc material (12) Improve recirculation within primary zone (13) The ignition isn’t generating a spark (14) Provide balancing capability of 90 XYZ-units (15) How can the stress defending effect of the set screw holes be improved? (16) WHAT IF… Ignition performance is OK for XYZ NP? (17) DESIGN MOD. Introduce one or two XYZ jets for ignition and starting (18) How to improve design life of dust cap to 3 shop visits? (19) What measures can be used to improve scavenge/avoid oil leaks? (20) Need to prove capability of casting this geometry of print out in NP


(21) There might be a way to improve the stress defending effect of the set screw holes (22) XYZ has insufficient resistance to ice impact and reduces fan blade containment capability (23) This will provide additional strength, and thus improve their resistance to lifting during AAA flight (24) The set screw holes can be positioned further outwards to improve the clamping of two flanges (25) Introduce a dust cap with improved design life (26) Assess temperature distribution/profile in service. Compare with seal capabilities (27) Improved core print out geometry (28) Provides max balancing capability of only 153.5 XYZ units (29) How might the integrity of the washers beneath the ice impact panel attachment fixings be improved? (30) Improved weld material applied to current core print out design using current welding technology (31) Improved welding technique on current core print out using pre heat to ABCC and sweat welding XYZ 1 or another material XYZ2? (32) (other words are removed as this node contains lots of words) However with these weights the balancing capability at this flange is limited to AB XYZ-Units. As the required balancing capability is ideally more than AB XYZ-Units, additional weights are required (33) The stress in the disc flange main bolt hole and defender/set screw hole is sensitive to the depth of the chamfer. For reasons of manufacturing capability the tolerance on the depth had to be set to (other words are removed as this node contains lots of words)

4.3 Retrieval of Structured Design Knowledge


which one of them could actually be more useful than the other. In the first case, when they both appear in the same DRed file, if one node has more associated nodes relevant to the query, it can be more useful. In the second case, when they appear in different DRed files, if one appears in a DRed file more relevant to the query, it is very likely to be more useful. In this section, the structured information within DRed graphs will be exploited and utilized to improve retrieval performance and to support more complex queries. A good starting point for utilizing this structured information is to understand the knowledge needs of designers. Generally, they have some common needs in solving problems encountered in their work, as well as some specific needs raised in the specific context. As discussed earlier, Design Rationale (DR) can fulfil most of the needs identified in earlier research [6] by providing detailed descriptions about issues to be addressed, answers proposed, and arguments made. In this research, the author assumes that there are mainly two common motivations for designers to search through existing DRed graphs: firstly, they have an issue but do not know how to address it, so they wonder whether similar issues have been identified and addressed in previous projects; and secondly, they have found an answer to an issue but are not sure about the effectiveness of their answer, so they would like to see whether similar issues have been met before and whether similar solutions were considered in those situations. The stated assumption is believed to be reasonable as other knowledge can also be found from these problems and solutions, as well as from the arguments associated with them. The keyword-based retrieval method cannot well support users with the above two motivations. Firstly, it does not provide information about the type and status of a node so that the user needs to find out the question or solution of interest from it. Secondly, a single node like DRed does not contain sufficient information for describing both problems and solutions. Thirdly, it is very difficult for the retrieval system to identify the problems and solutions of interest from a query simply expressed as a set of words. These disadvantages can be remedied by using the structured information in DRed graphs. Based on the drawbacks discussed above, a number of ways can be proposed to utilize the structured information in the retrieval. Firstly, the DRed nodes found in response to a specific query can be re-ranked by taking into account their types, statuses, positions in a graph, and connections with other nodes. Secondly, instead of returning a set of individual DRed nodes to users, a subset of a graph can also be recommended by the retrieval system on the basis that the relevant keywords appear in different positions of the sub-graph. Thirdly, queries with more semantic information can be submitted by users, which can be used to inform the retrieval system about the correlations of different keywords. Fourthly, if the similarity between two sub-graphs can be calculated, then similar pieces of DR can be automatically suggested by the retrieval system.


4 Collaborative Design Knowledge Retrieval

4.3.1 Using the Feature Information of Nodes To make use of the structured information implicit within DRed graphs, it is first necessary to understand what can distinguish one DRed node from another apart from its textual contents. Generally, the features that make one DRed node different from others include its type, status, position in a graph, and interconnections with other nodes. Based on this information, a retrieval system can infer the purpose of a DRed node, and use the inference to determine the degree to which it matches a query submitted by the user. The inferences made on different nodes with different types and statuses are identified by the author, as shown in Table 4.4. The ‘file’ and ‘text’ nodes are not included in the table as DR information is mainly targeted in this research. Apart from the type and status of a DRed node, other information (e.g. its position in a graph and its connections with other nodes) can also be utilized to infer its importance and complexity. For example, if a keyword (e.g. ‘combustion’) appears in the top node of a DRed graph, it is very likely that the whole rationale record is trying to resolve a problem concerning combustion. On the other hand, if it appears at a lower level of a DRed graph, combustion may be just part of the problem or a further issue raised by a proposed solution. Likewise, if many connections are created for a node, this probably means that this node is about a complex topic requiring many arguments or raising many further issues. It is noteworthy that the number of connections to a node does not necessarily indicate its complexity, as it is also possible that some pieces of DR are captured in more detail than others. Nevertheless, when two nodes are found to be equally important to a query (with all other factors taken into account), this information can still be helpful as a further indicator of their relative importance.

4.3.2 Returning Nodes Group as the Results A complex problem generally involves a large piece of DR divided into several smaller pieces, each of which denotes a sub-graph and is stored in a separate DRed file. These smaller pieces are linked using tunnel links, which enables users to navigate multiple DRed files easily. Figure 4.7 shows two DRed graphs linked through a tunnel link, with the bottom one created to elicit discussions for a con-argument raised in the top one. By following these tunnel links, a DRed parser can identify the linking of DRed files (and the DRed nodes linked) and thereby construct a complete picture of a large DR space. Therefore, the retrieval system will be able to understand the hierarchy of, and locate a DRed node in, a large DRed graph. The newer version of DRed uses a template to guide the capture of DR. A DRed file named ‘!Top Level’ is used to structure a project, which includes a top node for project statement and a number of followers about different aspects (project management, problem formulation, etc.) of a project. DRed nodes in a graph have

4.3 Retrieval of Structured Design Knowledge


Table 4.4 Inferences carried by DRed nodes with different types and statuses Type of node

Status of node

Inferences made



(1) The issue is resolved by an effective solution (2) It is very useful for the designers who work on similar issues


(1) The issue still requires further work (2) It might be useful for the designers who would like to see whether such an issue was met before


(1) The issue is rejected (2) It is very useful for the designers who are not sure whether it makes sense to address an issue


(1) No effective solutions can be found (2) It is useful for the designers who want to find potential solutions for a problem


(1) The answer is accepted as an effective solution (2) It is very useful for the designers who work on similar issues


(1) The answer can likely be used to resolve the issue (2) It might be useful for designers


(1) The answer node still requires further work (2) it might be useful for designers


(1) The answer can hardly be used to resolve the issue (2) It is very useful for the designers who have a solution and want to see how it worked previously


(1) The answer is rejected (2) It is very useful for the designers who have a solution and want to see how it worked previously


(1) The pro-argument is proved to be valid (2) It is very useful for the designers


(1) The pro-argument plays a major role in confirming the validity of a statement or solution (2) It is very useful for the designers


(1) The pro-argument fails to support a statement or solution (2) It can be used to remind users with similar ideas


(1) The con-argument is proved to be valid (2) It is very useful for the designers


(1) The con-argument plays a major role in confirming the validity of a statement or solution (2) It is very useful for the designers


(1) The con-argument fails to support a statement or solution (2) It can be used to remind users with similar ideas





4 Collaborative Design Knowledge Retrieval Level 1 Level 2 Level 3



Level 4


Level 5

A tunnel link Level 6

Level 7

Fig. 4.7 Levels and interconnections of DRed nodes

multiple levels in terms of the relative positions they have, see for example the level 1 to level 7 shown in Fig. 4.7. Some branches of the graph may involve many levels while some have fewer levels. DRed nodes at the same level are not necessarily of the same type. Whatever position a DRed node is in, it will definitely be interconnected with some other nodes. An arrow is used to indicate the connection between two DRed nodes, with the arrow pointing to the one from which the other one is derived. For example, nodes 2 and 3 are derived from node 1 in Fig. 4.7, with arrows pointing to node 1. DRed does not constrain the connections between two nodes too much and therefore a node can be connected to another node of any type. The primary types of connections between two DRed nodes mainly fall in two categories, namely relationship of dependency and relationship of similarity. The former is derived by the fact that a DRed node is definitely correlated to another one in the same graph. The latter can be identified based on the premise that two DRed nodes, if having a larger portion of words in common, are more likely to talk about the same thing. A first step for making use of the levels and interconnections of DRed nodes is to understand exactly what these two features mean in the context of representing DR as a graph of dependencies. The levels of DRed nodes essentially reflect the

4.3 Retrieval of Structured Design Knowledge


stages of a problem-solving process. Higher levels mean earlier stages of the process whereas lower ones correspond to later stages. The DRed nodes at high levels tend to describe a design problem with a high level of abstraction while those at low levels usually aim to describe solutions and arguments in more detail. Moreover, the branches involving fewer levels of DRed nodes tend to mean that the proposed solutions were quickly proved to be ineffective, for example the left two branches of the top graph in Fig. 4.7. Obviously, such an inference does not always hold, and a key prerequisite is that there must be at least one branch which involves many more levels. If a user has not got much information about a problem, the DRed nodes at higher levels should be more useful for them as they can start to learn about the problem at a more abstract level. On the other hand, the DRed nodes about detailed solutions should be more useful when users have got enough information. With this inference, the retrieval system can automatically suggest some information for users based on their different contexts of working. The interconnections between DRed nodes not only indicate their dependencies but also reflect the complexity of a DRed node. Specifically, a DRed node is very likely to describe a complex issue if it has many successors derived from it. If a DRed node is derived from two or more predecessors, it might be inferred that this node describes something that applies to different places of the problem-solving process and its predecessors all pertain to a similar topic. Through the analysist upon the datasets, we reveal that 52 nodes have more than four successors and 105 nodes have more than three successors, out of a total of 892 DRed nodes. Amongst those 52 DRed nodes, the numbers of ‘answer’, ‘issue’, and ‘con-argument’ nodes are 34, 15, and 3 respectively. The 105 DRed nodes that have more than three successors include 68 ‘answer’ nodes, 31 ‘issue’ nodes, 5 ‘conargument’ nodes, and 1 ‘pro-argument’ node. From these data, it can be inferred that ‘answer’ node is most likely to involve many further discussions and ‘issue’ node comes the second, thus these two types of nodes will be given higher priority when the retrieval system tries to sort the results list. Also, those nodes having more successors can be returned with priority as they provide more potential information which can be further referred to by interested users. Moreover, 4 nodes (2 ‘answer’, 1 ‘issue’, and 1 ‘con-argument’) have more than 3 predecessors and 28 nodes (14 ‘answer’, 7 ‘issue’, 4 ‘con-argument’, and 3 ‘pro-arguments’) have more than 2 predecessors. These nodes embody the correlation between different pieces of DR and can be used for suggesting relevant information to users. It is noteworthy that this statistical information does not include the nodes that denote files, e.g. pictures, spreadsheets, etc.

Finding Sub-graphs in Knowledge Model

Another usage of the interconnections of a DRed node is to return several groups of interconnected nodes as the retrieval results rather than listing each node separately. A single DRed node typically does not contain much information and only consists of a few sentences. When users navigate a DRed graph, they look at a node, try to


4 Collaborative Design Knowledge Retrieval

understand its contents, and quickly move to another one to see its context. Therefore, a group of interrelated DRed nodes, i.e. a sub-graph, can help put any single DRed node in context and make the information easier to understand. If the retrieval system is able to try to understand the information contained in a DRed graph (or sub-graph) and match it with the query, it will find more precise information for users. In this way, a sub-graph can be used as the response to the user’s query. For example, when ‘elimination dust cap’ is given as the query, the retrieval system will look for a subgraph which contains these three words, rather than trying to find all three words in a single DRed node. In addition, if the contents of each DRed node can be (at least to some extent) ‘understood’ by the retrieval system using NLP techniques, then the retrieval system will be able to return more meaningful nodes groups. There are two prerequisites for finding a group of DRed nodes to match a query. Firstly, the group should work together to describe a problem-solving process either wholly or partially, and most of the nodes should (to some degree) match the query. Secondly, all the nodes should be interconnected with each other and the size of the group should be as small as reasonably possible. The second prerequisite is reasonable as obviously a group of 4–5 nodes is easier to navigate and understand than one with (say) 40 nodes. Therefore, three indications can be obtained: (1) a DRed node is only deemed to be relevant when it somehow matches the query; (2) DRed nodes should be searched either forwards or backwards along the solution path of design problems; (3) the searching must stop when no further useful information can be found. The process of finding sub-graphs in a large DR space essentially involves: (1) finding a node as the starting point; (2) trying to search for a relevant node connected to it; (3) moving to the new node found and repeating the process until no further relevant information can be found. These three steps then lead to three specific processes that need to be executed by an effective retrieval method. Once these similar sub-graphs are identified, the retrieval system can automatically recommend similar problem-solving processes to users. The computational complexity of such searching tasks is generally very high. The computational efficiency of the methods proposed therefore also needs to be taken into account, though that is not the focus of this research. This section discusses and compares the different methods developed by the author for finding groups of nodes in response to a query. As discussed above, the number of DRed nodes returned as a group should not too large and these nodes should all connected together. A simple method for this problem is to start the searching from the top node of a project (e.g. Node A in Fig. 4.8), and move to lower levels of the tree until a DRed node is found to be useful. This node will be put in the list and then all its children nodes will be evaluated. The searching process for this method is shown as ‘Method 1’ in Fig. 4.8. It is very likely that not all the children nodes of a DRed node are useful, and in such cases the searching can continue towards deeper levels of the tree. If no useful DRed nodes are found even though the searching goes down several levels, it can be deemed that only the ‘useful’ nodes found in previous steps correlate to the query. Otherwise, the additional nodes found during the search can be put in the list as well and returned together as a group.

4.3 Retrieval of Structured Design Knowledge


A Method 1

Method 2 B

C D Searching direction of Method 1

Searching direction of Method 2

Fig. 4.8 Methods starting searching from a DRed node

This method is not efficient as it always starts from the top of a tree and checks every single node found during the search. An improvement to this method is to start the searching from a useful DRed node (e.g. Node B in Fig. 4.8) rather than from the top of the tree, shown as ‘Method 2’ in Fig. 4.8. Generally, the best option for such a useful node is the top one in the results list of the keyword-based retrieval. The next step of searching, either upwards or downwards, is to evaluate the nodes connected with this one. If none of these nodes are useful, their children will be checked. If any node checked is useful, then a new search will start from it until enough nodes have been obtained. After a few rounds of trying like this, the useful DRed nodes will propagate and sub-graphs will be constructed. Although the second method starts from a useful node, it does not necessarily mean that the next one found during the search will be useful as well. Both of the two methods actually involve lots of useless checking (as useful nodes are only a small part of a DR space) and are thus not computationally efficient. This inefficiency is mainly due to the fact that the retrieval system does not know where the next useful node is, and therefore needs to try every node in the search path. Moreover, these methods fail to identify the cases where two (or more) nodes match the query but are not directly connected. For example, assume both Node B and Node D match the query and the search starts from B, it is likely that Node D is not checked at all just because Node C is checked first and deemed not ‘useful’. This issue also needs to be taken into account by the method for constructing nodes groups. The two methods discussed above in essence involve traversing a tree from either the top node or a node deemed to be ‘useful’, and finding a group of ‘useful’ nodes interconnected with each other. Whether a node is ‘useful’ is determined by matching its content with the query given using the calculation methods discussed in Sect. 4.2. Doing calculations for each node will impose a heavy burden on the retrieval system. A method to eliminate this burden is to utilize the set of DRed nodes obtained by the keyword-based retrieval method, i.e. determining a node’s usefulness by checking


4 Collaborative Design Knowledge Retrieval

whether it appears in the results list. However, a method with this improvement still has a drawback, that is, checks on non-relevant nodes are likely to happen as the retrieval system needs to search node by node without prior knowledge about whether next node is relevant. For instance, assume a search starts from Node B as shown in Fig. 4.8, and if none of the nodes interconnected with it are useful then all the checks done during this search are in vain. Such cases are very likely to happen as each sub-graph found to match a query is just a small part of a large DR space.

Finding Groups of Interconnected Nodes from Results Set

A method can be developed to avoid these redundant checks by finding groups of interconnected nodes from the results set obtained from keyword-based retrieval. In this way, only useful nodes will be analyzed and sub-graphs can be found based on the interconnections of these nodes. Thus the problem to be resolved becomes, given a set of DRed nodes, construct a group (or several groups) of interconnected nodes that supplement each other to help users understand a piece of DR. A retrieval method needs to make the story told by the nodes of a group as complete as possible. To achieve this goal, two main criteria can be utilized for choosing the nodes from the results set, namely relevancy and dependency. Since the nodes to be explored are obtained from the keyword-based retrieval, the first criterion is already met. The second criterion emphasises that the nodes found should depend upon, and supplement, each other. The first step for resolving this problem is to understand the dependencies between DRed nodes. Any two nodes essentially have some sort of dependency, given that they are created for the same design project. Based on the positions of two nodes, there are two main types of relationships. The first type is an explicit dependency which means that one node is directly derived from the other. The meanings of this type of dependency may vary when the types of the two nodes change. The second type of relationship is implicit, involving two nodes that are not interconnected with each other. Illustrations of different kinds of dependencies are shown in Table 4.5. In case 1, two nodes have explicit dependency with one node derived from another. In case 2, two nodes are derived from the same node, which means they are very likely to be about the same topic and thus relate to each other. In case 3, a node (e.g. an argument for an answer) is indirectly derived from another one (e.g. an issue), and the dependency between them is moderate. In cases 4 and 5, the dependencies are much lower. When two nodes are returned to the user, the easiness of understanding the piece of DR depends upon the correlation between them. For instance, if there is explicit dependency between the two nodes (e.g. Case 1 in Table 4.5), the meanings of the two nodes can be easily understood. For the other cases, it is not so easy to understand their meanings unless other nodes are also included. Take Case 2 as an example, the meanings of nodes ‘Con-argument B’ ‘Con-argument C’ would have more contexts if ‘Answer B’ was included in the group as well. Similarly, if one or more nodes can be included in Cases 3 through to 5, the information contained by the group of nodes

4.3 Retrieval of Structured Design Knowledge


Table 4.5 Dependencies between two nodes in different positions of a DRed graph Illustration

Two nodes in different positions

A node is derived directly from another one. The two nodes have strong dependency

Two nodes are derived from a same node. The two nodes have strong dependency

A node is indirectly derived from another one. The dependency between these two nodes is moderate

A node is derived indirectly from a node (Issue A) from which another one is derived. The dependency between these two nodes is low


will be complete and thus easy to understand. The task of getting enough information for users is easy to implement if the extra nodes needed are returned as part of the results set. Otherwise, some extra nodes need to be found out by the retrieval system. Five principles for finding nodes groups can be summarized, as shown in Table 4.6. Based on the principles set out above, the problem then becomes how to find a reasonable number of DRed nodes from the results set by analyzing their contents


4 Collaborative Design Knowledge Retrieval

Table 4.5 (continued) Illustration

Two nodes in different positions

Two nodes are derived indirectly from the same one (Issue A). The dependency between these two nodes is very low

Table 4.6 Principles for finding groups of DRed nodes as results Principles

Illustrations of the principles

I. The contents of the nodes should be relevant to the query

This is to ensure that the contents of the nodes can fulfil users’ information needs

II. The dependencies between the nodes should be as strong as possible

The nodes in a group should work together to describe a complete piece of rationale

III. The information contained by the nodes should be as complete as possible

The story told by the group of nodes should be complete so that it can be easily understood by users

IV. The number of extra nodes added to the group should be reasonably small

Extra nodes are useful for understanding the meanings of the nodes already in a group. However, too many extra nodes may degrade the group’s relevancy to the query

V. The keywords in a query should appear in different nodes as far as possible

The nodes in a group should match the query as a whole rather than a small number of major nodes play a much more important role than others

and dependencies. A straightforward method is to firstly get the node which best matches the query from the results list, and then check whether other nodes in the list have dependencies with it. An algorithm can be utilized to describe this method, as shown in Fig. 4.9 (words in italics indicate variables in the programme). The task of finding the node that best matches the query is not difficult, as nodes in the results list have already been sorted in a descending order based on the degree of matching. The top node of the results list is thus picked out each time and a group will be constructed from it. If a nodes group is successfully constructed, then all the nodes in that group will be removed from the results list so that repeated checks on them are eliminated. This process will be repeated until no more groups can be constructed. The core of such a method is the construction of a group of nodes from a single node. Actually, the retrieval tool developed employs a data structure in which each node object has two lists, namely the ‘parents list’ which is utilized to store the nodes that this node supports and the ‘children list’ which consists of nodes supported by this node. For each edge object, its source and target nodes are identified and inserted

4.3 Retrieval of Structured Design Knowledge Fig. 4.9 An algorithm for constructing node groups from the results list


ConstructNodeGroups(Vector nodeList) 1. Vector groups = new Vector(); 2. for each node in nodeList 3. if node is not in checkedNodes 4. do construct a group from node; 5. if a group group is successfully constructed 6. do remove all the nodes in the group from nodeList 7. put group into groups 8. end 9. end

to each other’s lists respectively. By using these lists, the dependency between any two nodes can be easily identified. For instance, we can start from Con-argument A and get Answer A from its parents list. After that, Issue A can be found out by a further check on the parents list of Answer A. Through these two steps, the piece of information inferred is that Issue A is the grand parent of Con-argument A in the tree. However, there are some circumstances under which such a method gives rise to repeated checking. Take the case shown in Fig. 4.10 as an example, Answer C and Issue B need to be checked consecutively before Con-argument C is deemed to be derived indirectly from Answer B. To resolve this problem, a method is developed based on different tags attached to DRed nodes, as shown in Fig. 4.11. The first part of a tag represents the unique name of a project so that DRed nodes created for different projects can be distinguished. The second part after symbol ‘@’ indicates a node’s position in a tree. The tags attached are utilized to perform a fast measure on the ‘distance’ between any two DRed nodes. In this way, the need to traverse a tree is eliminated. Moreover, attaching tags is a pre-processing task, which therefore does not impose much burden on the run-time performance of the retrieval system. Then two further problems are raised:

Fig. 4.10 A case in which it is not efficient to identify the dependency between two nodes by checking the parents and children lists


4 Collaborative Design Knowledge Retrieval PrjName@1

Issue A [email protected]

[email protected] Answer A

Answer B

[email protected]

[email protected]

Issue B [email protected]

Proargument A

[email protected]

Conargument A

Proargument B

[email protected] Answer C

[email protected] PrjName: The unique name of a project

Conargument B

[email protected] Conargument C

Fig. 4.11 Tags attached to DRed nodes

firstly, how to evaluate the ‘distance’ and find the supplementary nodes; and secondly, how (and when) to perform the pre-processing. The reciprocal of the ‘distance’ between two DRed nodes in essence indicates the degree to which they are correlated. Therefore, ‘distance’ can be utilized to determine the dependencies between nodes. It is useful to first give a few definitions for the development of methods to evaluate the ‘distance’ between two nodes, which are listed as follows: Definition 4.1 Direct Dependency: Two DRed nodes have direct dependency upon each other if one of them is directly derived from the other. Definition 4.2 Common Ancestor: The Common Ancestor of two DRed nodes, which have no direct dependency, is defined as the nearest node to both of them, and from which they are either directly or indirectly derived. Definition 4.3 Distance: The Distance between two DRed nodes is defined as the number of nodes involved when going upward in the tree from them until reaching their Common Ancestor. The first definition ‘Direct Dependency’ indicates the relationship between two DRed nodes connected with a graph. The second definition is ‘Common Ancestor’ which is the key to establishing the dependency between two nodes without explicit dependency. If two nodes have explicit dependency, their ‘common ancestor’ becomes the one from which the other is derived. As shown in Definition 4.3, the ‘distance’ between two DRed nodes is determined by the minimal number of nodes involved to establish the dependency between two nodes. If two nodes have ‘direct dependency’, then their distance is one. If the ‘common ancestor’ of two nodes is one of them, then their distance is the total number of nodes involved in the path from the descendent to the ancestor subtracts one. For example, if one node is the ‘grandfather’ of the other, then the ‘distance’ is two. Take the case in Fig. 4.10 again

4.3 Retrieval of Structured Design Knowledge


as an example, the ‘common ancestor’ of Pro-argument B and Answer C is Answer B and their ‘distance’ is three as their ‘distances’ to Answer B are one and two respectively. A process of calculating the ‘distance’ between two nodes is shown in Fig. 4.12. In step 3 of the process, the distance is obtained by adding the numbers of nodes (excluding the ‘common ancestor’) involved in the two paths which begin with the ‘common ancestor’ and end at the two nodes concerned. If the ‘distance’ between any two nodes in the results set can be calculated, the relative positions of these nodes can be determined. As the purpose of constructing nodes groups is to supplement a node’s meaning by providing its nearby peers, a nodes group only consists of nodes belonging to the same project (i.e. describing the solution process of the same problem). Therefore, the nodes in the results set should first be grouped according to the different projects they belong to. The process of constructing nodes groups by using the ‘distances’ between nodes is shown in Fig. 4.13. The inputs for such a process are the list of nodes (nodeList) belonging to the same project and the maximum distance (N) which is used to determine whether two nodes can be put in the same group. For instance, if node A is already in a nodes group and the maximum distance set for the process is three, then only those nodes

Fig. 4.12 The process of calculating the distance between two nodes by utilizing tags


4 Collaborative Design Knowledge Retrieval

ConstructNodeGroupsUsingDistances(Vector nodeList, int N) 1. Vector groups = new Vector(); 2. while nodeList has more than one node 3. do create a group group 4. add the topmost node, tnode, of nodeList to group and delete it from nodeList 5. end 6. for each of the other nodes, onode, in nodeList 7. if the distance between tnode and onode is less than or equals to N 8. do add onode to group and delete it from nodeList end 9. if there are more than one node in group 10. for each of the nodes, lnode, left in nodeList 11. if the distance between lnode and any node in group less than or equals to N 12. do add lnode to group and delete it from nodeList end 13. do add group to groups end 14. end Fig. 4.13 The process of constructing nodes groups by using the distances between nodes

whose distances to node A are less than three can be added to the nodes group. The sizes of nodes groups can be adjusted by choosing different maximum distances. As shown in Fig. 4.13, the process starts by creating a vector (groups) which is used to store the nodes groups constructed, and stops when less than two nodes are left in the list. As nodes in the results set are already sorted in descending order, the topmost node is taken out first and added to the first nodes group created (Step 3 through to Step 5). Then the ‘distances’ between the topmost node and others in the nodeList are checked, and those nodes with ‘distances’ less than the specified maximum distance will be added to the nodes group and deleted from the list (Step 6 through to Step 8). After the steps introduced above, if the nodes group still only includes the topmost node, this means that the topmost node is isolated and no nodes groups can be constructed for it. Otherwise, other nodes left in the nodeList will be checked again to find the nodes that are close to any of the nodes already in the nodes group and add them to the nodes group as well. When all these steps are completed, the nodes group constructed will be stored in the vector groups (Step 13). The process will iterate until every single node is assigned to a specific nodes group. A prerequisite for effectively utilising the nodes groups is that the nodes included in a nodes group should tell a complete story. The perfect situation is that every node in a nodes group has explicit dependency with at least one other. However, there are some circumstances under which nodes in a nodes group may be separated, e.g. Answer C and Pro-argument B in Fig. 4.12. In this case, it is useful to include Answer B and Issue B as well when showing the nodes to users. This idea can be easily implemented by using the tags and taking the following steps: (1) finding a node from the nodes group, which is at the uppermost level of a tree (i.e. the length of its tag is shorter than those of others); (2) if more than one nodes are found in last step, getting their Common Ancestor; (3) finding all the paths (similar to Step 2 in Fig. 4.12) from all the nodes to the node obtained in either Step (1) or Step (2); and

4.3 Retrieval of Structured Design Knowledge


(4) if the nodes in these paths do not exist in the nodes group, adding them in. In this way, a tree structure containing all the nodes in a nodes group can be shown to users. Generally, starting reading a piece of DR from an issue node can help users to better understand the problem-solving process. A further method to improve the user’s understanding of the retrieved DR can be developed by extending the tree structure (to make its uppermost node to describe an issue) when it is necessary. Like DRed nodes, the nodes groups found also need to be sorted based on their degrees of matching the query. A number of rules can be used for ranking nodes groups: (1) if all the nodes in a group are part of the results set obtained by using keywordbased retrieval, it may be very useful; (2) the nodes group with all the keywords evenly appearing in different nodes within it is more useful than the one with all the keywords mainly appearing in one or two nodes within it; and (3) the groups with more nodes could be more useful. The match between the nodes groups and query based semantic similarity takes both nodes match and structure match into consideration. In this method [7], two types of similarity are considered, namely semantic similarity and structure similarity. Mark Tq = {N1 , . . . , Nm , R12 , . . . , Ri j } as the complex structured query and Td = {N1, , . . . , Nn , , R12 , , . . . , Ri j , } as the nodes groups of DR. (1) Semantic similarity (sim Se ) sim se measures the functional similarity between a query and a nodes group. Let g = {g1 , g2 , . . . , gm } represent the feature set of query k, and m is the node number or the knowledge number of the set. f = { f 1 , f 2 , . . . , f m } represents the feature set of the retrieved nodes group k , , and n is the node number. The semantic similarity sim se (k, k , ) is then calculated using Eq. (4.5). sim Se =


Wi (gi ) × Si (gi , f i )



wi (gi ) is the weight value of the ith feature after normalization processing, Si (gi , f i ) represents the similarity of the two values gi and f i , which can be represented as: S(gi , f i ) = 1 −

|gi − f i | max(gi , f i ) − min(gi , f i )


(2) Structure similarity (sim st ) Structure similarity sim st is defined based on the hierarchical structure and relationships between multiple nodes. sim st is estimated as the taxonomical distance of two nodes defined in the nodes group, which is based on the analysis of the lengths of paths for linking the pieces of nodes concerned. Let s1 , s2 represent the requirement node and knowledge node, respectively. The simSt can be calculated as follows based on the method given in WordNet [8].


4 Collaborative Design Knowledge Retrieval

 sim St =

2×log p(s) log p(s1 )+log p(s2 )

0 i f no common parent exits between s1 and s2


In Eq. (4.7), s means the common parent node of s1 and s2 and p(s) = count (s)/total is the proportion of the sub node counts of node s in the total word counts.

4.3.3 Using Complex Queries Users’ queries reflect their information needs and thus need to be well interpreted by a retrieval system. Using a set of keywords to form a query is a straightforward method. It is, however, very difficult to infer users’ various intents from only the few words given. For instance, when ‘Cambridge University’ is used as a query, it is hard to determine which aspects of the University, e.g. admission process, rate of international students, history, etc., the user is really interested in. Ambiguity can be alleviated by adding more keywords to the query, e.g. ‘Cambridge University history research’. Large searching engines can suggest potential keywords for the user and then find out results useful for most users. This is because they have a large amount of data about users’ usages and feedbacks from which lots of rules and facts can be identified. Using a set of keywords is still an effective way of expressing users’ information needs for large searching engines as these needs are extremely varied. However, small and medium sized retrieval systems, which are used only within organizations, do not have such a huge user group and users’ information needs do not vary very much. As discussed earlier in this section, designers’ motivations of retrieving and reusing DR mainly include: firstly, finding previous solutions for a similar problem they are working on; and secondly, finding out how similar solutions worked for previous problems. Thus their queries should tend to focus on issues and solutions, as well as a combination of both. Retrieval performance can be greatly improved if the retrieval system can, from the query given, identify what the issue is, and what potential solutions are. However, it is very difficult to do this by using a simple keyword-based retrieval as all the keywords are treated equally. For example, if a designer has a solution (‘change shrouding on gears’) for an issue (‘reduce heat to oil’) and would like to see whether this solution worked on previous occasions, then a possible query he or she can form is ‘reduce heat to oil shrouding on gears’. It is extremely hard for the retrieval system to infer the retrieval intent, as the meaning of this group of words is really difficult to understand. This problem can be easily resolved if users can inform the retrieval system which set of words (e.g. ‘reduce heat to oil’) represents the issue and which (e.g. ‘shrouding on gears’) represents the potential solution. A method can be developed to enable this by using complex queries which not only consist of a set of words but also contain the purposes of these words. In the following section, the development of such a method will be discussed in detail.

4.3 Retrieval of Structured Design Knowledge


Forming of Complex Queries

Users’ queries are utilized to model their information needs and therefore depend upon the structure of the problems they are dealing with. When knowledge acquisition is done through talking to experienced designers, the queries can be easily formed as they can be supplemented by further explanation or re-phrased during the interaction between the two designers. On the other hand, queries cannot be so easily formed when designers are interacting with a retrieval system which is not intelligent enough to acquire further information for fully understanding the queries. This problem can be partially resolved by not only giving a few words to the retrieval system but also informing it of the specific purposes of these words, i.e. using complex queries. The format of complex queries should be determined by the structure of the information the retrieval system is able to provide. A straightforward method for forming complex queries for the retrieval of DRed graphs is to use the argumentation-based DR model employed by DRed. Firstly, the DR captured using DRed is represented using such a format, so it will be easy to do searching. Secondly, this model can easily express the information needs of designers who, as discussed above, have two main motivations when seeking for DR. Using complex queries formed in such a way, designers can inform the retrieval system what problems they are working on, what solutions they have got, and for what reasons these solutions are supported or criticized. Then the retrieval systems can navigate the DR records and match the different components (issues, answers, arguments, etc.) of a complex query separately.

Matching Complex Queries

As discussed above, a complex query can be viewed as a small piece of DR which consists of a number of nodes: see, for example, the one on the left of Fig. 4.14. To find results for such a query, these nodes need to be matched against those in the DR records by comparing both their contents and types. Actually a complex query does not necessarily start from an issue node and may consist of many nodes. In this section, we only use the example in Fig. 4.14 to illustrate the methods of finding pieces of DR in response to complex queries. This query is the most typical case, which describes an issue, a solution proposed for it, together with the arguments for and against the solution. Each node in the query consists of a few keywords. In both Fig. 4.14 and Fig. 4.15, the solid lines indicate the connections in both queries and DRed graphs while the dashed lines indicate the action of matching. Figure 4.14 illustrates a simple method which makes an exact match for each node. Firstly, a group of issue nodes are found out to match the issue described in the query. ‘Issue A’ in the figure represents one of the issue nodes found and will be used as an example to illustrate the following steps. The task of matching is performed by checking whether ‘Issue A’ contains one or more of the keywords in the issue node of the query. After that, the answer described in the query will be matched against the children of ‘Issue A’. If ‘Issue A’ happens to have one child (each node will


4 Collaborative Design Knowledge Retrieval

Fig. 4.14 A method for matching complex queries

Fig. 4.15 A method that is able to find more possible results

be processed in the same way if more than one is found) that successfully matches the answer node, then the children of this node will be matched against the proargument and the con-argument in the query. Otherwise, ‘Issue A’ is deemed not to match the query. As well as being simple, this method is also efficient as only the nodes associated with ‘Issue A’ are checked. This method also has a drawback as there are many cases in which solutions to an issue are not created just as its children. It is very likely that only a few high-level solutions are developed for an issue, with many detailed solutions created as the children of those high-level ones. Taking the DRed graph on the right of Fig. 4.14 as an example, ‘Node X’ is a sub-solution of the solutions created directly for ‘Issue A’. If it matches the answer described in the complex query, then this DRed graph is clearly something of interest to the user. However, it will be deemed to be nonrelevant if the method discussed above is used. Another method has been developed

4.3 Retrieval of Structured Design Knowledge


by the author to resolve this problem, as shown in Fig. 4.15. In this method, a group of nodes that match the top node in the query are found and again suppose one of these nodes is ‘Issue A’. After that, a group of nodes are found to match the answer described in the query, and only those that have some sort of dependency (this can be done by comparing the tags of each node) with ‘Issue A’ are kept (as illustrated in Step 2). In Step 3, similar things are done for the two arguments in the query. Finally, a path (or several paths) will be returned as the result, as shown in Step 4.

4.3.4 Implementation and Evaluation of the Retrieval System with the Structured Information To evaluate the methods developed for utilizing the structured information in DRed graphs, further development of the keyword-based retrieval system has been done. The same programming technique is used for the implementation of the retrieval methods introduced in this section. Some methods, e.g. filtering retrieved results based on their types, are developed by making changes to existing codes whereas others, e.g. finding nodes groups and matching complex queries, are developed independently and integrated to the retrieval system. The new prototype system is still a standalone application and can be extended as a Web-based collaborative tool and integrated with the DRed tool once the methods are deemed to be valid and helpful. The framework of the prototype system is shown in Fig. 4.16. A snapshot of the GUI of the prototype system is shown in Fig. 4.17. The results for finding DRed nodes with particular types are shown in Fig. 4.18. Take ‘dust cap’ as an example, the retrieval results for this query include four issue nodes, twelve answer nodes, six pro-argument nodes, and three con-argument nodes. By informing the retrieval system about the specific type of nodes of interest, users are able to only focus on a few results without getting frustrated by the total twentyfive results. This advantage becomes more manifest when the number of results retrieved is very large. Such a way of filtering information naturally fits with the major motivations discussed in Sect. 4.3.1 with which designers would like to find and reuse design knowledge. The status information of DRed nodes can also be used similarly to filter retrieval results. In practice, both type information and status information are used together to further narrow down the result space. Though the combination of different types and different statuses can lead to scores of retrieval options, only four useful options, namely, resolved issues, accepted answers, holding pro-arguments, and holding con-arguments, are included in the current version of the prototype system, as too many options will adversely affect the ease of use of the system. Retrieval results for DRed nodes with particular types and particular statuses are shown in Fig. 4.19. As shown in the figure, results are further filtered so that users can more precisely find the DR of interest. The retrieval of nodes groups in essence involves three steps. Firstly, several sets of DRed nodes are found from the results of a keyword-based retrieval. For each of


4 Collaborative Design Knowledge Retrieval

Fig. 4.16 Framework of the prototype system which utilizes structures within DRed graphs

the nodes in a set, another node which is also in the same set and whose distance to this one is less than a pre-defined value, should exist. This is to ensure that all the nodes in the list are at least to some extent associated. Secondly, more nodes will be added to a set to construct a complete tree structure so that any node in the set is connected to another one. In this way, a complete tree can be constructed and returned as the result to a query. Thirdly, the trees constructed in the last two steps will be ranked with the top one tree in the ranked list be displayed on the GUI. The GUI for showing retrieval results as nodes groups is shown in Fig. 4.20. As shown in the figure, two nodes groups are retrieved for the query ‘coverplate flange move’ and the first one is shown on the GUI as a tree with its eleven nodes shown as rectangular blocks. The type and status of each DRed node is shown on top of the block and the contents of DRed nodes are shown in the block with keywords highlighted. When the cursor is moved over a block, the complete contents will be shown. Right-clicking on a block will trigger a popup menu which enables users to open the entire DRed file, as shown in Fig. 4.21. The ranking of nodes groups can be done by taking into account a number of principles. Firstly, the information contained by a nodes group should match a query as far as possible, i.e. it should include as many keywords as possible. Moreover, the keywords included in a nodes group should appear in different DRed nodes as far as possible. Secondly, the number of nodes in a nodes group should be reasonably small. If the number of nodes is too large, then there is no point in finding nodes groups, and the retrieval system can simply refer the user to a DRed file. On the other

4.3 Retrieval of Structured Design Knowledge


Fig. 4.17 Retrieving DRed nodes using the prototype system

hand, the information contained by a nodes group may not be complete if too few nodes are enclosed. A reasonable range for the number of nodes is between four and twelve. Actually, the number of nodes can be adjusted by changing the parameter N (the maximum distance for which two nodes are deemed to be associated) in the algorithm shown in Fig. 4.13. Thirdly, the types of nodes in a nodes group should be as diverse as possible. Generally, a nodes group containing issue, answer, and argument nodes is deemed to be more informative than one containing only arguments. The queries used for three testing retrievals, together with the most useful nodes groups returned by the system, are shown in Table 4.7. The keywords used are extracted from existing DRed graphs and the parameter N is set to be three. In all the retrievals, the original DRed graphs from which keywords are extracted come at the top of the results list and some others which partially cover the keywords are also included in the list. The parameter N is critical as it determines the number of possible results that the retrieval system can find. A comparison of retrievals using the same query (‘seal bearing lubricant contact point’) but different values for N (one, two, and three) is shown in Fig. 4.22. When N is set as one, the retrieval is very


4 Collaborative Design Knowledge Retrieval

Dust cap

Stress defence

anti score plates

washer integrity

ice impact

washer integrity

ice impact

Fig. 4.18 Retrieval results with particular types

Dust cap

Stress defence

anti score plates

Fig. 4.19 Retrieval results with particular types and statuses

strict and only returns one nodes group in which any single relevant DRed node is directly connected to another one in the group. Two groups are found when N is set as two, whereas one more result is found when the constraint is further relaxed by setting N as three. As demonstrated in the test illustrated in Fig. 4.22, the algorithm developed can successfully find different sets of results when different constraints on the retrieval

4.3 Retrieval of Structured Design Knowledge


Fig. 4.20 The GUI for showing nodes groups as retrieval results

Fig. 4.21 Click on the popup menu to open a DRed file where the nodes groups exist

are specified. The value of N can be chosen based on how closely the user would like the nodes in a group to be associated. This also opens an opportunity for developing an intelligent algorithm which is able to adapt to specific contexts of retrieval by setting N dynamically, such that a reasonable number of results can be returned to users.


4 Collaborative Design Knowledge Retrieval

Table 4.7 Nodes groups retrieved in response to some queries extracted from DRed files (N = 3) Queries used ‘dust cap loss maintain temperature’

‘seal bearing lubricant contact point’

‘noise liquid fuel’

Nodes groups retrieved and displayed on the GUI

4.3 Retrieval of Structured Design Knowledge





Fig. 4.22 Retrieval results with different settings for the maximum distance N

It is noteworthy that not all the nodes in a group are really useful; see for example the second result in Table 4.7. There are some nodes with contents such as “How to solve?” or just “How?”. It will not affect the meaning to remove such nodes out of a nodes group, but actually makes the rationale easier to follow in the sense that useful nodes are associated more closely. A straightforward method is to ignore these nodes when drawing them on the GUI. Figure 4.23 shows a comparison of the results obtained from retrievals with and without such a method. The number of nodes is reduced from ten to seven and as such the whole tree can be shown in the window. A better method is to do the filtering at the pre-processing stage so that the run-time performance will not be affected at all, i.e. ignoring these nodes when the DRed parser scans through DRed files and establish direct connection between their parents and their children. As evidenced in the development and evaluation section, the proposed methods are feasible and the algorithms developed for these methods are effective. It can greatly improve the retrieval performance to utilize the structured information in DRed graphs which is discarded in other keyword-based retrieval systems operating on DRed graphs, e.g., the prototype system introduced in Sect. 4.2 and that developed by [9]. The use of complex queries is very helpful for the retrieval of DRed graphs and moves a big step beyond simple keyword-based searching. Though the proposed methods are based on DRed graphs, they can all be easily extended and generalised for application in the retrieval of other types of structured design knowledge. The work presented in this section develops a new field for retrieving design knowledge


4 Collaborative Design Knowledge Retrieval

Nodes removed

Remove the nodes without much information

Fig. 4.23 Removing the DRed nodes without much information from a nodes group

where an extensive search of related publications revealed that no similar work has been done.

4.4 Towards Semantic Retrieval of Knowledge Model The keyword-based retrieval method discussed in Sect. 4.2 provides solutions for constructing indexes and matching DRed nodes to given queries using the indexes

4.4 Towards Semantic Retrieval of Knowledge Model


constructed. An improved method was therefore developed in Sect. 4.3 to augment keyword-based retrieval by taking into account the implicit structures in DRed graphs. This enabled the retrieval system to find groups of nodes as results, using structured queries to express complex needs, and filtering the DRed nodes based on their types and statuses. As demonstrated in its implementation and evaluation, this improved method helps designers find more precise results and better understand the meanings of those results. Nonetheless, the semantic information (both within a node and implied by a structured group of nodes) is overlooked in the two methods developed so far, and this information has the potential to enable a retrieval system to recommend more interesting results and provide better assistance for designers to quickly decide whether the results found are genuinely useful. Semantic search is a hot topic in the IR domain and aims to improve the precision of retrieval by understanding users’ intents and the contextual meaning of terms as they appear in searchable information spaces. This method is also very promising for improving the searching of DRed graphs as the information contained is mainly described as plain text. Contexts play an important role for the semantic retrieval method for DRed graphs. On the one hand, the context of a particular term can help the retrieval system to understand its position and usage and thus to infer its precise meaning. On the other hand, a designer’s context of working can be utilized by the retrieval system to infer their intent in retrieving information. Certain benefits can be achieved if these contexts can be captured and properly utilized. In this section, the semantic information in DRed graphs and users’ context of working is analyzed, and the development of a semantic retrieval method is described.

4.4.1 Extracting Concepts Categories Information The purpose of developing concept categories is to enable the retrieval tool to recognise the category of a word or a phrase, and as such gain a better understanding of the query given to enable optimised searching. For instance, if the retrieval tool gets a query “clevis material” and recognises that ‘material’ is a concept category, then it will try to find DRed nodes relevant to ‘clevis’ and having words under the ‘material’ category. As such the retrieval tool can return a node containing ‘clevis’ and ‘steel’ as a result, even though the word ‘material’ does not exist in this node at all. Moreover, the DRed parser can better understand the purpose of a node if many of the words or phrases in the node are found to be under pre-defined categories. For instance, if a person’s name (under ‘people’ category) and the word ‘requirement’ (under ‘requirement’ category) are found to be in the same node, then this node is very likely to be about the requirement specified by that person. Based on the above discussion, thirteen categories have been developed by the author and shown as a concept map in Fig. 4.24. These concepts are very typical for engineering design and cover most of the elements (e.g. people, component, function, requirement, and material) of the design process. The relationships between these concepts are represented as links between the nodes, each of which represents a


4 Collaborative Design Knowledge Retrieval Organisation



Project process

Belong to


Take part in


Consist of



Proposed for

Component Used for


Require Fulfilled by

Implemented by





Material Determine

Evaluate the feasibility of


Manufacturing Physical property

Measure and calculate


Fig. 4.24 Concept map for the product design process for which design rationale is captured

concept. Illustrations to the concepts developed are listed in Table 4.8, together with some simple examples. A semantic retrieval system can utilize these relationships to understand the content of a node and infer the user’s purposes, so as to automatically recommend useful and relevant information. The work of assigning the concept categories to different words and phrases is complicated due to the complexity of the English language. The criteria for conceptto-word assignment mainly include accuracy and efficiency. Doing manual assignment is tedious though good accuracy can be achieved. Another method is to use computation, e.g. machine learning, to perform automatic identification, which does not impose much burden on designers but requires well-defined rules. This work is focused on using the concept categories to improve retrieval performance and the study of concept-to-word assignment is preliminary. A method is developed by the author, which combines manual assignment and assignment using computation. Firstly, items for some categories can be added manually. For example, data about materials, manufacturing methods, and names of products/projects can be obtained from the manuals and reports produced at the collaborating company. Secondly, some other categories can be identified automatically from DRed graphs and this method will be discussed in the next section. Currently, the computing method is still very simple and the assignment is mostly done manually. For some concept categories such as function and analysis, it is hard to develop a standard data bank similar to those for materials and manufacturing methods. It is therefore useful to develop the methods for identifying and extracting information from DRed files. A summary of the methods for different concept categories is shown in Table 4.9. Specifically, names of people and organisations can be found from “stakeholder mapping” and “WBS” diagrams. Names of tasks can be extracted from the ‘task’ nodes in “WBS” and “network” diagrams. Component names can be found in “functional analysis” diagrams from which functions can also be identified.

4.4 Towards Semantic Retrieval of Knowledge Model


Table 4.8 Concept categories developed for design rationale retrieval Concept category


Some examples


Organizations involved in the development, e.g. collaborators, suppliers, development teams, etc.

Training department, fuel labs, RRC fluid systems


People’s names

Not listed deliberately


Names of products

Not listed deliberately

Project process

Names of projects

Not listed deliberately


Names of the tasks involved in the design process

Fluid research, experiment evaluation, rig concepts


Components involved in the design

Shaft, pin, dog, ring, bar


Concepts developed

Rig, plate, vanes


Material used for the design

Steel, titanium, ceramic


Manufacturing and assembly methods

Screw, fixture, mount

Physical property

Physical properties involved in the analysis of design solutions

Area, torque, stress, load, capacitance, displacement


Methods used for root-cause analysis or the analysis of design solutions

Calculation, calibration, cause, measure, evaluation


Requirements for the project

Requirement, mandatory, shall, optional


Functions of components

Transmit, supply, apply, force, torque, operational

Words for requirements and analysis can be extracted from “requirement capture” diagrams and “RCA” diagrams respectively. In the context of DR capture and retrieval, a user’s context of working is mainly about problem solving, which involves two aspects as discussed in Sect. 4.3. A user’s context of working can be inferred in a number of ways, and this research focuses on a method that utilizes the rich textual information in DRed graphs. If the retrieval tool is developed as an add-on for DRed, then user’s context of working can be inferred from the node he or she is creating. Moreover, the nodes recently created by the user can also indicate what problem they are trying to resolve and what solutions they have developed. If the retrieval system runs separately, a similar method can still be developed to analyse the node currently or recently viewed by the user on the system.

4.4.2 Extracting Context Information The DR space is captured as a large graph with dependencies. The hierarchy of this graph indicates the order of nodes created during a problem-solving process, in which context is propagated from high-level nodes to lower-level ones. Generally, the top node of a graph carries the context of the whole project, e.g. “electric power car”.


4 Collaborative Design Knowledge Retrieval

Table 4.9 The extraction and identification method for different concept categories Concept category Extraction and identification method Organization

Words and phrases for the ‘organization’ category can be found in stakeholder mapping diagrams. The children nodes of the node “External Stakeholders” (or the node “Internal Stakeholders”) generally have contents about organisations, as shown in Fig. 4.25


People’s names can be found from stakeholder mapping diagrams as shown in Fig. 4.25, together with their roles in the project and expertise. Some ‘task’ nodes in “WBS” diagrams also include peoples’ names


Names of tasks can be identified from the ‘task’ nodes in WBS diagrams and network diagrams


Names of components are mainly used in the ‘block’ nodes in functional analysis diagrams, ‘answer’ nodes in concept generation diagrams. Moreover, they can also be found in other diagrams such as requirement capture and as such it would be helpful to develop a data bank for which names of components can be added incrementally


Concepts can be found from the ‘answer’ nodes in concept generation diagrams


Requirements can be extracted from the ‘answer’ nodes in requirement capture diagrams, e.g. “maximise the repeatability of the calibration process” and “minimise calibration inaccuracies”. Moreover, information about requirement category (e.g. mandatory, optional) and the person who raised it can also be found from requirement capture diagrams


Words describing a problem can be found from ‘answer’ nodes and words about its causes can be found from ‘argument’ nodes


Functions can be found from the ‘relationship’ nodes in functional analysis diagrams, e.g. “transmit force for”

Organisation names

People’s names and contact information

More information

Fig. 4.25 Part of a “stakeholder mapping” diagram with names of organisations and people

4.4 Towards Semantic Retrieval of Knowledge Model


This project context is then inherited by the files derived from the top file in which the top node exists. Thus the context propagates through the graph to the bottom nodes that generally contain very detailed information. All the nodes in a DRed file can take the context of this file whilst providing contextual information for each other. A node’s context provided by other nodes in the same file depends upon its dependencies with other nodes. Therefore, two kinds of contexts for DRed nodes can be identified, namely hierarchy context and relationship context. The picture on the left of Fig. 4.26 shows the two kinds of contexts. Each circle represents a DRed file with the outer circles meaning higher-level files. Contexts of DRed files propagate from the outer circles to the inner circles level by level, and then are taken by all the nodes in the files. The innermost circle means the lowest level file with three nodes, each of which not only takes the context of the file but also provides relationship context for the other nodes. As shown on the right of Fig. 4.26, the connection between two DRed files is done by using tunnel linking, and explicitly indicates the propagation of context from DRed file A to B. Moreover, the tunnel linking actually bridges two DRed nodes, e.g. Node D and E in the figure, both of which are also an important part of the context propagation process. In this research, a context object is embodied as a number of words and the context for a specific file is represented as a chain of different context objects. Taking the right part of Fig. 4.26 as an example, assume that DRed file X is a file derived from the top file for requirement management and that DRed file Y is derived from file X with the purpose of capturing the requirements raised by Engineer W, then the context chain for this example is shown in Fig. 4.27.

Top file: project context DRed file X

Node D Other files: context Node A

Node B DRed file Y derived from X Node C

Level 2 file: context about the purpose of nodes created in this file and the files derived from this file

Propagation of context

Node E

Relationship context

Fig. 4.26 Illustration to hierarchy context and relationship context

Tunnel linking


4 Collaborative Design Knowledge Retrieval

Context A: electric power car engine design project

Context B: requirement capture management

Context C: what are the requirements for the project

Context D: what are the requirements from Engineer W

The content of Node E of DRed file Y, to which DRed file X is linked Context of the whole branch The content of Node D of DRed file X, from which DRed file Y is derived Context of file X: Context A + Context B Context of file Y: Context A + Context B + Context C + Context D Project context

Fig. 4.27 An illustration to the context chain

In Fig. 4.27, context A is the project context object and context B is the context object for all the files about requirement management. Context C is the context object containing all the words in Node D and Context D contains all the words from Node E. The context of DRed file X is the chain from Context A to Context B and can be defined by merging the words in context A and context B, i.e. “electric power car engine design requirement capture management”. The context of file Y can be derived by merging the context of file A and the words from the two nodes involved in the tunnel linking between file X and file Y, i.e. the chain from Context A through to D. Each file only needs to store the end context object (e.g. Context B for file A) to enable the parser to trace back all the other objects involved. Unlike a hierarchy context that mainly focuses the propagation of context at the DRed file level, a relationship context is aimed at identifying the context for a DRed node at the micro level, i.e. the relationship between any two nodes. The various types of dependency between two nodes have been discussed in Sect. 4.3. In theory, any two nodes in a large DR can be associated in some way. In this research, only cases in which two nodes have explicit dependency upon each other will be utilized for a relationship context. In summary, in the context of DRed graphs retrieval, the contextual information of a DRed node can be obtained by a synergy of the context propagated from higher-level nodes and the context provided by other nearby nodes. Currently, the prototype retrieval tool is used separately and as such the focus is to analyse user’s use of it and infer the knowledge needs, based upon the evidence obtained from the analysis. It is noteworthy that the analysis of the nodes viewed via the retrieval system is the same as the one performed on the nodes created in DRed. Therefore, the methods developed can be easily adapted and utilized by DRed when the retrieval tool is integrated with DRed. Apart from the nodes currently or recently viewed by the user, the queries currently and recently used can also indicate the user’s intent. For instance, a query “shaft design” is used and the user is not happy with the results found, so submits another one “shaft design steel”. If the user is still not happy with the results from the second query, it is very likely that they would like to find something that is about “shaft design” but not relevant to “steel”. On this basis, the retrieval system can automatically recommend some other relevant nodes. Moreover, a user’s role in the design project could also be utilized as useful context to infer his or her knowledge needs. Such a role can be identified by analyzing the account a user used to log on to the retrieval tool, the type of design they perform, as

4.4 Towards Semantic Retrieval of Knowledge Model


well as the problems they have looked at. Communication between users (e.g. text messages exchanged and emails sent and received) in such a collaborative working environment also encompasses potentially rich contextual information for inferring the needs of users, together with the sources for finding information to fulfil the needs. These are all very interesting topics which open up the opportunity of studying the integration of KM and collaborative design. These topics are however beyond the scope of this research, and will be left for further study. This section summarizes the contexts identified in this research and discusses their utilization in retrieval. The context of a DRed node not only supplements its textual content but also offers relevant information for users to determine whether the node is genuinely useful. Hierarchy context is captured as a chain of context objects, each of which contains a few words. In this way, each DRed file gets a context embodied as a set of words which are then inherited by all the nodes in the file. These words are then used together with the words within each node to match the queries given by users. As such, more DRed nodes can be found as results, even though some terms used in the query are not directly contained by them. For instance, if a node has the content “ensure safety” and the context “electric power car requirement capture”, then it can be returned as a result for the query “electric car requirement” even though none of the keywords appears in the node. The utilisation of relationship context is mainly aimed at providing information useful and relevant for users in addition to the results found through keyword matching. The semantic relationship between two connected nodes is explicitly captured in DRed files and can be used to make meaningful recommendations. For instance, if a user is interested in an ‘answer’ node, then the retrieval tool can show the user other answers developed for the same issue. Different cases for using the relationship context are listed in Table 4.10. These cases are based upon designers’ problem-solving logic and in particular apply to the retrieval and reuse of DR. These recommendations not only help design engineers determine whether a piece of DR is genuinely useful, but also facilitate the navigation of the DR space by not exposing users to useless information. A user’s context of working is mainly used to infer their knowledge needs by analyzing the node currently viewed (or recently viewed, i.e. viewing history) and the query currently submitted (or queries recently submitted, i.e. searching history). The knowledge needs can specifically mean the different categories terms. In essence, the analysis methods for DRed nodes and users’ queries are the same. Firstly, words in the DRed nodes or queries are checked to see whether they belong to some specific categories. Based on all the categories identified, the retrieval tool can do some simple reasoning to get the idea about what problem-solving purposes (e.g. material selection for a component) are behind these words. If the purpose can be inferred, the retrieval tool can search for any existing cases that match the purpose, and otherwise it can still do searching on DRed nodes that have similar contents and contexts to the queries. In terms of the viewing (or searching) history, the retrieval tool can make comparisons between the different nodes viewed (or queries used) to get the common part as well as the differences. Based on the common part and the differences, some reasoning work can be done to infer what is missing in the nodes recently viewed (or queries


4 Collaborative Design Knowledge Retrieval

Table 4.10 Utilisation of the relationship context for different situations The node of interest

Recommendation made by the retrieval tool

An issue

Other issues derived from the same parent can be recommended. If this issue’s status is ‘resolved’, then answers accepted can be used to inform the user how the issue was resolved. Otherwise, a set of answers fail to resolve this issue can be recommended

An answer derived from another issue

Other answers developed for the same issue can be recommended. If the answer of interest is accepted, then the arguments supporting the answer can also be recommended. Otherwise, another accepted answer can be particularly recommended

An argument

Other arguments derived from the same parent can be recommended. The arguments of the same type (either for or against an issue or answer) can be particularly recommended

A requirement item

Information about who raises this item and what category (functional or operational, mandatory or optional) the item belongs to can be recommended

A component

Information about the function of this component

A node about the fact that somebody does something

Information about the role (e.g. combustion expert), the team (e.g. training department), and the contact information of the person concerned can be used to help the user understand the fact described in the node

A node about a reason or a problem

If the node is about a reason, then show the problem to the user. Otherwise, show all the reasons to the user

A node about a design task

Recommendation can be made to inform the user about the higher level (or lower level) tasks directly associated with this one, as well as to show other tasks that need to be completed before or after this one

recently used). With the missing information, the retrieval tool can recommend more relevant nodes, or do informed searching.

4.4 Towards Semantic Retrieval of Knowledge Model


4.4.3 Utilisation of Semantic Information

A Framework for the Semantic Retrieval of DRed Graphs

A semantic retrieval method not only searches the words in DRed nodes but also utilizes their contexts and the user’s context of working. It involves two key issues, namely the identification of contexts and the assigning of different concept categories for words and phrases. The development of a semantic retrieval system is very complicated and, as suggested by the title of this section, the focus of this research is to make a proposal to identify the issues to be resolved as well as to take the first step towards the semantic retrieval of DRed graphs by developing a methodology and in turn implementing a prototypical system. It is useful to firstly develop a framework for the semantic retrieval system so that various issues and resources can be identified and related, as shown in Fig. 4.28. Apart from the resources such as data banks for concept categories and the two kinds of contexts, there are two core methods in the framework, namely information processing and the searching strategy. The former is an add-on to the functionality of the DRed parser, and identifies the relationship and hierarchy contexts for DRed nodes. It also does the work of updating the data banks for different concepts (e.g. ‘material’ and ‘component’) by extracting useful information from DRed files using the methods discussed in Sect. 4.4.1. In addition, it can also do some statistical learning from existing classified data so as to automatically categorise other unclassified data found in nodes that are parsed subsequently. This automatic classification Fig. 4.28 A framework for the semantic retrieval tool


4 Collaborative Design Knowledge Retrieval

is especially useful for a large dataset, which will be the next step for the semantic retrieval tool as currently the dataset used is still small.

Searching Strategy for Semantic Retrieval

The searching strategy is the key to finding genuinely useful information as per an understanding of a user’s needs. It mainly encompasses three modules, namely the automatic recommendation of relevant information, the mapping of contexts, and the understanding of users’ contexts. Specifically, the understanding of user’s contexts aims to infer their intents by analyzing the queries used and the nodes recently viewed. Based on this understanding, extra information can be obtained and used to inform the search. The mapping of contexts is utilized to achieve precise retrieval by mapping users’ contexts of working to the contexts of DRed nodes. The automatic recommendation of relevant information is done by inviting users to look at other nodes relevant to, and helpful for the understanding of, the one currently viewed. Meanwhile, the use of visual information such as pictures for sketches and CAD models is an important feature of engineering design. Therefore, the pictures relevant to a piece of DR (shown as other information in the figure) can also be recommended to help users better understand a design problem or solution. The searching strategy for semantic DRed retrieval in essence involves the steps to be taken for the searching process and the strategy for information processing and utilization so that useful DRed nodes can be found as per user’s query and relevant information can be recommended for each result. The query is still embodied as a few keywords used to express user’s knowledge needs, and as such the methods introduced in Sect. 4.2 can all be applied here. In addition to matching the keywords used, the searching task is well informed by using the semantic information (i.e. the contexts and the concept categories constructed via pre-processing) in DRed graphs. Figure 4.29 shows a flowchart for the semantic searching strategy which indicates the process of finding, ranking, and recommending DRed nodes. The searching process starts with searching the context objects for the keywords in a query and then continues to find useful DRed nodes from the DRed files related to the context objects. The second step mainly consists of three branches which aim to deal with three different situations, namely Situations A through C in the figure. Specifically, Situation A means that only the context objects which contain all the keywords in the query can be found. The method of finding such context objects is to use Boolean searching in which a set of nodes are found for each keyword and the intersection of all the sets found will used as the set of results. The next step in Situation A is to analyse all the files related to the context objects and check all the nodes within them. In theory these nodes are all relevant to the query, but need to be ranked as they may be numerous. There are three criteria for determining how much priority should be given to the nodes, as shown in the figure. Firstly, the nodes containing some keywords in the query will be given high priority as their contents are related to the query as well as their contexts. Secondly, the nodes with words that belong to the same categories as

4.4 Towards Semantic Retrieval of Knowledge Model


Search the context space using the method introduced in Chapter 4 and rank the results based on the degrees to which the context objects match the query given.

Situation A: context objects containing Situation C: no or very few context all the terms in the query found objects can be found from the query Situation B: context objects containing most of the terms but missing one or two Find the files related to the context objects in the way discussed in Section 6.3.1 and Section 6.3.3.

Find the files related to the context objects in the way discussed in Section 6.3.1 and Section 6.3.3.

Process the DRed nodes in the Process the DRed nodes in the files files found and generate a summary for each of them. found and generate a summary for Priorities for ranking the nodes each of them. Priorities for ranking the nodes are given in the order as are given in the order as follows: follows: (1) nodes containing the terms used (1) nodes with the names of the projects they belong to in the query. (2) nodes containing the terms that appear in the query. (2) nodes containing the terms belong to the same categories as used in the query are given some of the terms used in the high priority. query. (3) nodes with useful contents (e.g. (3) nodes containing the terms that belong to the same design solutions) . categories as some of the terms used in the query. (4) nodes with useful contents Infer user¶s context of working (e.g. design solutions) are and re-rank the results as per deemed to be more important. the context.

Infer user¶s context of working.

Perform keyword-based retrieval based on the methods discussed in Chapter 4. Filter the results based on the inferred context of working. Generate a summary for each node based on the understanding of the semantic information.

Filter and rank the results got based on the inferred context of working.

For each result concerned, generate the related nodes list and the list of nodes that offer detailed information.

Get the pictures associated with some nodes. Present the results on the user interface.

Fig. 4.29 A flowchart showing the searching strategy

some of the keywords in the query will be given lower priority. Thirdly, the nodes with useful contents for its context (e.g. the context is about ‘solution generation’ and the nodes are about answers) will be given still lower priority. In Situation B, no context objects can be found that include all the terms, and as such the ones that include most of the terms with only one or two missing will be collected and returned. The method used for Situation B is similar to that for A, that is, finding all the files related to the context objects collected and analyzing all the nodes involved in those files. The criteria for giving different levels of priority to the


4 Collaborative Design Knowledge Retrieval

nodes are similar though in Situation B the focus is to find nodes as relevant to the query as possible. Moreover, the number of keywords included by the context of a node will be used as a very important criterion for determining the usefulness of the nodes found. In Situation C, very few or none of the terms can be found from the context objects and thus it is difficult to find useful nodes just from a few files related to the context objects which well match the query. As such the keyword-based retrieval method is utilized for this situation, and the user’s context is inferred based on the terms in the query and utilized for filtering the results obtained from keyword-based retrieval. The searching of the context space is based upon the methods proposed in Sect. 4.4.2 for identifying and utilising context. The inferring of the user’s context is an important step for the searching process, especially in Situation C where none of the context objects can well match the query, and can be done by analyzing the use of the terms in the query based upon the concept categories developed. In addition, semantic information can also be used for the generation of summaries for DRed nodes as well as for making automatic recommendation. The idea of generating a summary for each node is discussed in Sect. 4.4.1, which is mainly about showing the type and status of a node. With semantic information, the summary of a node can actually contain more information (e.g. the purpose of the node and the project for which it was created). This work can be done during the process of checking and ranking the nodes that are found in DRed files whose contexts match the query. Likewise, the recommendation of related nodes can also be more meaningful if the retrieval system understands the purposes of the nodes concerned. For instance, if an argument node is found as a result item, other arguments for the same issue (or answer) can be recommended as well, even though they are not directly associated with it in the graph.

4.4.4 Implementation and Evaluation The implementation of the semantic retrieval tool is based upon the development done for the last two sections. As mentioned above, the codes for index construction and keyword-based searching can be reused. In addition, the GUI can be used again and only some panels for showing semantic retrieval results and their relevant information need to be developed. The DRed parser introduced in Sect. 4.2 needs to be modified, as the format of the DRed files analyzed in this section is different from those created using earlier versions of DRed. The modification needed is not large as the DRed parser programme is designed as a modular structure, and changes are only necessary for some of the modules, e.g. the module for reading DRed files and the module for extracting information. The codes are again written in Java on the Java 6 update 23 platform. Java is an object-oriented language in which all the data are defined and operated in Classes. The Classes for DRed project, DRed file, and DRed node are all modified to deal with the new information storing and processing functionalities. Moreover, some new

4.4 Towards Semantic Retrieval of Knowledge Model


Classes such as the “Context” Class and the “Concept Category” Class are developed to process data and perform operations for contexts and concept categories. A number of algorithms have been developed to implement the semantic retrieval, and some of them are shown in Fig. 4.30. The pseudo code for the semantic retrieval algorithm is shown at the top of the figure, which invokes three other functions as shown below it. The “split” function used on top of the figure is used to break a long string into several strings at the spaces, and the Boolean retrieval method is the one introduced in Sect. 4.2. It should be noted that lots of the details of the algorithms are left out, and only the key operations are shown in the figure. The retrieval tool was run and tested on a HP EliteBook 8440p laptop which has an Intel Core i5 CPU (2.40 GHz) and 8 GB memory. The average time for parsing DRed files in the dataset introduced in Sect. 4.3.1 is about 800 ms. The average time for pre-processing of information including analyzing the contents of nodes, identifying the relationships between nodes, and identifying contexts, is about 4100 ms. The time taken for complex situations (finding about 200 results and obtaining the relevant nodes and detailed information for these results) in semantic retrieval is about 200 ms while much less time (e.g. 20 ms) is required for simpler situations. The pre-processing time is relatively long as much work is done at this stage, whereas the run-time performance is good for various situations. Further optimisation of the programmes developed is very possible as this is just the first step towards semantic retrieval. The current focus is to see whether semantic retrieval can improve the retrieval performance in terms of both recall and precision. As such the time for pre-processing and running is not a key issue at this stage, though it is quite important for a retrieval system especially when the scale of the dataset is very large. The process of using the semantic retrieval tool is explained in Fig. 4.31 (geometric information is not true). The operations are actually quite simple, as indicated by the icons with number 1 through to 5 in the figure. The user only needs to input the query and press “Go!” button, then the results will be shown in the GUI. Each result item represents a DRed node and starts with a red “D” icon. The summary of the node is shown just after the icon and the content is shown in the line(s) below. In addition to the summary and the content, a few links are listed below the content to offer further information about the node. By clicking these links, the user can open the DRed file the node belongs to, see its relevant nodes, see other nodes offering details about this node, and see the visual information involved. In Fig. 4.32, some summaries are highlighted which indicate what the node is about (e.g. a detailed design concept, a task, or a problem). In Fig. 4.33, some summaries and terms are highlighted for the query “ABC concept material” (“ABC” is the name of a project, which is deliberately not shown). The summaries are very meaningful and so are able to inform the user about facts such as “the node concerned is relevant but comes from another project”. Moreover, the tool infers the user’s intent as finding something about material so it highlights all the terms related to material. To evaluate the performance of the semantic retrieval, a comparison is made between the results obtained using this tool and those obtained from the keywordbased retrieval discussed in Sect. 4.2. The criteria discussed in Sect. 4.2.4 are again used for assessing recall and precision. As evidenced in the comparison, the semantic


4 Collaborative Design Knowledge Retrieval

Vector performSemanticRetrieval(String query, Vector contexts) 1. Define all the data involved in this function; 2. Vector terms = split(query); Vector retrievalResults = new Vector(); 3. if(the size of terms is one) 4. do find the equivalent term of terms[0]; 5. find the set of results for a single term terms[0]; 4. else 5. do find the equivalent terms of every item in terms and update terms using the equivalent terms found; 6. do perform Boolean retrieval on nodes using query; 7. if results are found from Boolean retrieval and marked as brResults 8. retrievalResults = findDRedNodesFromBRResults(brResults); 9. else for each context in contexts 1 10. do perform the computational method on context for keyword-based searching introduced in Chapter 4 and find results contextsGotFromKS; 11. if there exists some results in contextsGotFromKS, which miss just one or terms in terms 12. do mark the vector for results that missing one term as m1Contexts, and the vector for that missing two terms as m2Contexts; 2 13. retrievalResults = findDRedNodesFromKSResults(m1Contexts, m2Contexts); 14. else infer user’s context and perform keyword-based retrieval for query and get results as retrievalResults; 15. do findRelevantNodesForResults(retrievalResults); 16. return retrievalResults; 3

Vector findDredNodesFromBRResults(Vector contexts) 1 1. Define all the data involved in this function; 2. Vector brResults = new Vector(); 3. for each context in contexts 4. do find the DRed files related to context; 5. do get all the DRed nodes in those files and analysed each node; 6. do process each node based on the its context; 7. rank the nodes using the method shown in Fig.14; 8. get detailed information (based on methods in 6.3.3) and generate summaries for the nodes; 9. put the nodes in brResults; 10. return brResults; Vector findDRedNodesFromKSResults(Vector m1Contexts, Vector m2Contexts) 1. Define all the data involved in this function; 2 2. Vector brResults = new Vector(); 3. for each context in m1Contexts 4. do find the DRed files related to context; 5. do get all the DRed nodes in those files and analysed each node; 6. do process each node based on the its context; 7. rank the nodes using the method shown in Fig.14 and give scores as per their priority; 8. get detailed information (based on methods in 6.3.3) and generate summaries for the nodes; 9. put the nodes in brResults; 10. for each context in m2Contexts 11. do the same operations in step 3 through to step 7; 12. return brResults;

Vector findRelevantNodesForResults(Vector nodes) 1. Define all the data involved in this function; 2. for each node in nodes 3. do analyse the context of node; 4. analyse the type and status of node; 5. find relevant nodes for node using method discussed in Section 6.3;


Italic red words: variables Bold words: key words in programming language or names of functions Italic bold words: types of data in programming language

Fig. 4.30 A diagram showing the pseudo codes of the key algorithms developed

4.4 Towards Semantic Retrieval of Knowledge Model


Fig. 4.31 The process of using the semantic retrieval tool

retrieval tool can achieve better performance in terms of both recall and precision. In addition to the words within a node, the semantic retrieval tool also looks at the words in the contexts of DRed nodes, so that some nodes can be found as results even if none of the terms in the query is contained by them, thus recall can be enlarged. The retrieval may filter the results especially when there are enough evidences to suggest that some of the results match the query much better than others. For instance, a Boolean retrieval is firstly used for searching context objects, and if it can find some context objects, then only the nodes related to these context objects will be returned as results. Thus, the precision can be much improved. Even if no context objects can be found from the Boolean retrieval, the algorithm also has some code to make sure that the nodes matching both the contexts and contents are put on top of the results list. A comparison of the results returned from three typical retrieval tests is shown in Fig. 4.34 (project names in graphs are replaced by “ABC” and “DEF” and some black blocks are used to obscure confidential information). Only some top results from the list are shown in the figure, as the list is very long for some cases. The query used for the first test is “ABC concept material” where “ABC” is used to represent the name of the project concerned. The intent for this test is to find the issues about material in the concept design of the project. The query used for test 2 is “DEF requirement” where “DEF” is again the name of the project concerned. This query is intended to


4 Collaborative Design Knowledge Retrieval

Fig. 4.32 Retrieval results shown on the GUI with some summaries highlighted

get some results about the requirement analysis of the project. The query used for test 3 is “ABC function component” which embodies the intent to find results about the components implementing the functions of the product in the “ABC” project. In the first test, the top results returned by the keyword-based retrieval are some nodes with the content “concept”, “concepts”, or “ABC”. This is actually accurate as the keyword-based method calculates a term’s importance to a node by evaluating the number of times it appears in the node, as well as the number of times it appears in all the nodes. Generally, a term is very important to a node if it appears many times in a node and the terms contained by the node are not many. Nevertheless, these results are obviously not good ones for the query “ABC concept material”. The top results returned by the semantic retrieval are much better, being part of the DRed file for a rig concept design, which is part of the “ABC” project. Moreover, the two nodes having content about ‘material’ are returned as the top 2 results, which makes the test results match the query even better. The results returned in the second test are shown in row 2 of Fig. 4.34. The keyword-based retrieval successfully finds some DRed nodes containing the term “requirement” but the project name “DEF” is not included. This is understandable as keyword-based retrieval treat “DEF” as a normal word rather than the name of a project which clearly embodies an intention of the user. Thus the results containing “DEF” are put in much lower places of the list. For the semantic retrieval, some

4.4 Towards Semantic Retrieval of Knowledge Model


Fig. 4.33 Retrieval results shown on the GUI with some summaries and terms highlighted

context objects are found from the Boolean retrieval and as such only nodes related to those objects are included in the results list which is very short and only consists of seven nodes. Nevertheless, all the seven nodes are from the DRed file for capturing the requirements for the “DEF” project and precisely fulfil the query intent. Moreover, the summary generated even informs the user about the types (e.g. mandatory, nonfunctional) of the requirement items, which makes the user’s work a lot easier. In the third test, the keyword-based retrieval once again does its job and finds some short results containing either “ABC” or “function”. Further potentially useful results are placed further down the list as they get lower scores from the calculation. On the other hand, the top 4 results got by the semantic retrieval tool are much better. In particular, the fourth result in the list is concerned with the component “hole”, as part of the functional analysis diagram for the “ABC” project. The summary for this node shows the function of this component, which makes it easy for the user to determine whether the node is genuinely useful. The other three results are also useful and belong to the same DRed file, which was particularly created for the functional analysis for the rig design in the “ABC” project. They are ranked on top of the list as they contain terms about components, namely ‘wall’, ‘motor’, and ‘plenum chamber’.


4 Collaborative Design Knowledge Retrieval

Queries used

Keyword-based retrieval

Semantic retrieval

ABC concept material




DEF requirement




ABC function component


Fig. 4.34 A comparison of the results got from keyword-based retrieval and semantic retrieval

4.5 Discussion


4.5 Discussion The subsequent retrieval and reuse of DR is very important, especially when DR records accumulate and are used to replace design reports; but little work has been done in this area. The main information contained in DRed graphs is in plain text, so the first method is a simple keyword-based search of the text. The data structure developed is able to organise various pieces of information for the retrieval of DRed graphs, and the DRed parser is effective in extracting information from DRed graphs. DRed nodes are found and returned if they include any of the keywords in a query and those including more are ranked in higher places of the list. Further improvements to this method also ease the use of the system. Specifically, the automatic suggestion of keywords offers good assistance to users who do not know exactly what to search for. The automatic recommendation of other relevant DRed nodes provides information useful for understanding the meaning of the one currently being viewed by the user. Since the knowledge needs of designers are very predictable, it is possible to develop advanced retrieval methods which outperform the keyword-based retrieval schemes used by most knowledge retrieval systems. Two further retrieval methods have been developed and implemented in a prototype system on the basis of DRed graphs, which return DRed nodes as results to achieve fine granularity of information. Objectives in the development are three-fold. Firstly, the system should find as many results as possible as currently the dataset is not large. Secondly, DRed nodes at the top of the results list should fulfil user’s knowledge needs as much as possible. Thirdly, the system should offer assistance for users to quickly determine whether a result is genuinely useful. The second method aims to utilize the structured information within DRed graphs to improve retrieval performance. The innovative ideas include generating summaries for DRed nodes and filtering retrieval results based on type and status information, finding groups of nodes to achieve better match, and using complex queries to enable users better express knowledge needs. As demonstrated in the prototype development, the algorithms developed are feasible and effective. The summary of a DRed node supplements its content and enables the user to quickly determine whether it is useful. The filtering of DRed nodes based on type and status (or a combination of both) enables the user to focus on some particular nodes of interest. Finding groups of nodes not only offers useful context for each node within a group, but can also achieve better match of query. Complex queries not only help users to better express their knowledge needs but also enable the system to better understand the needs, resulting in very high retrieval precision. The third method is the semantic retrieval of DRed graphs, which aims to enable the system to understand the meanings and contexts of DRed nodes, the intents behind a user’s queries, and the context in which a request for information is being made. This work is complicated and two ideas have been developed as a first step towards the long term goal. The first idea is to develop some concept categories (e.g. material, function, etc.) typical for engineering design and assign them to the words used by designers. The second is to construct contexts for each DRed node based on


4 Collaborative Design Knowledge Retrieval

its relationships with others in the same graph. In this way, the contexts of DRed nodes will be first checked before their contents are searched for keywords in a query. The number of results is thereby increased, as a DRed node can be deemed to be relevant even if some keywords appear in its contexts but none of them appears explicitly in its content. In addition, if a DRed node’s content contains words belonging to the same categories as some keywords, it can also be returned as a result. As well as increasing the number of results, this method can also achieve high precision for the results at the top of the list as the keywords match both their contexts and their contents. In summary, the prototype system is successful and can be used as part of the information and knowledge management tool by engineering designers throughout the design process. As demonstrated in the evaluations of the prototype system, the retrieval methods developed are effective and achieve good performance in addition to the intelligent assistance offered by the tool. For example, the second retrieval method achieves high precision when nodes groups are returned as results; and 100% precision can sometimes be achieved when complex queries are used. The semantic retrieval method can improve the recall of keyword-based searching, and moreover achieves much higher precision (more than 70% in the tests) for the first few results returned than keyword-based searching. The last two methods go beyond those used by most current knowledge retrieval systems and open up a new field where the literature suggests that no similar work has been done, especially in the knowledge management domain. Although this work is based on DRed and DRed graphs, these methods are by no means constrained to DRed as the ideas for using the structure of DR and the context of a designer’s queries are all general purpose, and can be adapted and extended to other knowledge retrieval applications. The work reported in this dissertation has some limitations, which open the opportunity for further research. Firstly, the focus has been on the effectiveness of retrieval methods while efficiency is less well addressed. Secondly, the data for different concept categories are still not big enough and mainly concern the DR records analyzed in this research. Thirdly, the queries used in the evaluation are mainly formed by analyzing existing DRed nodes and further work is required to invite designers to use various queries.

References 1. Manning, C., Raghavan, P., & Schütze, H. (2010). Introduction to information retrieval. Natural Language Engineering, 16(1), 100–103. 2. Church, K. W. (2017). Word2Vec. Natural Language Engineering, 23(1), 155–162. 3. Peters, M. E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., & Lee, K. (2018). Deep contextualized word representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers) (pp. 2227–2237). 4. Kenton, J. D. M. W. C., & Toutanova, L. K. (2019, May). Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) (pp. 4171–4186).



5. Bracewell, R., Wallace, K., Moss, M., & Knott, D. (2009). Capturing design rationale. ComputerAided Design, 41(3), 173–186. 6. Bracewell, R. H., Ahmed, S., & Wallace, K. M. (2004, January). DRed and design folders: A way of capturing, storing and passing on knowledge generated during design projects. In International Design Engineering Technical Conferences and Computers and Information in Engineering Conference (Vol. 46946, pp. 235–246). 7. Peng, G., Wang, H., Zhang, H., Zhao, Y., & Johnson, A. L. (2017). A collaborative system for capturing and reusing in-context design knowledge with an integrated representation model. Advanced Engineering Informatics, 33, 314–329. 8. Kim, S., Bracewell, R. H., & Wallace, K. M. (2005). A framework for design rationale retrieval. In DS 35: Proceedings ICED 05, the 15th International Conference on Engineering Design, Melbourne, Australia, August 15–18, 2005 (pp. 252–253). 9. Pantel, P., & Lin, D. (2002, July). Discovering word senses from text. In Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 613–619).

Chapter 5

Collaborative Design Knowledge Reuse

5.1 Overview Modern products are an integration of multidisciplinary knowledge. According to statistics, about 70% of design knowledge can be found in historical design case knowledge, but only 20% of the knowledge is effectively reused in the entire product life cycle. With the advent of the era of knowledge economy, product design has put forward higher requirements for deep reuse of collaborative knowledge in the context of shortening product development cycles and accelerating market iterations. In the process of collaborative design, there are mainly the following knowledge reuse requirements: recommendation of related design knowledge, reasoning of expected knowledge, fusion of multi-source knowledge, design-assisted decision-making and other scenarios. Knowledge recommendation is a common knowledge service mode based on the retrieval, which further acquires user knowledge preferences, provides users with more targeted knowledge services, even solves the problem of knowledge overload more effectively. Knowledge recommendation not only realizes the function of recommending knowledge for users, but also realizes the function of knowledge association. In addition, knowledge recommendation can analyze users’ needs in specific contexts, then make recommendations to users based on context. Knowledge recommendation methods can be divided into ontology-based recommendation, related data-based recommendation, and knowledge graph-based recommendation. Section 5.3 introduces these three knowledge recommendation methods in detail. Knowledge reasoning is a knowledge calculation method. Knowledge reasoning mainly includes case-based reasoning, ontology-based reasoning, design knowledge collaborative reasoning based on Bayesian method, and knowledge graph-based reasoning. Section 5.4 introduces these knowledge reasoning methods in detail. Decision support (DS) is a computer-based information technology used to support the decision-making activities of enterprises or users. Traditional decision support has certain limitations, such as difficulty in aspect of flexibility and adaptability and knowledge collaboration and relevance, while knowledge-based decision © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. Wang and G. Peng, Collaborative Knowledge Management Through Product Lifecycle,



5 Collaborative Design Knowledge Reuse

Knowledge retrieval

Product design knowledge retrieval

Event knowledge retrieval

Ontology-based recommendation

Knowledge recommendation

Related data-based recommendation

Knowledge graph-based recommendation

Collaborative design and knowledge reuse

Case-based reasoning

Knowledge reasoning

Ontology-based reasoning Collaborative reasoning of design knowledge based on the Bayesian approach Traditional decision support system

Knowledge based decision support system

Knowledge-assisted decision making

Knowledge reuse in the collaborative design process

Knowledge reasoning based on the context

Knowledge reasoning in a assembly case

Fig. 5.1 The framework of this chapter

overcome the shortcomings of traditional DS. The knowledge graph will be used to replace the decision model of the traditional decision support system. In Sect. 5.5, we introduce two intelligent decision models, namely, rule-based decision support and decision model based on association analysis. The Fig. 5.1 shows the framework of this chapter.

5.2 Collaborative Design Knowledge Retrieval The retrieval of collaborative design knowledge is a functional requirement directly facing designers, etc., and also a basic function to realize the requirement of knowledge reuse in other scenarios. In the development of new products, designers modify, adjust and generate new design schemes on the basis of previous design knowledge. Product components are mostly precision components with specific agreed parameters. For product designers, in addition to the parameter and constraint of the component itself, external knowledge resources such as related documents, pictures, videos, formulas and three-dimensional models of the component need to be quickly

5.2 Collaborative Design Knowledge Retrieval


obtained and consulted. How to accurately acquire the required knowledge in the process of product design has become an urgent problem to be solved. As a new research field, knowledge retrieval has attracted more attention in recent years. Currently, the main knowledge retrieval technologies and methods include Semantic Web [1], Context Retrieval [2], Knowledge Visualization [3], Ontology Retrieval [4], etc. Among the above methods, ontology provides a common understanding of a specific domain, which not only well meets users’ requirements in terms of completeness and accuracy, but also has more prominent advantages in aspects of intelligence, range positioning, result synthesis, etc. Therefore, ontology-based knowledge retrieval has become a focus of researchers in various fields.

5.2.1 Product Design Knowledge Retrieval In the field of product design, in order to enable designers to accurately acquire design knowledge and previous design examples in the design process, many researchers have carried out related studies. Cao et al. [5] adopted the ontology-based product design knowledge representation method to realize green product design knowledge retrieval through semantic retrieval and text similarity matching. Tu et al. [6] adopted a three-layer mapping structure (relational index, retrieval ontology, and design knowledge) to express design knowledge, and a knowledge retrieval method based on the extension of common reference relationship was proposed. Wang and Huang [7] realized knowledge retrieval and reuse in complex product design by classifying and processing design knowledge based on extension theory and K-means clustering method respectively. Yu et al. [8] introduced semantic relation weight into the retrieval terms, and similarity algorithm of vector space model was used to retrieve design knowledge. Through the research and analysis of the above methods, it is found that there are the following deficiencies: (1) the retrieval target is mostly derived from the ontology itself, which is too single and does not provide a complete design knowledge expression system. (2) Most retrieval entrances exist in the form of retrieval text box, which is dominated by manual input natural language retrieval and lacks navigation retrieval. (3) The retrieval results simply appear in the form of text or picture, which does not reflect the semantic relationship given by design knowledge in the ontology, and does not support the secondary retrieval of the retrieval results. In order to solve these deficiencies, Ma [9] adopted the product design knowledge representation system with a three-layer structure, which includes the product design ontology, index catalogue and external resources of product design. On this basis, the retrieval problem is processed by natural language. By editing the distance similarity algorithm, the canonical ontology concept in the index database is extended to match the processed retrieval problem. The SPARQL [10] retrieval language is generated accordingly. Then, based on the mutual mapping between product design ontology and external resources of product design, product design knowledge can be acquired accurately and comprehensively.


5 Collaborative Design Knowledge Reuse

In order to acquire design knowledge accurately and comprehensively, visible navigation retrieval and multi-source knowledge retrieval should be provided. The ontology-based design knowledge retrieval model proposed by Ma [9] is shown in Fig. 5.2. Its basic ideas are as follows: (1) Make reasonable planning for design tasks and define the design knowledge, such as design principle, domain knowledge, expert experience and so on, used in the design process. With the cooperation and participation of multidisciplinary designers, the product design ontology is formed. (2) Obtain external resources related to the design, including 3D models, videos, documents, formulas, pictures, etc. The external resources of product design are stored in the database, and the corresponding concepts in the ontology are semantic annotated to realize the mapping between the ontology and the resources. (3) Get all the concepts about the product design, including class and instance, and get the class library and instance library, forming the index directory. (4) Natural language processing, such as word cutting and word stopping filtering, is carried out, and the keywords are obtained and extended through the index directory. (5) Through similarity calculation, a set of indexes matching results are obtained. Threshold is used to filter the results with small similarity and improve the hit rate of the retrieval results. (6) Use the selected keywords to construct the corresponding SPARQL retrieval statements. Based on the mutual mapping between product design ontology and external resources, the knowledge retrieval results are pushed. For the knowledge in product design ontology, visualization component is used to display it in the form of twodimensional mesh graph. This can not only express the internal semantic connection of the product design ontology itself, but also facilitate the designers to search again in the network diagram. As can be seen from Fig. 5.2, the key technologies of ontology-based design knowledge retrieval model are as follows: (1) Use ontology-based product design knowledge organization and expression. The domain knowledge and related external

Knowledge Representation

Retrieval Problem Visual compon ent

Retrieval request Docume nt



Application API

Natural Language Processing

External resource s

Knowledge retrieval results

External resources for product design SQL



Resources Upload

Class or Case?


Product Design Ontology Jena API

No Index Similarity calculation

Ontology Management Class Library

Case Library

Threshold value

Index Catalog

Fig. 5.2 Design knowledge retrieval model based on ontology

Concept sorting


Concept1, Concept2, ...

Index matching

5.2 Collaborative Design Knowledge Retrieval


resources are conceptualized and formalized according to the design task requirements to develop the product design knowledge system based on ontology, which can provide a clear and comprehensive knowledge retrieval source for designers. (2) Based on the similarity algorithm of edit distance, the concept association between design problem and ontology is determined, and the design knowledge is retrieved according to the intention of design problem.

5.2.2 Event Knowledge Retrieval The traditional concept lattice proposed by Ganter and Wille [10] based on the philosophical concepts is a simplified model of knowledge representation in the real world. This simplification makes the traditional concept lattice have good mathematical properties. Therefore, there are many research and application achievements in data processing, information retrieval, knowledge engineering, software engineering and machine learning. In this simplified model, concepts are used as knowledge units. A single concept lattice is formed by partial order relation between concepts, which is more suitable for expressing static knowledge at the conceptual level. Events that reflect movement and change in the real world are composed of conceptual units of multiple roles. Therefore, the single-concept lattice is not suitable to express the dynamic knowledge at the event level, which limits its ability to express the objective world knowledge. From the perspective of resource concept lattice, objects with common attributes constitute concepts, and events with common attributes constitute event classes. From the perspective of event ontology, knowledge in the objective world consists of event unit and concept unit. Event units are represented by event individuals and event classes, while conceptual units are represented by entity individuals and entity classes. Therefore, the event network model, which is composed of event class node and concept node, is proposed. The event network takes the event class node as the core, and has such concept nodes as subject, action, time, space, proposition and result at the same time. These concept nodes form the event network with event class nodes through semantic relations. Each concept node (such as time) can generate a conceptual hierarchy (such as time hierarchy). The event class nodes (such as motion events) can generate an event hierarchy (such as motion event hierarchy). Therefore, it is necessary to extend the single resource concept lattice to the multi-resource concept lattice to formalize the event network. Event network has concept unit and event class unit, which can not only express the classification and non-classification relationship, but also can express the static and dynamic characteristics of things. With the event network as the core, the event ontology obtained by mapping rules is suitable for semantic reasoning and retrieval of human thinking. Event network is based on event class unit and has knowledge expression method of event unit and concept unit at the same time. The so-called event is what happens in characteristic time and space, which is composed of multiple entity roles and has


5 Collaborative Design Knowledge Reuse

the characteristics of action, behavior and change. Event granularity is divided differently in different domains. For example, in the field of linguistics, there are window granularity such as sentences, paragraphs and chapters. According to granularity, events, event classes and event networks are defined as follows: Definition 1 Event individuals. An event individual is a structure formed by the connection of six object nodes and one event node according to various semantic relations, which can be formally expressed as: event = (action, agent, time, space, proposition, result). Actions represent action roles and attributes associated with event behavior. Agent expresses the subject or object role and attribute related to the event participant. Time expresses the time role and attribute of the event. Space expresses the spatial role and attribute of the event’s occurrence. Proposition expresses the propositional role and attribute related to the event description. Result represents the result roles and properties associated with changes in event state. Definition 2 Event classes. For each event consisting of six objects and attributes, the collection of events with common attributes is the event class. The set E = {eventi , 1 ≤ i ≤ n} of n events with attributes of A1, A2, A3, A4, A5 and A6 with the same six roles, then the Event class can be formalized as: Event = (E, A1, A2, A3, A4, A5, A6). From the perspective of formal concept, the event set is the extension of the event class, and the attributes of six roles are the connotation of the event class. Definition 3 Event network. An event network is composed of an event class node and the various concept nodes related to the event class. The graph generated by semantic relation connection can be formalized as E N = (Event, Cr o, Sr ). Event represents the event class node. Cro represents the conceptual node of behavior role, subject or object role, time role, spatial role, propositional role and result role, etc. Sr is the semantic relationship between the event class node and the concept node. The characteristics of event network are as follows: Event class nodes can generate event class hierarchical network through activation, and concept nodes can generate concept hierarchical network through activation. Therefore, event network can express both static knowledge and dynamic knowledge of concept unit. The semantic relationship between event class and concept is non-categorical. The event class hierarchy network has categorical and non-categorical relations. The concept hierarchy network mainly has categorical, equivalence and non-inclusion relations, so the event network can express both categorical and non-categorical relations. Since the event network has two activation nodes of event class and concept, it has the linkage mechanism of human brain. It can not only simulate the learning mechanism of human brain, but also describe the characteristics of human brain such as memory, representation, retrieval and reasoning. An event class is associated with the top concept semantics of multiple roles. There is an inclusion relationship between two concepts of a role starting from the top concept, and the two concepts with a direct inclusion relationship are parent– child concepts. The inclusion relationship between concepts forms the conceptual

5.2 Collaborative Design Knowledge Retrieval


Fig. 5.3 An event network represented by six resource concept lattices and event classes



Result Event Proposition



hierarchy such as ancestors and descendants through transmission. Thus, a single resource concept lattice with top, middle, and bottom concepts is generated. Similarly, a resource concept lattice can be generated for each role. Each resource concept lattice is a conceptual hierarchy network composed of a concept unit, so the event network formed by multiple resource concept grids and event classes through semantic relations is shown in Fig. 5.3. The event network refined in Fig. 5.3 is not only a conceptual hierarchy network with event classes as the core, but also a core of event ontology formed by event classes and event classes through semantic relations. The structure of the event knowledge retrieval system is shown in Fig. 5.4. The ontology of the system is the event ontology of ESHOIQ(D) [11] logic mapped by the event network. It is an extended ontology based on the concept ontology of SHOIQ [12] logic by adding event classes. Event knowledge retrieval system is essentially an event knowledge reasoning system, whose core functions include demand expression, resource description, demand and resource matching reasoning. The main modules to realize these functions include user interface, learner, inference machine and result visualization. User interfaces are designed to accept the expression of requirements in natural language, SPARQL, MOS, and so on. Input SPARQL statement, MOS statement, through the inference automatically output event knowledge to the user. When the natural language is input, the event network of the statement window is generated, and then mapped to the event ontology, and the event knowledge is output to the user through the reasoner. The learner uses the event network to continuously learn the natural language, local or network data resources input by the user, and automatically generate the multi-resource concept lattice. The


5 Collaborative Design Knowledge Reuse

Input Requirement

Retrieval Feedback Reasoner

Visual Result


Event Network

Event Ontology

Event Network

Unstructured Data

Semi-structured data

Structured data

Fig. 5.4 Event knowledge retrieval system based on event ontology

event level and concept level of the event ontology are constantly improved through retrieval feedback. The forms of data resources include unstructured, semi-structured and structured. Unstructured data resources are mainly unstructured electronic documents on the network. Semi-structured data resources are mainly HTML and XML structured documents on the network. Structured data resources are mainly local or network databases, etc. Reasoner mainly completes the matching reasoning between user demand and event ontology, the core of which is the reasoning algorithm of the selected logic. The inference can be designed directly by using event network, but it usually needs to calculate semantic similarity, inclusion degree and so on, which is a lot of work. This shortcoming can be overcome by constructing Tableaux [12] algorithm to complete inference through ESHOIQ (D) logic of event network mapping. The algorithm is an extension of SHOIQ’s Tableaux algorithm. In addition, event extensions can be extended to existing concept-based inference machines (such as Pellet [13], FaCT++ [12]) to satisfy event inference. The result visualization module is used to output event related knowledge in the form of event class, entity class, individual, attribute, etc.

5.3 Knowledge Recommendation


5.3 Knowledge Recommendation Personalized recommendation is an important method to solve the problem of information overload. Personalized recommendation was originally applied in the field of e-commerce to recommend products to users. With the development of the Internet, personalized recommendations have gradually been applied to various fields, including services, academics, music, entertainment and other fields. The methods of personalized recommendation mainly include related recommendations, hot recommendations, personalized recommendations, etc. Recommendation methods extract user preferences through user behaviors, and recommend services or commodities that are of interest to users to related users. The recommendation system is an information system developed based on e-commerce websites, which can provide users or consumers with product information and references to assist users in decision-making. Knowledge recommendation is based on personalized recommendation. The current dilemma users are facing is the excessive flood of information and the lack of effective knowledge services to meet user needs. Knowledge service needs to be changed and developed into a service model that can better meet the needs of users’ knowledge. As a new knowledge service model, knowledge recommendation can more effectively acquire user knowledge preferences, provide users with more targeted knowledge services, and solve the problem of knowledge overload. Scholars have conducted in-depth discussions on knowledge recommendation. Knowledge recommendation not only realizes the function of recommending knowledge for users, but also realizes the function of knowledge association. Through the interrelation of domain knowledge, knowledge discovery can be carried out to help users understand knowledge and obtain knowledge content related to domain knowledge. In addition, knowledge recommendation can make recommendations to users based on context, analyze users’ knowledge needs in specific contexts, and provide users with targeted knowledge services. The association of knowledge resources and the association of users can help the virtual knowledge community find similar resources and users, and improve the effect of knowledge recommendation. Knowledge recommendation is a calculation based on knowledge similarity calculation or user data feature extraction, as shown in Fig. 5.5, mainly including recommendation based on the ontology, the related data, and the knowledge graph. (1) Recommendation based on the ontology Ontology-based recommendation requires semantic description of recommended resources, using similarity calculation formulas to calculate the similarity between resources, and using ontology reasoning rules to reason out similar resources and recommend them to users. The method based on ontology semantic description is to describe knowledge resources in an ontology language such as OWL and OWL-S. Users and resources have semantic information, combined with collaborative filtering algorithms to calculate the similarity of resources or users, thereby improving the accuracy of recommendation.


5 Collaborative Design Knowledge Reuse


Fig. 5.5 Knowledge recommendation methods


Knowledge recommendation

Related data-based recommendation

Knowledge graphbased recommendation

(2) Recommendation based on the related data Linked data is a concept proposed by Lee [14] in 2006 to save data in the Semantic Web. The URI specification is defined in the associated data to facilitate developers to obtain resources from the HTTP/URI protocol. There are usually four principles for publishing related data. The identification name of things is URIs. HTTP/URI is used as a globally unique name for reference. RDF and SPAQL standards are used to query information. URI should contain as much information as possible to facilitate users to find information. Linked data can solve the problem of data heterogeneity on the Internet, and researchers have also begun to develop new web applications based on linked data. Linked data can be used as a method of personalized knowledge recommendation to help virtual knowledge communities realize knowledge recommendation. The data used in the traditional recommendation method is based on the data of the local system and rarely involves other types of data. Therefore, the recommended cross-platform and cross-domain application cannot be realized. Linked data solves this problem. Recommendations based on linked data adopt a unified structured standard for all data sources, and independent data sources also include association relationships. In this way, the recommendation system can be applied to data sources in different fields. These data sources may be heterogeneous or distributed. These data are linked together through the association of related data to achieve recommended cross-platform and cross-domain applications. (3) Recommendation based on the knowledge graph Knowledge graph-based recommendation relies on the structured entities, rich semantic relationships and intelligent reasoning advantages of knowledge graph. Based on entity representation and feature association, substantiated information is used to build a knowledge graph, and multi-path traversal of entities based on associated attributes are applied to mine relationships between entities. It can dynamically collect user information and behaviors and perform intelligent inferences to extract user interest features or relationships. It also combines scene information to construct and dynamically evolve semantic user interest models to fully and accurately reveal user individual needs. It can also dynamically collect multi-source heterogeneous

5.4 Collaborative Knowledge Reasoning


project information, and in-depth mining of potential semantic relationships between projects. The accurate representation, standardized storage, and intelligent update facilitate to achieve semantic level project information management. It is based on ontology formal description, entity vectorized integration to represent user and item information. The knowledge graph support semantic layer knowledge reasoning, similarity calculation and recommendation algorithm mixing, to achieve formal, intelligent, and personalized recommendations.

5.4 Collaborative Knowledge Reasoning Knowledge reasoning is the process of inferring the unknown knowledge based on the existing knowledge base. Knowledge reasoning starts from acquired knowledge to obtain the new facts contained, or from a large number of existing knowledge to conclude new knowledge. According to the description of the above concepts, knowledge reasoning can be divided into two types. The first type is reasoning from the existing knowledge, and the other type is deriving or concluding new knowledge by using the existing knowledge. Knowledge exists in many forms, such as one or more paragraphs of description, or the traditional form of a syllogism. Taking syllogism as an example, its basic structure includes three parts: major premise, minor premise and conclusion. In these three parts, the major premise and minor premise are the known knowledge, while the conclusion is the new knowledge inferred from the known knowledge.

5.4.1 Case-Based Reasoning Case-based Reasoning (CBR) was first proposed by Professor Schank [15] of Yale University in his book. It is an important reasoning method in the field of artificial intelligence. Since the late 1980s, the theory and method of CBR have been systematically studied, and practical achievements have been made in the fields of general problem solving, legal case analysis, equipment fault diagnosis, auxiliary engineering design, auxiliary planning and so on. In recent years, there has been a growing trend in the application of CBR technology in decision-making processes such as manufacturing operations. CBR is a kind of memory-based reasoning in essence, which accords with human cognitive process. When people encounter a new problem or situation, they will not only regard it as a specific problem, but also think about it and classify it. Then, they will look for similar problems solved in the past in their brains and solve the current problems based on the experience and lessons learned from solving similar problems in the past. CBR is a reasoning mode that retrieves similar problems and solutions solved in the past from the case base. Engineers then compares the background differences


5 Collaborative Design Knowledge Reuse

between the new and the old problems to adjust and modify the old case solutions when new problems are encountered. Compared with the traditional rule-based reasoning and model-based reasoning, the data type of CBR is not fixed, so it is different from the traditional relational data model which emphasizes data domain, data length and data type. It does not need to display domain knowledge model and avoids knowledge acquisition bottleneck. And its system is open, easy to maintain, and also fast. At the same time, incremental learning makes the coverage of the case base increase gradually with the use of the information system. The judgment effect is getting better and better. It can effectively solve many problems existing in traditional reasoning methods. CBR calls the new problems we are facing as target cases, and the problems we have solved in the past as source cases. The CBR process can be regarded as a 4R (Retrieve, Reuse, Revise, Retain) cycle process, as shown in Fig. 5.6. When the new problem is encountered, the new problem is input into the CBR system through case description. The system will retrieve the case that matches the target case. If there is a source case that is consistent with the target case, the solution will be directly submitted to the user. If not, the solution of similar cases will be adjusted and modified according to the situation of the target case. System will evaluate and learn the solutions that satisfy the users, and save them to the case base. Fig. 5.6 The process of case-based reasoning

New problem

Target case description

Case library

Case learning

Case retrieval Similar case New case

Is it in line with the new problem ?

No Adjust ment and modif ication


Case reuse

5.4 Collaborative Knowledge Reasoning


CBR is a method of reasoning in which many other aspects of knowledge and techniques are used. Some related techniques in case-based reasoning are as follows: (1) Knowledge representation of the case The knowledge representation of cases is the basis of case-based reasoning, which represents the instances of solving problems in the past as cases and stores them in the case base. As described in Chap. 3, knowledge representation uses some specific symbolic languages to encode the instances into data structures acceptable to the computer. The case is generally composed of three parts: the description of the problem, the corresponding solution and the implementation effect. Among them, the description of the problem and the corresponding solution are the information that must be included in the description of the case. The implementation effect of the solution is determined according to the requirements of the case base. Therefore, a general case representation should at least include a description of the problem and the corresponding solution, that is, in the form of a binary group ⟨ Problem; Solution⟩. To describe the implementation effect of the solution, it can be expressed in the form of a triple ⟨ Problem; Solution; Implementation effect⟩. In the case description, a large number of source cases collected should be analyzed firstly, then the main common features can be extracted, and finally case description can be carried out according to these common features. The knowledge representation of the case must reflect the structure of knowledge and link the context of the case. The common methods of case representation include case characteristic attribute, framework representation, XML-based representation and object-oriented method. Case feature attribute representation is a relatively simple representation method. The characteristic attributes of the case are composed into a set, which is used to represent the corresponding case, that is, Case = {attribute 1, attribute 2 . . . attribute n}. This representation method is simple and clear, but it is not good for case retrieval. The case representation based on object orientation is realized through the instantiation of the case class. A class is an abstraction of things in the real world, a collection of objects with the same properties and services, and objects are concrete instances. Object-based cases mean that only the properties of the case occupy storage space, while the solutions occupy common storage space. The case described by XML consists of two parts: the case structure that describes the research field and the case description itself. The case structure file defines the structure of the case, that is, the case library maps the structure of the case, and the case structure file gives the structure of the case description. A framework is a data structure that describes the properties of an object. In the frame representation, the basic unit of knowledge representation is the frame, and the relations between different frames are established through the relations between attributes. A frame is usually made up of a frame name and a number of slots, which in turn are made up of a number of sides. Slots are usually used to represent an attribute of a case, and sides are usually used to represent an aspect of an attribute. Slot and side attribute values are called slot values and side values, respectively. Side


5 Collaborative Design Knowledge Reuse

values can have many values. This representation method has a clear structure, and describes the case well. Therefore, it has a wide range of applications. This method is especially suitable for the application of case-based reasoning in large and complex problems. (2) Case retrieval The retrieval of similar cases is a key link of CBR, which is to search the most similar and helpful cases from the case database. The process of case retrieval is a process of searching and matching. The retrieval of similar cases should achieve two goals: the retrieval of similar cases as few as possible and the retrieval of similar cases as much as possible with the target cases. At present, the commonly used case retrieval algorithms include knowledge guide method, neural network method, inductive index method and nearest neighbor method. Knowledge guidance method is based on the existing knowledge to determine the importance of the characteristic attributes of the case in the case retrieval, to give its attribute a certain weight, and according to the attribute characteristics of the weight value of the case retrieval. As knowledge is accumulating and increasing, this retrieval method is dynamic to a certain extent. The neural network method divides the case database into several sub-case databases according to the characteristic attributes of the cases, and establishes a neural network system in each sub-case database to retrieve the cases in the subcase database. One of the most important aspects of this approach to retrieval is the training of the data, the problem description of the case and the solution of the data, and the relationship between input and output. Inductive indexing classifies cases according to the characteristic attributes that best distinguish them from other cases, and reorganizes them using these characteristic attributes. Inductive index method is divided into group index method and case structure index method. Group index method is to cluster the cases according to their characteristic attributes, divide them into several case groups, and then use the nearest neighbor method to calculate the similarity of the cases in the case groups similar to the target case attributes, and find out the most matching cases with the target case. The structural index method classifies the cases according to their contents and characteristics, and organizes the case base into tree, chain, network and other structures. The process of case retrieval is the process of dividing and searching the organizational structure model of the case base. The nearest neighbor method is a commonly used case retrieval method, which is a similarity measurement method based on distance. In this method, the distance between cases is defined firstly. The target case is regarded as a point in space, and the nearest point to this point is found to be a similar case. This method not only calculates the distance between the attributes of the case, but also gives the weight of the attributes. The feature of the target case and the feature index described in the candidate set are calculated for similarity, and then the similarity between the two cases is calculated according to the weight of the feature index, so as to obtain the case that is most similar to the target case.

5.4 Collaborative Knowledge Reasoning


(3) Case adjustment and modification In order to better solve the new problem, the process of adjusting and modifying the retrieved similar cases according to the situation of the new problem is called case adjustment and modification. The adjustment and modification of the case can be carried out on a similar case, or can be the reorganization and modification of several similar cases. Case adjustment and modification is a difficult problem for CBR. CBR systems in many fields are generally still in the retrieval stage, and case adjustment is carried out according to specific domain knowledge, so there is no universal applicable method. Practitioners of case-by-case adjustments and modifications can classify them as system modifications and user modifications. System modification means that the CBR system adjusts and modifies the solution of similar cases according to some pre-defined case modification strategy, and delivers the adjusted and modified solution to users. User modification refers to that users adjust and modify similar cases according to the situation of the problem and their own requirements in order to get the solution of the new problem. In general, these two methods of case adjustment and modification are used in combination. Firstly, the case is modified by the system, and the modified and adjusted case is submitted to the user. Then the user modifies and adjusts the case according to the demand and the new situation, and finally produces the solution suitable for the new problem. (4) Case learning Case learning is an important means to ensure the quality of case base. Case study not only includes the maintenance of case base, but also includes the evaluation of case. Case evaluation is a prerequisite for case learning. Case evaluation is to make a comment on the application effect of the new case. If the application effect of the solution of the new case is excellent, the new case should be stored. If a solution for a new case is not being applied well, no longer add it to the case base and consider finding a new solution for it. Case maintenance mainly includes to add new cases in the case that case storage or delete some unusual cases, if there is no solution for the target case in the case, can consider to add solution to the problem of the new to the case, make the case more complete, the system has the ability to solve new problems. Adding a new solution to the case base will increase the number of successful cases available in the case base, thus improving the possibility of case reuse and the accuracy of case-based reasoning. If some cases in the case base are basically not matched with other cases, then these cases are not necessary to exist, we can consider to delete these cases, so as to improve the efficiency of case retrieval in CBR. The maintenance of cases involves not only storing or deleting cases that are not commonly used, but also the process of adjusting and modifying some unsuccessful cases or related parameters, storing the information of these adjustments or modifications to provide solutions for solving similar problems in the future. Case learning is not only a means to update and expand the case base, but also an important condition to ensure the long-term effectiveness of the case base.


5 Collaborative Design Knowledge Reuse

Product design depends on design knowledge, including all kinds of explicit design knowledge and implicit design knowledge. CBR is an effective method to represent design experience knowledge, but knowledge representation lacks scalability and flexibility. Zhang and Liu [16] proposed a stamping die design knowledge representation method combining CBR and knowledge graph in view of the great demand for the representation and reuse of implicit design experience knowledge. It improves the efficiency and expansibility of instance storage and retrieval, and has important theoretical significance and research value for better saving and reuse of stamping die design experience knowledge. In the design of stamping mold, the designer analyses the stamping workpiece properties and design requirements to choose the design scheme and the structure of each sub-parts. Designers’ tacit knowledge will be reflected in their design results. These design results constitute a product design fact, which are the personal experiences of designers. The traditional CBR method needs to summarize these design results into design examples, and carry out index coding for the convenience of subsequent retrieval. The knowledge reuse system must meet the requirements of facilitating designers to find design knowledge. Thus, it is necessary to establish convenient and practical knowledge management tools to facilitate the management of design knowledge. Zhang and Liu [16] proposed a knowledge graph to formalize the record and description of product design results, while preserving the four-step cycle of retrieval, reuse, modification and retention of traditional CBR. According to the above requirements of knowledge representation of stamping die, a knowledge representation framework of stamping die design combining CBR and knowledge graph was proposed, as shown in Fig. 5.7. The framework includes the following knowledge representation layer and knowledge manipulation layer.

5.4.2 Ontology-Based Reasoning Most of the information on the Web is not easy for computers to automatically process, only suitable for people to read. With the increase of network information, how to solve the retrieval problem of large-scale data has become a difficult problem in information retrieval. The Semantic Web was born in this environment. From the hierarchy of the Semantic Web, this level of reasoning makes a huge contribution to the acquisition of the knowledge users need. The inference layer should use the concept and its relation provided by the ontology layer to complete the inference. Ontology reasoning mechanism is the basis and facilitates to realize ontology application. At present, there are mainly two kinds of ontology inference mechanisms: rule-based ontology inference mechanism and descriptive logic-based ontology inference mechanism. These two reasoning methods will be analyzed and studied in detail in the following sections.

5.4 Collaborative Knowledge Reasoning

203 Design knowledge construction and maintenance

Subgraph of the design problem Stamping die design concept ontology

Rule repository


New case

Retrieved case

Stamping die design case library Case1

Exoteric database Case2

Reuse CaseN Solved case


Modified case

Domain corpus

Stamping die design knowledge presentation layer


Verified and learned case

Fig. 5.7 Knowledge representation framework for stamping die design based on CBR and knowledge graph

Rule-Based Reasoning Mechanism

Rule-based reasoning is a process that relies on inference engine to complete ontology reasoning according to certain rule-based reasoning algorithm and derives implicit conclusions from existing facts in ontology. Firstly, the system builder completes the construction of the ontology knowledge base based on the construction of the ontology knowledge base and the application domain knowledge. Then the ontology technology and inference rule technology are combined to construct a rule base suitable for the application domain. Finally, with the help of inference engine and the impetus of inference algorithm, the system loads and analyzes ontology knowledge base and rule base to complete the reasoning work of ontology knowledge base. The structure of rule-based reasoning system is shown in Fig. 5.8. The whole reasoning process includes two central links: rule base design and inference algorithm design. The design of rule base is related to whether the implicit knowledge can be obtained to the maximum extent, while the inference algorithm is related to the operating efficiency and knowledge acquisition ability of the system. (1) Design of Rule base The rules in the ontology knowledge base come from two aspects. One is the rules contained in the ontology itself, that is, the axioms contained in the ontology. The other is the rules applicable to the application domain established by personnel


5 Collaborative Design Knowledge Reuse

Fig. 5.8 Architecture of reasoning system based on rules

Query interface

Domain ontology

Knowledge base

Domain rule

Rule parsing and transformation



Reasoning result Reasoning engine

according to the knowledge of the application domain, which are called domain rules. Domain rules are used to describe some inference rules that ontology cannot complete. They are designed for specific applications in the application domain. The design of domain rules is essentially the extension of rules in ontology. When building domain rules, we often need to follow a few principles: (1) Explicit the relation between condition and action; (2) The rules must conform to the application in the domain, and there must be no conflict between the rules; (3) To meet the commonly used and valid rule representation for computer processing; (4) Adopt a problem-driven approach to design domain rules. After designing the domain rules comprehensively, it is necessary to formalize them through the rule language. Only in this way, can the computer handle them conveniently. Currently, the commonly used rule language is SWRL. SWRL is a rule description language based on OWL DL and OWL Lite, as well as Unary/Binay Datalog RuleML. Its purpose is to better promote the combination of Horn-like rules and OWL knowledge base. SWRL adds rules to OWL because rules provide greater logical expression. (2) Rule-based inference algorithm As an XML document, ontology document is a file in a tree-structured state. Each node in the tree is an XML file, and the relationship between nodes is a parent–child relationship or a sibling relationship. Ontology documents can be represented in a tree-structured state space. So, some traditional inference algorithms for nodes can

5.4 Collaborative Knowledge Reasoning


be applied to ontology inference. An example is an inference algorithm based on RDF and PD* semantics. The main steps in the idea of the algorithm are as follows: (1) initialize all rules as triggers; (2) read RDF to graph G and all custom triples; (3) Start the loop. For each rule, determine whether it was triggered in the last iteration. If its premise matches the triples in the last iteration, then this rule is applied to graph G and the rules resulting from it are recorded. (4) When no new triples are created, end the loop. This algorithm has some shortcomings: this algorithm is an iterative algorithm, and the time complexity of the iterative algorithm is relatively high. In the algorithm description, each node is required to judge each relation. The algorithm matches each relation of the node and inference rule, which will increase the waste of time and resources and fail to complete the optimization of time. To address the shortcomings of the algorithm, Gong [17] proposed a new ORBO algorithm, whose principle is as follows: for a node, judge whether the first inference relation in the inference rules is satisfied. If so, it will continue to judge whether other inference relations in this rule are satisfied or not, otherwise, it will be abandoned. This eliminates the need to match nodes for every inference relation of all the rules. ORBO algorithm does not need to re-judge the whole node set, which saves reasoning time, improves reasoning efficiency and reduces time complexity. The rule-based reasoning method is suitable for those ontology with strong regularity, more regular expression and relatively simple and small size. If applied to a larger ontology, the rule-based inference algorithm cannot guarantee a higher inference efficiency. In addition, it is more suitable for the application of case-based reasoning.

Reasoning Mechanisms Based on Descriptive Logic

Description logic is an object-based knowledge representation tool. It first defines the relevant concepts of the application domain, and then uses these concepts to represent the relationships and attributes in the application domain, so as to achieve the purpose of representing the individuals and objects in the application domain. The two basic elements to describe a logical language are concept and relation. The concept represents the set of individuals, and the relation represents the interrelation among individuals in the domain. Description languages usually include a number of constructors. These constructors can be used to construct complex concepts and relations from simple atomic concepts and relations. Constructors determine the expressiveness of a description language. The reason why descriptive logic can be used as the theoretical basis of ontology is that it can meet the requirements of ontology language in terms of semantics, expression ability and complexity. And the basic elements in the description logic are concept, relationship and individual, all of which can be found in the ontology corresponding elements. Reasoning mechanism based on descriptive logic is mainly a process of completing reasoning and obtaining implicit knowledge in ontology knowledge base


5 Collaborative Design Knowledge Reuse

according to certain algorithms. The structure of reasoning system based on description logic is shown in Fig. 5.9. Description logical knowledge base consists of two parts: Tbox and Abox. Tbox describes the term axioms of concepts and conceptual relations, which are used to represent the general properties of the concepts and relations of application domain connotation. Abox describes instantiation assertions for individuals that represent relationships between individuals in the application domain’s extensional knowledge. Tbox includes two main forms of terminology axioms. One is the implication axiom, which defines the inclusion relation and realizes the concept classification. The other is the equivalence axiom, which is used to define complex concepts. Abox includes two main forms of assertions. One is conceptual assertion, which is used to indicate that an individual belongs to a concept. The other is relational assertion, which is used to declare the existence of a relationship between two individuals. Description logic provides an inference mechanism for ontology knowledge base to mine implicit knowledge. The reasoning mechanism provided by descriptive logic is carried out separately for the two components of the knowledge base. For the inference problem in Tbox, it can be summarized as the following aspects: satisfaction test, inclusion test, equivalence test and mutual exclusion relationship detection. The instance assertions in Abox must satisfy the axiomatic rules in the Tbox, or they will lead to illogical conclusions. The reasoning method based on descriptive logic completes the reasoning through the strong determinability of descriptive logic. The feature of description logic is embodied in its previous compatibility judgment and consistency detection. It has a good correspondence with the semantic web description language. Therefore, reasoning methods based on descriptive logic have become the basis of reasoning in semantic web applications. And the description logic can be applied to the design


API of querying and returning result

Tbox reasoning Knowledge base loading

Abox reasoning

Basic reasoning Reasoning engine

Knowledge base parsing

Knowledge base

Fig. 5.9 Architecture of reasoning system based on description logics

Knowledge base editing

5.4 Collaborative Knowledge Reasoning


stage and maintenance stage of ontology, respectively. In the design stage, it can be used to detect whether there are contradictions and conflicts in the concept definition of ontology, and to mine the hidden hierarchy and relations among concepts. During the maintenance phase, instance detection in the description logic can be used to verify whether the knowledge is satisfiable, thus ensuring that the newly added knowledge is correct. However, the reasoning methods based on descriptive logic still have some shortcomings. In practical application, the expression ability and the efficiency of reasoning are always not very satisfactory.

Case Study

In the traditional process design process, the selection of process equipment and the determination of cutting parameters rely on the accumulated experience and knowledge of process personnel. The process plan developed by different technicians will be different, which not only leads to the low efficiency of process design, but also the process plan developed cannot be well applicable to the new process task. The emergence of process aided decision system basically solves the problem of low efficiency of traditional process making, and process designers can use this kind of system to design process more efficiently. However, the representation of process knowledge by these expert systems is basically based on relational database system. The establishment of process rules is based on a comprehensive database. The relationship between rules is not obvious, which results in the overall disorganized process knowledge, poor reasoning flexibility and low processing efficiency. At the same time, if the size of the rule base is very large, the combination explosion between rules is likely to occur. The research on ontology technology and semantic reasoning technology provides an effective solution to solve the problems of expert system in knowledge acquisition, retrieval and intelligent reasoning. It also provides a complete method of knowledge definition and logical hierarchy construction. At the same time, intelligent reasoning based on semantics is realized, which strengthens the system’s ability to solve problems, and meets the user’s demand for improving the operating efficiency and broadening the application field of the expert system. Cui et al. [18] introduced ontology technology into process reasoning and used SWRL rules as process reasoning rules. By combining with Jess reasoning engine, he proposed an ontology-based process knowledge representation and semantic reasoning method, which provided a reference for expert system to establish intelligent process knowledge base and process reasoning knowledge reuse. They first extracted knowledge and concepts related to the process. Through consulting the process manual, accumulating the experience of the processing site and refining the information in the process card, the important knowledge and concepts related to the process ontology are summarized. They first extracted knowledge and concepts related to the process. Then they constructed the process domain ontology through Protégé software platform. After ontology construction, they used rule description language (SWRL) to build rule base according to the built ontology knowledge base.

208 Fig. 5.10 Process information reasoning process based on ontology

5 Collaborative Design Knowledge Reuse XML file of feature information Feature information preprocessing

Does it match the ontology library?


Renewing the SWRL rules and ontology library

Yes OWL and SWRL format conversion Loading the SWRL rules and ontology library in Jess Reasoning result Forming normalized readable information Output

By calling established ontology and customized rules, SWRL can realize the further improvement of ontology, and then define the logical relationship between ontology terms, until the reasonable rule basis is formed. Finally, the Jess inference engine is used for process knowledge reasoning, and the specific process is shown in Fig. 5.10.

5.4.3 Collaborative Reasoning of Design Knowledge Based on the Bayesian Approach Product design is highly knowledge-intensive, which involves considerable design knowledge from a range of disciplines being generated, exchanged and reused. At the same time, product design technologies have been developing quickly, which makes it necessary for designers to employ the latest technologies. With the indepth application of knowledge management, designers are facing the problem of information overload. Providing designers with domain knowledge by reasoning and reusing in an automated manner is an effective way to solve this problem by identifying content that designers might be interested. Peng et al. [19] proposed the collaborative reasoning is the process of recommending effective knowledge units for

5.4 Collaborative Knowledge Reasoning


different design entities based on the current design tasks, issues and rich information contained in the knowledge hypernetwork. With the increasing complexity of modern products and the design process, the collaboration of knowledge creation and sharing is characterized by the features of diversity as well as by fuzzy and dynamic evolution processes. In this work, the Bayesian approach is employed to facilitate collaborative reasoning in the design knowledge hypernetwork proposed in Sect. 2.3. Bayesian inference calculates the posterior probability based on a prior probability and a likelihood function derived from a statistical model for the observed data, which is expressed as Eq. (5.1) where P(Bi ) stands for the prior probability, P (Bi |A) stands for the posterior probability (i.e., the probability of Bi given A). P(Bi ) × P( A|Bi )   j=1 P B j × P(A|B j )

P(Bi |A) = Σn


In the product design process, collaborative reasoning is used to identify the associations between different objects in order to pass the most relevant knowledge unit to the appropriate entity, so that the designer can make better-informed design decision. The application of this method under several different circumstances is discussed as follows. (1) When a new design issue arises on an existing product, retrieve the most relevant knowledge unit in the knowledge network. P (V m k |V n i , H dk ) is denoted as the recommendation probability of V m k , and V n i is the new arisen design issue. Based on the Bayesian theory, the probability can be calculated using Eq. (5.2). P

Vmk |Vni ,



    P Vni , H dk |Vmk · P Vmk   = P Vni , H dk Σ ω V i ,Vni )·d I K (Vli ,Vmk ) Vli ∈V i ( l d (V k ) × Σ k H k dmH (V k ) d H (Vmk ) g Vg ∈V =   i Σ d H (Vli ) i   Vli ∈V i ω Vl , Vn · Σ i Σ =

V if ∈V i

dH V f

   Σ V fi d i i H ω Vli , Vn · d I K Vli , Vmk V f ∈V   i × Σ  i Σ k i Vgk ∈V k d H (Vg ) Vli ∈V i ω Vl , Vn · d H Vl 

Vli ∈V i



In Eq. (5.2), ω(Vl i , Vn i ) is the similarity weight between Vn i and its adjacent node i k in the issue network Vl i ; dIK (VΣ l , Vm ) is the correlation degree between issue node i k Vl and knowledge node Vm ; Vfi∈Vi dH (Vf i ) is equal to the total number of edges within the issue network. Similarly, P(Vm k |Vn p , Hdk ) can be calculated as the recommendation probability of knowledge unit Vm k when a new product Vn p arises. P(Vm k |Vn d , Hdk ) is denoted as the recommendation probability of knowledge unit Vm k when a new designer


5 Collaborative Design Knowledge Reuse

Vn d joins the design group. Step by step, the collaboration reasoning problem which involves more uncertain elements is studied. (1) When a new product and related design issue arise, retrieve the most relevant designer in the designer network. P(Vm d |Vn p , Vk i , Hdk ) denotes the recommendation probability of Vm d , which can be calculated using Eq. (5.3). P

Vmd |Vnp , Vki , Σ


p p Vg ∈Ωn ,V if ∈Ωik



    p P Vn , Vki , H dk |Vmd · P Vmd   p = P Vn , Vki , H dk

  p p p ω(Vg ,Vn )·ω V fi ,Vki ·d D P I (Vmd ,Vn ,Vki )


d H (Vmd )




d Σ d H (Vm ) d d d H (Vs ) d V ∈V s



p p p p p ω Vg ,Vn ·ω V fi ,Vki ·d P I Vn ,Vki Vg ∈Ωn ,V if ∈Ωik   Σ   Σ Σ p p p p d Vg + V i ∈Ωi d H V fi − V p ∈Ω p ,V i ∈Ωi d P I Vg ,V fi Vg ∈Ωn H g n f f k k

( )  p p  i i    p p p · ω V f , Vk · d D P I Vmd , Vn , Vki i i ω Vg , Vn Vg ∈Ωn ,V f ∈Ωk =  p p  i i   p i Σ p p Vg ∈Ωn ,V fi ∈Ωik ω Vg , Vn · ω V f , Vk · d P I Vn , Vk   Σ    p Σ Σ p i i p p p p Vg ∈Ωn d H Vg + V fi ∈Ωik d H V f − Vg ∈Ωn ,V fi ∈Ωik d P I Vg , V f Σ × d Vsd ∈V d d H (Vs ) Σ


In Eq. (5.3), Ωn p is the set of nodes adjacent to node Vn p in the product network, and Ωk i is the set of nodes adjacent to node Vk i in the issue network. The above two cases represent two common types in the product design process: the former means that there is only one type of uncertain information while the latter means that there are multiple kinds of uncertain information. Table 5.1 lists the different scenarios for these two types, and the calculation and reasoning in other cases is similar to the above two. The following shows an example of the knowledge reasoning in product assembly. Figure 5.11 shows the product structure and mating features of three different products, where S and P sub-assembly and part, respectively, C represents various connection types, including Bolt-Nut, Screw, Pin, Key and so on. The products can be represented as A = (C1(S1, C2(P1, P2), P3), C3(P4, P5)), B = (C4(C2(P1, P2), S2)), C3(P3, P4)) C = (P1, C1(S1, C2(P2, P3))). Since complex products are usually composed of sub-assemblies in different levels, the retrieval of similar instances starts from different levels. The similar case matching algorithm is shown in Table 5.2.

5.4 Collaborative Knowledge Reasoning


Table 5.1 The most common reasoning situations in the design process Reasoning type 1-level reasoning

Expression   P Vmk |Vni , H dk   p P Vmk |Vn , H dk   p P Vmi |Vn , H dk

2-level reasoning

  p P Vmd |Vn , Vki , H dk   p P Vmk |Vn , Vki , H dk   p P Vmk |Vnd , Vk , H dk

Description When a new issue arises, retrieve the most relevant knowledge unit When design a new product, retrieve the most relevant knowledge unit When design a new product, retrieve the most common issue When a new product and design issue arise, retrieve the most relevant designer When a new product and design issue arise, retrieve the most relevant knowledge unit When a designer design a new product, retrieve the most relevant knowledge unit

Fig. 5.11 Similar case matching of product A, B, C

Usually, the product model in the case library is not exactly the same as the new task. Only the partial structure can be reused to perform the sequence planning. Then the other parts need the priority rules guiding to generate a complete assembly sequence solution. Table 5.3 list three types of common assembly priority knowledge.


5 Collaborative Design Knowledge Reuse

Table 5.2 The similar case matching algorithm Algorithm 1. Main body of the similar case matching algorithm Input: The product assembly model P and the knowledge database K Output: A set of usable assembly models from the knowledge database 1. Begin 2. Calculate the Number of nodes and maximum number of layers 3. for layer m do 4. pm ⇐ find the connection type set of P in layer m 5. qm ⇐ find the connection type set of case Pi 6. if ( pm ∩ qm /= ∅) 7. count(( pm ∩ qm ) 8. sim St (Ni , sm ) ⇐ calculate the structure similarity 9. else 10. move sm from gi 11. end if 12. end for 13. Rank = {} ⇐ Rank the cases in gi based on similarity 14. end for 15. Return gi 16. End

Table 5.3 Assembly priority rules Rule types


Basic parts rules

Box-type part is assembled separately as a basic part and assembled first

Functional parts rules

Parts of low precision are assembled first

The biggest and most massive part is usually used as basic part Parts in symmetrical structure are assembled first Connection parts rules Priority Riveting, welding > Key, Gear > Screw > Bolts & nuts, Mate …

5.5 Knowledge-Assisted Decision Making 5.5.1 Knowledge Reasoning Based on the Context According to the application domain, contextual reasoning generally includes three categories: model consistency verification, hidden context mining and context recognition. Among them, context is the semantic abstraction of some phenomena in dynamic environment. The abstract high-level context describes the more valuable entity state and its changes, which can avoid the impact of meaningless or minor low-level context changes on the system. The inference recognition of entity

5.5 Knowledge-Assisted Decision Making


behavior (simple situation) or context situation can assist the system to make intelligent decisions and proactively provide users with required information and appropriate services. The recognition of context situation has always been the focus of research and hot drama deletion in this field. Lim and Dey [20] reviewed the inference models used in the literature of three major conferences (Human–Computer Interaction Conference, Pervasive Computing Conference and Pervasive Conference) in the field of Pervasive Computing from 2003 to 2009. Of the 114 context-aware systems and applications they reviewed, 109 used these dominant reasoning methods. Figure 5.12 shows the distribution of these dominant methods in 109 systems and applications. These reasoning techniques mainly include rule-based methods, logicbased methods, ontology-based methods, case-based methods, supervised learning methods and unsupervised learning methods. The rule-based reasoning approach is the simplest and most straightforward of all, and it is also the most popular and common approach. Rules usually adopt the form of if–then structure, and the high-level context is generated by the fusion of low-level context interpretation. In addition, rules can be used to detect events or situations in context-aware computing. However, this method cannot deal with uncertain context. Most of the inference rules are compromise schemes formulated by domain experts for most users, and do not support personalized situation requirements. Logic-based reasoning method has become a common and effective reasoning tool in the field of context-aware computing because of its powerful expression and reasoning ability. At present, the logic theories used in context reasoning mainly include first-order logic, fuzzy logic and probabilistic logic. On the premise of context determination, the method based on the first-order logic has higher reasoning accuracy and has the support of mature inference machine, but it cannot deal with the uncertain context and cannot identify the undefined situation. The method based on the fuzzy logic has Fig. 5.12 Contextual reasoning model statistics







Rule-based model Bayesian model Support vector machine

Decision tree model Hidden Markov model K-nearest neighbor classification


5 Collaborative Design Knowledge Reuse

poor performance in dealing with inaccurate data. Its systematic reasoning process is complex, the calculation cost is high, and a large number of labeled experimental data training rules are needed. Moreover, the concepts used in the reasoning process do not support semantic description, so they are arbitrary and easy to cause ambiguity. The reasoning method based on the probabilistic logic mainly makes the final judgment based on the possibility of occurrence of all the facts related to the problem. It mainly calculates the probability of the occurrence of the context situation through the combination of different evidence. With the application of artificial intelligence, pattern recognition, data mining and other related technologies, the reasoning model trained by learning method can better adapt to the environment and user’s personalized needs. Because of its high accuracy and coverage of reasoning, learning-based methods have been highly praised by researchers and a large number of research results have emerged, which are mainly divided into two categories: supervised learning and unsupervised learning. Bayesian network is also a typical supervised learning method, which is based on probabilistic logic and is widely used in statistical reasoning. It uses the directed cycle graph to represent the causal relationship between the low-level context and the high-level context, and uses the learning algorithm to train the conditional probability of each node to complete the construction of the context reasoning structure. The probability of the occurrence of the user’s current context can be obtained by using the corresponding Bayesian inference algorithm according to the network structure. Ontology has been widely used in many context-aware systems due to its excellent semantic description and reasoning ability. The realization of ontology reasoning is mainly based on descriptive logic, which is a kind of knowledge representation based on logic. Ontology-based reasoning has been widely used in situation recognition and event detection applications. Ontology-based reasoning method integrates ontology modeling technology well, and is a relatively general and popular contextual reasoning method. With the help of ontology reasoning mechanism, the mining of potential context can be realized, and the context can be identified quickly and reliably. At the same time, the learned rules are more applicable to the real environment and better reasoning results can be obtained. However, ontology is difficult to deal with fuzzy and incomplete context data. Just as context is naturally used in everyday interactions, it is often used in programming to assist and optimize decisions. How to use context effectively is a more important problem in context-based computing research. With the help of intelligent decision making, it proactively provides users with required information and appropriate personalized services, reduces user input, avoids shifting users’ attention, and enables them to obtain enhanced user experience. It is the fundamental goal of context-aware computing to effectively utilize context-assisted or optimize intelligent decision making, and it is also the embodiment of its superiority. Context-based intelligent decision making mainly includes perceptual trigger, adaptive strategy, self-configuration and self-organization technology. The organic combination of context-aware computing and Web services technology mainly focuses on service discovery, choice offering and white adaptation, and mainly

5.5 Knowledge-Assisted Decision Making


involves two important mechanisms: context-aware service description and matching mechanism. In order to integrate context into knowledge, knowledge is divided into two parts, domain knowledge and situational knowledge. Domain knowledge includes knowledge content and knowledge carrier. Knowledge content refers to the description of the information that can guide the application practice, including the index information of the knowledge carrier, and the knowledge carrier guides the knowledge user to find the specific location of knowledge conveniently. Situational knowledge is the knowledge described by knowledge context, which refers to the specific background and environment in which knowledge is generated and applied. Domain knowledge and situational knowledge respectively describe the content and environment of knowledge, and they are a unity, and neither is dispensable. Situational similarity calculation algorithm mainly realizes the similarity calculation between user context ontology and knowledge context ontology, which is the most important step to realize the knowledge push service engine. The quality of the algorithm is related to the quality of the knowledge push service. At present, the existing similarity calculation strategies include: the model based on feature matching, the model based on semantic distance, the model based on information content and mixed model. Because the hierarchical structure and probability models of different ontologies are different, the first three similarity calculation strategies are relatively suitable for the conceptual similarity calculation of ontologies in the domain. The hybrid model is used to calculate the conceptual similarity between different ontologies, which makes use of set theory, ontology hierarchy and attribute structure, etc., and is suitable for calculating the semantic similarity between entity types.

5.5.2 The Traditional Decision Support System and Its Limitation DSS (Decision Support System) [21] is a computer-based information system used to support business or organizational decision-making activities. DSS is to use computer, database, multimedia, network, human-like intelligence to provide auxiliary decision-making means and tools, such as the decision of human thinking activities into the combination of decision calculation and thinking. DSS has been studied deeply in various fields. Some universities and research institutes have developed DSS in various fields, such as “Distributed Multimedia Intelligent Decision Support System Platform”, “Customer/Server-based Decision Support System”, “Intelligent Decision System Development Platform IDSDP”, and “Agent Based Intelligent Collaborative Decision Support System”. However, the traditional decision support system has some limitations, including: (1) it is difficult to acquire the knowledge used for decision. Traditional rule-based decision support, such as expert system, relies on experts’ domain knowledge to form rules and then


5 Collaborative Design Knowledge Reuse

make decisions. There are many uncertainties in the formation of rules, such as the experts’ understanding deviation of the industry and the ability of generalization and summary. (2) The flexibility and adaptability of decision support system are low. If some of the factors affecting the decision change, the existing decision system cannot make accurate judgment. (3) Poor knowledge synergy and correlation. Correct decision-making requires the association of various knowledge for decision-making and the solution of the coordination problem between decision-making groups.

5.5.3 Knowledge-Based Decision Support System KB-DSS (Knowledge-Based Decision Support System) was first proposed by Professor Schneider and Hans-Jochen [22] from the Information Department of Technical University Berlin. As a new decision support system, KB-DSS needs to consider the introduction of expert system, natural language understanding, especially knowledge base system. On this basis, the rapid response and automation of decision-making are improved to realize knowledge sharing, association and inheritance. At present, knowledge graph has been applied in many fields, such as intelligent search, intelligent recommendation, human–computer interaction and question answering system. In addition, more and more enterprises provide the knowledge graph as the basic data service of cloud platform or data middle platform for upper applications. Considering the advantages of knowledge graph and the limitations of traditional decision support system, an intelligent decision support technology based on knowledge graph is proposed. By constructing domain-oriented knowledge graph and decision models, decision analysis is carried out for specific problems. This chapter introduces two kinds of intelligent decision models, which are rulebased decision support and association analysis-based decision model. (1) Rule-based decision support model There are two processes in rule-based decision support: (1) establishment of rule policy. The rule strategy is established by the domain experts based on the domain knowledge base, and the accuracy and validity of the rule are verified by the sample data. (2) decision-making process. The decision-maker analyzes and judges the problem according to the established rules and finally gets the result of decision reasoning. Rules are typically described using the expression “if(…), then” depending on the rule engine. Where “if” can use an arithmetic operator, a logical operator, and others, the operator can be used in combination. (2) Association analysis-based decision model Rules-based decision making can make inferential decisions for specific single events, but in real behavior, the synergy between different individuals and group characteristics often affect the accuracy of decision making. Therefore, the introduction

5.5 Knowledge-Assisted Decision Making


Attribute value V1





Attribute name


Entity Ei

Entity Ej

Entity Ek Rjk

Entity group


Rj3 R23

Entity E2

Entity E3





Attribute value

Fig. 5.13 Clustering entity group decision making

of association analysis is particularly important. Clustering entity group decision is shown in Fig. 5.13. The decision algorithm based on association analysis is as follows: Step 1: Entity cluster clustering based on k-means. According to the k-means method, the related entities are clustered in the knowledge graph. The algorithm is as follows: Repeat { for i = 1 to m  2 ➀ c(i) = arg mink  E (i) − μk  . for k = 1Σto K (i ) m I {c =k }∗E (i ) Σm ➁μk = i=1 (i) i=1 I {c =k} } ➀ Calculate the nearest distance between each entity E(i) and each central node μk , and divide this entity into the entity group corresponding to the central node. ➁ Recalculate center node for C(k) of each entity group. Repeating ➀ and ➁ continuously, and finally obtaining the clustering of entity group. Step 2: Obtain the associated entity relationship based on entity group. When there is a tuple (E j , Rjk , E k ), and Ei and E k belong to the same entity group, the decision of E i on E j can be obtained by Eq. (5.4).


5 Collaborative Design Knowledge Reuse

Rik =

(E i − E j )2 √ ∗ R jk Σ j−1 arg min m=i,n=i+1 (E m − E n )2


Equation (5.4) is used to express the decision relation between Ei and Ej , √ where (E i − E j )2 denotes the Euclidean distance between Ei and Ej . argmin √ Σ j−1 2 m=i,n=i+1 (E m − E n ) represents the Euclidean distance and the minimum path distance of all nodes on the associated path of Ei and Ej . That is to say, the higher the correlation between Ei and Ej of the entity group after clustering, the closer the direct connection is, and the more similar the decision based on Ei and Ej will be.

5.5.4 Knowledge Reuse in the Collaborative Design Process In the enterprise, new designers are more likely to learn the successful design practices and design processes of the past, so as to apply these experience in solving future design problems. Most of the time, the solution to a new design is obtained by modifying an existing design problem. Studies show that 75% of design activities involve instance-based design, and in new product development, about 40% are reusing past component designs, about 40% are slightly modifying existing component designs, and only about 20% are completely new designs. Therefore, it can be seen that design knowledge reuse plays an important role in product design. The purpose of design knowledge management is to provide knowledge support for product design activities, so that designers can quickly and conveniently acquire the knowledge they need, so as to help designers to carry out rapid design and innovation of new products. There are many systematic theoretical models of product design based on knowledge reuse. The research on product design has shifted from the extension of product design activities, such as concurrent design, collaborative design, virtual design, rapid prototyping technology, to product design combined with previous knowledge. The design process based on knowledge really reflects the design activity connotation and design essence method. In addition, design knowledge management is combined with product design process to maximize the benefits of design knowledge management. Modern product design is based on knowledge and centered on knowledge acquisition. Design knowledge is closely related to the design process, and the product design process is a process of gradually refining design knowledge from abstraction to concrete. Usually, the initial design of new products always searches for possible solutions in the existing case base, which is a reuse of design knowledge. If the existing product is not a completely satisfactory solution, it is necessary to find a new solution to the part of the structure that cannot be solved by the existing design knowledge. This is the product innovation process. In this stage, the knowledge associated with the product is constantly generated, and the amount of knowledge extraction is constantly increasing, which is a process of acquiring new knowledge. In the knowledge-based product design, it can be considered that in a certain stage K of product design, the design sub-process K applies specific design knowledge to

5.5 Knowledge-Assisted Decision Making


The product at stage k Generate


The product model at stage k-1 Design Subprocess of Apply design at stage knowledge at k stage k Design stage k

Fig. 5.14 The iterative process model of three-element product design in stage k

produce the stage product (the carrier of design knowledge) that needs to be achieved in this stage. As shown in Fig. 5.14, a three-element product design iterative process model is based on the stage product model (design knowledge carrier), design process and design knowledge. As can be seen from the figure, the design sub-process of a certain stage is combined with the design knowledge of this stage to breed the stage products required for this stage, and such continuous iteration will finally complete the design of the complete product. The model provides a theoretical basis for the establishment of rapid product design model based on knowledge reuse. Rapid product design based on knowledge reuse combines product design knowledge with design process. Figure 5.15 illustrates a rapid product design model based on knowledge reuse. In this model, model transformation and knowledge inheritance between different design stages are the essence of the evolution of design knowledge and design process. Each stage of product design is an iteration of design knowledge based on the design results of the previous stage. In fact, product design is the gradual evolution process of design concept and product model from the requirements analysis to the completion of a product design. Therefore, through the replacement and iteration of product design knowledge, the development of design concept and design process are reflected, and the reuse and evolution of product design knowledge are organically combined with the history of design process. Product design process and product model evolution are based on the integration and evolution of historical design knowledge. It is embodied in the following aspects: (1) When the product model evolves, the design knowledge and design process contained in the current product model reflect the design history of the model. The design history of a model indicates its origin and generation method, which is an important basis for model modification and evolution. (2) Using the historical knowledge of product design to reason the source of product model, mining the knowledge correlation and engineering semantics of product model, and strengthening the intellectualization of product knowledge modeling. (3) New product development mostly


5 Collaborative Design Knowledge Reuse Product model at stage N-1

Complete product

Product model at stage 2

Design subprocess M

Product model at stage 1 Product 2

Design knowledge M

Feedback Design stage M

Design subprocess 2

Product 1 Feedback Design subprocess 1

Design knowledge 1

Design stage 1

Design knowledge 2


Design stage 2

as ves evol l e d o uct m prod The

n esig the d

ses gres s pro s e c pro

Fig. 5.15 A rapid product design model based on knowledge reuse

is the process of the inheritance and development of the function, principle and structure of the existing product, and the reuse and evolution of product design knowledge reflects the utilization of this design history characteristic. To sum up, the product rapid design model based on knowledge reuse can not only control the product model with the geometric characteristic parameters, but also express the design ideas, principles and principles adopted by the designers in the design process with explicit knowledge. Compared with the traditional product modeling technology, it can better reflect the non-geometric information such as the function, behavior and design intention of the product. It clearly expresses that product design is a creative process driven by knowledge, and it is an integration process from discrete knowledge to knowledge collection, which better expresses the connotation and essence of product rapid design based on knowledge reuse.

5.5.5 Case Study of Knowledge Reuse in the Assembly Process The assembly of customized and complex products entails the collaboration of heterogeneous devices and thus raises the need of effectively planning the assembly process according to the dynamic product and environment information. For this purpose, a product assembly model is introduced to describe the hierarchical relationships and mating features between sub-assemblies. The integrated assembly knowledge model including product structure, spatial position, mating features, and assembly process is presented. Sensory information collected from the real world

5.5 Knowledge-Assisted Decision Making


and product model together form the assembly context. A two-step assembly knowledge reasoning process is developed, where similar case matching finds the same or similar product structure from the existing assembly instance library, and priority rules guiding completes the final assembly sequence.

Product Assembly Modeling

Product assembly model accurately describe the hierarchical relationships and interactions between different design entities. The hierarchical relationships are extracted from the geometric model, which is imported into the system through the interface between the system and the CAD modeling software. The assembly information is then added to the structure tree, associating the component of the product with the assembly feature, such as key, insert, screw and so on. The generated product assembly model is stored in the cloud in the form of semantic language. Figure 5.16 shows the product assembly model of a gearbox. Firstly, the threedimensional model of the gearbox reducer is built in the CAD software and the assembly tree structure is obtained. The sub-assembly of the gearbox assembly has four parts: the cover part, the input shaft part, the output shaft part and the gear part. Each part contains several parts. The assembly connection type between the main parts is added to the system through the maintenance of the assembly information.

Fig. 5.16 Product model of a gearbox


5 Collaborative Design Knowledge Reuse

Assembly Knowledge Modeling

The reasonable representation of assembly knowledge is the basis of the context reasoning. Due to the increasing complexity of products, the huge amount of data of complex models in the product design seriously affects the speed and efficiency of assembly planning. Therefore, rational construction of assembly knowledge semantics will improve the efficiency of the perception mechanism. As shown in Fig. 5.17, integrated assembly knowledge includes product structure, spatial position, mating features, and assembly process. Among them, the product structure reflects the composition relationship of the sub-assemblies and the components, the spatial position describes the final positional relationship of the assembly, the mating feature describes the connection relationship and the tolerance constraint, etc. The assembly process points out the resources and corresponding assembly sequence related to the product. Product structure reflects which subassemblies the assembly object consists of, and where it is assembled (parent assembly). It is directly related to the topological relationship of the object being assembled. Assembly process includes the type of assembly task, the resources required for that type of task, and the corresponding assembly sequence. Assembly process database will be continuously updated and improved according to the completion of new assembly tasks. Spatial position refers to the final positional state between the assemblies by setting the characteristic relationship of the assembly. The spatial positional relationship not only defines the spatial position of the assembly, but also implies the designer’s design intent, as well as constrains the degree of freedom between the mating objects. The designer assembles the components into assemblies by setting spatial positional relationships.

Fig. 5.17 Assembly knowledge model

5.5 Knowledge-Assisted Decision Making


Mating feature reflects the requirements of the assembly process for the positioning device. It includes the positioning reference and positioning method, which is influenced by the assembly positioning reference and positioning method, and at the same time matches the assembly accuracy requirements. Mating features play an important role in representing assembly and connection relationships.

Assembly Knowledge Reasoning

The assembly knowledge reasoning can be divided into two steps in this paper: similar case matching and priority rules guiding. Similar structure of different products is expected to use the same assembly method to reduce the complexity of the assembly operation. Similar case matching is to find the same or similar structure from the existing assembly instance library of the enterprise or industry, to quickly acquire the assembly process of the product and reduce the computational complexity of the assembly planning. If there are no typical instances that can be matched, the system will look for the corresponding assembly sequence for each join relationship of the assembly semantic tree. For those structures that cannot get the assembly sequence from the matching of the instance library, the system will use the knowledge of the mating feature to perform geometric reasoning of the assembly sequence by priority rules guiding. The detail case refer in Sect. 5.4.3.

Prototype System

A Web-based prototype system has been developed using the SSH development framework (Struts, Hibernate, Spring). The software and hardware environments for system implementation are summarized as follows: (1) Application server: A IBM desktop with Intel Core i7 CPU (3.40 GHz) and 16 GB memory, 512 GB SCSI HD and a Windows 10 operating system; (2) Programming platform: JDK 1.6, MyEclipseIDE, and Jena2.4; (3) Database: My SQL 6.0. The intelligent assembly cloud service platform is divided into five sub-function modules: assembly resource management, assembly task management, cloud platform management, intelligent assembly planning, and assembly simulation verification. Resource management uses the manufacturing service modeling method to model the assembly resources. Take the robot as an example to establish its resource service description model. MSR = {BaseInfo, Capacity, Interface, QoS} where BaseInfo = {Ku4200, assembly execution area, idle}, Capacity = {(function, grab/handle), (degree of freedom, 6), (maximum grab weight, 20 kg), (speed, 2 m/s), (control Type, servo type), (horizontal extension distance, 2100 mm)}, Interface = {(assembly execution system, conveyor line system), (tightening, conveyor line, locator)}, QoS = {500 W, 96%, good}.


5 Collaborative Design Knowledge Reuse

Fig. 5.18 GUI of assembly task management

The functional modules included in the assembly task management are: product model import, product data management, and assembly information maintenance. The product model is imported into the geometric model data of the product through the interface between the system and the CAD modeling software, shown in Fig. 5.18. The assembly information maintenance associates the component structure tree with the assembly features, and the product data management configures the assembly process data for the product in the form of a semantic model. Figure 5.19 shows the GUI of intelligent assembly planning. In this case, the overall structure of the gear reduction gearbox is first matched, and then the subassembly of the input shaft component and the gear component is matched, and finally the component combination of the box cover box seat connection and the bearing connection is matched. Each subassembly of the gearbox can retrieve similar existing instances from the knowledge base. The output shaft assembly retrieved therein is identical to the structure of the present case and the assembly sequence can be used directly. The assembly associated with the cover member includes a viewport cover assembly and a ventilator assembly. The assembly associated with the output shaft assembly contains only one output bearing and does not include a shaft cover, so a new sequence needs to be inferred based on the relevant rules in the previous table. According to the basic part rules, the cover is the heaviest and largest part and should be installed first. According to the connection parts rules, the bearing coupling and the fitting joint have the lowest priority, so the output bearing II and the output shaft cap group need to be placed at the end of the output shaft component assembly sequence, as listed in Table 5.4.

5.5 Knowledge-Assisted Decision Making


Fig. 5.19 GUI of intelligent assembly planning

Table 5.4 Connection case of matched subassembly No

Matched subassembly




Cover part


Cover—view hole cover—view hole cover bolt


Cover part


Box base—screw plug—ventilator—ventilator screw


Cover part


Box base—input part—output part—cover—cover screw


Output shaft assembly


Output shaft—key—gear—sleeve—output bearing I—output bearing II


Input shaft assembly


Input shaft—oil slinger—output bearing—oil slinger

According to the above matching cases and priority rules, the final gear reduction gear assembly sequence can be obtained as follows: P2-plug-P4-ventilator screw setP5-oil retaining ring-P7-oil retaining ring-P6-P8-P9-key—P12-P13-P10-P11-P14P1-P3—view hole cover screw set-cover seat screw set. The resulting assembly is shown in Fig. 5.20.


5 Collaborative Design Knowledge Reuse

Fig. 5.20 GUI of assembly simulation verification

References 1. Breners-Lee, T. (1998). Semantic web roadmap. [WWW document]. URL DesignIssues/Semantic.html 2. Brown, P., & Jones, G. (2001). Context-aware retrieval: Exploring a new environment for information retrieval and information filtering. Personal & Ubiquitous Computing, 5(4), 253– 263. 3. Chen, C. (2003). Mapping scientific frontiers: The quest for knowledge visualization. Journal of Documentation, 59(3), 364–369. 4. Cheng-Gang, W., Jiao, W., Tian, Q., & Shi, Z. (2001). An information retrieval server based on ontology and multi-agent. Journal of Computer Research & Development, 38, 641–647. 5. Cao, L., Chen, Y., & Zhang, L. (2013). Study of knowledge retrieval during product green design based on ontology. Journal of Hefei University of Technology (Natural Science), 36(5), 513–518. 6. Tu, J., Li, Y., & Li, W. (2013). Knowledge retrieval model and implementation for product innovative design. Computer Integrated Manufacturing Systems, 19(2), 300–308. 7. Wang, T., & Huang, X. (2012). Complex mechanism extension scheme design model based on knowledge reuse. Journal of Nanjing University of Aeronautics & Astronautics, 44(4), 548–552. 8. Yu, X., Liu, J., & He, M. (2011). Design knowledge retrieval technology based on domain ontology for complex products. Jisuanji Jicheng Zhizao Xitong/computer Integrated Manufacturing Systems Cims, 17(2), 225–231. 9. Ma, X. (2014). Research on the knowledge retrieval of product design based on ontology. Information Studies: Theory & Application, 6, 112–116. 10. Ganter, B., & Wille, R. (1999). Formal concept analysis: Mathematical foundations. Springer. 11. Shi, Z. G., Liu, Z. T., Wang, J. H., & Feng, D. (2010). Research on event network model and its application. Journal of Nantong University (Natural Science Edition), 9(3), 55–65. 12. Horrocks, I., & Sattler, U. (2007). A tableau decision procedure for SHOIQ. Journal of Automated Reasoning, 39(3), 249–276. 13. Sirin, E., Parsia, B., & Grau, B. C. (2007). Pellet: A practical OWL-DL reasoner. Journal of Web Semantics, 5(2), 51–53.



14. Berners-Lee, T., Chen, Y., Chilton, L., Dan, C., & Sheets, D. (2006). Tabulator: Exploring and Analyzing linked data on the Semantic Web. In Proceedings of the 3textsuperscriptrd International Semantic Web User Interaction Workshop (SWUI06). 15. Schank, R. C. (1982). Dynamic memory: A theory of reminding and learning in computers and people. Cambridge University Press. 16. Zhang, Y., Liu, X., & Jia, J. (2019). Knowledge representation framework combining casebased reasoning with knowledge graphs for product design. Computer-Aided Design and Applications, 17(4), 763–782. 17. Gong, Z. (2007). Research on ontology reasoning based on OWL. Doctoral Dissertation, Jilin University (In Chinese). 18. Cui, X., Tang, D., Zhu, H., & Yin, L. (2017). Knowledge representation and semantic inference of process based on ontology and swrl. Machine Building & Automation, 46(3), 6–10. 19. Peng, G., Wang, H., & Zhang, H. (2019). Knowledge-based intelligent assembly of complex products in a cloud CPS-based system. In 2019 IEEE 23rd International Conference on Computer Supported Cooperative Work in Design (CSCWD), IEEE, pp. 135–139. 20 Lim, B., & Dey, A. (2010). Toolkit to support intelligibility in context-aware applications. In Proceedings of the 12th ACM international conference on Ubiquitous computing. Association for Computing Machinery, New York, NY, USA, pp. 13–22. 21. Keen, P. (1980). Decision support systems: A research perspective. Decision support systems: Issues and challenges, pp. 23–44. 22. Schneider, & Hans-Jochen. (1979). Formal models and practical tools for information systems design. North-Holland Pub. Co.

Chapter 6

The Merging of Knowledge Management and New Information Technologies

At the beginning of the twenty-first century, the world is experiencing the fourth industrial revolution and digital transformation of the business world, which is commonly referred to as Industry 4.0. In the Industry 4.0 environment, interconnected computers, smart materials and smart machines communicate with each other, interact with the environment and ultimately make decisions with minimal human involvement. The real force of Industry 4.0 mainly includes digital interconnection, information development and sharing. Deploying with smarter machines and equipment, digitizing manufacturing and business processes may result in a number of advantages, such as higher manufacturing productivity and lower waste [1]. In such a context, the current manufacturing landscape requires some key factors, such as efficiency, flexibility, prompt response to market changes, and a greater focus on product quality and customization. Besides these critical factors, a higher level of digitalization and automation is required by manufacturing enterprises, which will result in extensive connectivity between manufacturing processes and other business areas, namely a high degree of integration of operational systems with the overall organizational structure. In addition to internal integration, it is also necessary to have a higher degree of integration with the external environment, especially with suppliers and customers [2]. After decades of development, knowledge management has entered a new stage and its importance for the sustainability and competitiveness of organizations has been emphasized by economic changes [3]. Knowledge is a key resource for achieving sustainable competitive advantage, because it can be translated into more effective business processes and quality improvements, as well as an enhanced ability of enterprises to identify new solutions and develop products that meet customer needs [4]. On the one hand, this emerging manufacturing model brings challenges to knowledge management—the actual situation of resource allocation, manufacturing capacity, production scale, etc. varies greatly among enterprises, which exacerbates the complexity of knowledge management; on the other hand, new information technology provides a driving force for the promotion of a new paradigm of knowledge management. © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. Wang and G. Peng, Collaborative Knowledge Management Through Product Lifecycle,



6 The Merging of Knowledge Management and New Information …

In this sense, the integration of knowledge management with Industry 4.0 technologies will play a key role in the new stage. In Industry 4.0, the main goal of implementing new technologies is related to the effective and efficient customeroriented adaptation of products and services, in order to increase the additional value of an enterprise and improve its competitive position, with satisfaction and loyalty of customers. To achieve it, it is necessary to manage new knowledge that is critical to the decision-making process [2]. The new Industry 4.0 technologies enables the integration between manufacturing operations systems and new information and communications technologies, such as big data, Internet of Things, digital twins and cyber-physical systems. In this chapter, the basic concept of these technologies will be briefly introduced and the integration of knowledge management with these technologies will be discussed respectively.

6.1 Big Data Technology 6.1.1 Overview of Big Data Big data refers to the dataset that cannot be captured, managed, or processed by conventional software tools within a specific time range. It is a massive, high-growth, and diversified information asset that requires new processing modes to have more substantial decision-making power, insight power, and process optimization ability. The four main characteristics of big data are massive data scale, fast data flow, diverse data types, and low-value density. Nowadays, big data and its analysis have become one of the core technologies, being widely applied by various industries, including technological, digital, and online businesses [5]. These data are generated from online transactions, emails, videos, audios, images, click streams, logs, posts, search queries, health records, social networking interactions, science data, sensors, and mobile phones and their applications. The strategic significance of big data technology lies not in the mastery of huge data information but the professional processing of this meaningful data. In other words, if big data is compared to industry, the key to the profitability of this industry lies in improving the processing capability of data and realizing the value-added of data through the processing. Similarly, the importance of big data is not the big scale but the valuable information, which means the value content and mining cost are more important than quantity. Using this massive data is the key to winning the competition for many industries. With cloud computing technology and another innovation background, the information which is difficult to collect and the process starts to be utilized efficiently. To systematically understand big data, it is necessary to decompose it comprehensively and meticulously. Firstly, a theory is essential for cognition and the baseline of being widely recognized and disseminated. With the feature definition of big data,

6.1 Big Data Technology


Big Data






Fig. 6.1 Three five Vs of big data

the overall description and characterization of big data in the industry can be understood; with the discussion of the value of big data, the unique features of big data can be deeply analyzed, and the development trend can be sighted. Secondly, technology is the means to reflect the value of big data and the cornerstone of progress. The development of cloud computing, distributed processing technology, storage technology, and perception technology can respectively describe and illustrate the whole process of big data from collection, processing, and storage to the formation of results. Thirdly, practice is the ultimate value of big data. The beautiful scene shown by big data can be supported by the Internet, government, enterprises, and individuals. Big Data requires a revolutionary step forward from traditional data analysis, characterized by its five main components: Volume, Variety, Value, Velocity and Veracity as shown in Fig. 6.1. Each component is briefly expressed below [6, 7]. • Volume: Large amount of data, including collection, storage and calculation. The unit of measuring big data is at least P (1000 T), E (1 million T) or Z (1 billion T). • Variety: Diversity of species and sources, includes structured, semi-structured and unstructured data, which is embodied in network logs, audio, video, pictures, geographic location information, and etc. Multiple types of data require higher data processing ability. • Value: The data value density is relatively low, but it makes sense to say with business value, every other V is a waste of time. With the wide application of the Internet and the Internet of Things (IoT), information perception is everywhere and there is a huge amount of information, but the value density is low. The most important problem to be solved is to mine the data value by combining business logic and powerful machine algorithm. • Velocity: The high speed of generating, producing, refreshing and streaming data, and the high timeliness. For instance, search engines require that the news happened a few minutes ago can be queried by users, and personalized recommendation algorithm requires real-time recommendation as far as possible. This is a significant feature of big data that distinguishes it from traditional data mining. • Veracity: The accuracy and reliability of the data, namely the quality of the data. It is the classic GIGO, garbage in, garbage out.


6 The Merging of Knowledge Management and New Information …

Besides the five Vs, other additional ones are covered below in order to expand the knowledge management model [8]: • Variability: The speed of loading data from database is uneven. It relates to data types and sources. • Validity: Similar to Veracity, reliable policies are quired for quality and consistent data in order to reduce the cost of data cleaning. • Vulnerability: Security is one of the most important issues for data services. A hacker attack may result in unimaginable damage. • Volatility: How long the fresh data can remain relevant and useful is a big issue for big data. The data must be relevant to the business and managers must consider the volatility because of the velocity and volume of big data. • Visualization: Traditional graphical methods are not suitable for big data because of the memory constraints, poor scalability and long response time. Therefore, data clustering, sunbursts, parallel coordinates, circular network diagrams, cone tree and other graphical methods need to be considered.

6.1.2 Knowledge Management in Big Data With the improvement of application software technology, hardware technology, and user participation, the process of big data is gradually accelerating. According to the five Vs of big data, Value, the most important one, cannot be attained without a good strategy [7]. Knowledge management builds a quantitative and qualitative knowledge system in the organization, let the information and knowledge finally feedback to the knowledge system through the process of acquisition, creation, sharing, integration, records, access, update, and innovation. It forms a continuous accumulation of individual and organizational knowledge, becoming the cycle of knowledge and wisdom. The knowledge and understanding then become the intellectual capital of management and application in the enterprise organization, which can help the enterprise make the right decision to adapt to the changes in the market. Therefore, comparing the requirement of extracting the value of big data with the process of knowledge management to create knowledge, a link between big data and knowledge management appears to improve organizational performance. Lack of constant knowledge creation leads to poor performance; therefore, proper strategies need to be devised to create new knowledge. Moreover, knowledge fostering enablers such as people, processes, systems and organizations need to work coherently for effective knowledge creation. Finally, how knowledge creation and learning occur using these resources are explained by the SECI model [9]. The traditional SECI model, shown in Fig. 6.2, divides enterprise knowledge into two categories: tacit knowledge and explicit knowledge. Tacit knowledge includes beliefs, metaphors, intuitions, thought patterns, and so-called “tricks.” On the other hand, explicit knowledge can be transmitted in a standardized and systematic language, i.e., textualized knowledge. SECI model has a basic premise: both human learning and knowledge innovation are realized and completed in groups and situations of

6.1 Big Data Technology


Tacit Knowledge






Explicit Knowledge

Fig. 6.2 Traditional knowledge management: SECI model

social interactions. It is the existence of the society that enables cultural inheritance activities. Every person’s growth and every ideological innovation needs collective wisdom. Therefore, the mutual transformation of tacit knowledge and explicit knowledge go through the process of socialization, externalization, combination and internalization. On the stage of big data, the enterprise needs “dynamic capability” to deal with new trends and create new knowledge. This concept of dynamic capability is referred to as the “ability to integrate, build and reconfigure internal and external competencies to address changing environments”. Based on this dynamic capability view, big data is also a new trend which has created a lot of opportunities to discover and gain understanding of hidden information to improve different business processes and operations, especially in prediction and decision-making [9]. Therefore, in big data based knowledge management, more factors are needed to be considered which may influence the data itself. As shown in Fig. 6.3, market trends, policy, competitor data, economic data, customer information and other force majeure should be considered as they can change data thus influencing the decision-making. Besides the important factors mentioned before, one basic premise of the combination of big data and knowledge management is that knowledge is the central in any big data situation. It is well-known that the development of big data and its analysis is based on human knowledge, which also decides the algorithms of collecting and processing data. Therefore, even though a large amount of data can be accessed,


6 The Merging of Knowledge Management and New Information …

Market Trends Policy

Other Force Majeure

Big Data based Knowledge Management

Customer Information

Competitor Data Economic Data

Fig. 6.3 Big data based knowledge management

knowledge is still the central part of big data analytics. Above all, it is human knowledge that decides how to apply the information generated from big data analytics [8]. In Fig. 6.4, the Big Data Analytics-Knowledge Management model (BDA-KM) [8] illustrates the role knowledge plays in the use of big data as well as the analytic process. The model handles a problem from contextual knowledge, which mentions the knowledge situated in organizational contexts including the tacit knowledge of employees, implicit knowledge in systemic processes and activities, outputs and stakeholders. Contextual knowledge is significant since it can be utilized by managers and professionals to establish frames and systems. Moreover, contextual knowledge is also generated from new technology and can be applied to choose suitable analytic tools for extracting information and identifying new knowledge. More specifically, key search words via text mining may be presented by data attributes of frequency, region and gender. It will be the analysts’ decision what key words will be analyzed to identify consumers’ behavior if the analysis is for a marketing survey. Thus, new knowledge might be created through this process, and such knowledge becomes the basis for answering previously defined problems or to inform or initiate subsequent organizational actions. A well-known example of creating knowledge from big data and analytics to initiate new actions is the way Amazon uses customer data and dedicated analytics (recommender systems) to suggest products to customers [8]. Finally, decisions or answers are generated after the whole process has been finished, and it still requires human knowledge for further decision and better utilization of the results from the model. The BDA-KM model emphasize the main character of knowledge in big data analytics. On the one hand, contextual knowledge can lead to big data storage crossplatforms at the operational level. On the other hand, it turns raw big data into extracted information and new knowledge with the analytic techniques and tools [8]. From the strategic perspective, big data can assist in knowledge co-creation, which, in turn, leads to evidence-based knowledge products for better business competition. Big data provide a justification for bringing proven KM strategies and tools to bear on data sources, while at the same time KM can opportunely support the process of value creation from big data. Strategies of KM could be adopted as a

6.1 Big Data Technology

235 Contextual Knowledge


Analytics Technology


New Knowledge

Big Data Datasets

Extracted Information

Decisions, Answers

Fig. 6.4 Big data analytics-knowledge management model

lens for analyzing the huge amount of data to transform them into valuable assets for business competition. The centrality of the union of big data and KM is knowledge products that align with the business strategies of the organization [10].

6.1.3 Case Study The advent of the era of big data provides strong technical support for knowledge management. This knowledge system is built into a database, which can classify, integrate, record, read and update scattered information and data. Furthermore, it has the ability of data query, data sharing and data transmission, thus helping decisionmakers make correct decisions and reduce decision-making risks.

Small and Medium-Sized Enterprises Knowledge Management

Big data has new characteristics, such as large scale, high speed, diversity, and omnipresence. Specifically, it refers to the requirement of putting forward targeted schemes for the operation mode of enterprises based on the rapid acquisition, processing, analysis, and extraction of valuable, massive, and diversified transaction data and interaction data. Meanwhile, knowledge management plays a crucial role in big data analytics as human knowledge guides how the information generated by big data can be utilized in operational, tactical, and strategic areas. Therefore, the impact of knowledge and the impact of big data analytics will always go hand in hand. The synergistic relationship between big data and knowledge management


6 The Merging of Knowledge Management and New Information …

shows that only with knowledge management can big data produce big knowledge [11]. Organizations must understand the reason for incorporating big data into knowledge management, identify big data types that enhance knowledge management practices, enable stakeholders to leverage big data to create value, and then develop big data processes for knowledge management [12]. With the consideration of it, an organization must establish a strategic plan for the computing infrastructure, organizational procedures, policies and rules related to big data [13]. However, for small and medium-sized enterprises, harvesting big data is an organizational venture. In summary, general motivations of small and medium-sized enterprises to use of data for the competition includes [10]: • • • •

Reduce costs and increase profit through cost and profit analyses. Improve customer services through customer’s purchase analyses. Enhance marketing strategies through marketing campaign analyses. Achieve sustainability through long-term risk analyses.

The analysis of big data involves multiple distinct phases as shown in Fig. 6.5, each of which introduces challenges [14]. For small and medium-sized enterprises, the challenges are more specific and urgent as shown in Fig. 6.6, includes at least five main parts [10]: • Strategic Use of Big Data: Due to the lack of financial resources for small and medium-sized enterprises to invest in technology, it can be a complex task to deploy, manage and support end-user computing for big data. • IT Literacy and Ethics: It is an important issue to ensure security, IT ethics and intellectual property related to big data, including defining end-user computing policies, IT ethics and intellectual property ownership, and proposing methods for protecting information of companies and customers. • IT Infrastructure: The physical structures of the end-user computing infrastructure must be established to manage big data, including technical facilities and Challenges


Acquisition/ Recording


Extraction/ Cleaning/ Annotation


Integration/ Aggregation/ Representation

Major steps in analysis of big data

Fig. 6.5 Big data procedural and challenges


Analysis/ Modeling

Human Collaboration

Interpret Action

6.1 Big Data Technology

237 Strategic use of big data

IT literacy and ethics General Big Data Processing & Challenges

Challenges for Small and Medium-Sized Enterprises

IT infrastructure

Computing architecture

Big data tools

Fig. 6.6 Big data challenges for small and medium-sized enterprises

organizational programs. The lack of competent IT staff is a significant barrier to achieve IT infrastructure flexibility. • Computing Architecture: An appropriate computing architecture to handle dataintensive applications is required. • Big data Tools: Convenient and easy-to-use tools are the dependency for small and medium-sized enterprises to process big data. Based on the challenges for small and medium-sized enterprises, a knowledge management model of big data for small and medium-sized enterprises is introduced as shown in Fig. 6.7 [10]. The model mainly has four parts which solve the challenges respectively. • Strategic Method of Big Data: The enterprises must clarify their big data strategies and address potential organization challenges, which will lead to lower costs, more profits, better customer services and less risks. • Knowledge Guided Planning: With an identified strategic method of using data to support the business, the enterprise must apply prior knowledge of the business context and identify the data requirements, including the data content, data sources and data resource management. • IT Solutions: IT solutions are the best choices to address big data challenges. Feasible technologies include end-user-oriented application software, social media, cloud computing, etc. IT literacy and ethics Strategic use of big data

Big data tools

IT infrastructure Computing architecture

Str ategic method of big data

Knowledge guided planning

IT solutions

Lower costs and more profits

Prior knowledge guide

Social media

Better customer services

Data contents and sources

End-user-oriented application software

Less risks

Data resource management

Cloud computing

Knowledge products New business plan

New customer pool New decision making rules New marketing materials

Fig. 6.7 Knowledge management model of big data for small and medium-sized enterprises


6 The Merging of Knowledge Management and New Information …

• Knowledge Products: Generating new knowledge is the ultimate goal of dealing with big data, which can help enterprises improve the performance. New knowledge includes new business plan, new customer pool, new decision-making rules and new marketing materials. Since the current development of conceptual models is based on the release of limited small and medium-sized enterprises’ significant data business cases, there are limitations in this model. The conceptual knowledge management model proposed needs further verification and verification. In any case, the model can guide small and medium-sized enterprises to leverage big data for their business. Focusing on knowledge management rather than complex IT technologies is more efficient. Successful small and medium-sized enterprises should not ignore the opportunities of big data and exploit big data to produce valuable knowledge products for competition.

Maritime Business

Big data and business analytics bring new capabilities, but obviously require clear and insightful directions. Future research may relate to knowledge management, particularly the interaction of competitive intelligence, business intelligence and intellectual capital elements. To improve maritime innovation capability, a big data driven knowledge management model is introduced in maritime organizations [15]. As shown in Fig. 6.8, the four main areas which can be addressed with this model include: strategy and decision making in shipping companies, competitive intelligence, human capital development, and data-driven culture. Strategy and decision-making are the critical parts of the operation of shipping companies. Through strategy and decision making, investment evaluation and portfolio management are produced, including vessel valuation, risk assessment of financing options, and scanning and positioning or exit decisions of the market. With the real-time collection of various large data resources, the possibility of new information services can be immediately foreseen. Nowadays, from the view of big data design thinking, explaining and predicting the evolution of the market and the Fig. 6.8 Four areas in maritime business

Data-driven culture Competitive intelligence

Strategy and decision making in shipping companies

Human capital development

6.1 Big Data Technology


strategic adjustment preferred by the company are considered the closest approach to the evolution of knowledge management systems. From the perspective of shipping, the greatest research and development focus is mainly on IoT-based parts, including ship health monitoring, energy consumption monitoring, environmental impact monitoring, and security platform monitoring. Commercial software and related government policies are under development. Competitive intelligence aims to collect and analyze available data to determine the business behavior patterns of partners, customers, and competitors. Also, it can forecast competitors’ behaviors and interpret strategies that support observable actions, thus predicting future strategic and tactical actions. Similarly, the future behavior of customers, such as shippers and charterers, can be analyzed and supported with greater confidence. The functions stated are based on several related technologies, including sentiment analysis based on social media information to infer users’ opinions and integrated programs for market trend prediction. Human capital development includes two parts: on board and ashore. As ships gradually become automatic driving, the crew are required to be familiar with hightech equipment on board, which means seafarers need to be technologically literate and equipped with problem-solving, decision-making, and communication skills. On the other hand, maritime professionals are also challenged by new knowledge and certification requirements. With the development of big data technology, knowledge management is expected to be integrated into marine education and training, including developing skills assessment and course recommendation products related to marine market trends. A data-driven culture is a transformation from craftsmanship to knowledge and science orientation. With the development of Industry 4.0, the maritime business enters a new stage. As big data technology becomes an integral part of ships and shipping companies, the number of seafarers and professionals carrying out their work based on data and related technologies is continuously increasing. In the range of decision support and business support, an innovation-oriented mindset is also gaining momentum in the shipping environment. Knowledge management is now considered essentially supported by enterprises to improve the system innovation capability of shipping companies. Because data is a vital business resource, knowledge management theories and tools are in a new formative stage. The knowledge creation, management, and sharing process will be redefined based on innovation-oriented enterprise thinking.


6 The Merging of Knowledge Management and New Information …

6.2 Internet of Things (IoT) Technology 6.2.1 Overview of IoT The Internet of Things means physical entities are connected to the network through sensing devices according to the agreed protocol. Objects realize intelligent identification, positioning, tracking, supervision, and other functions through information exchange and communication. It has three essential characteristics: standard object device, autonomous terminal interconnection, and universal service intelligence. According to other definitions, IoT refers to ubiquitous devices and facilities. On the one hand, it includes intrinsic intelligent devices and facilities, such as sensors, mobile terminals, industrial systems, building control systems, brilliant home facilities, video surveillance systems, etc. On the other hand, external devices and facilities are also included, such as various assets affixed with RFID, people, vehicles carrying wireless terminals, etc. Smart objects and intelligent mote achieve machineto-machine, grand integration, and cloud-based Software as a Service (SaaS) through various wireless and wired, extended, and short-distance communication networks. With Intranet, Extranet, and Internet, an appropriate information security guarantee mechanism is adopted to provide safe, controlled, and personalized realtime online monitoring, positioning traceability, alarm interaction, scheduling and command, plan management, remote control, security prevention, remote maintenance, online update, statistical reports, decision support, and other management services, achieving the highly efficient, energy saved, safe, environmental integration. As shown in Fig. 6.9, in a simple term, IoT is a network of physical elements, with clear element identification, embedded with software intelligence, sensors and ubiquitous connectivity to the Internet [16]. The corresponding elements are shown below: • • • •

Sensors: for data and information collection. Identifiers: for identifying data sources. Software: for data analysis. Internet connectivity: for communication and notification.



Sensors Identifiers IoT Software Internet connectivity

Fig. 6.9 Simple definition of IoT


6.2 Internet of Things (IoT) Technology

241 Internet


People Data Process Things



Processes Standards

Fig. 6.10 More complete definition of IoT

In a more complete term, IoT should also include “Standard” and “Processes” which aims to allow “Things” to be connected over the “Internet” to exchange “Data” [16]. As shown in Fig. 6.10, IoT refers to internet of everything including people, process, data, and things: • People: The network connects people in more relational ways. • Data: Better decisions are made by converting data into intelligence. • Process: Delivering the right information to the right person or machine at the right time. • Things: Physical devices and objects connected to the Internet and each other for intelligent decision-making, often called IoT. IoT model is required to provide unique identifier per thing, be able to communicate between things and sensor information from things. Therefore, IoT model needs to have the ability to monitor anything from anywhere around the world. Figure 6.11 presents a basic process of IoT, which is typically based on telecommunication technology. Besides, IoT is the transfer and control of information between things and people, so the key technologies include: • Sensor technology: Most computers process digital signals, so sensors need to convert analog signals into digital signals to make sure computers can process. RFID tag is a sensor technique which is a comprehensive technology fusing radio Things

Unique Address Sensing & Actuating

Fig. 6.11 Basic process of IoT

IoT Network

Ability to Communicate


Notification & Control


6 The Merging of Knowledge Management and New Information …

frequency technology and embedded technology. RFID has a broad application prospect in automatic identification and goods logistics management. • Embedded system technology: It is a complex technology integrating computer hardware and software, sensor technology, integrated circuit technology and electronic application technology. Embedded system is changing people’s life, promoting the development of industrial production and national defense industry. • Intelligent technology: This one related to various methods utilizing knowledge to achieve certain desired purpose. By implanting intelligent system into objects, the objects can actively or passively communicate with the users. • Nanotechnology: It is the study of the properties and applications of materials with structural size in the range of 0.1–100 nm. Sensor technology can detect the physical state of objects, embedded technology and intelligent technology can embedded intelligence in objects to enhance the power of the network by transferring information processing power across the boundaries of the network. With the advantages of nanotechnology, more and more smaller objects in the IoT can be interacted and connected. The trend in electronics is letting devices and systems to be smaller, faster, more responsive and more energy efficient, which means nanotechnology has a huge impact and becomes the final frontier for builders. Based on the key technologies stated above, IoT is formed by applying network technology to all things, such as embedding sensors in the oil network, power grid, road network, water network, buildings, dams, and other objects. A simple IoT pyramid is shown in Fig. 6.12, introducing a foul-level framework. It includes IoT devices (things), IoT network (infrastructure transporting the data), IoT Services Platform (software connecting the things with applications and providing overall management), and IoT applications (specialized business-based applications such as customer relation management (CRM), accounting and billing, and business intelligence (BI) applications). The control is passed down from one level to the one below, starting at the application level and proceeding to the IoT devices level and backup the hierarchy [16]. • IoT Application: All applications operating in the IoT network. • IoT Management Services Platform: The key management software functions to enable the overall management of IoT devices and network; main functions connecting the device and network levels with the application layer. • IoT Network: All IoT network components including IoT gateways, routers, switches, etc. • IoT Device: All IoT sensors and actuators. Advantages of the proposed IoT four-level model include [16]: • Reduced Complexity: It breaks IoT elements and communication processes into smaller and simpler components, thereby helping IoT component development, design, and troubleshooting.

6.2 Internet of Things (IoT) Technology



Fig. 6.12 IoT pyramid

IoT Applications IoT management Services Platform

IoT Network IoT Devices

• Standardized Components and Interfaces: The model standardizes the specific components within each level (e.g., what are the key components for general IoT Services Platform) as well as the interfaces between the various levels. This would allow different vendors to develop joint solutions and common support models. • Module Engineering: It allows various types of IoT hardware and software systems to communicate with each other. • Interoperability between vendors by ensuring the various technology building blocks can interwork and interoperate. • Accelerate Innovation: It allows developers to focus on solving the main problem at hand without worrying about basic functions that can be implemented once across different business verticals.

6.2.2 Knowledge Management in IoT IoT utilizes the IT technology of a new generation, puts it together with the existing Internet, and achieves the integration of the human society and the physical system. From the role of IoT, we see that its purpose is the same as the knowledge management system, which aims to improve the utilization of resources and productivity level, and the relationship between humans and nature. IoT focuses on the network construction process in the early stage, while the knowledge management system emphasizes practical applications in the later stage. As IoT covers a wide range and involves many fields, it is essential that information sharing and unified cooperation be implemented to achieve a win–win effect to give full play to the role of IoT. Therefore, the extensive application of knowledge management systems in IoT will promote IoT’s faster development. With the consideration of it, effective knowledge management should focus on the following aspects: reducing the response time to environment stimuli with more efficient knowledge acquisition and application; creating and keeping potential for


6 The Merging of Knowledge Management and New Information …

the growth and development of the company; increasing the value and/or profitability of the company; improving the products and services of the company, stimulating knowledge production, reducing operating costs, and ensuring continuous improvement in quality and efficiency [17]. However, traditional knowledge management systems have limitations and a new generation of knowledge management system should offer the following capabilities in the era of IoT [18]: • • • • • •

creating personalized experiences and track different multiple sources; moving and reposition data from different sources; optimizing data for different uses; providing real-time on-demand data for decision-making; collecting instructional data and machine data; enabling the knowledge management system to be seamless and interoperable, and to have realistic connectivity that reduces costs, improves quality and derives greater productivity.

In the IoT context, knowledge management now has the opportunity and the capability to collect data from various resources and then more knowledge can be created from the amount of data. Based on the requirements stated above, a new knowledge management system shown in Fig. 6.13 should include the following four components at least [19]: • Information Technology infrastructures: These physical technologies will help collect and manage knowledge effectively, including hardware, software, extranets, intranets, local area networks (LANs), etc. • Collaborative technologies: These technologies play a role in the process of data collection, transmission and storage, including forums, shared databases, document repositories, workflows, etc.

New Knowledge Management System

IT Infrastructures

Hardware Software Extranets Intranets LANs

ICT Adoption

Collaborative Technologies

Forums Shared databases Document repositories Workflows

Fig. 6.13 Four components of a new knowledge management system

Integrate Technologies

6.2 Internet of Things (IoT) Technology


• Information and Communications Technology (ICT) adoption: It can integrate different collaborative technologies, being directed towards three main implementation objectives: – The ICT informative orientation aims to deliver business information to a number of stakeholders across organizational and functional boundaries; – The ICT communicative orientation aims to reduce costs and interact with a number of commercial agents inside and outside the organization; – The ICT workflow orientation aims to be utilized to establish electronic processes in corporate technologies. • Integration technologies: These technologies work in an open and collaborative IoT-driven environment, including: – The integration of the website with back-end systems and databases; – The integration of internal and external stakeholder databases. Combining IoT and knowledge management will result in a number of benefits in some fields, but it is also companied with potential risks. Some areas or objectives of knowledge management are shown in the following with the potential and risk of the IoT implementation [17]. • Strategy/development of knowledge and competence resources: – Potential: Access to external expertise, the latest technology; providing tools to enable access to organizational and technological knowledge; providing tools for support of the transfer and sharing of knowledge. – Risk: Need of better control over the IoT environment; Dependence on the suppliers of IoT. • Human resources management/development of intellectual capital: – Potential: Developing new skills and competences in the field of IT; Improved cooperation and transfer of knowledge between organizations. – Risk: Possible loss of intellectual capital in case of the dismissal of IT staff as a result of the adaptation of IoT solutions; The need to improve knowledge and to develop new skills to use the new technology (IoT). • Process management: – Potential: Supporting innovation; Reducing the time of process implementation. • Marketing/business intelligence/innovation: – Potential: Possibly unlimited access to internal and external data. Ensuring the possibility of direct integration of external entities with the company; Supporting market-driven innovation (for products/services). – Risk: The possibility of losing competitive advantage by acquiring strategic information or sensitive data by the competitors.


6 The Merging of Knowledge Management and New Information …

• Information technology/information security management: – Potential: Improved use of IT resources; Possibility of collecting data from diverse products, company assets or the operating environment; Possibility of the transmission of real-time data from wireless network. – Risk: Loss of control over the IT environment; System failures; Services unfit for the actual needs of the organization; The possible decrease in safety; The possibility of loss and/or unauthorized use of sensitive data.

6.2.3 Case Study IoT is the third wave of the world information industry after the computer and the Internet. As an essential support and communication bridge for enterprises to keep pace with the times and improve their core competitiveness, the knowledge management system will play an increasingly important role in this wave. It will also become the best partner for enterprises in the IoT era and contribute to building a harmonious society.

Enterprise Knowledge Management

Knowledge management is an integral part of enterprise management. It establishes a humanistic and technical knowledge system in the organization so that the information and knowledge will go through the acquisition, creation, sharing, integration, record, access, update, and feedback to achieve the ultimate goal of constant knowledge innovation. The knowledge of individuals and organizations can be accumulated continuously. The ability to think from the system’s perspective will become the organization’s intellectual capital and help enterprises make correct decisions in response to market changes. Enterprise knowledge management mainly includes five stages: cognition, planning, pilot, promotion and support, and institutionalization. A general model is shown in Fig. 6.14. • Cognition: The process of unifying the cognition of knowledge management in enterprises, sorting out the significance of knowledge management to enterprise management, and evaluating the current situation of knowledge management in enterprises.




Promotion & Support

Fig. 6.14 General model of enterprise knowledge management


6.2 Internet of Things (IoT) Technology


• Planning: The process of carrying out the knowledge management planning with detailed analysis of the status quo and types of knowledge management, and combination with business processes and other perspectives. • Pilot: The process of selecting appropriate departments and processes to carry out knowledge management practice in accordance with the planning basis. The knowledge management plan is evaluated from the short-term effect and corrected with the considerations of the problems in the pilot. • Promotion and support: The process of promoting knowledge management in enterprises on a large scale to fully realize its value, on the basis of revising the knowledge management plan in the pilot stage. • Institutionalization: The process of making knowledge management become a part of the operation mechanism of the integrated enterprise, to integrate knowledge management into the management system of enterprise strategy, process, organization and performance. With enterprise knowledge management, the managers can build learning and training plans for all departments and posts based on the knowledge base so that members of the organization can recharge themselves at any time and become a team, thus improving their skills. Knowledge management also classifies various knowledge contents (e.g., programs, plans, systems) and formats (e.g., photos, Word, Excel), thus improving the application efficiency of organizational knowledge. An enterprise knowledge base makes knowledge query invocation simpler, thus allowing total usage of knowledge results, improving work efficiency, and reducing repetitive work. As a typical application of IoT technology, Community Security Intelligent Access Control project manages the entry and exit of floating population and personnel by installing intelligent micro-card ports. The project covers more than half of the property communities and senior communities in the main urban areas of the district, involving 300,000 residents in the neighborhood. This project has dramatically improved residents’ property and life safety and assisted public security in completing the investigation and statistics of live population data in the covered areas. Knowledge management aims to summarize the successful implementation experience of the existing projects and provide an empirical reference for promoting the projects in other regions. In this project, the purpose of knowledge management is to sort out and classify the implementation process of intelligent projects, establish a knowledge base, and integrate knowledge management into the project. According to the actual situation of the company and the project, new implementation steps are formulated as shown in Fig. 6.15. • Cognition: Before the establishment of the enterprise knowledge management system, a mobilization meeting is required to train the leaders of the main departments on knowledge management, so that they will understand the significance of knowledge management and their roles in this process. A knowledge management


6 The Merging of Knowledge Management and New Information …



Promotion & Support



Verify Update Pilot

Fig. 6.15 Model of knowledge management system

system construction team is also necessary to be responsible for the construction of the knowledge management system of the entire enterprise. • Planning: The main task in this stage is to organize the structure of the knowledge base. The implementation process of intelligent access control project is the core work content of this project, including survey, contract signing, installation, collection, operation and maintenance. Based on the current status of the project and the experience of the previous project, the knowledge base of intelligent access control project is divided into two parts, as shown in Fig. 6.16. At the enterprise level, it requires the cooperation of various departments. On the other hand, at the department level, it requires the standardization of various work and processes within each department. • Pilot: This step is divided into three parts: create, verify and update. After the completion of the knowledge management plan, a detailed knowledge examination plan is developed. Relevant staffs prepare the plan according to the wellexecuted knowledge management, and complete the corresponding knowledge


Enterprise Level

Department Level


Contract Signing

Promotion and implementation system

New employee training plan

Ground promotion scheme and process

Operation and maintenance service system

Marketing Department


Operating Department

Customer Service Center


Fig. 6.16 Planning procedure



6.2 Internet of Things (IoT) Technology


sorting and editing. After finishing the process of knowledge edition and revision, the knowledge will be practiced in actual work to confirm its practical value. The knowledge editor is responsible for tracking the practical effect, revising the knowledge in the knowledge base and deleting the worthless knowledge according to the pilot effect. • Promotion and support: With the confirmation of the practical value of the knowledge base in the pilot stage, the knowledge management will be promoted in a larger scope, gradually be shared with all staffs, and finally be integrated into the whole business. • Institutionalization: With the completion of the implementation of the intelligent access control project, the next task is about operating and maintaining the system. A management system, combing with the performance appraisal standards, is necessary to institutionalize the knowledge management system and making knowledge management become a part of the operating mechanism of the whole enterprise. The enterprise knowledge management practice has a positive impact on the operation of the whole company. The project realizes the process control, pays attention to the execution details, grasps the process information, monitors the execution direction, ensures the execution efficiency, and corrects the directional errors to ensure the smooth implementation of the target, thus realizing the work process monitoring and process correction. Also, it promotes organization flattening and information access by eliminating information isolation, helping employees and leaders to accurately grasp the overall and detailed execution information, avoiding information distortion, and improving the accuracy of work and management. Furthermore, it accumulates knowledge and experience through the process of procedural documentation. It helps improve employees’ quality by promoting internal cultural exchanges and experience sharing. From the operation of the whole enterprise and the project, intelligent access control knowledge management has achieved satisfying results. After the launch of the intelligent access control project, the entire intelligent community service works well and has been recognized by customers and residents. The project eventually became a pilot project of national community security construction, and many cooperators and competitors came to visit and learn.

Automotive Domain

Currently, with the development of IoT, it shows that most IoT activities are related to manufacturing, transportation, smart city and consumer applications. In the automotive domain, IoT has important implications [18]: • It allows real-time tracking of the location of automobiles using cloud-based intelligent monitoring control system.


6 The Merging of Knowledge Management and New Information …

• It provides inter-equipment connection using devices attached to vehicles. For example, data transmission equipment A on a vehicle communicates with data transmission equipment B on another vehicle. This communication may allow drivers to take appropriate precautions to avoid delays or accidents. • Fuel management: Sensor data provide drivers with better visibility into fuel consumption and efficiency, potentially saving millions in fuel costs. • Improved passenger comfort and convenience: Travelers can be alerted about delays via their mobile devices. • Predictive maintenance: Vehicles can transmit defect data directly to engineers. Predictive maintenance can identify components in need of repair/replacement, eliminating the need to take equipment out of service for routine inspections and preventive maintenance. The basic part of knowledge management is data, without which no knowledge can be discovery. In the automotive domain, important data can be captured from various IoT connections and devices [18]: • Sensors: Sensors installed in vehicles offer the ability to track maintenance needs, driver safety, fuel usage and other related metrics in real time. • Roads: The IoT equips roadways with embedded road sensors to capture data related to temperature, humidity and traffic volumes. The data gathered by these sensors are transmitted over the wireless network for further processing and analysis. The data collected also provide real-time information about the condition of the road. • Parking: Sensors can be embedded on the pavement to collect data and then to make the data available to drivers and parking facility operators. Drivers can make use of a smart app to locate available parking spaces and to determine the cost. This can simplify the problem of finding an appropriate parking space. • Vehicles: Vehicles can be connected with IoT-enabled devices. A special device can be installed in the vehicle that can allow drivers to monitor and control their vehicles remotely. This device is GPS-enabled so that drivers can see maps of where and how far they have driven. This device also enables drivers to monitor the security of their vehicles, such as the locking and unlocking of the vehicle’s doors. Data do not equal information, and information is not equivalent to knowledge. However, by analyzing raw data, one can extract information, and the more information one gathers and validates, the more ability is created. IoT collects data from various sensors, so the data need to be classified, mined, organized and used to make automated decisions, which requires knowledge management. With knowledge management, the data collected from IoT can be turned into helpful knowledge quickly and effectively. This is crucial in automotive since road conditions change quickly and require fast responses and decisions [18]. Take intelligent parking service as an example. It can be a challengeable issue to find an available parking space in many situations. An IoT-based intelligent parking cloud service is required to help users find available parking spaces, which

6.3 Digital Twins

251 Parking Management

Application Level

Function Level

Communication Level




V2I Communication

Driver Level

Advertisement Management

I2I Communication

Driver Management (Vehicle, Belts, and IFD)

Fig. 6.17 Software architecture for intelligent parking cloud service

collects and analyzes geographic location information, parking availability information, parking space reservation and order information, traffic information and vehicle information though sensor detection and through the clouds. In this process, the use of vehicle-to-infrastructure (V2I) and infrastructure-to-infrastructure (I2I) communications are vitally important. Figure 6.17 shows a software architecture for intelligent parking cloud service using a modular approach [19]. The integration of a massive amount of real-time heterogeneous data from different sources, including traffic information, bus timetables, waiting times at events, event calendars, environmental sensors for pollution or weather warnings, GIS databases, parking availability information, parking space reservation and order information, traffic information and vehicle information is a big challenge that cannot be solved without effective KM. A knowledge-based decision system is further recommended to integrate and analyze these data to generate better information and analysis, including real-time parking lot statistics/reports, which can enhance decision-making significantly and make parking administrators’ jobs easier. The knowledge-based decision system can also conduct historical or longitudinal data analysis to generate new knowledge. This can help parking administrators to develop a long-term car parking solution that can guide drivers to find parking spaces more efficiently and conveniently when similar events happen in the future [18].

6.3 Digital Twins 6.3.1 Overview of Digital Twins Around 2003, the idea of digital twin appeared for the first time in Professor Grieves’ product lifecycle management course at the University of Michigan. However, the term ‘Digital Twin’ had not been formally proposed then. Grieves called this idea ‘Conceptual Ideal for PLM (Product Lifecycle Management)’. Nevertheless, the basic idea of the digital twin in this vision has been reflected, that is, the digital model constructed in the virtual space and the physical entity are interactively mapped to


6 The Merging of Knowledge Management and New Information …

faithfully describe the trajectory of the physical entity’s entire life cycle. Around 2010, the term “Digital Twin” was formally proposed in NASA’s technical report and was defined as “a system or aircraft simulation process that integrates multiple physical quantities, multiple scales, and multiple probabilities.” In 2011, the U.S. Air Force explored the application of digital twins in aircraft health management and discussed in detail the technical challenges of implementing digital twins. In 2012, NASA and the U.S. Air Force jointly published a paper on digital twins, pointing out that digital twins are one of the key technologies driving the development of future aircraft. In the next few years, more and more researches will apply digital twins to the aerospace field, including airframe design and maintenance, aircraft capability assessment, and aircraft failure prediction. The digital twin has been defined as the virtual model of a process, product, or service. As a virtual model, a digital twin functions as a bridge between the physical and digital world. As shown in Fig. 6.18, the digital twin is composed of three components, which is physical entities in the physical world, virtual models in the virtual world, and the connected data between these two worlds. A widespread concept of Digital Twins that a Digital Twins is a scale, physic, unified, stochastic simulation of an as-built system, permitted by the use of a digital thread, which employs the best accessible models (physical, behavioral, etc.), as well as the updated information to emulate the life cycle, actions, and operation of its real twin. The digital twin consists of three main components, namely: (1) physical objects in the physical world, (2) digital objects in the digital world, (3) a set of connections that connect digital and physical elements with each other. The physical world is a complex, changeable, and dynamical manufacturing atmosphere that includes the following elements: users and operators, assets, machinery, Fig. 6.18 Definition of digital twin

6.3 Digital Twins


goods, specific regulations and atmosphere. Assets summarize all the necessary elements of manufacturing, including elements related to production, data capture, and necessary software devices. All these elements have their place, but they need to be connected through an IoT system [20]. The physical world mainly consists of two elements: • Devices: The physical twins from which the digital twins are intended to be created. • Sensors: Elements that are physically connected to the devices and through which data and information are obtained; once the sensor obtains the data, it sends them to the physical world that processes them. The most widely used sensors are: programmable logic controllers (PLC), radio frequency identification, quick response codes, etc. The digital world contains two parts: • The virtual environment platform constructs an integrated 3D digital model to execute apps and on the same time allows executing actions to prove the functioning of diverse algorithms. There are numerous connections among the virtual environment platform and the digital twins; the virtual environment platform offers different necessary models to the development and performance of the digital twins. • Digital twins which mirror their physical entities life course and allow multiple operations (control, prediction, etc.). Connections between real and virtual spaces are different depending on the development methodology that each author uses.

6.3.2 Knowledge Management in Digital Twins According to Panetta in Gartner’s report: “A digital twin is a digital representation that mirrors a real-life object, process or system. Digital twins can also be linked to creating twins of larger systems, such as a power plant or city. The idea of a digital twin is not new, but today’s digital twins are different in four ways: (1) The robustness of the models, with a focus on how they support specific business outcomes, (2) The link to the real world, potentially in real time for monitoring and control, (3) The application of advanced big data analytics and AI to drive new business opportunities, (4) The ability to interact with them and evaluate “what if” scenarios. The Digital Twins Approach is based on complex cyclical flows. A complex flow occurs through an iterative series of three steps, collectively known as the physical-to-digital-to-physical (PDP) loop (shown in Fig. 6.19). There are:


6 The Merging of Knowledge Management and New Information …

Physical Twin

Establishing a digital record

Analyze and visulze

Digital Twin

Generation of movements

Fig. 6.19 The physical-to-digital-to-physical (PDP) loop and digital twins approach

• Step 1: Physical to digital: Capture information from the physical world and create a digital record from physical data; • Step 2: Digital to digital: Share information and uncover meaningful insights using advanced analytics, scenario analysis, and artificial intelligence; • Step 3: Digital to physical: Apply algorithms to translate digital-world decisions to effective data, to spur action and change in the physical world [21].

6.3.3 Applications

Knowledge Extraction and Inference Domain

A digital twin-based system which can extract and infer knowledge from large scale production line data is shown in Fig. 6.20. The system can also enhance manufacturing process management with reasoning capabilities, by introducing a semantic query mechanism [22]. The system architecture has three distinct stages: Stage 1: Data processing and feature extraction. Stage 2: Generating knowledge graph from the extracted features. Stage 3: Semantic relation extraction from the knowledge graph. In the first stage, the system processes the data and extracts features that help define the underlying variance in the data. The analysis of the data revealed that the categorical and date features could be normalized to mimic numeric feature set, and valuable insights such as feature breakdown, delayed turnaround time and such can be inferred from the data. An ontology and generated knowledge graph based on the ontology is created to maximize the probability of inferring such insights. Path

6.3 Digital Twins Data Document


Extract-Feature from Online data stream and offline data corpus

Generate Knowledge Graph from Extract-Features

Group and generalize Knowledge Graph to create inference paths

Fig. 6.20 A digital twin-based system for knowledge extracting and inferring

Ranking Algorithm is utilized for extracting semantic relation from the knowledge graph. In essence, the algorithm considers a generalized version of the knowledge graph and tries to infer relations based on paths it can traverse.

Smart City Digital Twins

Smart city digital twins encompass Milgrams virtuality continuum (shown in Fig. 6.21), which expands from complete real to complete virtual environments [i.e., Virtual Reality (VR)]. Any combination of the two is located in-between, where real and virtual objects are combined in either real [i.e., Augmented Reality (AR)] or virtual surrounding environments. Similarly, Mixed Reality (MR) refers to the merging of both real and virtual environments, in which both real and virtual objects are accessible [23]. Smart City Digital Twins are designed to make knowledge discovery from city data have the capacity for collective data exploitation by integrating a more holistic analytics and visualization approach into the real-time knowledge discovery process from heterogeneous city data. Mixed Reality(MR) Real Environment Fig. 6.21 Virtuality continuum

Augmented Reality

Virtual Reality

Virtual Environment


6 The Merging of Knowledge Management and New Information …

Knowledge Navigation Domain

A Digital Twin as Service Provider for a 4.0 Knowledge Fruition has been designed on the basis of an ad-hoc comprehensive Service Oriented Architecture (depicted in Fig. 6.22) and developed as a multi-sided cross platform. The proposed architecture consists of six component systems connected via a network, namely, Cyber-Physical Production System (CPPS), Knowledge Box, Digital Twin, Operator Field Application System, Remote Operator Application System and Middleware to Extant Legacy System. In order to achieve interoperability, each component system is both a service producer and a service consumer, which means that it can provide or request a set of heterogeneous and remotely accessible Web services implemented using RE Representation State Transfer (REST) technology. Since there are multiple service producers/consumers, the core element of the proposed architecture is the enterprise service bus. Its responsibility is to receive service requests, and its output should be sent back to consumers, based on network resources and connected services (who produces/requests what?). The first component system is CPPS, which is intended as a set of industrial assets (such as field devices, machines, equipment, and products) equipped with local intelligence (i.e. sensors, microprocessors, PLCs, or any other embedded systems). CPPS produces services because it hopes to send data related to machines, equipment, and products to the knowledge module through the enterprise service bus, but it also requests services, such as performance optimization services. The digital twin is the real-time synchronized digital reflection of CPPS. (1) It is self-evolving because its data is updated in real time, (2) Its behavior is accurately reproduced due to high-fidelity functional models, system models, physical models, manufacturing models, usage models, etc. The behavior of the physical system, Knowledge Box

Digital Twin

Legacy System


Processing Units


Information Maps

Software Applications


Enterprise Service Bus

Web Application


Web Application


Real Plant


Operator In-situ

Fig. 6.22 A service oriented digital twin-based architecture

Remote Operator

6.3 Digital Twins


(3) It is reactive because the drift between them detects the digital and physical systems and then informs the operator of the alarm, or the system will automatically provide countermeasures without stopping production. In the proposed architecture, the digital twin provides the above-mentioned services (diagnostic and condition monitoring services, prediction and decision support services, flexible control services, performance optimization services, bird’s-eye view services, what-if scenario configuration services, notification services) to other component systems [24]. However, little changes in CPPS (such as the addition of new sensors) may disrupt the functionality of the knowledge box itself and its services, and programming expertise is required to add this new data stream to the knowledge structure. The knowledge structure should register all the sensing data from CPPS according to the availability and number of sensors. In order to map all the original and processed data generated by CPPS or digital twins, we designed and implemented a flexible and configurable knowledge structure called a knowledge box. It consists of a SQL-based database, including a Web server equipped with Apache Solr Lucene, which enables the Knowledge Box to provide intelligent “4.0” knowledge navigation and voice interaction services, and the Laravel framework provides APIs for Web applications. Return-System Administrators can use it to establish a knowledge structure.

Manufacturing Domain

The manufacturing cell in intelligent manufacturing is considered to be a perfect substitute for the traditional human manufacturing system. Therefore, it should have strong learning and cognitive abilities and can support the autonomous operation of the manufacturing process, that is, the operation independent of human control. Therefore, the Digital Twin Manufacturing Cell (DTMC) is defined from a data- and knowledge-driven perspective. DTMC is defined as the smallest implementation unit for industrial enterprises to put intelligent manufacturing into practice, which is composed of a five-dimensional limited smart manufacturing space. Five-dimensional space includes physical space, digital space, data space, knowledge space and social space. The integration of these five spaces allows DTMC to have strong cognitive and learning abilities, such as self-thinking, self-decision-making, self-execution and self-improvement, thereby supporting DTMC’s autonomous operation. The following describes the specific role of each space. Physical space: It is a container that gathers manufacturing resources related to the natural flow of products in the processing sequence. Here, manufacturing resources include work in process (WIP), smart manufacturing equipment, wired or wireless sensors, and smart gateways, where sensors sense the status of WIP and smart manufacturing equipment.


6 The Merging of Knowledge Management and New Information …

Digital space: It is the container of virtual digital twin models, such as virtual workin-progress, virtual manufacturing equipment, and virtual processing. Based on the real-time manufacturing data released by the data space and the historical knowledge in the knowledge space, it can be simulated, then understood, and then predicted, and finally the performance of DTMC can be optimized. Data space: It is a container of massive real-time manufacturing data related to WIP status, manufacturing equipment status, and manufacturing environment. It is responsible for preprocessing and publishing the real-time data of physical space perception, and subscribes control orders from digital space or social space through the smart gateway. Knowledge space: It acts as the brain of DTMC by integrating dynamic knowledge bases and knowledge-based intelligent skills. Here, the dynamic knowledge base provides DTMC with the ability of self-thinking and self-improvement through automatic knowledge navigation and accumulation. Knowledge-based intelligent skills enable DTMC to have the ability to make independent decisions, and can handle various manufacturing problems in physical space, digital space, or social space. Social space: It integrates various service systems such as customer relationship management (CRM) and enterprise resource planning (ERP), and bridges the gap between DTMC’s supply and customer needs in service-oriented manufacturing. That is, the social space can recognize and analyze customer needs based on social data and historical knowledge in the dynamic knowledge base, and then generate production orders through the service system to provide manufacturing services and guide the production process of DTMC [25]. Based on the definition, a data and knowledge driven framework for DTMC is proposed (as shown in Fig. 6.23), where data is responsible for the perception of manufacturing problems and knowledge provides reliable solutions for these problems. In addition, the proposed DTMC is able to support autonomous manufacturing by three key enabling technologies, namely digital twin model, dynamic knowledge bases and knowledge-based intelligent skills. Here, digital twin model contributes to the convergence of physical space and digital space. The deep fusion of physical space and digital space facilitates the autonomous operations of DTMC by the capacities of self-execution. In other words, this model could perceive and simulate the manufacturing process based on the real-time data published by data space, while understanding, predicting and optimizing the manufacturing performance with knowledge-based intelligent skills. Dynamic knowledge bases act as the brain of DTMC, which connect physical space, digital space, data space and social space through a uniform and interoperable knowledge model. Dynamic knowledge bases equip DTMC with the capacities of self-thinking, namely automatic knowledge navigation according to manufacturing problems, and self-improving, namely continuous knowledge accumulation by extracting knowledge from data space. Knowledgebased intelligent skills are learned from historical knowledge in dynamic knowledge bases through machine learning algorithms, which equip DTMC with the capacity to deal with various manufacturing problems and produce reliable decisions.

6.3 Digital Twins


Digital space Simulation






Physical space Robot



Social space Customer Relation Management

Enterprise Resource Planning

Data space

Knowledge space

Real-time data

Prediction Model

Control order

Decision-making Model

Fig. 6.23 Data and knowledge-driven framework for DTMC

Digital twin-based product design: Digital twins are integrated multi-physics, multi-scale, probabilistic simulations of complex products, and use the best available physical models, sensor updates, etc. to reflect the life of their corresponding twins. It can map various physical data of the product to the virtual space correctly. Virtual products can reflect the entire lifecycle process of corresponding physical products. Based on the digital twin, the product design process can be divided into conceptual design, detailed design and virtual verification, as shown in Fig. 6.24. Conceptual design: Conceptual design is the first and most important step in the product design process. The designer needs to determine the future design direction Virtual Factors Digital Twin Data

Conceptual Design

Physical Factors

Fig. 6.24 Digital twin-based product design

Detailed Design

Virtual Design


6 The Merging of Knowledge Management and New Information …

of the entire product. At this stage, the designer will define the concept, aesthetics and main functions of the new product. At the same time, designers need to process various data, such as customer satisfaction, product sales, product competitiveness and many other information. These data are huge and scattered, making it difficult for designers to collect. Through the use of digital twins, various data in the physical space of the product can be integrated, and all information can be easily integrated. Designers can quickly understand the areas that need improvement by virtue of its single information source. More importantly, the digital twin is a faithful mapping of physical products, which can make the communication between customers and designers more transparent and faster by using real-time data transmission. It can make full use of customer feedback and various problems that customers have encountered when using the previous generation of products to perfectly guide the improvement of new products. Detailed design: After completing the conceptual design, the next step is the detailed design. At this stage, designers should complete the design and construction of product prototypes, as well as the development of tools and equipment for commercial production. Designers need to further refine the product design plan on the basis of the previous stage, including product function and appearance, product configuration, design parameters, test data, etc. The detailed design phase requires repeated simulation tests to ensure that the product prototype can achieve the expected performance. However, due to the lack of real-time data and environmental impact data, the effect of the simulation test is not obvious. Fortunately, digital twin technology can solve this problem well, because it exists in the entire life cycle of the physical object and can always evolve with it. It can record all the data of the product and the environmental impact. Virtual verification: The last stage is virtual verification. In the traditional model, the effectiveness and feasibility of the design plan need to be evaluated in small batches after the product design is completed. Not only will the production cycle be prolonged, it will also greatly increase the cost of time and money. If the designer chooses to use the digital twin model, the quality of any accessories will be predicted by debugging and prediction directly in the digital twin model before actual production. Digital twin-driven virtual verification can make full use of data such as previous generation equipment, environment, materials, customer physical characteristics and historical data. This method can test whether there is a design defect and find its cause, and then redesign will be quick and convenient. In addition, it can avoid tedious verification and testing, thereby greatly improving design efficiency. More importantly, digital twins can not only describe behavior, but also propose solutions related to real systems. In other words, it can provide operations and services to optimize auxiliary systems and predict physical objects based on virtual models. Therefore, by using digital twin technology, designers can create vivid simulation scenarios, effectively perform simulation tests on prototypes, and predict the actual performance of physical products as accurately as possible [26].

6.3 Digital Twins



Business Digital Twin

EIS Collaboration


Optimization Digital Twin

Quality Control Operation Management

Line Management System

Lifecycle Digital Twin

Workflow Auto-scaling

Automation Process

Automation Digital Twin

Configuration Stability

Sensors Actuators Controllers

Asset Digital Twin

Status Alerts Logs

Fig. 6.25 The digital twin framework for factory automation

The digital twin framework for factory automation: The intelligent factory is the core of intelligent manufacturing at the factory level. At present, information systems in the factory are developing toward the collaborative and intelligent direction, and they have been able to realize the overall management of enterprise resources and product data. Figure 6.25 shows the digital twin framework for factory automation. Using digital twin technology, data collected and recorded by the information system is displayed in the virtual factory to detect, analyze, control and optimize in the specific scenarios, so as to improve the performance of the physical factory. Digital twin technology is a method or tool for modelling and simulating a physical entity’s status and behavior. It can realize the interconnection and intelligent operation between the manufacturing physical space and virtual space. Through utilizing digital twin technology, all factors in the physical factory are defined accurately. Relevant information related to production activities in the physical factory are reflected and verified digitally in the virtual factory. Thus, dynamic changes can be communicated between physical units (i.e. product, plant or factory) and its virtual models with real-time feedback generated. It should be noted that the virtual factory is not independent as it connects the physical factory in real time. The factory digital twin is a higher-level virtual model based on virtual and physical convergence. As mentioned above, digital twin, as a virtual model in the virtual space is used to simulate the behavior and characteristics of the corresponding physical object in real time [27].

Workshop Digital Twin System

The framework of data construction method for workshop Digital Twin System (DTS) is designed with three modules as shown in Fig. 6.26. It can be seen that each module is not independent but related to each other. The functions realized in each module are introduced as follows. (1) Data representation module


6 The Merging of Knowledge Management and New Information … Suitable data

Top-level applications

Data organization

Data management Required data

Data customized-processing Data pre-processing

Cleaned data


Data representation Characteristic representation of application scenes Hierarchical representation of manufacturing data

Data collection Raw data



Fig. 6.26 Framework of data construction method for workshop DTS

A workshop model that consists of the hierarchical representation of manufacturing data and the characteristics representation of applications is designed in this module. The hierarchical representation part plays a guiding role for database design while the characteristics part guides customized-processing method selection. Ontology is currently applied in manufacturing fields for modeling due to its explicit, sharable and reusable features. Thus, the Ontology Web Language (OWL, used to semantically describe an ontology) recommended by World Wide Web Consortium (3 W) is selected to build the workshop model. (2) Data organization module Pre-processing strategy aims at solving common problems (missing, abnormal, duplicated, etc.) existing in manufacturing data. While customize-processing consisting of data transformation and data reduction aims at providing suitable data for applications to improve implementation efficiency. (3) Data management module Database and corresponding storage and retrieval strategy constitute this module. Considering the large amount and ever-updating data in manufacturing workshop, a column database called HBase is selected due to its characteristics of distributed storage scheme and dynamically adding columns. That means, storage space and processing efficiency won’t be limited by the performance of single computer, and database can adapt to the workshop changes without stopping [28]. For a complete data construction process, raw data is collected from the physical world and then stored in database after pre-processing. When top-level applications require corresponding data, the pre-processed data is retrieved from database according to the workshop model and then is transformed into algorithm-friendly

6.3 Digital Twins


data through customized-processing. The algorithm-friendly data is finally provided to applications of DTS for internal algorithm usage.

Digital Twin-Enabled Collaborative Design Platform for New Material Product

The development of new material product is to determine the chemical composition and process parameters of each production step to yield the final mechanical and physical properties, such as the tensile strength, yield strength, elongation, hardness and corrosion behavior. In the field of materials science, the conventional product design process includes several stages, namely, requirements collection, discovery, small batch production, property optimization, manufacturing, deployment and so on. This trial-and-error development mode is not only very time-consuming and costly, but also highly depends upon personal experience accumulated from the previous projects. Due to the multi-disciplinary and highly contextual nature of engineering design, development, testing, and manufacturing engineering or scientific teams from different institutions lack the mechanism of knowledge sharing and effective collaboration. Thus, how to quickly design or adjust the chemical composition and key processes of materials according to customer needs remains to be an urgent problem in both industry and academia. The core of DT is to use data to build a multiscale, multi-disciplinary digital mapping of product entities in the cyber domain, and realize the simulation and optimization of production through virtual and real interaction, data fusion analysis, and decision-making. The material product development platform based on DT architecture is introduced in this paper, which includes design knowledge management, product quality tracking and design optimization, and full-process product design process visualization (Fig. 6.27). As a knowledge-intensive and complex process, new product design contains a wide variety of knowledge during the entire development cycle. However, most of the domain knowledge exists in the minds of experts and cannot be coded or managed, causing the difficulty in knowledge reusing. Due to the lack of a collaboration mechanism, effective knowledge sharing cannot be formed among members in a development team either within an organization or across several organizations. Furthermore, massive and multi-source design knowledge has heterogeneity in format and semantics, which increases the difficulty for designers to obtain knowledge and further reduces the efficiency of knowledge reuse. With the development of information technologies, on the one hand, big data technology can be used to crawl and collect metallurgical specifications and general standards disclosed in the network, patents, documents, etc., and on the other hand, natural language processing and semantic representation techniques can be used to extract knowledge from historical design cases and generate the enterprise-specific knowledge base. In this study a material design knowledge management platform is established, which involves knowledge of customer demand analysis, material chemical composition, process design, design experiment, quality inspection, and product improvement analysis. Figure 6.28 shows the framework of material design knowledge management. Through the extraction


6 The Merging of Knowledge Management and New Information …

a) 3D visible model of the rolling production line

b) Equipment model - finishing mill

c) Product model - digital coil

Fig. 6.27 3D visible model of an actual rolling factory

and integration of massive irregular data and text, structured design information is generated, and then the design knowledge base is formed through the link and representation of the knowledge unit. The reuse of design knowledge can assist designers in decision-making and reasoning in new material development process. The data collection layer gathers structured and unstructured data from the manufacturing execution system, quality control system, and inspection and testing system. Data source contains market analysis, user requirements, quality design, production process parameters, inspection and testing processes, process changes, product delivery, and user tracking. At the same time, we also use crawlers to grab metallurgical specifications, general standards and other data disclosed in the Internet, patents, and literature. At the data extraction layer, word segmentation service improves the accuracy of full-text retrieval by optimizing Solr keyword retrieval; knowledge graph establishes a data basis for search association; log feedback provides users with annotations on search accuracy; spark data processing engine completes basic data cleaning and associating; the knowledge map extracts entities, relationships and attributes of the data, and completes the establishment of the graph database. The data storage layer mainly includes relational data storage, Solr index storage, professional word segmentation thesaurus and graph database storage. Due to the wide variety of material design data, the requirements for automatic data extraction and integration are relatively high. In this study we use D2RQ semantic mapping technology to transform structured data into RDF data for ontology retrieval and reasoning services. OWL-based ontology modeling technique is then utilized to represent the product design knowledge. In addition to inheriting RDF, OWL also adopts the ontology reasoning layer OIL (Ontology Inference Layer) to facilitate

6.4 Cyber Physical Systems (CPS)


Fig. 6.28 Framework of material design knowledge management

rule-based reasoning. The constituent elements such as classes, attributes, and individuals in OWL are defined as RDF resources and identified by URIs. We first formulate the D2RQ mapping rules and formalize the corresponding rules template according to the established ontology model, then call the D2RQ mapping engine on the development platform, load the ontology model and mapping templates, and establish the connection between the ontology model and the data source, so as to finally transform the actual production into design ontology.

6.4 Cyber Physical Systems (CPS) 6.4.1 Overview of Cyber-Physical Systems Cyber physical system (CPS), considered as global network infrastructure, can provide the foundations for integrating the physical manufacturing facilities and machines with the cyber world of Internet and computer applications into a single exploited and explored system that relies on sensory, communication, networking, and information processing technologies. A foundational technology for CPS is the wireless sensor networks (WSNs), which mainly use interconnected intelligent


6 The Merging of Knowledge Management and New Information …

Electric Vehicle

Smart Grid

Cyber-physical System

Smart Building

Health and Medicine

Fig. 6.29 Application of CPS

sensors to sense and monitor. Its applications include environmental monitoring, healthcare monitoring, industrial monitoring, traffic monitoring, and so on. The advances in WSN significantly contribute to the development of CPS. In addition, many other technologies and devices such as smartphones, social networks, and cloud computing are being used to support CPS [29]. Cyber-physical system (CPS) is a complex and heterogeneous system with seamlessly integrated cyber components (e.g., sensors, computers, control centers, and actuators) and physical processes involving mechanical components, human activities, and surrounding environment. CPS is capable of closely interacting with the surrounding physical environment through perception, communication, computation, and control. As shown in Fig. 6.29, there are various CPS application domains including electric vehicle (EV), smart grid, health and medicine, smart home, and advanced industries, which promise substantial economic and social benefits [30]. Typical CPS applications: • • • •

Smart Grid Smart Building Electric Vehicle Health and Medicine. Three representative design methodologies in CPS:

• Design Automation • Data Analytics • Hardware Support. A potential framework of CPS is shown in Fig. 6.30.

6.4 Cyber Physical Systems (CPS)


CPS: Integration and Interoperability Embeded System

Application Level: Smart Manufacturing,Smart Transporation,Smart Cybersecurity

Cyber Level: Computer,Internet,WLAN,Wireless Sensors

Cognition Level: Data Initiation,Collection,Storage,Analysis,Exploration

Physical Level: Physical facility,Devices,Information collection terminals Fig. 6.30 A potential framework of CPS

6.4.2 Knowledge Management in Cyber-Physical Systems Knowledge management (KM) is a series of productive iterative, life cycle, dynamic and systematic development and exploration activities and processes aimed at making information operability and reusability. In the era of rapid technological innovation and change, knowledge management is a key driving factor for value creation. Despite efforts to reflect the contribution of knowledge management to organizational learning, KM in the Industry 4.0 era has not been extensively studied. The proliferation of digital technology, the emergence of human-centered cyber-physical production systems (CPPS), autonomous, learnable and collaborative systems in smart factories, and the revival of artificial intelligence have raised questions about the theoretical basis of KM 4.0. In the context of Industry 4.0, most researches on knowledge management explore four typical areas, namely: (1) Knowledge discovery methods, that is, how to acquire knowledge, (2) Supervised, semi/unsupervised data mining and machine learning methods, that is, which procedures are suitable for obtaining knowledge accurately and accurately,


6 The Merging of Knowledge Management and New Information …

(3) The source of knowledge, that is, what types of data can be collected through wireless and sensor systems, and iv) the data management platform, that is, how the scalable data in a heterogeneous structure should be stored [31]. All in all, knowledge management is a management function responsible for regularly selecting, implementing and evaluating goal-oriented knowledge strategies, aiming to improve how an organization handles internal and external knowledge to improve organizational performance. With this in mind, the strategic and operational tasks of knowledge management have become the research focus in the digital transformation era. The concept of Knowledge 4.0 and the model of Knowledge Ladder 4.0 have been launched recently. These theories enhance the value creation in the digital knowledge economy through using digital technology for knowledge creation and sharing. Therefore, the digital society and digital knowledge economy are characterized by the digitization and intelligence of daily life and value creation. Smart interconnected products, cognitive and network systems, and artificial intelligence are changing competition, careers and education. The premise of the knowledge ladder 4.0 model is that digitization and intelligence expand the scope of knowledge from a set of discrete facts internalized by the receiver to capabilities, abilities and competitive skills, that is, knowledge 4.0. In particular, work knowledge includes the knowledge, skills, abilities, and competences (KSAC) that Industry 4.0 workers should be able to demonstrate. Recent research has proposed different types of taxonomy to classify KSAC, taking into account the various roles of humans in the manufacturing environment. The former identified four necessary categories of competence, namely technical, method, social and personal competence. The latter determines the skills required by Industry 4.0 employees, namely technical skills, technical skills, and soft skills. From a strategic point of view, KM 4.0 can be regarded as a “Dynamizer” to (1) identify the key knowledge required, for example, to build new business models and obtain future-oriented intellectual capital and knowledge assets, (2) Create meaning and common needs as the basis for action, namely decision-making or problem solving, (3) encourage innovation, active learning and reflection, and iv) build a platform to attract internal and external stakeholders. From an operational point of view, KM 4.0 is a “Stabilizer” to (1) ensure ubiquitous and organized flow of information and knowledge, (2) achieve cross-departmental cooperation, and (3) coordinate human learning and machine learning and reciprocal learning between humans and machines, that is, the co-creation of collective wisdom. From an ontological point of view, human resources and machine labor are complementary, especially considering that the ability of one person is superior or inferior to the other. However, they are epistemologically different. In view of the division of labor between humans and machines, knowledge management 4.0 (KM 4.0) must deal with two different sets of knowledge participants and related instances, namely k-holder (used to explain and store knowledge), k-producer (used to complete the current Some knowledge and creating new knowledge), k-user (used to transform knowledge into skills and test knowledge in practice, such as through

6.4 Cyber Physical Systems (CPS)


on-the-job training), k-receiver (used to select and accept knowledge before k-holder storage), And finally k-eraser (for forgetting knowledge). Each of the above roles is part of learning. Therefore, “learner” is a superior term that involves learning, relearning, and canceling learning [31]. Considering the participation of human and machine labor in performing manual or cognitive tasks, especially in shared tasks, three basic questions should be considered: (1) How to theorize the concept of knowledge participants in the human–computer environment? (2) What are the possible situations? The relationship between man and machine in a mixed environment? (3) How do humans and machines acquire knowledge and develop the collective wisdom of manufacturing enterprises? Figure 6.31 depicts the boundary system of KM 4.0. The mixture of knowledge participants combines the elements of human and machine in the acquisition and utilization of knowledge, and combines the determinants of job performance (that is, the factors that affect the participating tasks) into a new boundary system. The new boundary system is represented by a delineated but flexible boundary, that is, boundary dynamics. Therefore, it allows two groups of labor to participate in shared tasks, thereby defining a new relationship and exchange model, namely reciprocity.

Knowledge Acquisition&Utilization

Human Acquisition

Shared Acquisition

Machine Acquisition

Human Participation

Shared Participation

Machine Participation

Participation Performance

Fig. 6.31 Boundary system for KM 4.0

Technology-oriented Knowledge Management

Human-oriented Knowledge Management

Explorative and Exploitative Learning Processes


6 The Merging of Knowledge Management and New Information …

6.4.3 Applications

Cyber-Physical Production System

Knowledge in the cyber-physical production system (CPPS) development network can be formal (reports, catalogs or patents) and default, recognized, researched and developed by experts. We need to create an integrated KMS and CPPS (KMS to CPPS) method, which summarizes the importance of expert knowledge because it is very important in developing information systems. The method of KMS to CPPS should also be the social infrastructure, namely the acquisition, storage, classification and final transfer within the so-called knowledge network. In order to ensure a common understanding in the network and to establish synergy between knowledge management and project management, it is best to use the same model. The approach to KMS to CPPS is shown in Fig. 6.32 and includes the following five parts: (1) Identify the processes within the network that should be supported by the knowledge management system and CPSS system integrator, (2) Identify the experts within the network as the source of knowledge, (3) From Experts in the network obtain tacit knowledge, (4) the classification of tacit knowledge in the network, (5) the knowledge base of the network [32].

Network of knowledge

Process within the network, that should be supported by the knowledge management system and also by the system integrator of CPSS

Defining the sets of workers within a network according to to realizing processes using the algorithm k-means clustering

Determining the experts within a network using the personnel usefulness function

Acquiring tacit knowledge from experts within a network

Tacit knowledge formal representations and transferring it into a computerimplementable format

Fig. 6.32 Approach to KMS to CPPS

Tacit knowledge base for the network

6.4 Cyber Physical Systems (CPS)


Data Representation Methods in Cyber-Physical System

Traditional data representation methods are not suitable for information extraction in distributed parallel computing and cannot retain sufficient semantic information, such as the order of terms in the text and the “supernumerary-subordinate word” relationship of terms. A hierarchical graph-based distributed semantic network (DSN) for knowledge management of cyber-physical systems can solve this problem [33]. This method can represent semantic information extracted from heterogeneous data. Distributed semantic development and research are theories and methods for quantifying and classifying the semantic similarity between language items based on the distribution characteristics of language data in a large sample. DSN can be used to describe the hierarchical and distributed semantics implied by text data. It not only expresses the subject - predicate - object relationship between text terms through the phrase network, but also shows the generalization - specialization relationship in the hierarchical structure. Therefore, DSN can absorb more distributed semantic features and is suitable for information extraction in a distributed environment. DSN is composed of several conceptual layers connected by the generalization and specialization of conceptual nodes in different layers. The number of layers can be automatically determined according to the meaning of each concept. The meaning of the concept can be obtained from the hierarchical relationship of WordNet, which is a large English vocabulary database. Nouns, verbs, adjectives and adverbs are grouped into a set of cognitive synonyms, each synonym expressing a different concept. The distributed characteristics of the model are revealed through the hierarchical structure, hierarchical relationship, sequence and subject - predicate - object relationships between nodes. In DSN, all triples are distributed in different layers according to their semantic information. Within each layer, all nodes are in different positions according to the roles they play in the sentence [33]. CPS generates large amounts of data from heterogeneous sources. When capturing data with different structures, DSN can be constructed and extended in three steps: Multi-Order Semantic Analysis (MOSP) processing, DSN construction, and extensible DSN extension. The construction process of CPS knowledge management DSN is shown in Fig. 6.33.

Healthcare Cyber-Physical System

For the healthcare industry, cloud and big data are not only important technologies but also gradually become the trend of medical innovation. Today, medicine relies more on specific data collection and analysis, and medical knowledge is exploding. Therefore, medical knowledge published and shared through the cloud is prevalent in practice. Patients usually know more than doctors. Therefore, doctors can enrich and share information and knowledge bases through the cloud. Patients can also actively participate in medical activities assisted by big data. Through smartphones, cloud computing, 3D printing, gene sequencing, and wireless sensors, medical rights


6 The Merging of Knowledge Management and New Information … Multiple Order Semantic Parsing

Scalable Distributed Semantic Network

Nodes and Edges

Semantic Expansion

Lemmatization Hypernyms Extraction DSN Construction

Textual Data

Initial Distributed Semantic Network

Fig. 6.33 The construction process of DSN for knowledge management in CPS

return to patients. The role of doctors is to provide decision-support consultants for patients. The revolution of cloud and big data may substantially impact the medical industry, and the medical industry may even be transformed into a new complex ecosystem. The expanded healthcare ecosystem, including traditional roles and other newcomers, is the reason for the need to design a more appropriate healthcare system to meet the following challenges in this new revolution. (1) Unified standard multi-source heterogeneous data management. The heterogeneity of various data types and homogeneous data makes it difficult to use healthcare data. On the one hand, the system must support various healthcare equipment to ensure scalability. On the other hand, the data format should be converted in accordance with a unified standard to improve the efficiency of data storage, query, retrieval, processing and analysis. (2) Diversified data analysis module with unified programming interface. The diverse healthcare data includes structured, semi-structured and other unstructured data. According to these different data structures, appropriate methods need to be deployed for efficient online or offline analysis, such as stream processing, batch processing, iterative processing, and interactive query. Reduce system complexity and improve development and Access efficiency, a unified programming interface will be a basic component. (3) An application service platform that unifies the northbound interface. The system is expected to provide various applications and services for different roles. In order to provide usable and reliable healthcare services, the application service platform of the system is essential for resource optimization, technical support and data sharing [34]. Figure 6.34 shows the architecture of Health-CPS, which consists of three layers, namely the data collection layer, the data management layer and the application service layer.

6.5 Digital Factory


Data Collection Layer

Application Service Layer User Interface

API Hospital Node

Internet Data

Data Access

Data Collection&Processing

User-generated Content

Data Management Layer Distributed Parallel Computing

Distributed File Storage

Fig. 6.34 Health-CPS architecture

(1) Data collection layer: This layer is composed of data nodes and adapters, and provides a unified system access interface for multi-source heterogeneous data from hospitals, the Internet or user-generated content. Through the adapter, the original data of various structures and formats can be preprocessed to ensure the availability and security of data transmission to the data management layer. (2) Data management layer: This layer is composed of distributed file storage (DFS) modules and distributed parallel computing (DPC) modules. With the help of big data related technologies, DFS will improve the performance of medical systems by providing efficient data storage and I/O for heterogeneous medical data. According to the timeliness of data and the priority of analysis tasks, DPC provides corresponding processing and analysis methods. (3) Application service layer: This layer provides users with basic visual data analysis results. It also provides developers with an open and unified API, usercentric applications, and provides rich, professional, and personalized medical services [34].

6.5 Digital Factory 6.5.1 Industrial Robots Industrial robots are widely used in the industrial field of the multi-joint manipulator or multi-degree of freedom of machine devices. They have a certain degree of


6 The Merging of Knowledge Management and New Information …

automaticity and can achieve a variety of industrial processing and manufacturing functions with their own power energy and control ability. Nowadays, robotics, information technology, communication technology and artificial intelligence are further integrated with the emergence of the era of artificial intelligence. After a long electronic age and digital age, industrial robots have ushered in a new era of intelligence. With the consideration of it, a cooperative robot with the ability to work with the human is a typical representative of industrial robots being intelligent [35]. Besides mechanical structure and drive system, industrial robots are required to have the ability of sensing the environment, executing the task, and interacting with people and environment. During the process, knowledge occurs and researchers related to this field have come up with a number of approaches to utilize the knowledge. For instance, RoboBrain is a large-scale knowledge engine for robots, stored in the cloud. The knowledge in this engine includes physical interactions that robots have while performing tasks, knowledge bases from the Internet, and learned representations from several robotics research groups [36], which seems to include every conceivable source of knowledge. In addition, RoboEarth is a world wide web for robots storing in the open-source cloud [37]. As a system for robots to share knowledge, RoboEarth support for storing different types of data. The map is saved as a compressed file containing the map image and other contextual information, such as the coordinate system; robotic task descriptions are stored as human-readable action instructions in a high-level language to allow sharing and reusing across different hardware platforms; operation formulations consist of semantic representations of the skills that describe the specific functions required to execute them. Additionally, the database services provide basic learning and reasoning functions, including helping robots map high-level descriptions of motion recipes to their skills, determining what data can be safely reused on which types of robots, and so on [37]. Moreover, RoboPlanner is claimed as a pragmatic task planning framework for autonomous robots [38]. An accurate planning require knowledge to select the right behavior from a large search space, in other words, pragmatic search and selection. With structured planning and replanning framework, RoboPlanner provides detailed robotics planning, executables, and knowledge base integration. On the other hand, such an adaptive structure allows run-time fault resolution, knowledge inclusion, and replanning from the current state [38]. The three “Robo-” systems stated before may have basic abilities to support industrial robots, because they can provide a large-scale knowledge base, a platform to share knowledge between robots, and a pragmatic task planning framework. Theoretically, these three parts possess enough capabilities for industrial robots to execute tasks, but the development of relevant technologies are still required. In essence, knowledge graph is a type of knowledge representation based on graph model. Knowledge graph abstracts entities as vertices and the relationship between entities as edges, and describes knowledge in a structured form. In other words, graph database plays a role as a storage engine of knowledge graph to process massive information intelligently and form a large-scale knowledge base to support business applications.

6.5 Digital Factory


Traditional relational database is widely used but has shortcomings of low flexibility and slow query. Graph database makes up these flaws and can store various kinds of data thus being available in many fields. Yachen’s group claims a method to store and query knowledge graph based on graph database for power equipment management. The method can also compare the storage and query performance of RDF general storage format with graph database based on constructed knowledge graph of power equipment management [39]. Smidt et al. utilize graph database to model smart application development for IoT asset management with highavailability web services. They use graph theory in complex system modeling and implement a graph database for managing and maintaining connected components, emphasizing virtual and physical connections of each component [40]. Besides, Nguyen’s team uses graph database to do spatio-semantic comparison of large 3D city models in CityGML (3D CityDB database). They use graph database as the main structure to store and process the CityGML dataset, then utilize it to extend conceptual models and definitions of different types of editing operations between city models [41, 42]. Furthermore, Eva and her partners build a semantic graph database for the interoperability of 3D geographic information system data through the process of collecting all the data converted to a specific format and populating the graph database in the semantic web environment while maintaining the relationships [43]. For industrial robots, it may be helpful to perform deep learning process in simulation world to reduce the cost of low sampling efficiency and security problems in real world. Generally, 3D simulation models are often large and complex recommending the usage of database technology for an efficient data management. With the consideration of it, Martin’s group claims an approach to persistently store 3D simulation models in graph database (Neo4j), since it can process data efficiently, support traversal and other graphic operation, handle large 3D models, and have a powerful query language—Cypher. According to their approach, the 3D simulation model stored in VEROSIM (3D simulation system) and VSD (structure, property, behavior) can be mapped into Neo4j data model [40]. With this method, multiple 3D simulation models of industrial robots can be stored in graph database, resulting in a chance to explore knowledge based on the graph database and have a deeper understanding of their relationships. Knowledge graph can also be used to enhance learning ability of industrial robots. Mohak’s team introduces a model for semantic memory that allows machines to collect information and experiences to become more proficient with time [44]. Figure 6.35 shows the process of utilizing dynamic knowledge graph as semantic memory model for industrial robots. The industrial robots collect surrounding information with a vision system (camera), then use adaptive computer vision technologies to detect the objects and corresponding attributes and locations. With the semantic data, a dynamic knowledge graph can be built and updated. On the other hand, natural language commands are input through some command interfaces (keyboard), then related NLP technologies are used to divide the commands into parts to find actuator, destination and behavior. Based on an adequate understanding of the instruction, the dynamic knowledge graph prepared before can be queried to find specific data and

276 Input image/video

6 The Merging of Knowledge Management and New Information … Detection of object/attribute/location

Dynamic knowledge graph


robot “Robot, pick the ball.”


interact query


graph database (query) other database (data storage)

Input command

Command division

2-layer database

Fig. 6.35 The process of using dynamic knowledge graph to enhance the learning ability of industrial robots

Table 6.1 Summary of related work

Platform of robot knowledge

RoboBrain (2015) RoboEarth (2011) RoboPlanner (2019)

Graph database application

Power equipment management (2020) IoT asset management (2018) Large 3D city models (2017/2020) 3D GIS data (2020) 3D simulation models (2020)

Knowledge graph utilization

Dynamic knowledge graph for industrial robots (2021)

execute the actuator. A two-layer database can be used which combines a quaryable graph database with another database (e.g., MongoDB) to store specific data of the corresponding object in knowledge graph [39]. This dynamic knowledge graph can be stored in the cloud so that every robot with a permission can access it, resulting in knowledge sharing and update. As shown in Table 6.1, the utilization of knowledge in robot field has been brought into focus and graph database has played an important role in many research fields in recent years. More researches and attention should be attracted to improve the ability of processing and utilizing knowledge for industrial robots.

6.5.2 Intelligent Assembly of Industrial Robots The deepening of globalization and collaboration are remarkable features of today’s world. Customized products and services with low cost, high equality and short

6.5 Digital Factory


production cycle are the core competence of the modern manufacturing enterprise. Assembly is one of the most important processes in product development and manufacturing, which accounts for more than 40% of the production time and influences the ultimate quality of products to a great extent. The assembly process of complex products involves the step of sequence planning, assembly resources allocating and path planning, each of which requires the collaboration and cooperation of distributed heterogeneous manufacturing resources such as manipulators, AGVs, various tooling and fixtures, etc. Most of these activities are knowledge-intensive and highly dependent upon personal experience accumulated from previous projects over a long period. On the one hand, the increase in product complexity and customization drives the need for a more flexible and efficient assembly approach. On the other hand, the operating status of different devices changes over time. Thus, how to plan the assembly process intelligently according to the dynamic product and environment information has become an urgent problem to be addressed. With the boosting of the information and communication technologies (ICT) revolution, manufacturing is developing towards digitalization, virtualization and intelligence. As a typical representative of modern high-tech, cyber physical system (CPS) combines the sensing, communication and controlling capabilities, and provides a solution to intelligent assembly. By applying the Internet of Things (IoT) technology, real-time information of the manufacturing system can be automatically monitored and supplementary decision information is offered based on the fusion of wireless sensors.

Intelligent Simulation System of Industrial Robot Based on CPS

The software and hardware resources in CPS are distributed and real-time, so it is possible to build an intelligent simulation architecture of industrial robots based on CPS. It can integrate physical systems such as industrial robots and their operating environments through industrial links and can simulate physical systems through virtual modeling. It also integrates intelligent learning algorithms such as visionbased target detection algorithms, path planning algorithms, virtual and real mapping motion control algorithms. This system shown in Fig. 6.36 mainly includes the physical equipment layer, the information transmission layer, the intelligent simulation layer, and the information platform layer. The physical device layer is mainly composed of industrial robots, various sensors for perception, and PLC hardware links. This part is the source of data transmission between systems. The real-time dynamic data stream generated by the industrial control site, such as PLC (programming logic controller) data, electronic control data, and industrial robot data, as the data source, needs to be imported into the control system in real-time, and at the same time, the data is transmitted to the simulation system for algorithm analysis and process management. After the simulation environment gets the data, it needs to be preprocessed in the simulation system before it can be provided to the intelligent algorithm.


6 The Merging of Knowledge Management and New Information … Simulation data interaction

Industrial control

Management system Industrial control system

field data interaction

Industrial field Industrial robot

socket communication

Industrial robot simulation system

Artificial intelligence system

Fig. 6.36 Framework of industrial robot intelligent simulation system

The information transmission layer supports real-time data transmission. For example, the motion trajectory data of industrial robots are transmitted to the industrial control system through PLC communication and the upper computer crossing port. The data of the industrial control system can interact with the simulation system through the network, and the simulation system communicates with the artificial intelligence system through a Socket service for data transmission. The simulation results produced by the artificial intelligence and simulation system are used as control information to be transmitted from the serial port of the host computer to the industrial robot and PLC control module in JSON format. Intelligent simulation layer: In terms of artificial intelligence algorithm requirements, the usual artificial intelligence data preprocessing steps include data feature preprocessing and data dimensionality reduction. After the collected data is ETL (extracted, transformed, loaded), structured data is mainly used to meet the data requirements of the management system, while most artificial intelligence data samples are unstructured. Such samples can be batch processed, interactive analyzed, or data stream processing for data mining. Processing data is also an important prerequisite for all artificial intelligence algorithms to achieve desired results. The data fusion of algorithms and information platforms requires that data that is available, effective and covers the required characteristics can be introduced into the platform. This is the crucial factor in whether artificial intelligence can be realized. Intelligent algorithm: Artificial intelligence algorithm can optimize the simulation model and motion trajectory of industrial robot. For example, reinforcement learning can be used to train robot actions in a simulation environment, find the best motion path planning, and then generate control data of robot motion trajectories, which are used to control the motion of industrial robots, thereby realizing intelligent control of industrial robots. Information platform layer: The data form of the control system is different from that of the simulation system. Take the realization of industrial robot control data and simulation and data as an example, the data collected from the robot field is the coordinate and posture information of the robot’s motion trajectory. These data are inconsistent in the simulation system and the actual robot control metrics. Therefore, it is necessary to perform normalization processing after inputting data, so that the

6.5 Digital Factory


data can be used on other platforms. At the same time, it is also necessary to remove the coordinate and posture data of the unreachable position of the robot due to the multiple solutions of the robot motion matrix equation. Information Management: Manage the processing and import of data. The user database mainly contains relational data. Therefore, relational databases such as MySql can be used to ETL (extract, transform, load) the collected data and provide it for data mining. After the demand analysis of the above information chain, the system’s requirements for the overall framework can be obtained. This framework realizes the overall linkage of industrial robots, industrial control sites, and simulation operations, achieving the overall unified linkage and data consistency of the digital twin. The trajectory optimization of industrial robot motion is the result of artificial intelligence algorithm calculation. Finally, the intelligent simulation of industrial robot can be realized. The CPS framework has the following three functions in the simulation of industrial robots: (1) Corresponding to the realization of the perception part of the physical model in the CPS, the system perception ability is realized. The information needed in the system is collected by sensors and various other channels through the CPS bus structure and provided to the control decision system. (2) Different functional modules can be dispersed into a distributed structure, and loose coupling can flexibly allocate software and hardware performance as needed, which relieves local resource restrictions and achieves the best allocation of resources and the expansion of overall performance. (3) The standardization of virtual and real simulation functions is realized. The power of standardization is not reflected in a single system. If the simulation system is simply realized, it does not need to be built on CPS. Following the CPS architecture will break isolated island mode of the simulation system of industrial robots, making it connect smoothly with other systems. When the company’s production tasks change, the simulation system can be quickly customized to adapt to new business needs. The simulation system based on CPS is more suitable for the integration of large-scale unified platforms, laying the foundation for future standardized expansion.

Cloud CPS-Based Intelligent Assembly System

When the number of assembled parts is large, and the assembly operations are complicated, too much calculation will cause the efficiency of these assembly sequence planning algorithms to be very low, which brings great limitations to the actual application of the assembly planning system. The representation and reuse of manufacturing knowledge can alleviate this problem to a certain extent. We developed a cloud CPS system in which multidisciplinary assembly knowledge is represented and reused for the selection of appropriate planning strategies based on different assembly tasks and dynamic environment [45].


6 The Merging of Knowledge Management and New Information …

Execution layer

Physical Domain Assembly Robot


NC equipment



Assembly commands Model layer

Manufacturing Service Cloud Platform Assembly simulation

Solution layer

Knowledge layer

Perception layer

Product Model

Command generation

Resource scheduling

Trajectory planning

Sequence planning

Assembly knowledge

Case Base

Environment context

Laser sensor

Image sensor

Location sensor

Cyber Domain

Force sensor



Physical Domain

Fig. 6.37 Framework of the cloud CPS-based intelligent assembly system

As shown in Fig. 6.37, the cloud CPS-based intelligent assembly system consists of a physical domain and a cyber domain. The physical domain includes a perception layer and an execution layer, while the cyber domain includes the model layer, the knowledge layer, and the solution layer. Physical domain: The perception and execution layers consist of several measuring devices, feedback devices, and assembly devices with communication, sensing, and instruction execution capabilities. The sensing and measuring device monitors the state attribute information and sends the measurement data to the cyber domain in time. The assembly device completes the corresponding assembly action according to the instruction sent by the system, such as handling, grasping, screwing, etc. The feedback device performs real-time sensing and information feedback during the assembly process. Cyber domain: The cyber domain includes a model library, a knowledge base and an algorithm library. The model library maps various manufacturing resources to the cloud through service encapsulation and service modeling. The knowledge base is a case library consisting of product models, assembly processes, and assembly rules. The algorithm library is responsible for processing the mapping algorithm from assembly tasks to device instructions, which involves a few steps like sequence planning, track planning and resource scheduling. The physical domain collects sensory information from the real world and uploads it to the cyber domain. The perceived data and assembly tasks together form the assembly context. The corresponding assembly process plan is formed through the matching of assembly context and knowledge reasoning, and the scheduled manufacturing resources are obtained. The algorithm library sequentially generates corresponding instructions and scheduling schemes to drive the operation of the assembly



device in the physical world. During the real-time operation of the assembly equipment, the feedback data is obtained from the monitoring devices and interacts with the simulation results. This offline pre-assembly and CPS online simulation control method further ensures smooth assembly.

References 1. Ghobakhloo, M. (2020). Industry 4.0, digitization, and opportunities for sustainability. Journal of Cleaner Production, 252, 119869. 2. Capestro, M., & Kinkel, S. (2020). Industry 4.0 and knowledge management: A review of empirical studies. Knowledge Management and Industry 4.0, 19–52. 3. Cárdenas, L. A., Ramírez, W., & Rodríguez Molano, J. I. (2018, June). Model for the incorporation of big data in knowledge management oriented to industry 4.0. In International Conference on Data Mining and Big Data (pp. 683–693). 4. Manesh, M. F., Pellegrini, M. M., Marzi, G., & Dabic, M. (2020). Knowledge management in the fourth industrial revolution: Mapping the literature and scoping future avenues. IEEE Transactions on Engineering Management, 68(1), 289–300. 5. Ferraris, A., Mazzoleni, A., Devalle, A., & Couturier, J. (2018). Big data analytics capabilities and knowledge management: impact on firm performance. Management Decision. 6. Sagiroglu, S., & Sinanc, D. (2013, May). Big data: A review. In 2013 International Conference on Collaboration Technologies and Systems (CTS) (pp. 42–47). 7. Hijazi, S. (2017). Big data and knowledge management: A possible course to combine them together. Association Supporting Computer Users in Education. 8. Pauleen, D. J., & Wang, W. Y. (2017). Does big data mean big knowledge? KM perspectives on big data and analytics. Journal of Knowledge Management. 9. Sumbal, M. S., Tsui, E., & See-to, E. W. (2017). Interrelationship between big data and knowledge management: an exploratory study in the oil and gas sector. Journal of Knowledge Management. 10. Wang, S., & Wang, H. (2020). Big data for small and medium-sized enterprises (SME): A knowledge management model. Journal of Knowledge Management. 11. Pauleen, D. J. (2017). Davenport and Prusak on KM and big data/analytics: Interview with David J. Pauleen. Journal of Knowledge Management. 12. Secundo, G., Del Vecchio, P., Dumay, J., & Passiante, G. (2017). Intellectual capital in the age of big data: establishing a research agenda. Journal of Intellectual Capital. 13. LaValle, S., Lesser, E., Shockley, R., Hopkins, M. S., & Kruschwitz, N. (2011). Big data, analytics and the path from insights to value. MIT Sloan Management Review, 52(2), 21–32. 14. Labrinidis, A., & Jagadish, H. V. (2012). Challenges and opportunities with big data. Proceedings of the VLDB Endowment, 5(12), 2032–2033. 15. Lambrou, M. A. (2016). Innovation capability, knowledge management and big data technology: A maritime business case. Technology. 16. Rayes, A., & Salam, S. (2019). Internet of things from hype to reality (pp. 1–35). Springer International Publishing. 17. Rot, A., & Sobinska, M. (2018). The potential of the Internet of Things in knowledge management system. In FedCSIS (Position Papers) (pp. 63–68). 18. Uden, L., & He, W. (2017). How the Internet of Things can help knowledge management: A case study from the automotive domain. Journal of Knowledge Management. 19. Santoro, G., Vrontis, D., Thrassou, A., & Dezi, L. (2018). The Internet of Things: Building a knowledge management system for open innovation and knowledge management capacity. Technological forecasting and social change, 136, 347–354. 20. Juarez, M., Botti, V., & Giret, A. (2021). Digital twins: Review and challenges. Journal of Computing and Information Science in Engineering, 21(3).


6 The Merging of Knowledge Management and New Information …

21. Kaivo-oja, J., Knudsen, M., Lauraeus, T., & Kuusi, O. (2020). Future knowledge management challenges: Digital twins approach and synergy measurements. Management, 8(2), 99–109. 22. Banerjee, A., Dalal, R., Mittal, S., & Joshi, K. (2017). Generating digital twin models using knowledge graphs for industrial production lines. UMBC Information Systems Department. 23. Mohammadi, N., & Taylor, J. (2020). Knowledge discovery in smart city digital twins. In Proceedings of the 53rd Hawaii International Conference on System Sciences. 24. Padovano, A., Longo, F., Nicoletti, L., & Mirabelli, G. (2018). A digital twin based service oriented application for a 4.0 knowledge navigation in the smart factory. IFAC-PapersOnLine, 51(11), 631–636. 25. Zhang, C., Zhou, G., He, J., Li, Z., & Cheng, W. (2019). A data-and knowledge-driven framework for digital twin manufacturing cell. Procedia CIRP, 83, 345–350. 26. Tao, F., Cheng, J., Qi, Q., Zhang, M., Zhang, H., & Sui, F. (2018). Digital twin-driven product design, manufacturing and service with big data. The International Journal of Advanced Manufacturing Technology, 94(9), 3563–3576. 27. Bao, J., Guo, D., Li, J., & Zhang, J. (2019). The modelling and operations for the digital twin in the context of manufacturing. Enterprise Information Systems, 13(4), 534–556. 28. Kong, T., Hu, T., Zhou, T., & Ye, Y. (2021). Data construction method for the applications of workshop digital twin system. Journal of Manufacturing Systems, 58, 323–328. 29. Lu, Y. (2017). Cyber physical system (CPS)-based industry 4.0: A survey. Journal of Industrial Integration and Management, 2(03), 1750014. 30. Someswara Rao, C., Shiva Shankar, R., & Murthy, K. (2020). Cyber-physical system—An overview. Smart Intelligent Computing and Applications, 489–497. 31. Ansari, F. (2019). Knowledge management 4.0: theoretical and practical considerations in cyber physical production systems. IFAC-PapersOnLine, 52(13), 1597–1602. 32. Patalas-Maliszewska, J., & Schlueter, N. (2019). Model of a knowledge management for system integrator (s) of cyber-physical production systems (CPPS). International Scientific-Technical Conference Manufacturing (pp. 92–103). Springer. 33. Song, S., Lin, Y., Guo, B., Di, Q., & Lv, R. (2018). Scalable distributed semantic network for knowledge management in cyber physical system. Journal of Parallel and Distributed Computing, 118, 22–33. 34. Zhang, Y., Qiu, M., Tsai, C., Hassan, M. M., & Alamri, A. (2015). Health-CPS: Healthcare cyber-physical system assisted by cloud and big data. IEEE Systems Journal, 11(1), 88–95. 35. Wang, T. M., Tao, Y., & Liu, H. (2018). Current researches and future development trend of intelligent robot: A review. International Journal of Automation and Computing, 15(5), 525–546. 36. Saxena, A., Jain, A., Sener, O., Jami, A., Misra, D., & Koppula, H. (2014). Robobrain: Largescale knowledge engine for robots. arXiv preprint arXiv:1412.0691 37. Waibel, M., Beetz, M., Civera, J., Andrea, R., Elfring, J., Galvez-Lopez, D., & Van De Molengraft, R. (2011). Roboearth. IEEE Robotics & Automation Magazine, 18(2), 69–82. 38. Kattepur, A. (2019). RoboPlanner: Autonomous robotic action planning via knowledge graph queries. In Proceedings of the 34th ACM/SIGAPP Symposium on Applied Computing (pp. 953– 956). 39. Fourie, D., Claassens, S., Pillai, S., Mata, R., & Leonard, J. (2017, May). Slamindb: Centralized graph databases for mobile robotics. In 2017 IEEE International Conference on Robotics and Automation (ICRA) (pp. 6331–6337). 40. Hoppen, M., Rossmann, J., & Hiester, S. (2016). Managing 3D simulation models with the graph database Neo4j. DBKDA, 2016, 88. 41. Nguyen, S. H., Yao, Z., & Kolbe, T. (2017). Spatio-semantic comparison of large 3D city models in CityGML using a graph database. In Proceedings of the 12th International 3D GeoInfo Conference 2017 (pp. 99–106). 42. Nguyen, S., & Kolbe, T. (2020). A multi-perspective approach to interpreting spatio-semantic changes of large 3D city models in CityGML using a graph database. ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, 6, 143–150.



43. Malinverni, E., Naticchia, B., Lerma Garcia, J., Gorreja, A., Lopez Uriarte, J., & Di Stefano, F. (2020). A semantic graph database for the interoperability of 3D GIS data. Applied Geomatics, 1–14. 44. Sukhwani, M., Duggal, V., & Zahrai, S. (2021). Dynamic knowledge graphs as semantic memory model for industrial robots. arXiv preprint arXiv:2101.01099 45. Peng, G., Wang, H., & Zhang, H. (2019). Knowledge-based intelligent assembly of complex products in a cloud CPS-based system. In 2019 IEEE 23rd International Conference on Computer Supported Cooperative Work in Design (CSCWD) (pp. 135–139).