Analytics for Business Decisions [1 ed.] 9789813299498

459 108 9MB

English Pages [212] Year 2022

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Analytics for Business Decisions [1 ed.]
 9789813299498

Table of contents :
Cover
Guest editorial: analytics for business decisions
A systematic literature review of data science, data analytics and machine learning applied to healthcare engineering systems
How do mid-level managersexperience data science disruptions? An in-depth inquiry through interpretative phenomenological analysis (IPA)
The dual drivetrain model of digital transformation: role of industrial big-data-based affordance
Antecedents to firm performance and competitiveness using the lens of big data analytics: a cross-cultural study
Analysing the voice of customers by a hybrid fuzzy decision-making approach in a developing Country's automotive market
Does service failure criticality affect global Traveller's service evaluations? An empirical analysis of online reviews
Impact of wholesale price discrimination by the manufacturer on the profit of supply chain members
Analytics of machine replacement decisions: economic life vs real options
Green innovation as a mediator in the impact of business analytics and environmental orientation on green competitive advantage

Citation preview

ISSN 0025-1747 Volume 60 Number 2 2022

Management Decision Analytics for Business Decisions Guest Editors: Manish Gupta, Weiguo Fan and Aviral Kumar Tiwari

Analytics for business decisions Introduction Analytics is increasingly gaining popularity among practitioners and academics (Amankwah-Amoah and Adomako, 2019; Law and Chung, 2020). It is primarily because of the role of analytics in enhancing the efficiency and effectiveness of the businesses considerably (Singh and Del Giudice, 2019). Usage of analytics makes it easy to carry-out the four basic functions of management including planning, controlling, organizing and directing (Fosso Wamba and Akter, 2019). Organizations typically collect data on several parameters and store them for political, economic, social, technological, legal and environmental purposes in the form of huge databases. Business analytics helps organizations analyze these data and derive meaning out of it. Such data-driven and evidence-based results have positive consequences for organizations. However, experts suggest that there are several challenges in using business analytics including human resource issues such as adaptability of employees, marketing management issues such as reliability and validity of market segmentation, financial management issues such as high initial investment for long-term return on investment, operational issues such as quantifying all the activities, and information systems such as understanding the technical know-how (Hamilton and Sodeman, 2020). Moreover, the ever changing macro-level factors external to the organization also have a role in the extent to which analytics is used in the businesses. Thus, it is important to know the ways in which analytics affects and gets affected by several micro- and macro-level factors.

Analytics of business decisions

297

Objective of the special issue Analytics in businesses does not work in isolation and does impact the business outcomes significantly (Aydiner et al., 2019). As the research relating to the antecedents and consequences of using analytics in business is very limited especially in the management functions, the main objective of this special issue is to the analytical factors that influence decision-making in businesses. The sub-objectives however are: (1) To understand the factors affecting the usage of analytics in businesses (2) To explore the decisional implications of using analytics in businesses (3) To critically examine the process of using analytics in a particular company or an industry. (4) To capture the dynamics of variables in the field of marketing, finance, human resource, organizational behavior, operations and information systems on introducing analytics in an organization. Articles in this special issue This special issue comprises nine articles from different fields of Business Management such as Operations Management, Organizational Behavior, Information Systems Management and Marketing Management to name a few. The articles provide rich insights into how business analytics affects a firm’s performance-related variables, how analytics may be used to assess the impact on firm performance by its antecedents, and so forth. The articles have

Management Decision Vol. 60 No. 2, 2022 pp. 297-299 © Emerald Publishing Limited 0025-1747 DOI 10.1108/MD-02-2022-174

MD 60,2

298

been arranged to first give a comprehensive view of the data science-related research thus far followed by experiences of managers with analytics, and then context-specific studies as examples of using data analytics for better managerial decision-making. The subsequent paragraphs summarize this special issue. The first article titled “A systematic literature review of data science, data analytics and machine learning applied to healthcare engineering systems” is authored by Roberto SalazarReyna, Fernando Gonzalez-Aleu, Edgar M.A. Granda-Gutierrez, Jenny Diaz-Ramirez, Jose Arturo Garza-Reyes and Anil Kumar. The authors used a systematic procedure to explore and assess available articles for knowing the characteristics of the analytics area and in business decision-making. The second article titled “How do mid-level managers experience data science disruptions? An in-depth inquiry through interpretative phenomenological analysis (IPA)” is authored by Atri Sengupta, Shashank Mittal and Kuchi Sanchita The authors did a qualitative analysis to capture the data science disruptions as experienced by the large scale Indian organizations’ mid-level managers. The study revealed several interesting insights such as two emergent person–job (mis)fit process models. The third article titled “The dual drivetrain model of digital transformation: role of industrial big-data-based affordance” is authored by Yi Liu, Wei Wang and Zuopeng (Justin) Zhang. The authors, using a case study approach in China, investigated the role of industrial big data in promoting digital transformation. As an outcome of their study, the authors proposed a drivetrain model of digital transformation by industrial big data. The fourth article titled “Antecedents to firm performance and competitiveness using the lens of big data analytics: a cross-cultural study” is authored by Abhishek Behl. Using a quantitative approach and collecting data from Indian and Chinese start-ups, the authors examined the ways in which big data analytics capabilities of tech start-ups impacts their competitive advantage and performance. The fifth article titled “Analysing the voice of customers by a hybrid fuzzy decisionmaking approach in a developing country’s automotive market” is authored by Hannan Amoozad Mahdiraji, Khalid Hafeez, Hamidreza Kord and AliAsghar Abbasi Kamardi. In that, the authors proposed a new method, a hybrid clustering multicriteria decision-making (MCDM) approach, to find ways of approaching multiple decision-making which involves a large set of data. The authors demonstrated it in the context of customer complaints in the Iranian automotive sector. The sixth article titled “Does service failure criticality affect global travellers’ service evaluations? An empirical analysis of online reviews” is authored by Rishi Dwesar and Debajani Sahoo. The authors applied a mixed-method research design to investigate the breadth and depth of the impact of airline type, failure criticality and the traveler’s culture on travelers’ airline evaluations of service failure for 20 major airlines globally. The seventh article titled “Impact of wholesale price discrimination by the manufacturer on the profit of supply chain members” is authored by Rofin T.M. and Biswajit Mahanty. The authors developed game-theoretic models to find out the influence of wholesale price discrimination by a manufacturer in a retailer–e-tailer dual-channel supply chain for diverse product categories. The eighth article titled “Analytics of machine replacement decisions: economic life vs real options” was authored by Yuri Yatsenko and Natali Hritonenko. The authors used several data analytic tools to overcome the shortcomings in making rational machine replacement decisions. The ninth article titled “Green innovation as a mediator in the impact of business analytics and environmental orientation on green competitive advantage” is authored by Hashim Zameer, Ying Wang, Humaira Yasmeen and Shujaat Mubarak. The authors, in this

study, used structural equation modeling to know the role of business analytics and environmental orientation in affecting green innovation and green competitive advantage. Conclusion The articles in this special issue, as one may observe, either applied rigorous analytics to reduce biases or offered new ways of using business analytics to strengthen the decisionmaking in businesses. The findings of the articles are expected to augment the academic research by providing evidence for possibilities of using business analytics as a tool to examine managerial issues and advance existing models. Some of the articles used case studies as an example to demonstrate how the management can take informed decisions if the models and the factors suggested are taken into consideration. Manish Gupta School of Management, Mahindra University, Hyderabad, India Weiguo Fan Department of Business Analytics, The University of Iowa Tippie College of Business, Iowa City, Iowa, USA, and Aviral Kumar Tiwari Finance and Economics, Rajagiri Business School, Kochi, India

References Amankwah-Amoah, J. and Adomako, S. (2019), “Big data analytics and business failures in data-Rich environments: an organizing framework”, Computers in Industry, Vol. 105, pp. 204-212. Aydiner, A.S., Tatoglu, E., Bayraktar, E., Zaim, S. and Delen, D. (2019), “Business analytics and firm performance: the mediating role of business process performance”, Journal of Business Research, Vol. 96, pp. 228-237. Fosso Wamba, S. and Akter, S. (2019), “Understanding supply chain analytics capabilities and agility for data-rich environments”, International Journal of Operations & Production Management, Vol. 39 Nos 6/7/8, pp. 887-912. Hamilton, R.H. and Sodeman, W.A. (2020), “The questions we ask: opportunities and challenges for using big data analytics to strategically manage human capital resources”, Business Horizons, Vol. 63 No. 1, pp. 85-95. Law, K.S. and Chung, F.L. (2020), “Knowledge-driven decision analytics for commercial banking”, Journal of Management Analytics, Vol. 7 No. 2, pp. 209-230. Singh, S.K. and Del Giudice, M. (2019), “Big data analytics, dynamic capabilities and firm performance”, Management Decision, Vol. 57 No. 8, pp. 1729-1733.

Analytics of business decisions

299

The current issue and full text archive of this journal is available on Emerald Insight at: https://www.emerald.com/insight/0025-1747.htm

MD 60,2

300 Received 10 January 2020 Revised 8 April 2020 2 May 2020 Accepted 14 May 2020

A systematic literature review of data science, data analytics and machine learning applied to healthcare engineering systems Roberto Salazar-Reyna and Fernando Gonzalez-Aleu Department of Engineering, Universidad de Monterrey, San Pedro Garza Garcia, Mexico

Edgar M.A. Granda-Gutierrez Graduate School of Engineering and Technology, Universidad de Monterrey, San Pedro Garza Garcia, Mexico

Jenny Diaz-Ramirez Department of Engineering, Universidad de Monterrey, San Pedro Garza Garcia, Mexico

Jose Arturo Garza-Reyes Centre for Supply Chain Improvement, University of Derby, Derby, UK, and

Anil Kumar Guildhall School of Business and Law, London Metropolitan University, London, UK Abstract Purpose – The objective of this paper is to assess and synthesize the published literature related to the application of data analytics, big data, data mining and machine learning to healthcare engineering systems. Design/methodology/approach – A systematic literature review (SLR) was conducted to obtain the most relevant papers related to the research study from three different platforms: EBSCOhost, ProQuest and Scopus. The literature was assessed and synthesized, conducting analysis associated with the publications, authors and content. Findings – From the SLR, 576 publications were identified and analyzed. The research area seems to show the characteristics of a growing field with new research areas evolving and applications being explored. In addition, the main authors and collaboration groups publishing in this research area were identified throughout a social network analysis. This could lead new and current authors to identify researchers with common interests on the field. Research limitations/implications – The use of the SLR methodology does not guarantee that all relevant publications related to the research are covered and analyzed. However, the authors’ previous knowledge and the nature of the publications were used to select different platforms. Originality/value – To the best of the authors’ knowledge, this paper represents the most comprehensive literature-based study on the fields of data analytics, big data, data mining and machine learning applied to healthcare engineering systems. Keywords Data analytics, Big data, Machine learning, Healthcare systems, Systematic literature review Paper type Literature review

Management Decision Vol. 60 No. 2, 2022 pp. 300-319 © Emerald Publishing Limited 0025-1747 DOI 10.1108/MD-01-2020-0035

1. Introduction Data science is a “set of fundamental principles that support and guide the principled extraction of information and knowledge from data” (Provost and Fawcett, 2013). It involves the use and development of algorithms, processes, methodologies and techniques for understanding past, present and future phenomena through the analysis of data to improve decision-making. Data scientists and data analytics must be able to view business problems from a data perspective to be able to leverage the benefits of its application on the organization. The healthcare industry is one of the world’s largest, most critical and fastest-growing industries that is evolving through significant challenges in recent times (Nambiar et al., 2013).

It is considered as a data-driven industry and has historically generated a large amount of data, driven by record keeping, compliance and regulatory requirements and patient care (Raghupathi and Raghupathi, 2014). However, according to a report from the Institute of Medicine, the healthcare industry is considered a highly inefficient industry, where one-third of its expenditures are wasted and do not contribute to better quality outcomes. While the healthcare system continues to apply industrial and systems engineering tools to achieve an effective coordinated system, data analytics have the potential to improve care, save lives and lower costs by identifying associations and understanding trends and patterns within the data. Data science has several areas and disciplines within itself; thus, there is no universal agreement in the literature regarding its components and interactions. Winters (2015) developed a Venn diagram to visualize the three main fields of data science (i.e. data analytics, big data and algorithms) and their intersections (i.e. data mining, machine learning and software tools) based on a two-axis diagram (i.e. on the x-axis: experimental versus theoretical; on the y-axis: descriptive versus prescriptive). On the other hand, Emmert-Streib et al. (2016) developed a schematic visualization (i.e. Efron-triangle) of the main fields constituting data science (i.e. domain knowledge, statistics/mathematics and computer science) and their intersections (i.e. machine learning, biostatistics and data engineering) based on the original data science Veen diagram created by Conway (2013). Taking into consideration the significant role data science can take to achieve better outcomes in healthcare systems, it would be relevant to understand to what extent each field/area has been applied and its maturity state, in healthcare systems, along with the authors researching that field/area. Therefore, the purpose of this study is to assess and synthesize the published literature related to the impact, benefits, implications, challenges, opportunities and trends of data science exclusively in healthcare systems. To achieve this aim, the authors used a SLR as the research methodology. SLRs focus on the published literature of a specific research field by identifying, evaluating and integrating the findings of all relevant studies that address a set of research questions, while being objective, systematic, transparent and replicable. However, for highly relevant publications to be identifiable, they must be indexed in targeted platforms/databases (Lefebvre et al., 2011). To ensure this, the authors have strategically selected platforms that contained medical databases to provide adequate coverage of the research area and designed a search strategy that allowed the capture of as many significant publications as possible. After the final set of publications was obtained for this study, three different dimensions were assessed and evaluated to synthesize information, i.e. publication characteristics, authors’ characteristics and content characteristics. These were identified based on preliminary work that defined relevant criteria to assess the maturity of a research area (Keathley et al., 2013). The publication characteristics analyses included an examination of the publication trends over time as well as the characteristics of the publications’ sources associated with the final paper set, which in this case were primarily academic journals, given the nature of the publication set. The authors’ characteristics examination included an investigation of author quantities and author collaborations among them through social network analyses to identify predominant authors and research groups. Investigation of content characteristics, for this work’s purpose, refers to analyze the scope in which the areas/ fields within data science (e.g. data analytics, machine learning and data mining) have been addressed in healthcare systems, in which medical areas/departments and to treat which diseases/disorders. To address this, a social network analysis was conducted. Thus, the research questions addressed in this study are: (1) Publications characteristics: RQ1. Which trend exists in publication pattern overtime for this research area?

Healthcare engineering systems

301

MD 60,2

RQ2. What type of sources is publishing the works? RQ3. Which are the sources with the highest frequency of published works in the field? RQ4. Which are the main study fields from the sources publishing the works? (2) Authors’ characteristics:

302

RQ5. How many authors are contributing to this area? RQ6. To what extent are new authors contributing? RQ7. To what extent are authors collaborating between them in this research area? RQ8. What is the distribution of the number of authors per publication? (3) Content characteristics: RQ9. Which are the most frequently mentioned data science fields applied to healthcare systems? RQ10. Which are the top medical areas/departments where data science has been studied and applied? RQ11. Which are the top diseases/disorders being addressed through data science approaches? RQ12. Which are the main study approaches on the theoretical publications set? RQ13. Which are the main application objectives on the case study publications set? RQ14. Which are the newly emerging research lines related to this research area? The rest of the paper is divided into three main sections: the research methodology (i.e. SLR conduction) is presented in Section 2; the results of the study (publication characteristics, authors’ characteristics and content characteristics) are included in Section 3 and Section 4 presents the conclusions and future research directions. 2. Research methodology PRISMA (preferred reporting items for systematic reviews and meta-analysis) is a wellrecognized research methodology in the medical field; it uses four steps (Moher et al., 2010), namely: identification, screening, eligibility and included. This research method is often used in meta-analyses. On the other hand, the systematic literature review approach proposed by Keathley et al. (2016), based on Tranfield et al. (2003) and the Cochrane Handbook (Higgins and Green, 2011; Lefebvre et al., 2011), has been used in bibliometric and/or scientometric analyses. Keathley et al. (2016) follows seven steps: (1) Problem definition: the research area is identified and the research objectives defined. (2) Scoping study: the desired scope of the study is established and the research team conducts a “traditional” literature review to identify relevant publications related to the research area. (3) Search strategy: the scoping set of papers is evaluated by identifying potential search terms. Then, the strategy is formulated by defining the databases/platforms to be searched, Boolean phrases, search tools, limiters, filters and exclusion criteria. (4) Exclusion criteria: Publications not directly related to analytics, data mining, big data and machine learning applied in healthcare engineering systems are excluded.

(5) Data collection: bibliometric data are collected and the criteria identified based on the aim of the research study. (6) Data analysis: the bibliometric analysis is conducted based on the aim of the research study.

Healthcare engineering systems

(7) Reporting: findings and results are presented. In this study, the research team decided to use Keathley et al. (2016) research methodology based on two considerations. First, the purpose of this study was focused on conducting quantitative analyses of published documents, also known as bibliometric analyses (Broadus, 1987). Second, Keathley et al. (2016) included three critical steps in their research methodology (problem definition, scoping study and search strategy) that are not included in PRISMA. These three steps offer the possibility of easily updating a systematic literature review. 2.1 Problem definition Throughout the literature, there are multiple publications regarding the use of data science, data analytics and machine learning algorithms applied to healthcare systems. However, it is not clear to what extent authors contributing to this research area are collaborating to diffuse new knowledge and significant findings. For this reason, a SLR aiming to synthesize the current published literature would provide a guide for the future development and evolution of this research area. 2.2 Scoping study The scoping study was conducted through the identification of six main publications related to the research area using three platforms (EBSCOhost, ProQuest and Scopus): Malik et al. (2018), Islam et al. (2018), Hansen et al. (2014), Luo et al. (2016), Alonso et al. (2017) and Mehta and Pandit (2018). To determine to what extent the literature related to data science applied to healthcare systems had been analyzed, a comparison study of previous literature reviews was conducted (see Table 1). The literature review conducted in 2014 aimed to discuss the perspectives of the evolving use of big data in science and healthcare and to examine some of the opportunities and challenges. The literature review conducted in 2015 discussed big data applications in four major biomedical subdisciplines: bioinformatics, clinical informatics, imaging informatics and public health informatics. The literature review carried out in 2017 reviewed big data sources and techniques in the health sector and identified which of these techniques were the most used in the prediction of chronic diseases. Once again, the first literature review conducted in 2018 reviewed big data analytics applications and challenges in its adoptions in healthcare and identified strategies to overcome them. The second literature review conducted in 2018, the most extensive one, provided a systematic review of the development of the fields of multiple healthcare sub-areas, data mining techniques, types of analytics, data and data sources, as well as possible directions. Finally, the last literature review conducted in 2018 assessed and synthesized how the big data phenomenon has contributed to better outcomes for the delivery of healthcare services. One interesting finding from these systematic literature reviews is the fact that none of them conducted social network analyses related to authors publishing in this research field, which represented a gap within this field to be covered. The present study, in addition to being the most updated one, analyzed a significantly higher number of publications in comparison with these other studies. Including a theoretical approach study as well as a social network of the authors publishing in the research field aiming to help new and current researches identify researchers who have similar interests and research lines within this field and that are collaborating in study groups for the diffusion of knowledge.

303

Table 1. Literature reviews (SLR) comparison table 2015 68 No No No

Yes

No No No No No No No No

No No

2014 0 No

No

No

Yes

No

No

No

No

No

No No

No

No

No

Note(s): *Includes publications until June 2019

Year Papers analyzed Sources of big data Sources of healthcare data Big data analytical techniques Application areas of big data/data mining Platforms of big data Big data definitions Keywords network Distribution of publications Distribution of journals Types of analytics Classification by disease Data mining algorithm tool/ software Authors’ social network analysis Theoretical approach

Big data application in biomedical research and health care: A literature review

No

No

No

No No

No

No

No

No

Yes

No

Yes

No

2017 32 Yes

A systematic review of techniques and sources of big data in the healthcare sector

No

No

No

No No

No

No

No

Yes

No

Yes

Yes

Yes

2018 58 Yes

Concurrence of big data analytics and healthcare: A systematic review

No

No

No

Yes Yes

Yes

Yes

Yes

No

No

Yes

Yes

No

2018 117 No

A systematic review on healthcare analytics: Application and theoretical perspective of data mining

No

No

Yes

No No

Yes

Yes

No

No

No

Yes

No

No

2018 22 No

Data mining and predictive analytics applications for the delivery of healthcare services: a Systematic literature review

304

Category

Big data in science and healthcare: A review of recent literature and perspectives

Yes

Yes

No

Yes Yes

Yes

Yes

Yes

No

No

Yes

Yes

No

2019* 576 No

This paper

MD 60,2

2.3 Search strategy The initial search strategy protocol consisted of five single search terms (data analytics, big data, data mining, machine learning and healthcare), three platforms (EBSCOhost, ProQuest and Scopus), the utilization of Boolean operators (AND/OR), all fields search and two main exclusion criteria – published in academic journals and written in the English language. This search strategy was tested and modified multiple times to identify a final set of relevant publications for this research area. First, to increase the sensitivity of the search, synonyms (e.g. data analysis, analysis of data, mass data and massive data), techniques (e.g. data processing, text mining and deep learning), more specific concepts (e.g. artificial intelligence, business intelligence and Internet of things) and the term “health care” (due to the lack of standardization between healthcare and health care in publications and academic texts) were added into the original search terms using the OR Boolean operator. Second, also to increase sensitivity, the Boolean phrase was applied to abstracts instead of all fields or all text, which helped control the scope. Lastly, conference materials were considered in the publications’ search. Table 2 shows the final search strategy protocol used in this work. The search strategy was executed to identify all relevant papers up through July 2019.

Healthcare engineering systems

305

2.4 Exclusion criteria A total of 8,529 publications were identified and screened based on the exclusion criteria listed in Table 2, removing the following publications from this study: duplicated (16.4% of the raw results), not related to data science fields (29.4%), not exclusively focused to healthcare systems (47%) and without an electronic file (0.4%). From the initial set, a total of 576 publications (6.8%) were accepted as the final publication set for this research. For purposes of this research, these 576 publications were classified into two separate sets based on their research approach: theoretical and application publications. The theoretical publication set included 105 publications that mainly focused on studying and analyzing the strengths, weaknesses, opportunities, threats, challenges, capabilities, trends, benefits and

Components of search Data science concept

Healthcare concept Platforms Search strategy

Exclusion criteria

Explanation Search terms Data analytics (8 search terms): analytics, data analytics, data analysis, analysis of data, informatics, informatics, health information technology, health information technologies Big data (6 search terms): big data, massive data, mass data, large data, macro data, metadata Data mining (3 search terms): data mining, data processing, text mining Machine learning (8 search terms): machine learning, artificial intelligence, robotics, deep learning, neural networks, Internet of things, IoT, business intelligence Healthcare (2 search terms): healthcare, health care EBSCOhost, ProQuest and Scopus Boolean operators OR within search terms for each concept (i.e. analytics OR analysis of data) AND across concepts (i.e. analytics AND healthcare) Search field: Abstract (EBSCOhost, ProQuest and Scopus) Publications present in academic journals or conference materials Publications written in a language other than English Exclude Duplicate publications Publications not related to the topic or that did not address data science fields Table 2. exclusively on healthcare engineering systems Systematic literature Publications for which an electronic file is not available review search protocol

MD 60,2

306

promises of data science, data analytics and machine learning algorithms applied to healthcare systems as a whole. On the other hand, the application publication set included 471 publications related to case studies of data science, data analytics and machine learning algorithms applied to healthcare systems that addressed a specific problem, disease, medical condition or medical disorder. To investigate the extent to which this research area was expanding, synthesizing and assessing the literature in the three dimensions outlined earlier (publications characteristics, authors characteristics and content characteristics) became a significant task. Each of these included the analysis of one or more criteria, as reported in the following section. 3. Results To obtain a comprehensive perspective of the published literature of data science, data analytics and machine learning applied to healthcare engineering systems, this section presents the results of the analyses conducted to address the research questions posed earlier in the Introduction section. 3.1 Publications characteristics To answer the research questions from publication characteristics, the following data were collected and synthesized from the 576 publications: publication year, publication name, publication field and publication impact (quartile). 3.1.1 RQ1. Which trend exists in publication pattern overtime for this research area?. Trends analyses are useful for visualizing trends in the frequency of publications over time to determine the extent to which the frequency is changing. When conducting a SLR, the publication rate is one of the multiple analyses often used to evaluate publication trends. Figure 1 shows the frequency of publications per year; the following findings can be observed from it. The first paper focusing on data science, data analytics and machine learning applied to healthcare engineering systems was published in 2004 – thus, this particular research area spans only 15 years and appears to be relatively young. Second, from 2004 to 2010, the number of publications fluctuated between zero and three and does not seem to demonstrate

Frequency of Publications per Year 700 600

Number of publications

200

500 150

400 300

100

200 50

Figure 1. Frequency of publications per year

0

100

2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 Frequency of papers per year

Cumulative frequency of papers

0

Cumulative number of publictions

250

an increasing trend. Third, as suggested by the cumulative frequency line, the publication trend started to increase after the year 2011, being 2016 the year with the highest number of publications (195 papers), up to date. For purposes of this analysis and considering that the publication set included papers published until the end of June 2019, the last column corresponding to the frequency of published papers in 2019 was doubled to keep consistency within the data. 3.1.2 RQ2. What type of sources is publishing the works?. These publications have been published mainly in academic journals (410 publications; 71.2%) and conference proceedings (95 publications; 16.5%). This fact suggests that practitioners and academics are conducting theoretical and applied research on this topic. 3.1.3 RQ3. Which are the sources with the highest frequency of published works in the field?. A total of 346 publication outlets were identified from the set of 576 publications. The most frequently used were academic journals such as the Journal of Medical Systems (33), PLoS One (32), BMC Medical Informatics and Decision Making (13), International Journal of Advanced Research in Computer Science (12), BMC Bioinformatics (11), Journal of Big Data (11), Computers in Biology and Medicine (10) and Journal of Medical Internet Research (10). On the other hand, the conference proceedings authors most frequently published in were the 18th IEEE International Conference on e-Health Networking, Application and Services, IEEE 1st International Conference on Connected Health: Applications, Systems and Engineering Technologies, 2016 IEEE International Conference on Healthcare Informatics, 2016 IEEE International Conference on Mobile Services and 2016 6th International Conference–Cloud System and Big Data Engineering, all with two publications each, respectively. Although this research topic is limited only to healthcare engineering systems, the descriptive analysis in RQ2 shows evidence that this research topic has been addressed from different fields. 3.1.4 RQ4. Which are the main study fields from the sources publishing the works?. An analysis was conducted to identify the publications outlets’ main study fields, according to SJR – Scimago Journal and Country Rank, to determine which research field this topic would fit better. According to the results of the analysis, the publication outlets’ main study fields were medicine (138), health informatics (101), information systems (82), computer science applications (67), computer networks and communications (61), biochemistry, genetics and molecular biology (55), health information management (48), electrical and electronic engineering (43), agricultural, and biological sciences (37) and hardware and architecture (34). One interesting finding is the fact that most of the publication outlets’ study fields could be associated in three main fields: health, computer science and information systems. Finally, an analysis of the journals’ impact factor quartiles (Q1 – Higher impact to Q4 – Lower impact) was conducted to identify their ranks in their respective categories: Q1 (42%), Q2 (39%), Q3 (15%) and Q4 (4%). This result suggests that most of the journals where the authors are publishing their works are highly ranked in their respective fields of study. Overall, from publications characteristics, it is observed that this research topic (application of data analytics, big data, data mining and machine learning to healthcare engineering systems) is in a growing stage based on the information synthesis from the analyses conducted. In essence, the frequency of publications per year shows an increasing trend, most of the publications came from journals with high impact (Q1 and Q2) and the publications are highly centered in the medical and computer sciences fields. 3.2 Authors’ characteristics To answer the research questions from authors’ characteristics, the following data were collected and synthesized from the 576 publications: authors’ names, authors’ first publication year, authors’ country of affiliation, authors’ publication network (authors publishing together) and the number of authors per publication.

Healthcare engineering systems

307

MD 60,2

308

3.2.1 RQ5. How many authors are contributing to this area?. A total of 2,402 unique authors were identified from the 576 publications, for an average of 4.2 authors per publication. 3.2.2 RQ6. To what extent are new authors contributing?. An analysis of the frequency of new authors publishing in this research area was conducted, as shown in Figure 2. The graph suggests an increasing trend on the number of new authors publishing in this research area, being the year 2016 the one with the highest introduction of new authors; further, the cumulative frequency seems to support the ability of this research area to attract new authors. For purposes of this analysis and considering that the publication set included papers published until the end of June 2019, the last column corresponding to the frequency of new authors in 2019 was doubled to keep consistency within the data. A criterion commonly used to analyze authors’ characteristics is of author diversity, which investigated the authors’ affiliation country. This analysis allows determining to what extent authors’ interest is concentrated primarily in a geographical region or dispersed around the world. The 2,402 unique authors on both publication sets represented a total of 51 different countries. The countries with the highest number of authors were the USA (34.6%), China (15.2%), India (7.5%), United Kingdom (6.3%) and Australia (5.8%). Other countries represented South Korea, Canada, Germany, Italy and Spain with less than 4% each. Therefore, this research area, while attracting interest from authors around the world, representing all continents, is concentrated primarily in five countries accounting for most of the authors (69.4%). 3.2.3 RQ7. To what extent are authors collaborating between them in this research area?. Collaboration among authors was analyzed using a social network created in Gephi to visualize direct and indirect interactions among authors and study groups. There are several algorithms used to draw social networks, such as Fruchterman Reingold and Wakita.Tsurumi. The decision about which social network algorithm to use is usually based on authors’ needs (Pajntar, 2006), e.g. time consumed to process a large amount of data and drawing characteristics. For this study, the research team decided to use the Fruchterman Reingold algorithm, as it is a force-directed layout algorithm that considers the force between two nodes (Udanor et al., 2016), as we were interested in analyzing the relationship among authors. Figure 3 shows the authors’ names with color-coded. In essence, authors with a blue font appeared exclusively on the theoretical publications set, authors with a black font appeared exclusively on the case study publications set and authors with a red

Frequency of New Authors per Year

Number of new authors

800

2500

700

2000

600 500

1500

400

1000

300 200

500

100 0

Figure 2. New authors per year

2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 Frequency of new authors per year

Cumulative frequency of new authors

0

Cumulative number of new authors

3000

900

Healthcare engineering systems

309

Figure 3. Co-author network for both publications sets

font appeared on both sets. For this figure, the size of the nodes represented the number of publications per author and the width of the connecting line between nodes represented the total number of publications between two given authors. The authors with the highest number of publications were I. Dinov, Francisco Florez-Revuelta, Nuno Garcia, Ivan Pires, Nuno Pombo and S. Spisante, with four publications each, respectively. A large number of authors that have published more than a single paper suggests that this research area represents the main research focus for multiple authors. In the same way, Figure 3 illustrates the formation of multiple study groups, which confirms that diffusion of knowledge is occurring through collaboration. 3.2.4 RQ8. What is the distribution of the number of authors per publication?. The analysis of the number of authors per publication was performed to get an insight into how this research field is being studied (i.e. individually or in groups). Out of the 576 results, only 53 of them (or 9.20% of the analyzed publications) were written by a single author. In contrast, the other 523 publications were written in groups between 2 and 22 authors. The group of three authors has the highest frequency with 117 publications (or 20.31% of the analyzed publications). With this analysis, it can be inferred that it is most likely for authors to study this research field in groups rather than individually, which strengthens the fact that the diffusion of knowledge is occurring through collaboration.

MD 60,2

310

Overall, from the authors’ characteristics, it was observed that this research topic (application of data analytics, big data, data mining and machine learning to healthcare engineering systems) is in a growing stage based on the information synthesis from the analyses conducted. In essence, the results indicate that there is a large number of authors publishing mainly in groups, the number of new authors (see Figure 2) has an increasing trend, authors’ country of affiliation are mainly focused in the US and China with a widespread around 51 countries, and there are groups of authors working. 3.3 Content characteristics To answer the research questions from content characteristics, the following data were collected and synthesized from the 576 publications: publication keywords, publication approach (theoretical or case study), publication objectives and analyses included in the publications. 3.3.1 RQ9. Which are the most frequently mentioned data science fields applied to healthcare systems?. A total of 1,875 keywords were collected from the 576 publications and classified in 982 unique keywords. The first 28 unique keywords (2.54%) were related to data science, such as big data (81 publications), machine learning (68 publications) and data mining (65 publications). These first 28 unique keywords represented 580 out of the 1,875 keywords (31%). On the other hand, 780 unique keywords were mentioned only one time, indicating that a wide variety of topics were addressed in the 576 publications. To identify and analyze the top data science fields and machine learning algorithms applied to healthcare systems, as well as their concurrence relationship, a social network with the keywords from both publication sets was created using Gephi (see Figure 4). Considering that the research team was interested in understanding the relationship between two keywords (nodes), then the Fruchterman Reingold clustering algorithm was applied again. Similarly, the size of the nodes represented the keyword’s count frequency, while the width of the connecting lines between nodes represented the total number of times they appeared together in a publication. The top five data science fields applied to healthcare systems where big data, machine learning, data mining, decision support systems and the Internet of Things. On the other hand, the top machine learning, and learning algorithms applied where cloud computing, decision tree, neural networks, Naı€ve Bayes classifier, support vector machines and association rule. An interesting finding is the fact that the top machine learning algorithms applied to healthcare systems were classification and clustering algorithms, which suggests an idea of the purposes behind their applications. 3.3.2 RQ10. Which are the top medical areas/departments where data science has been studied and applied?. Identifying the top medical areas/departments where data science and machine learning algorithms have been applied allows making inferences about the application fields’ sizes, and thus, the degree to which they have been explored. According to the frequency of keywords, 24 out of the 982 unique keywords were related to different medical areas/departments. The first keyword mentioning a medical area/department was observed in the 29th place (ontology). The most frequent medical areas/departments were ontology (seven publications), mental health (five publications), health services (four publications), elderly healthcare (three publications), epidemiology (three publications), genomics (three publications), behavioral health (two publications), drug development (two publications), genomics (two publications) and intensive care units (two publications). 3.3.3 RQ11. Which are the top diseases/disorders being addressed through data science approaches?. Similarly, an analysis of the top diseases being addressed through data science and machine learning algorithms was conducted. Fifty-four unique keywords related to

Healthcare engineering systems

311

Figure 4. Keywords count network – application papers

medical disease were collected. One interesting finding is the fact that most of the disease approached can be classified into three main groups: heart diseases (e.g. cardiovascular disease and strokes) with 12 publications, cancer (e.g. breast cancer) with nine publications and diabetes (e.g. diabetes type 2) with nine publications. These diseases are all top leading causes of Americans’ deaths and disabilities and leading drivers of the United States’ $3.5 trillion in annual healthcare costs, according to the National Center for Chronic Disease Prevention and Health Promotion (2019). Other diseases and medical disorders frequently studied and addressed through data science and machine learning algorithms were HIV (four publications), asthma (three publications) and depression (three publications), respectively. 3.3.4 RQ12. Which are the main study approaches on the theoretical publications set?. Table 3 classified the publications on the theoretical set based on their research area and study/analysis performed (see Appendix 1 to identify the reference). As suggested previously in Figure 4 and displayed in Table 3, most of the research of the publications on the theoretical set focused on big data, which is highly correlated to the amount of data generated daily by the healthcare industry. 3.3.5 RQ13. Which are the main application objectives on the case study publications set?. Table 4 classifies the publications on the case study application set based on their application objective (see Appendix 1 to identify the reference). As suggested in Figure 4 and displayed in Table 4, the application purposes of machine learning algorithms were

Data mining

Diverse uses and applications

Big data

74, 76, 106 75 89 89

10, 27, 39, 46, 49, 70, 94 13, 27, 31, 45, 70, 78 1, 48, 50, 51, 91 10, 33, 70 66, 70 13 108 79 11, 15, 17, 19, 75 76, 77, 89, 101, 103, 109 76, 77, 89 102, 111

2, 24, 32, 54, 55, 57, 86, 96, 112

37, 51, 52, 58, 61, 62, 63, 67, 70, 85, 94, 96, 107

(continued )

2, 3, 5, 7, 9, 18, 22, 25, 26, 28, 31, 32, 33, 35, 36, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 54, 55, 56, 61, 62, 72, 85, 86, 95, 107, 108 2, 8, 9, 24, 26, 27, 32, 33, 34, 35, 36, 38, 49, 51, 52, 54, 56, 57, 58, 62, 63, 67, 85, 94, 99

Publication reference*

312

Implementation challenges/barriers/ limitations Strengths, weaknesses, opportunities and/or threats Implementation advantages/benefits/ promises Techniques Capabilities Systematic literature review Sources Characteristics Proposed model/framework Trends/future directions Others Diverse uses and applications Techniques Strengths, weaknesses, opportunities and/or threats Systematic literature review Algorithms Characteristics Implementation advantages/benefits/ promises

Studies/analyses performed

Table 3. Study approach of theoretical publications per paper

Research area

MD 60,2

Diverse uses and applications Implementation challenges Perspectives Guidelines Diverse uses and applications Architecture, algorithms and applications Opportunities Systematic literature review Implementation challenges/barriers/ limitations Diverse uses and applications Advantages and disadvantages Implementation challenges/barriers/ limitations Architecture, algorithms and applications Implementation challenges/barriers/ limitations Systematic/literature review Architecture, algorithms and applications Trends/future directions Diverse uses and applications Proposed model/framework Miscellaneous

Healthcare analytics

Note(s): *See Appendix 1 for references

Others

Medical information technologies E-health

Clinical decision support systems

Machine learning

Internet of Things

Studies/analyses performed

Research area

68 83 82 45, 47, 60, 82 12, 47, 60, 82 6, 23, 71, 80, 81, 88, 92, 98, 100, 115, 116

69 93

18, 19, 21, 27, 31, 38, 78, 104 48 93

4, 11, 16, 97 4, 97 97 97 90 87 90 110 90

Publication reference*

Healthcare engineering systems

313

Table 3.

Table 4. Application objectives on the case study publications set per paper

Classification Decision-making Data mining Identification Research Diagnosis Detection Case study Data analysis Framework Monitoring Discovery Data managing Modeling Pattern analysis Association Clustering Data processing Data visualization Systematic review Extraction Forecasting Comparative study Data handling Investigation Optimization Simulation Assessment Automation Case management

(continued )

1, 2, 6, 7, 9, 12, 14, 25, 27, 35, 46, 51, 52, 78, 81, 92, 105, 111, 112, 120, 121, 126, 138, 144, 145, 160, 121, 174, 178, 189, 191, 194, 201, 205, 251, 262, 265, 266, 276, 282, 288, 291, 304, 306, 328, 330, 331, 332, 336, 341, 353, 356, 359, 375, 384, 390, 391, 392, 393, 394, 395, 396, 397, 398, 399, 400, 401, 402, 403, 404, 405, 407, 408, 409, 410, 411, 412, 413, 414, 415, 416, 417, 418, 427, 429, 436, 438, 439, 444, 446, 449, 474, 480, 481, 485, 489, 490, 496, 497 21, 22, 25, 36, 39, 40, 80, 93, 117, 184, 245, 250, 267, 307, 325, 326, 342, 354, 358, 362, 367, 371, 373, 380, 382, 451, 452, 478, 495 28, 101, 142, 169, 179, 202, 230, 231, 232, 233, 268, 286, 339, 364, 368, 374, 383, 424, 453, 471, 498 15, 17, 103, 216, 218, 219, 220, 345, 346, 347, 350, 351, 370, 386, 422, 423, 487, 494 16, 90, 129, 182, 214, 241, 252, 274, 298, 299, 300, 301, 302, 310, 333, 463, 502 68, 102, 109, 124, 125, 154, 243, 385, 308, 311, 313, 323, 349, 428, 434, 458 10, 53, 72, 88, 91, 95, 115, 130, 188, 261, 295, 296, 297, 317, 318 19, 29, 31, 41, 55, 140, 143, 211, 226, 239, 275, 466, 479, 483 77, 152, 165, 209, 235, 263, 271, 337, 343, 430, 435, 468 106, 247, 281, 289, 419, 454, 456, 464, 469, 500 42, 49, 175, 176, 180, 208, 237, 287, 293, 340 26, 44, 56, 79, 97, 99, 100, 278, 294, 431 54, 85, 161, 210, 248, 433, 475, 484 18, 57, 64, 213, 221, 240, 437 4, 13, 149, 222, 255, 338 66, 164, 167, 348, 378, 379 118, 139, 196, 322, 387 60, 193, 246, 303, 315 3, 69, 98, 357, 440 91, 224, 229, 361, 388 73, 74, 75, 123, 447 20, 32, 170, 199 107, 236, 273, 491 186, 200, 366 116, 283, 284 96, 127, 462 225, 292, 372 197, 204, 365 137, 181 83, 141 134, 467

Publication reference*

314

Prediction

Application objective

MD 60,2

59, 61 37, 499 482, 503 146, 147 82, 110 8, 376 450, 465 33, 473 5, 11, 24, 30, 34, 43, 46, 50, 62, 65, 67, 70, 71, 76, 84, 86, 89, 104, 108, 114, 119, 122, 128, 131, 132, 133, 135, 136, 148, 150, 151, 153, 155, 156, 158, 159, 162, 163, 166, 168, 172, 173, 177, 183, 185, 187, 190, 192, 195, 203, 212, 223, 234, 238, 242, 244, 249, 254, 256, 258, 259, 260, 269, 272, 277, 279, 280, 305, 309, 312, 314, 316, 319, 321, 327, 329, 335, 344, 352, 360, 363, 369, 377, 381, 385, 389, 421, 425, 426, 432, 441, 442, 445, 455, 457, 460, 461, 470, 472, 488, 493, 501

Publication reference*

Note(s): *See Appendix 2 for references

Data integration Exploration Improvement Prioritization Screening Stratification Text mining Translation Miscellaneous

Application objective

Healthcare engineering systems

315

Table 4.

MD 60,2

316

mainly for prediction (e.g. readmissions prediction, disease prediction, fraud prediction, adverse event prediction and medical outcomes predictions), classification (i.e. based on the patients’ treats and characteristics) and decision-making (e.g. type of surgery, drugs and recovery process). They outlined the significant role of predictive analytics in healthcare systems. 3.3.6 RQ14. Which are the newly emerging research lines related to this research area?. A qualitative study was performed on the theoretical publications set to identify the newly emerging research lines. These included (1) the creation of algorithms and big data analytics technologies to address data privacy, data security and data traceability concerns, (2) improved understanding of the ethical, societal and economic implications of applying data analytics and machine learning algorithms in healthcare organizational decision-making, (3) big data and machine learning algorithms in conjunction with evidence-based medicine practices, (4) integration of multiple databases with different data structures, (5) big data applied into molecular-level data (i.e. the atomic scale), (6) applications related to social media investigation, (7) addressing information loss in data preprocessing and cleaning steps and (8) data analysis and automation for nonexperts. Overall, from content characteristics, it was observed that this research topic (application of data analytics, big data, data mining, and machine learning to healthcare engineering systems) had been addressed from theoretical and case study approaches with a widespread of purposes. However, from the authors’ perspective, this research topic is still in a growing stage with several medical areas/departments to study, as well as different diseases. 4. Conclusions, limitations and future research The objective of this study was to assess and synthesized the published literature related to the application of data analytics, big data, data mining and machine learning to healthcare engineering systems. To achieve this aim, an SLR was conducted to collect relevant publications to assess the maturity of this research field in three dimensions (Keathley et al., 2016): publication characteristics, authors’ characteristics and content characteristics. First, the frequency of publications indicates an increasing trend, suggesting that every year more authors are publishing theoretical or application papers. Comparing Figure 1 with the life cycle of a product (introduction, growth, mature and decline), it could be assumed that this research field is in its growth stage. These publications came from journals with a high impact factor in different fields, such as medicine, informatics and computer science, indicating that data analytics, big data, data mining and machine learning in healthcare engineering systems have been addressed from a multidiscipline perspective. Although these analyses are usually applied in literature reviews, the authors identified that this research topic is a skill not addressed from the industrial engineering and management decision perspective. Second, the frequency of new authors per year supports the assumption made in the analysis of publication characteristics dimension, where it is evident that new authors are interested in this topic, contributing to the body of knowledge with their publications. On the other hand, the utilization of social network analysis was used to identify groups of authors working together to conduct theoretical and applied research in this field. Now, practitioners and academics interested in this field are able to identify them and request to participate in future investigations or applications of data analytics, big data, data mining, and machine learning in healthcare engineering systems. With this analysis, this paper contributes to the body of knowledge, closing a gap identified during the literature review section in this paper (see Table 1).

Lastly, the content characteristic dimension was also addressed using social network analysis to show the relationship between the keyword used in the set of publications and the classification of papers based on their study approach (theoretical research) and application objectives (applied research). The keyword social network analysis showed that this research field had been analyzed in a variety of hospital departments and illnesses. However, the authors did not find evidence of publications evaluating the impact of data analytics on patient safety and or cost versus benefits in healthcare institutions. The other two analyses included in Tables 3 and 4 showed a lack of theoretical publications focused on analyzing the decision-making process. However, from an applied research perspective, decision-making was the third most important application objective (see Table 4). On the other hand, considering the four stages of data analytics maturity (descriptive, analytic, predictive and prescriptive) and analyzing the publications collected in the application research perspective (see Table 4), it was also observed that most of the application objectives were focused on predictive analyses. This evidence suggests that data analytics, big data, data mining and machine learning in healthcare engineering systems had a high level of maturing. With these analyses, this paper contributes to the body of knowledge, closing a gap identified during the literature review section in this paper (see Table 1). Using together the information shown in Figure 3, Tables 3 and 4, Appendices 1 and 2 practitioners and academics interested in this topic should be able to easily identify new colleagues, opportunities for new research and evidence to support the needs for specific research. For example, application of data science and industrial engineering tools/methods to improve healthcare process efficiency, the role of data science in healthcare performance excellence models or the creation of management decision models using data science in cases of global natural disasters or pandemics. Therefore, besides aiming at stimulating scientific research, this paper also intends to provide industrialists with a general overview of data analytics, big data, data mining and machine learning in healthcare engineering systems, so they can develop a deeper and richer knowledge on these subjects, and their practices. This will help healthcare industrialists to formulate more effective strategies for the implementation of the technologies. This research will also encourage them, and hence their organizations, to implement digital technologies to support the operations of their organizations. These findings should not be generalized, taking into consideration that every literature review has different biases, such as database bias (produced by the utilization of a limited number of databases) and interpretation bias (produced by the interpretation of the publication content using several researchers). To reduce the impact of these biases in this research, the authors used several platforms to collect the relevant publications (ProQuest, EBSCOhost and Scopus). Each of these platforms has access to different databases. On the other hand, under the supervision of a leading author, a single author was used to collect and interpret the information from our publication final set. Based on the current findings and the limitations of this paper, the authors consider that future research should be focused on increasing the theoretical and applied research on four lines in this field: assess cost versus benefits of the application of data analytics, conduct prescriptive analytics research, analyze decision-making process with data analytics and update this SLR in a short time including new platforms. References Alonso, S.G., de la Torre Diez, I., Rodrigues, J., Hamrioui, S. and Lopez-Coronado, M. (2017), “A systematic review of techniques and sources of big data in the healthcare sector”, Journal of Medical Systems, Vol. 41 No. 11, pp. 1-9.

Healthcare engineering systems

317

MD 60,2

Broadus, R.N. (1987), “Toward a definition of bibliometrics”, Scientometrics, Vol. 12 No. 5, pp. 373-379. Conway, D. (2013), “The data science venn diagram”, available at: http://drewconway.com/zia/2013/3/ 26/the-data-science-venn-diagram (accessed 7 July 2019). Emmert-Streib, F., Moutari, S. and Dehmer, M. (2016), “The process of analyzing data is the emergent feature of data science”, Frontiers in Genetics, Vol. 7 No. 12, pp. 1-4.

318

Hansen, M.M., Miron-Shatz, T., Lau, A.Y.S. and Paton, C. (2014), “Big data in science and healthcare: a review of recent literature and perspectives”, Yearb Med Inform, Vol. 9 No. 1, pp. 21-26. Higgins, J. and Green, S. (2011), “Chapter 4: guide to the concepts of a Cochrane protocol and review”, in Higgins, J. and Green, S. (Eds), Cochrane Handbook for Systematic Reviews of Interventions, John Wiley & Sons, England, pp. 51-79. Islam, S., Hasan, M., Wang, X., Germack, H.D. and E-Alam, N. (2018), “A systematic review on healthcare analytics: application and theoretical perspective of data mining”, Healthcare (Basel), Vol. 6 No. 2, pp. 1-43. Keathley, H., Gonzalez Aleu, F., Cardenas Orlandini, P., Van Aken, E., Deschamps, F. and Leite, L.R. (2013), “Maturity assessment of performance measurement implementation success factor failure”, American Society for Engineering Management 2013 International Annual Conference, Minneapolis, MN, 2-5 October. Keathley, H., Van Aken, E.M., Gonzalez Aleu, F., Deschamps, F., Letens, G. and Cardenas Orlandini, P. (2016), “Assessing the maturity of a research area: bibliometric review and proposed framework”, Scientometrics, Vol. 109 No. 2, pp. 927-951. Lefebvre, C., Manheimer, E. and Glanville, J. (2011), “Chapter 6: searching for studies”, available at: www.cochrane-handbook.org (accessed 22 July 2019). Luo, J., Wu, M., Gopukumar, D. and Zhao, Y. (2016), “Big data application in biomedical research and health care: a literature review”, Biomedical Informatics Insights, Vol. 8 No. 1, pp. 1-10. Malik, M.M., Abdallah, S. and Ala’raj, M. (2018), “Data mining and predictive analytics applications for the delivery of healthcare services: a systematic literature review”, Annals of Operations Research, Vol. 270 Nos 1-2, pp. 287-312. Mehta, N. and Pandit, A. (2018), “Concurrence of big data analytics and healthcare: a systematic review”, International Journal of Medical Informatics, Vol. 114 No. 1, pp. 57-65. Moher, D., Liberati, A., Tetzlaff, J. and Altman, D.G. (2010), “The PRISMA group preferred reporting items for systematic reviews and meta-analyses: the PRISMA stategement”, International Journal of Surgery, Vol. 8 No. 5, pp. 336-341. Nambiar, R., Sethi, A., Bhardwaj, R. and Vargheese, R. (2013), “A look at challenges and opportunities of big data analytics in healthcare”, Institute of Electrical and Electronics Engineers 2013 International Conference on Big Data. National Center for Chronic Disease Prevention and Health Promotion (2019), “Chronic diseases in America”, available at: https://www.cdc.gov/chronicdisease/resources/infographic/chronicdiseases.htm (accessed 25 October 2019). Pajntar, B. (2006), “Overview of algorithms for graph drawing”, Knowledge: Creation, Diffusion, Utilization, Vol. 3 No. 6, pp. 1-4. Provost, F. and Fawcett, T. (2013), “Data science and its relationship to big data and data-driven decision making”, Big Data, Vol. 1 No. 1, pp. 51-59. Raghupathi, W. and Raghupathi, V. (2014), “Big data analytics in healthcare: promise and potential”, Health Information Science and Systems, Vol. 2 No. 3, pp. 1-10. Tranfield, D., Denyer, D. and Smart, P. (2003), “Towards a methodology for developing evidenceinformed management knowledge by means of systematic review”, British Journal of Management, Vol. 14 No. 1, pp. 207-222. Udanor, C., Aneke, S. and Ogbuokiri, B.O. (2016), “Determining social media impact on the politics of developing countries using social network analytics”, Program, Vol. 50 No. 6, pp. 481-507.

Winters, D. (2015), “What is the difference between data analytics, data analysis, data mining, data science, machine learning, and big data”, available at: https://www.quora.com/profile/DahlWinters (accessed 25 October 2019).

Appendix The appendices are available online for this article.

Corresponding author Anil Kumar can be contacted at: [email protected] and [email protected]

For instructions on how to order reprints of this article, please visit our website: www.emeraldgrouppublishing.com/licensing/reprints.htm Or contact us for further details: [email protected]

Healthcare engineering systems

319

The current issue and full text archive of this journal is available on Emerald Insight at: https://www.emerald.com/insight/0025-1747.htm

MD 60,2

320 Received 29 January 2020 Revised 22 April 2020 10 June 2020 25 September 2020 Accepted 16 October 2020

How do mid-level managers experience data science disruptions? An in-depth inquiry through interpretative phenomenological analysis (IPA) Atri Sengupta OB&HR, IIM Sambalpur, Sambalpur, India

Shashank Mittal OB&HR, Rajagiri Busines School, Cochin, India, and

Kuchi Sanchita Information Systems, IIM Raipur, Raipur, India Abstract Purpose – Rapid advancement of data science has disrupted both business and employees in organizations. However, extant literature primarily focuses on the organizational level phenomena, and has almost ignored the employee/individual perspective. This study thereby intends to capture the experiences of mid-level managers about these disruptions vis a vis their corresponding actions. Design/methodology/approach – In a small-sample qualitative research design, Interpretative Phenomenological Analysis (IPA) was adopted to capture this individual-level phenomenon. Twelve midlevel managers from large-scale Indian organizations that have extensively adopted data science tools and techniques participated in a semi-structured and in-depth interview process. Findings – Our findings unfolded several perspectives gained from their experiences, leading thereby to two emergent person-job (mis)fit process models. (1) Managers, who perceived demands-abilities misfit (D-A misfit) as a growth-alignment opportunity vis a vis their corresponding actions, which effectively trapped them into a vicious cycle; and (2) the managers, who considered D-A misfit as a psychological strain vis a vis their corresponding actions, which engaged them into a benevolent cycle. Research limitations/implications – The present paper has major theoretical and managerial implications in the field of human resource management and business analytics. Practical implications – The findings advise managers that the focus should be on developing an organizational learning eco-system, which would enable mid-level managers to gain their confidence and control over their job and work environment in the context of data science disruptions. Importantly, organizations should facilitate integrated workplace learning (both formal and informal) with an appropriate ecosystem to help mid-level managers to adapt to the data-science disruptions. Originality/value – The present study offers two emergent cyclic models to the existing person–job fit literature in the context of data science disruptions. A scant attention of the earlier researchers on how individual employees actually experience disruption, and the corresponding IPA method used in the present study may add significant value to the extant literature. Further, it opens a timely and relevant future research avenues in the context of data science disruptions. Keywords Data science and analytics disruptions, Demands-abilities (mis)fit and supplies-values (mis)fit, Formal and informal workplace learning, Interpretative phenomenological analysis (IPA), Organizational support and personal learning orientations, Psychological strain Paper type Research paper

Management Decision Vol. 60 No. 2, 2022 pp. 320-343 © Emerald Publishing Limited 0025-1747 DOI 10.1108/MD-01-2020-0099

Introduction How data-science disruptions are impacting employees has received scant attention from researchers thus far, especially when these disruptions are the “new-normal” in today’s business. This emergent volatility often puts organizations at risk about the relevance of its

knowledge. With the advent of new technologies and tools (e.g. AI, machine learning, analytics, IoT), the success and survival of organizations are now contingent with how quickly they can actually adapt to the changing environments. McKinsey & Co. (2018) state in its global executive survey on disruptive forces in the industrial sectors that “These disruptions are unprecedented in their scale and speed, driven, among others, by massive advances in data generation, computing power, and connectivity” (p. 6) which impact all aspects of business and lives. The survey further reveals that new entrants/startups are better prepared as compared to the traditional company to deal with the potential impact. Industry leaders are in opinion that urgent responses towards this rapid advances are the enabler and driver of growth. The McKinsey & Co. (2018) survey exhibits that more than a third of employees will be impacted by the disruptions. Experts and executives of the survey expects that there will be talent scarcity in terms of data scientists, AI experts and programmers. The general tendency of organizations in face of such disruptions has been to hire new people with new skill sets, rather than investing on building the in-house capability. Resultantly, the responsibility of managing the career of an employee per se has shifted from the organizations to the employee himself/herself (Groysberg et al., 2019). Notably and as a matter of fact, existing employees, especially mid-level managers, cannot change themselves so rapidly with the evolvement of a new technology (Groysberg et al., 2019). This imbalance makes the working life of these employees and their intra-organizational relationships volatile. However, not many studies have shed light in this direction. Therefore, due to the paucity of such knowledge in extant literature, the present study looks into this unexplored issue, that is how individual employees, the mid-management professionals, experience these data science disruptions within their respective organizations. The obvious outcome of industry disruptions is knowledge volatility; in other words, the knowledge, which is valuable today, loses its relevance tomorrow. Some knowledge, especially in the organizational context, loses its relevance with the advancement of technology embedded knowledge in the similar domain. For example, with the disruptions in analytics, HR managers need to contribute in the organizational decision-making process through HR Analytics. They need to demonstrate empirically how HR strategies and activities contribute towards organizational performance, and how HR data can be used to devise future organizational strategy. Thus, employees feel a continuous pressure to acquire new knowledge, which is relevant to current or upcoming demands for the jobs (recrafted jobs) or the organizations, making thereby an individual’s learning curve very steep, with the risk of high failure (Groysberg et al., 2019). Moreover, the amount of time and support required to upgrade the “knowledge” are often missing from the system, as many organizations have less preferences for building their in-house capabilities. As a result, employees struggle to gain knowledge relevant to the changing job demands—a midmanager/career crisis. This incompatibility between the job demands versus the resources makes the relationship between the team members, leaders and followers highly dynamic. It often creates negative organizational outcomes like, job dissatisfaction, work disengagement, feeling irrelevant. A few obvious outcomes at an individual level have been observed in this highly volatile context. Herein, individual jobs are being continuously recrafted; there are competency mismatches with the jobs. One has to fast adopt to new learnings, cater to issues related to work–life balance, manage interpersonal conflict in teams and most importantly, manage career volatility due to one’s irrelevance of knowledge. Consequently, there is a predominance of retrenchment of mid-managers in this context. The major focus of literature on the disruptions of data science has been on investigating the analytics or data science disruptions at an organizational level or at the macro level (e.g. Dubey et al., 2018; Shao et al., 2018; Ghasemaghaei et al., 2017; Wang et al., 2016; Baesens et al., 2016; Xu et al., 2016). However, the impact of these disruptions on individuals has not received

Data science disruptions

321

MD 60,2

322

adequate attention from researchers, despite the fact that volatility creates threats for leaders, managers or individual employees within the organizations (Groysberg et al., 2019). Importantly, without this knowledge, it seems difficult for an organization to set the right strategies to navigate these disruptions. For example, organizations do need to consider how to upskill current employees; how to manage employee-stress, which emerges due to unemployability or skill gap; how to motivate the employees to optimize their performance, and so on. The present study specifically aims at responding to some of these phenomena. Largely, it aims at capturing individual-level phenomena of disruptions, which would certainly create greater policy implications for organizations in themselves. Specifically, we intend to capture the experience of mid-level managers, who were once pivotal to organizational success. They are struggling now with disruption dynamics in the organization to stay relevant. An attempt has been made to investigate about how these managers experience data science disruptions, what are their feelings and their response to the dynamics of this disruptions. A phenomenological research design has been preferred for this purpose. The next section of the study focuses on relevant literature on changing business context due to data science disruptions and individual experiences of such disruptions, followed by methodology, data analysis and theory development, discussion and implications for future research and practices. Relevant literature Evolution of data science and changing business context Since a decade, both businesses and computer scientists have been exploring the “immense possibilities in the form of profound changes in the way they manage their business, their customers and their business models” (Raguseo, 2018, p. 189). This has happened because the world was being “overrun by a data-driven revolution”, due to a “widespread availability of big data and the fast evolution of big data technologies” (Raguseo, 2018, p. 189). Until then, the major scene in the field of Data Analytics had been dominated by more traditional tools and platforms, which have been recently reported McKinsey Global Institute (2016) as the category of “other analytics”, viz., Regression (logistic), search algorithms, sorting, merging compression and so on. With the computing prowess, which has grown many folds in the present decade, and has developed a capability to run the complex algorithms and match the quantum of available data, the new era in analytics has been hailed as the emergence of Machine Learning (ML) and Artificial Intelligence (AI). In a recent report, McKinsey Global Institute (2016) revealed that Machine Learning and Deep Learning technologies are at the helm of Business Analytics today, while other traditional methods have been categorized as “other analytics”, and have taken a back seat to acquire a new role of “supporting technologies”. This is because these other analytic tools use conventional software programs that are “hard-coded by humans with specific instructions on the tasks they need to execute” (McKinsey Global Institute, 2016, p. 11). By contrast, it is possible to create ML and AI algorithms “that ‘learn’ from data without being explicitly programmed. The concept underpinning machine learning is to give the algorithm a massive number of ‘experiences’ (training data) and a generalized strategy for learning, then let it identify patterns, associations, and insights from the data” (McKinsey Global Institute, 2016, p. 11). In short, these systems are “trained rather than programmed”. The entire capabilities and possibilities of both of ML and AI have transformed the landscape of business intelligence (BI) to the next level with the use of advanced technologies like—clustering, dimensionality reduction, classification, neural networks, deep machine learning, deep belief networks, image procession to name a few. The biggest business applications in day-to-day activities offered by ML and AI today, are recognizing new patterns, generating natural language, understanding natural language, having enhanced

sensory perception and optimization and planning. Therefore, today, both ML and AI provide tremendous business opportunities to solve complex business problems, which range from providing and managing customer service, managing logistics, analyzing medical records and/or even writing news stories. These technologies generate immense productivity gains along with an improved quality of life; however, they also bring along job losses and other disruptions. An earlier report from McKinsey Global Institute (2011) shows that about 45% of work activities would potentially be automated by these technologies. ML for instance, could be an enabling technology for automating up to 80% of those activities. In the Indian context, there has been a potential boom in using these technologies. A recent study jointly conducted by Analytics India Magazine and Praxis Business School (2019) reveals that analytics and data science has made a paradigm shift in their roles from supporting business to shaping the business performance. This study estimates a rapidly growing market of analytics and data science in India. Consider this, the annual revenues of the Indian analytics, data science, and big data industry is reported to be of about $2.72 billion in 2018. In 2019, it has grown to about $3.03 billion, and by 2025, it is expected to double. Therefore, the aggressive posturing and growth opportunities of this industry, both in India and the global markets at large, have attracted numerous organizations—from large caps to start ups. However, these organizations need to navigate a bunch of challenges that emerge from the changing trend and sudden growth of this industry. These challenges may be labelled at three levels, viz., organizational, team and individual. For example, many organizations find it difficult to extract value from data and analytics at the organizational level, that is incorporating data-driven insights into day-to-day business processes. Similarly, the major focus of these organizations is on the technology per se, rather than focusing on the development of the “right team” or improving the existing project-team’s performance. The next critical challenge is at the individual level, whereby an organization needs to both attract and retain the right talent—not only data scientists, but business translators, who combine data with industry and functional expertise. Add to this, the challenge of upskilling current employees in order to make them competent for upcoming analytics and data science projects. As mentioned earlier, the present study aims to understand data science disruptions at the individual level for which there’s indeed a dearth of knowledge within extant literature. Specifically, we focus on mid-level managers who find themselves direction-less amidst this disruption. Essentially they are clueless about how to adapt and upskill themselves in this highly volatile situation; how to make themselves adaptable and thereby employable and relevant for upcoming data science projects within their organizations; how to survive in the face of competition with next-generation employees, who are formally well-equipped in these technologies. These managers are experienced; they have transcended from Information Technology to supporting technologies now, termed as “other analytics”. They were once considered to be highly relevant during the boom of such technologies in the last decade. However, these sudden and rapid changes in technologies have almost compelled them to be misfit to the “new” job demands. In this context of person–job misfit, the experiences and struggles of mid-level managers to gain back their hold in the organization has received a negligible attention from the researchers. The present study contributes in capturing such experiences and struggles of the mid-level managers leading to a greater policy implication for the organizations. Person–job (mis)fit Person–job (P-J) fit highlights two types of fits—person’s needs or values are compatible with the supplies of the job, that is supplies–values (S-V) fit. A person’s competencies in terms of knowledge, skills and abilities match the demands of the job, that is demands–abilities (D-A)

Data science disruptions

323

MD 60,2

324

fit (Kristof-Brown et al., 2005). D-A misfit, which occurs due to technical innovation, decentralized organizational structural changes, increasing competition (Parker et al., 2006), expects managers to adapt to the changing situation by increasing their abilities (Devloo et al., 2011). D-A misfit may be perceived by an individual either when his/her abilities cannot match the high demands of the job; in other words, when his/her abilities overpower the demands of the job, that is overmatched (Cable and DeRue, 2002). Current literature mostly emphasizes on antecedents, outcomes and measurement issues related to D-A (mis)fit, and ignores the direction perspective of the misfit that warrants future research (Cable and DeRue, 2002). In both the cases, that is under-matched or overmatched, an individual does need to restore the balance by taking some actions. However, what are the process models of these two types of D-A misfit, or to be precise, how do individuals actually deal with undermatched or overmatched D-A misfit, has received scant attention from scholars so far. Cable and DeRue (2002) reported very interesting findings, which revealed that D-A fit perceptions are unable to predict occupational commitment, job performance and pay raises. One possibility behind such findings, as reported by these authors, is that D-A (mis)fit should be measured across time and direction of (mis)fit, which actually plays a crucial role here. They also advocated that under-matched and overmatched D-A misfit could also lead to different outcomes (e.g. anxiety and boredom respectively) that need to be investigated in future. Considering the temporal and direction dimensions, D-A (mis)fit could better be studied qualitatively. Hence, the present study intends to investigate D-A misfit, with respect to temporal and direction dimensions, along with its process model. Notably, temporal and direction dimensions play significant roles in the context of data science disruptions, which may be characterized as being highly dynamic, that is a continuous and growing massive advancement happens in tools and techniques within a very short time frame. Supplies–values (S-V) fit, which is referred to as the values or needs an employee looks for from his/her job, is another significant contributor to subjective person–job fit or person– environment fit . Herein, values indicate the desires (Edwards and Cooper, 1990) or preferences of an employee, along with his/her motives, interests and goals (Edwards, 1992). Edwards (1996) stated “the core process underlying S-V fit is the cognitive comparison of the perceived and desired amount, frequency, or quality of conditions or events experienced by the person” (p. 294). He refuted the earlier findings, which believed that D-A misfit affects S-V misfit through strain (e.g. French et al., 1982). However, he does not deny the possibility that it may happen in case of distinct but instrumentally related dimensions. For example, data science disruptions create excessive job demands (D-A misfit), especially in terms of knowledge upgradation, that too in a rapidly advancing environment, which could create strain through S-V misfit with respect to intrinsic rewards (e.g. self-esteem). Extant literature is silent on this. In this study, we observe that in reality, individuals experience psychological strain once they are in a S-V misfit condition, for example not getting due importance in the organization due to D-A misfit. Further, we focus on assessing such relationship between D-A misfit and S-V misfit within a comprehensive structural model by capturing individual experiences. For this, we adopt a qualitative inquiry method; more specifically, we choose interpretative phenomenological analysis (IPA) (Smith, 1996), the details of which are mentioned in the next section. Psychological strain The extant literature has been fraught with multiple studies on the relationship between D-A misfit and strain. Strain has been well-established as an outcome of D-A misfit, where most of the studies predict that D-A misfit triggers psychological strain more strongly than demand (environmental factors) and ability (individual factors) alone (Livingstone et al., 1997). Adding to this, job-demand control model of Karasek and Theorell (1990) was primarily the model of

psychological strain, which put it as a consequence of demand-control misfit (Kwakman, 2001). This aspect of the model predicts that psychological strain is the result of high job demand and lower control over work. However, the other aspect of the model, which focused on the learning as a positive outcome of strain got relatively lesser attention so far. Further, the learning aspect of the psychological strain model predicts that when both job demand and ability (control over the job) are high, and have an optimum match, it results in learning and development prospects due to increased level of productive psychological strain (Kwakman, 2001; Karasek and Theorell, 1990). LePine et al. (2004) found that challenge strain does have a positive relationship with learning, which in turn is mediated by learning motivation and exhaustion, whereas the relationship is essentially negative for hindrance strain and learning. Further, numerous studies in Psychology, point to the interrelationship between psychological strain and learning, where there are two different pathways. “Adaptive pathway leading from stress to learning goals and constructive strategies, and a contrasting pathway leading from stress to self-validation goals and defensive strategies” (Rusk and Rauthbaum, 2010, p. 31). These are established to show how individuals have constructive learning and positive experiences both during and as an aftermath of a “strain”. During strain conditions and after it, “learning as compared to self-validation goals are more likely to lead to cognitive openness, problemsolving, support-seeking, and adaptive emotion regulation” (Rusk and Rauthbaum, 2010, p. 31). Further, Rusk and Rauthbaum (2010) stated “goal orientation see exploration and learning goals as resulting from incremental views—the belief that, through effort and practice, one can change one’s ability” (Rusk and Rauthbaum, 2010, p. 33) and external changes in the environment could act as a challenge stressor which prompts individuals towards incremental views or engages them in such activities which enable them to reduce psychological strain (Dweck and Molden, 2005). Thus, in the context of analytics and data science disruptions also, they may involve in upskilling themselves to restore the balance. Workplace learning D-A misfit is related to individual competences (Cable and DeRue, 2002). Therefore, to restore the balance, especially in case of under-matched D-A fit, individuals need to enhance their competences, and workplace learning is instrumental to such development. Data science disruptions have brought about a turmoil within the knowledge society and the industry at large. To understand such a complex change, one needs to understand workplace learning (Mittal, 2019; Manuti et al., 2015), which refers to the multiple channels and mechanisms that employees use to learn within an organization per se (Singh et al., 2019; Mittal, 2019; Jacobs and Parks, 2009). Individuals are engaged in workplace knowledge sharing (Singh et al., 2019) and learning by balancing both organizational and their own demands (Billett, 2001). Their learnings occur in two ways—formal learning, which is a structured or typically a classroom setting learning process organized by the organization (Marsick and Watkins, 2001). Informal learning is not something that is as planned and structured; it primarily emerges out of a need embedded within a given context (Manuti et al., 2015). Importantly, the other salient distinction between the two (i.e. formal vs. informal learning), is that whilst the former (i.e. formal learning) has a planned framework (Mittal, 2019), which includes a dedicated teacher or trainer, the latter (i.e. informal learning) arises within a given situation, where learning is not a prime objective of the activity; it is more aimed towards resolving an anticipated or existing problem (Mittal et al., 2020; Manuti et al., 2015). Self-initiated learning, collective learning, learning through mentor are some of the example of informal learning. Eraut (2000) preferred to term informal learning as “non-formal learning” by adding a dimension of “intentionality”. He advocated for three types of nonformal learning—implicit, deliberative and reactive learning.

Data science disruptions

325

MD 60,2

326

Cerasoli et al. (2018), in their meta-analysis, highlighted a list of antecedents (personal and situational) and outcome of informal learning behavior. Personal antecedents include individual predispositions and demographic factors, whereas situational antecedents comprise job/task characteristics, support and opportunities for learning. Individual learning behavior influences attitude, knowledge/skill acquisition and performance. Although informal learning is the most popular form of workplace learning, Clardy (2018) highlighted the relevance of both formal and informal learning in the workplace. He refuted the concept of 70% rule of informal learning in workplace and instead advocated for integrated learning. Noe et al. (2013), on the other hand, emphasized on the relevance of informal learning for the continuous upgradation of knowledge and skills of the employees to adapt to dynamic situations and to new technologies. They advocated for future research investigating the roles of more situational factors in informal learning. Data science disruptions bring rapid change into the system that expects employees to learn fast, while adopting to new technologies and situational demands. “Time is an important determinant of degree of learning” (Gettinger, 1985, p. 3). Three types of time framework of learning contribute significantly to achievement—time allowed or allocated for learning, time spent or engaged in learning and time actually needed for learning (Gettinger, 1985). Studies suggest that allocation of time (in terms of instruction time or duration of classes) and academic achievements have a modest relationship (Fredrick and Walberg, 1980). Similar relationship may exist in formal learning, where an organization schedules classroom instruction for its employees. However, it is believed that learning engagement or time spent for learning is a powerful predictor of academic achievement over allocated time (Arlin and Roth, 1978; Rosenshine and Berliner, 1978). Since informal learning is a learnerinitiated process, it is believed that learning engagement may be more in informal learning. Gettinger (1985) suggested “the relation between time allocated or time spent in learning and achievement appears to be intricately tied to the amount of time actually needed for learning” (p. 5). She further suggested that degree of learning and retention is more in case of the (time spent/time needed) ratio over (time allocated/time needed) ratio. Although the basis of her findings is school children, no literature is available to throw light on how working population (adult learners) would effectively react to the time and learning framework, especially when time in itself is very precious due to the rapid change of the business environment that emerges from data science disruptions. Based on the aforementioned arguments, we investigated the experiences and struggles of mid-level managers. Methodology Research design No disruptions, including data science disruptions, can better be understood from a single perspective, there have to be multiple perspectives, which need to be factored in. Therefore, if one were to solely focus on investigating disruptions from the organizational level, the knowledge would not be complete and holistic, as we see from the extant literature so far. Considering the paucity of relevant knowledge about individual level phenomena, we aim to capture work experiences of mid-management professionals, who have been at the receiving end of data science disruptions. As the objective of our study is to identify deeper meaning of how a mid-level manager experiences the phenomenon in terms of event and process or relationships, we look to develop the arguments based on the principle of interpretative phenomenological analysis (IPA). “IPA aims to grasp the texture and qualities of an experience as it is lived by an experiencing subject” (Eatough and Smith, 2017, p. 196). IPA is characterized as being phenomenological, a philosophical approach of studying human experience. It aims at capturing lived experiences of an individual. Phenomenologists are interested “in thinking

about what the experience of being human is like, in all of its various aspects, but especially in terms of the things which matter to us, and which constitute our lived world” (Smith et al., 2009, p. 10). The second theoretical underpinning of IPA is hermeneutic, which primarily focuses on how researchers tend to interpret and making senses of individuals vis a vis their experiences, the theory of interpretation. In other words, both researchers and participants are trying to make meaning out of each other’s senses (Smith and Osborn, 2003). The hermeneutic circle, which consists of “the part” and “the whole”, is the most significant idea used in hermeneutic theory (Smith et al., 2009). The third theoretical underpinning of IPA is idiography, which intends to analyze and interpret each case of the corpus in detail (Smith, 2011). Ideography focuses on the particular and makes the claims at group/population levels. It ascertains a general law of human behaviors (Smith et al., 2009). Herein, we intend to capture the lived experiences of mid-level managers. We look to cover their journeys in data science disruptions, and gain invaluable insights on how they responded/chose to respond to such disruptions. Our main focus is to make sense of their experiences in terms of their cognitive and affective phenomena; that is a deeper level understanding, for which the data to be analyzed, would be based on every single case. Given the fact that both the cognitive and affective experiences are extremely subjective and contextual in nature. IPA thereby enables us to “capture the quality and texture of individual experience” (Willig, 2003, p. 53). Moreover, it allows us to build arguments based on a few participants, as the emphasis is on richness or quality of data, as opposed to the quantity of the participants (Gill, 2013). IPA, a recently developed phenomenological approach by Smith (1996, 2010), has become increasingly popular in the field of Psychology, especially health psychology. IPA creates both the possibilities—(1) it does not bother about the hypothesis or captures the lived experiences of participants without imposing theoretical constraints. Moreover, (2) existing theories may be explored or extended with the lived experiences of the participants (Storey, 2007). “Practical considerations, including time, limit the number of participants that can be involved” (Murtagh et al., 2011, p. 252). Despite facing criticism from Giorgi (2010), whereby the author termed as being more of a craft rather than a technique or scientific method per se, IPA has its foothold firm in organizational research (e.g. Cope, 2011; Murtagh et al., 2011; Gill, 2013). In fact, Gill (2014) has emphasized the possibility of using phenomenological approach for organizational research. Thus considering its suitability with our study objectives, we prefer to adopt IPA over other qualitative approaches. Precisely, we follow the double hermeneutic theoretical underpinning of IPA for the study. In double hermeneutic, researchers makes sense of the participants who make senses of their life events. In other words, the meaning-making of a participant is a firstorder analysis, and the making sense of the participant’s meaning-making process by the researchers is referred to as a second-order analysis (Smith et al., 2009). The present study primarily addresses “How do the mid-level managers experience data science disruptions the Industry is currently going through? How do they make sense of the impact of disruptions in their lives?” The major objective is to extract the complete meaning of their personal experiences and derive a theory embedded therein. Thus, we find IPA with double hermeneutic theoretical underpinning is most appropriate for serving the purpose of our study. Participants and data collection A purposive sampling fits with IPA orientation over the probability sampling (Smith et al., 2009; Gill, 2013). Considering the purpose of the present study, we chose those participants who were in mid-management positions in different organizations that deal with data science related projects. We also looked into participants’ availability, willingness and capability to express their experiences with details. Thus, we found that purposive sampling was most

Data science disruptions

327

MD 60,2

328

appropriate for the study. Herein, it is important to note that IPA specifically advocates for a limited number of participants; in fact, even a single participant is appropriate (Gill, 2014), for capturing their personal experiences. We approached more than 20 people to participate in the interview process. However, only 12 individuals (N 5 12) expressed their interests and devoted a considerable amount of time to share their experiences with the interviewers on the condition of anonymity. Hence, we disguised the names of the participants and their employing organizations. The participants were between 30–38 years old, with a work experience spanning from 8– 17 years. Currently, these mid-level managers were employed with large-scale Indian organizations that extensively adopted data science tools and techniques for projects. Midlevel managers were defined by their respective organizations. A fairly homogeneous sample (mid-level managers working for large-scale organizations and dealing with data science tools and techniques) was found to be appropriate for IPA. Smith et al. (2009) recommends to consider “homogeneity” based on the practical problem (i.e. availability of the participants who encountered the same situation), and the interpretative problem (i.e. to what extent they were different in other terms and how those variabilities may contain in the analysis of the phenomenon). Out of the 12 participants, five were female (two married and three single) and seven were male (four married and three single). Demographic details of the participants along with their psycho-social contexts are presented in Table 1. Semi-structured interviews enabled the researchers to explore the lived experiences of the participants (Rubin and Rubin, 2011); these interviews were conducted over phone at a mutually convenient time. Each participant was interviewed 2–3 times for capturing his or her complete experiences which was not possible in a single time due to their time availability. We interviewed the participants till we received their responses that adequately addressed our research questions. The participants differed in their story telling styles. Some took long time to share their experiences whereas some were very to-the-point. We allowed the participants to take their preferred time to share their experiences freely and satisfactorily. Each interview took about an hour or more, with questions being related to their views about data science disruptions. For instance, how data science disruptions have disrupted their lives within the organization, individually as well as within a team; how do they respond to the changing dynamics, among others. We preferred telephonic interviews over face-to-face for facilitating the participants to share their negative feelings also, which they may have been hesitant to share in face-to-face interviews. With due permissions, all the interviews were digitally recorded and then transcribed verbatim based on IPA principles. Data analyses and findings Each interview transcript was codified based on IPA principles. Codification was done immediately after conducting a single interview so that we had an idea about the emergent themes. This method enabled us to triangulate the data and resultantly, we went back to each participant and interviewed them 2–3 times. We argue that psycho-social contexts (e.g. gender, education, marital status) of the participants impact the way he/she makes sense of his/her life events. Thus, data were further triangulated based on gender (male or female), basic education (from technology domain or not), personal life (married or not), and the system of employing organization (having due support from organization or not). Following the triangulation guidelines of Denzin (1978), all three investigators were involved in data codification from the very beginning, and any discrepancy in codifications that were possibly noticed, were resolved based on thorough discussions until a consensus was arrived at. We adopted hermeneutic circle, as suggested by Smith et al. (2009), the part vis-a-vis the whole: the single word vis-a-vis the sentence in which the word is embedded; the single extract visa-vis the complete text; the particular text vis-a-vis the complete oeuvre; the interview vis-

Participants

Highest qualification

Age

Gender

P1

36

Female

Masters in Economics

P2

38

Male

P3

37

Male

P4

35

Male

P5

33

Male

P6

32

Male

Bachelor of Commerce (B.Com) Bachelor of Commerce (B.Com) Bachelor of Technology (B.Tech) B.Tech and Masters in Data Science B.Tech

P7

30

Female

P8

34

Female

P9

36

Female

P10

32

Female

P11

35

Male

Masters in English B.Tech

P12

36

Male

B.Com

Masters in Statistics Masters in Sociology B.Tech

Total work experience (in years)

Work experience in current organization (in years)

Work experience in current team (in years)

Single, stays with mother and grand parents Married with one child

13

5

2.5

17

12

10

Married with two children

16

14

10

Single, stays alone

12

10

10

Single, stays alone

10

2

2

Single, stays alone Single, stays with mother Married

9

4

2

8

3

2

10

7

7

12

5

4

9

9

4

12

4

4

14

8

7

Family status

Married with two children Single, stays with mother Married with one child Married with one child

a-vis the research project; and the single episode vis-a-vis the complete life (Smith et al., 2009, p. 28). Smith et al. (2009) also suggest how to use exploratory commenting in terms of descriptive comments which focus on description of the content; linguistic comments which studies specific languages used by the participants; and conceptual comments that attempts to capture the underlying concepts embedded in the story. Accordingly, we first identified initial codes from the interview transcripts followed by the emergent themes using hermeneutic circle. We attempted to make sense of their lived experiences from their nonverbal languages also. For example, long pause in-between, or speaking unmindfully indicated that the participants were not in comfortable situation while experiencing the event. Similarly low/high pitch signified their level of excitements. We as researchers continuously made senses of these gestures and noted down. These efforts created the second-order analysis of double hermeneutic underpinning. Simultaneously, we made a constant comparison between data convergence and divergence to understand the holistic picture. Identifying the emergent themes and their interconnections we borrowed the concept from ideography underpinning of IPA as suggested by Smith et al. (2009). It is noteworthy to mention that Smith et al. (2009) recommend researchers to be more innovative in analyzing data and meaning-making.

Data science disruptions

329

Table 1. Demographic details of the participants

MD 60,2

330

Thus, the findings revealed about a variety of subordinate themes, which were then converged into three major themes—(1) person–job misfit and workplace strain; (2) workplace learning and (3) work–family conflict. These emergent themes, which were distinct and interconnected in nature, were supported by the extracts of the interviews. It would be noteworthy to mention here that, following the guidelines of Smith et al. (2009), we first accounted for both convergent and divergent narratives. The aforementioned three themes emerged based on the maximum convergent narratives, that is the convergence occurred only when more than 50% of the participants’ narratives indicated proximal outcome. Please refer to Table 2 for the codification process, which includes initial codes and emergent/subordinate themes. Table 3 includes the convergence-divergence comparisons of participants’ narratives. The detailed connections between the themes has been inferred in the following section. Person-job misfit and workplace strain: D-A misfit, S-A misfit and strain The participants explained how data science disruptions created a chaos in terms of personjob (mis)fit for a particular time-period, which naturally varied across the participants. For some, the period of chaos was long and for some it was short. By qualification, I was not trained in data analysis. I learnt it gradually. Initial days of working in my current team were mostly based on data analysis using Excel and afterwards SAS. By the time I acquired expertise in SAS, data science reached a revolution in the industry. Machine learning, deep learning, big data analytics started dominating. My the-then team received a project where machine learning algorithms were needed. All of a sudden, I found myself helpless, irrelevant for the team. I could not do what I was supposed to do due to lack of knowledge relevance (. . .P2). I joined the current team two years back. I was taken for a project which needed knowledge of Neural Networks, LSTM. I never worked on this before. But I could learn it very easily, probably, due to my background of data science. I felt very excited to get the opportunity to acquire new skills. It did not take much time for me to started working on the actual project. But yes, I saw my other team members, who were not from technical background per se, took a longer time period to learn it. However, during the transition period it was little stressful for me also because I lost my control on the job due to lack of desired knowledge for the time being. Yes, it was frustrating then (. . .P5). One day I was told by my manager whether I could do a task on Python and algorithms because it would be required for our next project. I said, I need some time to learn it. But there was no time as such so he was searching for new people with new skill sets. I could not understand what I would do then. I felt excluded. It was highly frustrating for me and I was afraid about my job also. I was outdated and completely direction less (. . .P1). Data science revolution put my career at stake. The way the technology changes so rapidly, I am scared about my career. Young people who are just from the college have enough opportunities to learn new technology. But at our age after finishing all organizational and family demands finding time for learning seems difficult. I am clueless about what to do. I want to learn but so many tools and techniques. . .very challenging. I am little confused also that being a mid-level manager why my contributions will not be giving managing the project from business and leadership perspectives. . .why I need to be technical expert also? But organization is currently demanding technical expertise from me (. . .P3). Yes, I felt discrimination. Young people, who have joined recently, are given due importance by the management. Even sometime these people also sound arrogant as if we are nobody in the team just because we do not have desired knowledge. I am currently working in a team which comprises of around 70% young people and I think rest 30% people also perceive in my way. Even an outsider can also feel the division in the team. Our team leader does not bother about this. He means only tasks to be completed on time. . .but how is not his look out. Its giving me the signal that we may throw out of the organization in coming future as it is happening across Indian service sector organizations (. . .P2).

P3

P2

P1

One day I was told by my manager whether I could do a task on Python and algorithms because it would be required for our next project. I said, I need some time to learn it. But there was no time as such so he was searching for new people with new skill sets. I could not understand what I would do then. I felt excluded. It was highly frustrating for me and I was afraid about my job also. I was outdated and completely direction less Yes, I felt discrimination. Young people, who have joined recently, are given due importance by the management. Even sometime these people also sound arrogant as if we are nobody in the team just because we do not have desired knowledge. I am currently working in a team which comprises of around 70% young people and I think rest 30% people also perceive in my way. Even an outsider can also feel the division in the team. Our team leader does not bother about this. He means only tasks to be completed on time. . .but how is not his look out. Its giving me the signal that we may throw out of the organization in coming future as it is happening across Indian service sector organizations.” Data science revolution put my career at stake. The way the technology changes so rapidly, I am scared about my career. Young people who are just from the college have enough opportunities to learn new technology. But at our age after finishing all organizational and family demands finding time for learning seems difficult. I am clueless about what to do. I want to learn but so many tools and techniques. . .very challenging. I am little confused also that being a mid-level manager why my contributions will not be giving managing the project from business and leadership perspectives. . .why I need to be technical expert also? But organization is currently demanding technical expertise from me

Participants Dialogues of the participants

Emergent themes

Demand-ability misfit; workplace learning; supply-value misfit; psychological strain; work-family conflict

Data science revolution; career volatility; insecurity in career; young people have more opportunity to learn new technology; less time available for learning due to fulfilling both organizational and family demands; P3 felt challenging so many tools and techniques, confused; job demand is technical expertise than the expertise (business and leadership) P3 had

(continued )

Supply-value misfit; psychological strain

P2 felt discriminated; Young people received more importance from the organization; team conflict; unsupportive leader; job insecurity

New skills required for upcoming project; manager offered Demand-ability misfit; supply-value new project first to P1; P1 needed time to learn the new misfit; psychological strain skills; Urgent requirement of the skill sets; manager searched for new people; P1 felt excluded, frustrated, jobinsecurity, outdated, and direction less

Initial codes

Data science disruptions

331

Table 2. Codification process (A few examples)

P8

P7

P5

Workplace learning; positive attitude; work-family balance

Acquiring new skills is challenging; challenges enjoyed; having achievement attitude; confident; managed well professional and personal lives

(continued )

Workplace learning; work-family A few classroom sessions for learning new skills; conflict organization expected self-learning; support from manager for collaborative learning; less time available from the resource persons; self-learning using family time; difficulty in managing family obligations

Demand-ability misfit; workplace learning; supply-value misfit; psychological strain; demand-ability fit

P5 was hired for a project needed advanced data science techniques; no prior experience in such techniques; knowledge irrelevance in the initial days; felt frustrating and stressed; lost control on job; felt excited to acquire new skills; acquired new skills easily due to formal education on data science; less time needed for learning

Difficulty in managing conflicting demands; gained back Work-family conflict; supply-value fit confidence, respect, recognition; poor work-life balance

Initial days were really tough to manage conflicting demands of my surroundings. But after a while, I was back to my career. I got back my confidence, respect, and recognition. . .Work-life balance in current organizational environment is myth. Because, life is now another name of task, target, deadline, party etc. I joined the current team two years back. I was taken for a project which needed knowledge of Neural Networks, LSTM. I never worked on this before. But I could learn it very easily, probably, due to my background of data science. I felt very excited to get the opportunity to acquire new skills. It did not take much time for me to started working on the actual project. But yes, I saw my other team members, who were not from technical background per se, took a longer time period to learn it. However, during the transition period it was little stressful for me also because I lost my control on the job due to lack of desired knowledge for the time being. Yes, it was frustrating then Learning new skills is always challenging, specially at this age. But I enjoyed this challenge. Because I believe in achievement, I had to challenge myself. I had confidence that I could do it. Yes, problems were there in managing both professional and personal life. But I managed it well Initially two classroom training sessions were conducted to learn new technology, however, that was not at all sufficient. My manager expected me to learn by myself. Yes, he helped me to collaborate with those team members who had the expertise. They were helpful, but they also were busy in their projects. I got very less time from them. I focused on learning by myself, but mostly from my family time. Being a mother of a small kid, it was really really really difficult for me

Emergent themes

P4

Table 2. Initial codes

332

Participants Dialogues of the participants

MD 60,2

P12

P10

It was difficult for me to accept that your juniors were the centre of attraction and even being a senior person, you were not discussed about the project related issues many times. Feeling important motivates me and suddenly I felt that neither the team leader nor my juniors in the team took me seriously. I felt isolated. Utterly disrespectful. . . I look for challenges from the job. But challenges are enjoyable when you know how to resolve it and get control over it. During the transition period of data science revolution, the biggest challenge before me was prior to getting control over one challenge multiple challenges emerged. So, fighting with the challenges became regular thing and feeling of losing became nightmare

Participants Dialogues of the participants

Emergent themes

Searching for challenging jobs; challenges are enjoyable Demand-ability misfit; supply-value when having control; emergence of multiple challenges misfit; psychological strain simultaneously; continuous fight with challenges; feeling of failure

Juniors were given more importance than P10; excluded Supply-value misfit; psychological often from project discussions; losing motivation; feeling strain isolated and disrespectful

Initial codes

Data science disruptions

333

Table 2.

MD 60,2

334

With the advancement of data science, mid-level managers were significantly impacted. Interestingly, they were the ones who once gave their best to the organization, but could not fit themselves in the changing environment. Their abilities within a time-frame did not match with the “new job demands” leading to a D-A misfit. The participants, who earned their expertise and were working on SAS or other BI tools and techniques like Excel, including different statistical data analysis (e.g. regression analysis, clustering analysis, etc.). However, they almost became direction-less when their projects demanded skill sets in Python and Hive for slicing and dicing the data. They were the same ones who basked in recognition and respect from their colleagues for their valuable contributions for effectively and efficiently completing projects. However, today, they felt left out, irrelevant for the team due to their knowledge irrelevance. They felt that they were losing control on both their jobs and the environment. Job insecurity, career volatility became a new-normal for them. Notably, many of them even felt interpersonal conflicts in the team like ostracism, feeling unimportant, discrimination etc. A few of them also reported that due to the appropriate support of their seniors, they were able to resolve these internal conflicts easily. Importantly, from the interviews it was visible that their educational background made a deep impact on them. For example, the participants who were from technical backgrounds, did not feel the same way as the participants from non-technical backgrounds. In fact, they considered it an opportunity to upgrade themselves for higher level of challenges; for these set of people, it was not as stressful as it was for other participants. This prompted us (the researchers) to ask them about what they looked for from the job. To this, they replied that recognition, respect, control over job and environment, job satisfaction, were all that mattered to them. I look for challenges from the job. But challenges are enjoyable when you know how to resolve it and get control over it. During the transition period of data science revolution, the biggest challenge before me was prior to getting control over one challenge multiple challenges emerged. So, fighting with the challenges became regular thing and feeling of losing became nightmare (. . .P12). It was difficult for me to accept that your juniors were the centre of attraction and even being a senior person, you were not discussed about the project related issues many times. Feeling important motivates me and suddenly I felt that neither the team leader nor my juniors in the team took me seriously. I felt isolated. Utterly disrespectful. . . (. . .P10). Yes, knowledge irrelevance was there. Initially my team members, who newly joined my team directly from the college with the desired knowledge, ignored me. But my team leader stopped it proactively. He used to invite me as well as a few other team members, who lacked desired knowledge, in the meeting. I was feeling tensed to attend the meeting by thinking that I may not be able to answer if I were asked something. I was given the rudimentary jobs in the project. It was difficult to adjust. I was thinking to quit (. . .P8).

Furthermore, we observed that apart from two participants (P5 and P7), others experienced knowledge irrelevance, job insecurity, interpersonal conflicts and career volatility due to D-A misfit. P5 and P7, during this transition period, felt a little turmoil for a very short span of time. They experienced the situation as a growth and learning opportunity for them. This happened because both of them had desired qualifications matching with data science requirements. They took much less time to realign themselves with the changing job

Themes Table 3. Person-job misfit and workplace strain Details of data Workplace learning convergence and divergence comparison Work-family conflict

Convergent narratives (in %)

Divergent narratives (in %)

76% 69% 62%

24% 31% 38%

demands. However, for other managers, the volatility emerged out of D-A misfit, viz., knowledge irrelevance and job insecurity. They underwent multiple interpersonal conflict in terms of loss of recognition, respect, and ostracism; they appeared to be a complete S-V misfit, which led to workplace strain. Workplace learning and work-family conflict Irrespective of workplace strain, one thing that was common among all the participants was their keen focus on acquiring new knowledge and skills with the objective of reducing D-A misfit. Specifically for P5 and P7, the trigger for them was both growth and opportunity, while for others, it was a question of survival. I clearly understood that it would be difficult for me to survive in the competition, unless I focused on learning python and other new techniques. But when to learn? I was not given due time for learning. I had to focus on my current project, it’s deadline also. Therefore, I had to struggle to find ample time for learning. Since I was not from technical background, I had to give more time. But my organization did not give me dedicated time for brushing up my knowledge. They expected me to learn by myself and beyond office hours and frankly speaking it was difficult for me due to my family obligations. (. . .P2) Initially two classroom training sessions were conducted to learn new technology, however, that was not at all sufficient. My manager expected me to learn by myself. Yes, he helped me to collaborate with those team members who had the expertise. They were helpful, but they also were busy in their projects. I got very less time from them. I focused on learning by myself, but mostly from my family time. Being a mother of a small kid, it was really really really difficult for me. (. . .P8) Yes, adequate time for learning was not available during office hours. But I did not face much difficulties for this. Reason behind, I had little domain knowledge and also I normally learn quickly. . .Yes, I managed learning time from my family time. But it was for a very short period of time. (. . .P5) How come organization expected us to devote our personal time for job related learning? But I did not have any option. One of my colleagues, who was expert in machine learning and deep learning, helped me a lot to acquire this new knowledge but after my office hours. Thereafter, I had to focus on self-study. It went for more than a year. This led to family tension as I could not complete my due responsibilities to my children as well as other family members. This created further disaster in life. I gave my best time to my organization. They should also think about me. (. . .P9) Learning new skills is always challenging, specially at this age. But I enjoyed this challenge. Because I believe in achievement, I had to challenge myself. I had confidence that I could do it. Yes, problems were there in managing both professional and personal life. But I managed it well. (. . .P7) Initial days were really tough to manage conflicting demands of my surroundings. But after a while, I was back to my career. I got back my confidence, respect, and recognition. . .Work-life balance in current organizational environment is myth. Because, life is now another name of task, target, deadline, party, etc. (. . .P4)

Interestingly and uniquely, both the groups (i.e. the ones who were totally misfit & the ones who adopted themselves easier) adopted the same strategy to deal with the transition. Herein, learning new skills was found to be the most preferred strategy for all the participants to meet the changing job demands. We also found that organizations typically expected their managers to acquire new skills on their own; there were very organizations, which looked to create supportive environment for facilitating such learning. Therefore, it was obvious that the nature of learning was more need-based, self-initiated and collaborative. We also observed that learning orientation of the participants, i.e. motivation to learn, and the congruence between time needed and spent influenced their learning phenomenon. For example, participants (P4, P5, P7), who were essentially motivated to learn, spent adequate

Data science disruptions

335

MD 60,2

336

time that was needed for their learning; thus, they could manage the transition well, which in turn led them to achieve demands–abilities fit. Therefore, they were back to the career path with a developed and enhanced expertise in data science tools and techniques; they got back their “hold” on the jobs, which in turn enabled them to realize supplies–values fit and subsequently to growth alignment. However, participants (P2, P8, P9) found it tough because they found it difficult to selfmotivate themselves for learning. True, they were involved in the learning process, as they perceived that to be an “existential threat”, whereby learning was the only option left to them. Moreover, due to their non-technical background, they needed more time to learn. As a result, they had to sacrifice their family time, leading to work–family conflicts. The transition period for these participants was prolonged, which made them bear the brunt of work–family conflict, which in turn led to their incapability to fulfil the supplies-values gap. Therefore, they continued in the cycle of S-A misfit, instead of following the path of D-A fit, and subsequently to psychological strain. Please refer to Figure 1 for the emerged process model of person-job (mis)fit. In the subsequent analyses, we observed that demographic attributes of the participants had an influence on their experiences or perceptions about data science disruptions, as outlined above. For example, the formal education (Certificate/Degree course) of the participants in data science and relevant domains helped them to adapt with the changed environment quickly. They could acquire new skills faster than the participants with no formal education. Consequently, they took less time from their personal lives for learning, leading to lesser work–life conflict. Findings also exhibited that the family status (single/ married) impacted their experiences. Married participants felt more work–life conflict than single participants. Moreover, female married participants faced more work–life conflict than their male counterparts. This happened due to the greater child-raising responsibilities of the

Individual Atude

Data Science Disrupons

D-A (mis)fit

S-V (mis)fit

Lack of Organizaonal Support

Figure 1. Process model of person-job (mis)fit in the context of data science disruptions

Wok-Family Conflict

Psychological Strain/Growth Alignment Opportunity

Personal Learning Orientaon (movaon, me spent & needed for learning)

Informal Workplace Learning

female than male in Indian families. We did not find any implication of the participants’ work experiences on their perceptions about data science disruptions. Theory building and discussion The major contribution of this paper is to offer a process model of person–job (mis)fit in the context of data science disruptions, using interpretative phenomenological analysis (Figure 1). Advancement of data science has not only disrupted the industry, but has also disrupted the employment conditions for mid-level managers. We have tried to capture their experiences through in-depth interviews, and subsequently unfolded a process model embedded in their valuable experiences following the guidelines of Smith et al. (2009). D-A misfit in organizations occur under several contexts, with technological changes being the most prominent one. Data science disruptions are characterized as being highly rapid and complex due to its vastness in terms of tools and techniques. It has lesser gestation time, leading to less time available for acquiring an expertise. It is immensely volatile in terms of the nature of job and individual career that fuels D-A misfit to the burning level, especially for mid-level managers. Extant literature on cognitive processing suggests that mental speed along with several other aspects of memory tend to weaken as a function of age (van Dijk et al., 2008), justifying thereby the difficulties that mid-level managers face in matching the degree and speed of learning required in the context of disruptions. Horn and Hoffer (1992) argued on similar lines, whereby they stated that out of nine broad categories of the cognitive process, viz., knowledge derived from acculturation, fluency in retrieving knowledge, visualizing capabilities, auditory capabilities, quantitative capabilities, reasoning capabilities, processes of maintaining immediate awareness, processes in speed of apprehension and processes for quick decision-making, the last four start to decline from about 20 years of age. Therefore, it seems little difficult for mid-level managers to accelerate their learning as compared to younger people. As a result, D-A misfit creates an unannounced line of demarcation between young experts and less proficient mid-level managers in data science domain due to organizations’ over-dependence on young talent for data science expertise. Generally, professionals mostly desire command over their job and working environment; they demand respect, recognition, empowerment, learning opportunity and a quick career growth. D-A misfit, which emerges due to data science disruptions, inhibits the supplies of these values to mid-level managers, which translates to psychological strain. Literature also supports that D-A misfit leads to strain through S-V misfit (e.g. French et al., 1982; Harrison, 1978, 1985). Our findings further reveal that D-A misfit may also lead to S-V fit, and subsequently to growth alignment opportunity for individuals having positive attitudes. These mid-level managers perceive D-A misfit as learning and career growth opportunities; they consider these challenges as achievement, and are by nature problem-solving oriented. Briefly therefore, D-A misfit becomes a challenge for them to showcase their potential. We offer two pathways for person-job (mis)fit—(1) D-A misfit to psychological strain via S-V misfit, and (2) D-A misfit to growth alignment opportunity through S-V fit. Both the pathways are directed towards workplace learning, majorly informal workplace learning for reducing D-A misfit. With the advancement of data science, organizations are witnessing rapid changes within the context of their business. Consequently, despite massive investment in formal workplace learning and formal knowledge sharing (Mittal, 2019), organizations need to depend more on informal learning and informal knowledge sharing (Singh et al., 2019) to match the speed and degree of the change. Earlier studies have also argued that most of the job-required knowledge and skills that employees learn, are through informal ways, regardless of the relevance of formal learning (Burns et al., 2005). However, personal learning orientations, which may be defined as ability, personality, and interest of an individual to learn, influence informal learning process of the employees (Choi and Jacobs, 2011). Motivation to learn, self-

Data science disruptions

337

MD 60,2

338

efficacy, learning orientations and other personal constructs have received maximum attention from scholars in the past with respect to informal learning; unfortunately, they overlooked the role of temporal dimensions in learning. For example, Gettinger (1985) suggested “the relation between time allocated or time spent in learning and achievement appears to be intricately tied to the amount of time actually needed for learning” (p. 5). She further said that the degree of learning and retention is greater in case of the (time spent/time needed) ratio over (time allocated/time needed) ratio. Our findings suggest that individual motivation to learn along with the match between time spent and time needed for learning actually facilitates the informal learning process. Nevertheless, one needs to note that the temporal dimensions of learning could vary across individuals. From our findings we also observed that mid-level managers, who spent adequate time to acquire new knowledge, achieved D-A fit easily, as opposed to the ones who did not spent as much time as was possibly needed. Consequentially, the latter group of managers suffered from work–family conflicts, with the reason that informal learning compels them to devote their personal time for learning and knowledge-based helping (Mittal et al., 2020) in the absence of appropriate organizational support. Mid-level managers are often busy dealing with the projects in hand, and that leaves them little time for learning. Consequently, they use their personal time to fulfil the job’s demands, resulting in work–family conflicts, which, in turn, leads to S-V misfit; the cycle continues to end up with psychological strain. From this observation, one may assume that this vicious cycle of D-A misfit to S-V misfit to psychological strain is primarily due to the absence of organizational support. However, as a counter argument, one could also state that for the former group of managers, the path of informal learning would enable them to achieve a D-A fit leading to S-V fit to growth alignment, and create thereby a benevolent cycle for them. Thus, accordingly any further disruptions in data science domain would actually add value negatively with the latter group, while positively fulfilling the former. Therefore, data science disruptions foster two cyclic path models for two different groups of mid-level managers. The first group, which considers D-A misfit as a growth alignment opportunity by means of a S-V fit, gets extremely engaged in informal learning process. Their personal learning orientations, for example, high motivation for learning and spending adequate time that needed for them to acquire new knowledge and skills, facilitate their informal learning process leading to D-A fit. The second group, on the other hand, perceives D-A misfit as a source of psychological strain via S-V misfit. Situation forces them towards informal learning process. But comparatively less motivation for acquiring new knowledge and skills along with the context like inadequate time spent than needed do not enable them to achieve D-A fit through informal learning process. Rather, their informal learning path directs them towards work–family conflicts and subsequently to S-V misfit. Implications for future research and practices The present study postulates a person-job (mis)fit path model for mid-level managers in the context of data science disruptions, characterized as being rapidly dynamic, with very little time to acquire new knowledge and skills. The major contributions of this study is multifold—(1) It captures the individual level phenomena of data science disruptions as divergent from existing literature, which so far focused on organizational level phenomena during such disruptions. Current literature of data science disruptions has been silent about how individual employees, possibly mid-level managers, experience the aforesaid disruptions when their organizations desire them to acquire new knowledge and skills in the data science domain with time paucity. (2) It adopts IPA, a very unique method through in-depth interviews, to capture the detailed individual experiences. Not too many studies in the past in the field of management science have followed the IPA method to explore a new context

despite its appealing relevance. Following the guideline of Smith et al. (2009), our study conducts an in-depth analysis from the lenses of IPA’s theoretical underpinnings. We have adopted double hermeneutic theoretical underpinning to explain the phenomenon using hermeneutic circle and finally, draw the inference of emergent themes and their interconnections using IPA’s ideographic commitment. (3) Our study offers two emergent person–job (mis)fit process models in the aforementioned context—for a group of mid-level managers, who perceive D-A misfit as a growth-alignment opportunity, and for another group of mid-level managers who experience psychological strain due to D-A misfit on the way through S-V misfit. We posit both the models cyclic in nature, wherein the former is attributed to a benevolent cycle and the latter to a vicious cycle. This study also contributes significantly in the field of human resource management. It reveals that not only the individual differences (attitudes and personal learning orientations), but also the organizational learning eco-system may enable mid-level managers to gain back their confidence and control over job and work environment in the context of data science disruptions. An integrated workplace learning (both formal and informal) approach with an appropriate ecosystem, that is formal learning investment, supervisory support, collaborative learning, should be adopted to help mid-level managers to adapt with these disruptions. If not, many of them will be redundant for the organization, leading to a job– market inequilibrium. Other contributions include, for example, in the context of disruptions, job recrafting, which plays a crucial role in making employees more effective. Continuous crafting of job with respect to situational demands may reduce person–job misfit, both demand–ability misfit and supply–value misfit. Petrou et al. (2016) also advocated that job crafting behavior of employees effectively enhance their capacity to adapt with and implement organizational changes. Jobs must be crafted according to the capabilities of the job holders, and not by the industry demand; otherwise, the job-holders need to be made employable through continuous learning and development process. This contextual job crafting would enhance the person-environment fit culture for organizations, which in turn may lead employees to enjoy work–family balance, and/ or integration. The study also has a few limitations, which include “generalizability” due to a smallsample qualitative design. However, this type of research design opens a plethora of future directions for other scholars. For example, one could test the proposed theoretical model quantitatively with large scale sample. One could look to understand and extend the model, especially for mid-level managers, who are trapped into the vicious cycle. With respect to other consequences, one could look to understand and extend the model with respect to career path for those mid-level managers, who enjoy benevolent cycle, and could possibly be chosen as future research agenda.

References Analytics India Magazine and Praxis Management School (2019), Analytics and Data Science Industry in India: Study 2019, Analytics India magazine, New Delhi. Arlin, M. and Roth, G. (1978), “Pupils’ use of time while reading comics and books”, American Educational Research Journal, Vol. 15, pp. 201-216. Baesens, B., Bapna, R., Marsden, J.R., Vanthienen, J. and Zhao, J.L. (2016), “Transformational issues of big data and analytics in networked business”, MIS Quarterly, Vol. 40 No. 4, pp. 522-544. Billett, S. (2001), “Knowing in practice: Re-conceptualising vocational expertise”, Learning and Instruction, Vol. 11 No. 6, pp. 431-452. Burns, J.Z., Schaefer, K. and Hayden, J.M. (2005), “New trade and industrial teachers’ perceptions of formal learning versus informal learning and teaching proficiency”, Journal of Industrial Teacher Education, Vol. 42 No. 3, pp. 66-87.

Data science disruptions

339

MD 60,2

340

Cable, D.M. and DeRue, D.S. (2002), “The convergent and discriminant validity of subjective fit perceptions”, Journal of Applied Psychology, Vol. 87 No. 5, p. 875. Cerasoli, C.P., Alliger, G.M., Donsbach, J.S., Mathieu, J.E., Tannenbaum, S.I. and Orvis, K.A. (2018), “Antecedents and outcomes of informal learning behaviors: a meta-analysis”, Journal of Business and Psychology, Vol. 33 No. 2, pp. 203-230. Choi, W. and Jacobs, R.L. (2011), “Influences of formal learning, personal learning orientation, and supportive learning environment on informal learning”, Human Resource Development Quarterly, Vol. 22 No. 3, pp. 239-257. Clardy, A. (2018), “70-20-10 and the dominance of McGraw-Hill, New York, NY informal learning: a fact in search of evidence”, Human Resource Development Review, Vol. 17 No. 2, pp. 153-178. Cope, J. (2011), “Entrepreneurial learning from failure: an interpretative phenomenological analysis”, Journal of Business Venturing, Vol. 26 No. 6, pp. 604-623. Denzin, N.K. (1978), “Triangulation: a case for methodological evaluation and combination”, in Denzin, N.K. (Ed.), Sociological Methods: A Sourcebook, McGraw-Hill, New York, NY pp. 339-357. Devloo, T., Anseel, F. and De Beuckelaer, A. (2011), “Do managers use feedback seeking as a strategy to regulate demands–abilities misfit? The moderating role of implicit person theory”, Journal of Business and Psychology, Vol. 26 No. 4, pp. 453-465. Dubey, R., Luo, Z., Gunasekaran, A., Akter, S., Hazen, B.T. and Douglas, M.A. (2018), “Big data and predictive analytics in humanitarian supply chains”, The International Journal of Logistics Management, Vol. 29 No. 2, pp. 485-512. Dweck, C.S. and Molden, D.C. (2005), “Self-theories: their impact on competence motivation and acquisition”, in Elliot, A. and Dweck, C.S. (Eds), Handbook of Competence and Motivation, Guilford Press, New York, NY, pp. 122-140. Eatough, V. and Smith, J.A. (2017), “Interpretative phenomenological analysis”, in Willig, C. and Stainton-Rogers, W. (Eds), Handbook of Qualitative Psychology, 2nd ed., Sage, London, pp. 193-211, 9781473925212. Edwards, J.R. (1992), “A cybernetic theory of stress, coping, and well-being in organizations”, Academy of Management Review, Vol. 17 No. 2, pp. 238-274. Edwards, J.R. (1996), “An examination of competing versions of the person-environment fit approach to stress”, Academy of Management Journal, Vol. 39 No. 2, pp. 292-339. Edwards, J.R. and Cooper, C.L. (1990), “The person-environment fit approach to stress: recurring problems and some suggested solutions”, Journal of Organizational Behavior, Vol. 11 No. 4, pp. 293-307. Eraut, M. (2000), “Non-formal learning and tacit knowledge in professional work”, British Journal of Educational Psychology, Vol. 70, pp. 113-136. Fredrick, W.C. and Walberg, H.J. (1980), “Learning as a function of time”, The Journal of Educational Research, Vol. 73 No. 4, pp. 183-194. French, J.R.P. Jr, Caplan, R.D. and Harrison, R.V. (1982), The Mechanisms of Job Stress and Strain, Wiley, Chichester. Gettinger, M. (1985), “Time allocated and time spent relative to time needed for learning as determinants of achievement”, Journal of Educational Psychology, Vol. 77 No. 1, p. 3. Ghasemaghaei, M., Hassanein, K. and Turel, O. (2017), “Increasing firm agility through the use of data analytics: the role of fit”, Decision Support Systems, Vol. 101, pp. 95-105. Gill, M.J. (2013), “Book review: constructing identity in and around organizations (perspectives on process organization studies)”, Organization, Vol. 20 No. 6, pp. 949-951, doi: 10.1177/ 1350508413504692. Gill, M.J. (2014), “The possibilities of phenomenology for organizational research”, Organizational Research Methods, Vol. 17 No. 2, pp. 118-137.

Giorgi, A. (2010), “Phenomenological psychology: a brief history and its challenges”, Journal of Phenomenological Psychology, Vol. 41 No. 2, pp. 145-179. Groysberg, B., Johnson, W. and Lin, E. (2019), “What to do when industry disruption threatens your career”, MIT Sloan Management Review, Vol. 60 No. 3, pp. 57-62, 64-65.

Data science disruptions

Harrison, R.V. (1978), “Person-environment fit and job stress”, in Cooper, C.L. and Payne, R. (Eds), Stress at Work, Wiley, New York, NY, pp. 175-205. Harrison, R.V. (1985), “The person-environment fit model and the study of job stress”, in Beehr, T.A. and Bhagat, R.S. (Eds), Human Stress and Cognition in Organizations: An Integrated Perspective, Wiley, New York, pp. 23-55. Horn, J.L. and Hofer, S.M. (1992), “Major abilities and development in the adult period”, in Sternberg, R.J. and Berg, C.A. (Eds), Intellectual Development, Cambridge University Press, Cambridge. Jacobs, R.L. and Park, Y. (2009), “A proposed conceptual framework of workplace learning: implications for theory development and research in human resource development”, Human Resource Development Review, Vol. 8, pp. 133-150. Karasek, R. and Theorell, R. (1990), Healthy Work: Stress, Productivity, and the Reconstruction of Working Life, Basic Books, New York, NY. Kristof-Brown, A.L., Zimmerman, R.D. and Johnson, E.C. (2005), “Consequences of individuals’ fit at work: a meta-analysis of person–job, person–organization, person–group, and person– supervisor fit”, Personnel Psychology, Vol. 58 No. 2, pp. 281-342. Kwakman, K. (2001), “Work stress and work-based learning in secondary education: testing the Karasek model”, Human Resource Development International, Vol. 4 No. 4, pp. 487-501. LePine, J.A., LePine, M.A. and Jackson, C.L. (2004), “Challenge and hindrance stress: relationships with exhaustion, motivation to learn, and learning performance”, Journal of Applied Psychology, Vol. 89 No. 5, pp. 883-891. Livingstone, L.P., Nelson, D.L. and Barr, S.H. (1997), “Person-environment fit and creativity: an examination of supply-value and demand-ability versions of fit”, Journal of Management, Vol. 23 No. 2, pp. 119-146. Manuti, A., Pastore, S., Scardigno, A.F., Giancaspro, M.L. and Morciano, D. (2015), “Formal and informal learning in the workplace: a research review”, International Journal of Training and Development, Vol. 19 No. 1, pp. 1-17. McKinsey & Co (2018), Disruptive Forces in the Industrial Sectors: Global Executive Survey, available at: https://www.mckinsey.com/∼/media/mckinsey/industries/automotive%20and%20assembly/ our%20insights/how%20industrial%20companies%20can%20respond%20to%20disruptive% 20forces/disruptive-forces-in-the-industrial-sectors.ashx (accessed 20 September 2020). Marsick, V.J. and Watkins, K.E. (2001), “Informal and incidental learning”, New Directions for Adult and Continuing Education, Vol. 2001 No. 89, pp. 25-34. McKinsey Global Institute (2011), Big Data: The Next Frontier for Innovation, Competition, and Productivity. Report, McKinsey Global Institute, New York, NY. McKinsey Global Institute (2016), The Age of Analytics: Competing in a Data Driven World, McKinsey, Brussels. Mittal, S. (2019), “How organizations implement new practices in dynamic context: role of deliberate learning and dynamic capabilities development in health care units”, Journal of Knowledge Management, Vol. 23 No. 6, pp. 1176-1195. Mittal, S., Sengupta, A., Agrawal, N.M. and Gupta, S. (2020), “How prosocial is proactive: developing and validating a scale and process-model of knowledge-based proactive helping”, Journal of Management and Organization, Vol. 26 No. 4, pp. 625-650. Murtagh, N., Lopes, P.N. and Lyons, E. (2011), “Decision making in voluntary career change: an otherthan-rational perspective”, The Career Development Quarterly, Vol. 59 No. 3, pp. 249-263.

341

MD 60,2

Noe, R.A., Tews, M.J. and Marand, A.D. (2013), “Individual differences and informal learning in the workplace”, Journal of Vocational Behavior, Vol. 83 No. 3, pp. 327-335. Parker, S.K., Williams, H.M. and Turner, N. (2006), “Modeling the antecedents of proactive behavior at work”, Journal of Applied Psychology, Vol. 91 No. 3, p. 636. Petrou, P., Demerouti, E. and Schaufeli, W.B. (2016), “Crafting the change”, Journal of Management, Vol. 30, pp. 503-518.

342

Raguseo, E. (2018), “Big data technologies: an empirical investigation on their adoption, benefits and risks for companies”, International Journal of Information Management, Vol. 38 No. 1, pp. 187-195. Rosenshine, B.V. and Berliner, D.C. (1978), “Academic engaged time”, British Journal of Teacher Education, Vol. 4 No. 1, pp. 3-16. Rubin, H.J. and Rubin, I.S. (2011), Qualitative Interviewing: The Art of Hearing Data, Sage, London. Rusk, N. and Rothbaum, F. (2010), “From stress to learning: attachment theory meets goal orientation theory”, Review of General Psychology, Vol. 14 No. 1, pp. 31-43. Shao, B.B., Shi, Z.M., Choi, T.Y. and Chae, S. (2018), “A data-analytics approach to identifying hidden critical suppliers in supply networks: development of nexus supplier index”, Decision Support Systems, Vol. 114, pp. 37-48. Singh, S., Mittal, S., Sengupta, A. and Pradhan, R.K. (2019), “A dual-pathway model of knowledge exchange: linking human and psychosocial capital with prosocial knowledge effectiveness”, Journal of Knowledge Management, Vol. 23 No. 5, pp. 889-914. Smith, J.A. (1996), “Beyond the divide between cognition and discourse: using interpretative phenomenological analysis in health psychology”, Psychology and Health, Vol. 11 No. 2, pp. 261-271. Smith, J.A. (2010), “Interpretative phenomenological analysis: a reply to Amedeo Giorgi”, Existential Analysis, Vol. 21 No. 2, pp. 186-193. Smith, J.A. (2011), “Evaluating the contribution of interpretative phenomenological analysis”, Health Psychology Review, Vol. 5 No. 1, pp. 9-27. Smith, J.A. and Osborn, M. (2003), “Interpretative phenomenological analysis”, in Smith, J.A. (Ed.), Qualitative Psychology: A Practical Guide to Research Methods, Sage, London, pp. 51-80. Smith, J.A., Flowers, P. and Lerkin, M. (2009), Interpretative Phenomenological Analysis: Theory, Method and Research, Sage Publications, London. Storey, L. (2007), “Doing interpretative phenomenological analysis”, in Lyons, E. and Coyle, A. (Eds), Analysing Qualitative Data in Psychology, Sage Publications, Los Angeles, pp. 51-64. van Dijk, E.J., Prins, N.D., Vrooman, H.A., Hofman, A., Koudstaal, P.J. and Breteler, M.M. (2008), “Progression of cerebral small vessel disease in relation to risk factors and cognitive consequences: Rotterdam scan study”, Stroke, Vol. 39 No. 10, pp. 2712-2719. Wang, G., Gunasekaran, A., Ngai, E.W. and Papadopoulos, T. (2016), “Big data analytics in logistics and supply chain management: certain investigations for research and applications”, International Journal of Production Economics, Vol. 176, pp. 98-110. Willig, C. (2003), “Discourse analysis”, in Smith, J.A. (Ed.), Qualitative Psychology. A Practical Guide to Research Methods, Sage, London. Xu, Z., Frankwick, G.L. and Ramirez, E. (2016), “Effects of big data analytics and traditional marketing analytics on new product success: a knowledge fusion perspective”, Journal of Business Research, Vol. 69 No. 5, pp. 1562-1566.

Further reading Gill, M.J. (2015), “A phenomenology of feeling: examining the experience of emotion in organizations”, New Ways of Studying Emotions in Organizations (Research on Emotion in Organizations), Vol. 11, Emerald Group Publishing, pp. 29-50, doi: 10.1108/S1746-979120150000011003.

Huselid, M.A. (2018), “The science and practice of workforce analytics: introduction to the HRM special issue”, Human Resource Management, Vol. 57 No. 3, pp. 679-684. Huselid, M. and Minbaeva, D. (2018), “Big data and human resource management”, in Wilkinson, A., Bacon, N., Snell, S. and Lepak, D. (Eds), The Sage Handbook of Human Resource Management, 2nd ed., Sage Publication, Los Angeles, pp. 494-507. Omar, Y.M., Minoufekr, M. and Plapper, P. (2019), “Business analytics in manufacturing: current trends, challenges and pathway to market leadership”, Operations Research Perspectives, Vol. 6, pp. 100-127, doi: 10.1016/j.orp.2019.100127. Ra, S., Shrestha, U., Khatiwada, S., Yoon, S.W. and Kwon, K. (2019), “The rise of technology and impact on skills”, International Journal of Training Research, Vol. 17 No. sup1, pp. 26-40, doi: 10. 1080/14480220.2019.1629727. Smith, J.A. and Shinebourne, P. (2012), “Interpretative phenomenological analysis”, in Cooper, H., Camic, P.M., Long, D.L., Panter, A.T., Rindskopf, D. and Sher, K.J. (Eds), APA Handbooks in Psychology®. APA Handbook of Research Methods in Psychology, Vol. 2, Research Designs: Quantitative, Qualitative, Neuropsychological, and Biological, American Psychological Association, pp. 73-82, doi: 10.1037/13620-005. About the authors Atri Sengupta is affiliated with Indian Institute of Management Sambalpur as Assistant Professor in the field of OB and HRM. She has over 15 years work experience. She received her Ph.D. from Indian Institute of Technology, Kharagpur. She has authored two books and several book chapters. She has also published articles in national and international journals of repute. Shashank Mittal is affiliated with Rajagiri Business School in the field of OB and HRM. He has over 5 years of work experience. Shashank Mittal is the corresponding author and can be contacted at: [email protected] Kuchi Sanchita is affiliated with IIM Raipur and is a research scholar.

For instructions on how to order reprints of this article, please visit our website: www.emeraldgrouppublishing.com/licensing/reprints.htm Or contact us for further details: [email protected]

Data science disruptions

343

The current issue and full text archive of this journal is available on Emerald Insight at: https://www.emerald.com/insight/0025-1747.htm

MD 60,2

The dual drivetrain model of digital transformation: role of industrial big-data-based affordance

344

Yi Liu and Wei Wang School of Management, Jinan University, Guangzhou, China, and

Received 4 December 2019 Revised 5 May 2020 30 June 2020 Accepted 8 July 2020

Zuopeng (Justin) Zhang Coggin College of Business, University of North Florida, Jacksonville, Florida, USA Abstract Purpose – To better understand the role of industrial big data in promoting digital transformation, the authors propose a theoretical framework of industrial big-data-based affordance in the form of an illustrative metaphor – what the authors call the “organizational drivetrain.” Design/methodology/approach – This study investigates the effective use of industrial big data in the process of digital transformation based on the technology affordance–actualization theoretical lens. A software platform and services provider with more than 4,000 industrial enterprise clients in China was selected as the case study object for analyzing the digital affordance and actualization driven by industrial big data. Findings – Drawing on a revelatory case study, the authors identify three affordances of industrial big data in the organization, namely developing data-driven customized projects, provisioning equipment-data-driven life cycle services, establishing data-based trust and determining affordance actualization actions driven by technology and market. In addition, the authors reveal the underlying drivetrain mechanisms to advance industrial big data affordance and actualization: stabilizing, enriching and pioneering. Originality/value – This study builds a drivetrain model on digital transformation by industrial big data affordance actualization. The authors also provide practical implications that can help practitioners to implement digital transformation effectively and extract value from their investment. Keywords Industrial big data, Digital transformation, Technology affordance actualization, Case method Paper type Research paper

Management Decision Vol. 60 No. 2, 2022 pp. 344-367 © Emerald Publishing Limited 0025-1747 DOI 10.1108/MD-12-2019-1664

1. Introduction Digital transformation (DT) has become increasingly important in both the academic and business communities during the past decade (Lucas et al., 2013; Li et al., 2018a, b; Babin and Grant, 2019). DT is defined as “a process that aims to improve an entity by triggering significant changes to its properties through combinations of information, computing, communication, and connectivity technologies” (Vial, 2019). With the emergence and continuous development of DT, many industries are increasingly using digital technologies, particularly the popular SMACIT – social, mobile, analytics, cloud and Internet of things (IoT) – technologies (Sebastian et al., 2017) in the operation of complex manufacturing systems (Borangiu et al., 2019). The seamless integration between the physical and virtual spaces, which is called Cyber–Physical Systems (CPS), promotes manufacturing enterprises entering a new era of “big data.” The global manufacturing industry is facing a new industrial revolution – Industry 4.0 – in which a large amount of time-series data was generated at high speed by the intelligent sensors and safe transmission network covering the entire product life cycle (Fichman et al., 2014; C^orte-Real et al., 2019; Simetinger and Zhang, 2020). As a result, the industrial big data has become the primary driver for this revolution (Perera et al., 2018; Zeng and Zaheer, 2018). It is imperative to develop a better understanding of the industrial big-data-driven organizational logic in DT. While past studies offer rich insights into how big data can change business operations (Youngjin et al., 2012; Zeng and Glaister, 2018), they offer limited insights into how industrial

big data dramatically transform the way manufacturing enterprises are doing business. Indeed, more recently, it has been recognized that research on the strategic implications of industrial big data is still quite limited (Oesterreich and Teuteberg, 2016; Frank et al., 2019a, b). It should be noted that besides the traditional “5V” characteristics such as volume, variety, velocity, veracity and value, industrial big data has new “3B” characteristics: below-surface, broken and bad quality (Pablo, 2017). Below-surface indicates that there is only a small portion of industrial big data that can usefully be deployed to create value (Li et al., 2018a, b). Broken refers to the heterogeneous and multisource nature of data, generated from different sensors with diverse sampling frequencies (Ji et al., 2016). Bad quality denotes the costly failures and unplanned downtime of machinery during the entire production when dealing with data that has a low correlation with the manufacturing process, such as noise data (Jay et al., 2015). These unique aspects of industrial big data raise multiple challenges to extract the potential value in the process of DT, and the traditional big data approaches are not applicable in industrial scenarios (Braganza et al., 2017). The value of industrial big data is highly dependent on the sociotechnical context and rooted in the strategic relationship of both internal (the firm’s technology resource base) and external factors (the firm’s market position, as well as consumers’ evaluations of the firm’s output). However, very few studies have given sufficient attention to the mechanisms of demand-driven insights; consequently, the insights for DT that comes from the demand side are missing. There lacks a comprehensive understanding of DT as well as its implications from the market. Indeed, scholars have lamented the lack of such consensus, calling for a unification of customer value and manufacturing process value (Frank et al., 2019a, b). In summary, the integration of digital technologies and market thinking of DT, although difficult, will be necessary if we are to better understand the specific process of value creation that can most likely lead to ongoing success transformation in a particular organization (internal) and market (external) contexts. Against this backdrop, this paper aims to explore the following overarching question: how does industrial big data that combines demand-side and technology-side thinking contribute to the implementation of DT? To answer this question, we conduct an exploratory case study of an industrial software solutions provider that has successfully undergone DT. In order to open the black box of DT, we adopt the perspective of affordance actualization (Strong et al., 2014) to explore the affordances of industrial big data and their actualization process. An affordance is defined as an action potential offered by digital technology (Nambisan et al., 2017), while actualization means organizations must take goal-oriented actions to achieve desired outcomes. The affordance–actualization perspective offers a promising lens that is suitable to examine previously unrecognized roles of emerging technology (Majchrzak et al., 2016). Identifying the affordances and actualization process of industrial big data relates to the goal of transformation and thus assists in expanding existing research of DT driven by data-based interdependence relationship between technology resource and market demand. Overall, our primary contribution is an “organizational drivetrain” model explaining the mechanism of industrial big data affordances for DT, wherein organizations respond to market changes by using industrial big data to alter their value creation processes. Further, this discovery extends the current knowledge of industrial big data value cocreation process with more balanced attention not only to a firm’s manufacturing practices but also to the value creation from the firm’s customers. The remainder of this paper is organized as follows. Section 2 describes the theoretical background. Section 3 outlines the methodology and a description of the case. This is followed by data analysis and inductive of research findings in Section 4. The following section presents the theoretical framework emerging from our analysis. Finally, the paper concludes with implications, limitations and future research directions.

Dual drivetrain model of digital transformation

345

MD 60,2

346

2. Theoretical background 2.1 Digital transformation and the role of industrial big data DT is one of the most important issues in the upcoming new industrial revolution. The new industrial revolution describes a new stage of smart manufacturing whereby the global industrial system with the power of advanced computing, analytics, low-cost sensing and new levels of connectivity is enabled by the Internet. To accelerate economic recovery and further seize new opportunities in this revolution, both developed and developing countries have proposed specific manufacturing-based stimulus policies to promote the industrial revolution accompanied by the more powerful network revolution (Tambe, 2014). For example, Germany proposes the Industry 4.0 plan to transform the traditional manufacturing industry into the smart industry by incorporating CPS. USA initiates the Industrial Internet and Advanced Manufacturing Partnership (AMP) to transform the manufacturing industry. China develops the “Made in China (2025) Plan” that aims to improve the intelligent level of manufacturing. An increasing amount of research has been conducted to understand how digital technologies transform the traditional manufacturing industry. Consistent with previous literature, we foreground DT as a process where digital technologies create disruptions triggering strategic responses from organizations that seek to alter their value creation paths (Vial, 2019, p. 118). Existing research has contributed to our understanding of specific aspects of DT in terms of themes such as the nature of digital technologies and their role in DT, required resources and capabilities for value creation (Hansen and Sia, 2015; Du et al., 2016), transformation strategy and processes (Makhlouf and Allal-Cherif, 2019) and the impacts of such transformation for an organization (Karimi and Walter, 2015; Yeow et al., 2017). Nevertheless, while recent research has made some contributions to identify specific features of DT, previous transformations were primarily limited to business insights within organizational boundaries (Li et al., 2018a, b). In the era of the new industrial revolution, the transformations driven by digital technologies go far beyond changes to internal business, which are predominantly related to the demand-pull insights. Since the value created from the demand-side depends on customers’ willingness to pay for firms’ output, it is necessary to advance a concept to highlight the characteristics of value derived from the cooperation between firms and customers. Understanding “what it takes” for an organization to digitally transform is the objective of this research. In this paper, we focus on industrial big data as one of the key assets of smart manufacturing, which combines demand-pull and technology-push understandings into a coherent whole. The rise of digital technologies in last decade has served as both a trigger and an enabler for industrial big data, such as the IoT, 5G, artificial intelligence (AI), cloud computing, digital twin model and wireless sensor network (Nambisan et al., 2017; Wamba et al., 2017; Iftikhar and Khan, 2020). The pervasive adoptions of digital technologies are radically changing the traditional industrial manufacturing toward smart manufacturing, resulting in the creation of tremendous amounts of industrial data (Xu et al., 2017; Zeng and Glaister, 2018). Fueled by the convergence of social, mobile, analytics, cloud and IoT, as well as the growing need for automation and integration, more and more sensors are being incorporated into smart products, manufacturing equipment and production monitoring (Ji et al., 2016). Multiple sensors and other data-sensation technology can automatically collect tremendous amounts of industrial data ranging from the material properties, temperatures and vibrations of equipment to the logics of supply chains and customer details, whereby the volume, variety, velocity, variability and veracity of data are exploding at record rates (Sun et al., 2016). The era of industry big data has arrived (C^orte-Real et al., 2019). Industry big data means large amounts of time-series data generated at high speed by industrial equipment (Shamim, 2018), which is composed by three main parts: (1) industrial equipment data, generated in design, commissioning, operation and maintenance and recycling; (2)

product life data and operational data coming from the process of product design, production, manufacturing, use, service, recycling and scrapping; (3) external data, including the data of industry supply chain, the Internet, users and economic and social environment(Frank et al., 2019a, b). Industrial big data is an important decision-making asset for enterprises to realize internal intelligent production, the network-based collaboration between enterprises and customized products and services. These large amounts of industrial data bring great challenges for all industries. Kusiak (2017) articulated that “big data is a long way from transforming manufacturing, even leading industries face data gaps, most companies do not know what to do with the data let alone how to interpret them to improve their process and products.” Taken together, industrial big data is enabling new forms of transformation on a scope and scale not previously witnessed in the realm of DT (Mcafee and Brynjolfsson, 2012). It brings the power of market demand to the forefront, empowering them to serve as the change agent for value cocreation. Nevertheless, this emerging phenomenon of industrial big-dataenabled DT remains underresearched. In the present study, we contribute novel insights into the phenomenon of industrial big data for DT to address this knowledge void. Specifically, we adopt the perspective of technology affordance actualization to conceptualize how industrial big data serve as the enabler for a large-scale, organizational-driven DT. The following section presents a review of our theoretical lens. 2.2 Affordance actualization The concept of “affordance” was first developed by an ecological psychologist Gibson in 1986, which refers to the possibility for action (Chemero, 2003). An affordance is what is offered, provided or furnished to animals by an object, either bad or good (Stoffregen, 2003; Sakreida et al., 2016). The studies in ecology have formed the foundation for information system (IS) research (Stoffregen, 2003; Bloomfield et al., 2010; Smith, 2015). In existing IS literature, technology affordance refers to the action potentials, that is, what an individual or organization with a particular purpose can do with a technology or IS (Volkoff and Strong, 2013). Affordances have recently received much attention from IS scholars (Faraj and Azad, 2012; Anderson and Robey, 2017). An important reason is that its application promises to provide new insights in explaining the consequence of IT artifact uses in organizations (Leonardi and Vaast, 2017; Leidner et al., 2018) and the related organizational changes (Bygstad et al., 2016; Piccoli, 2016; Autio et al., 2018). In the context of information technology, affordances refer to “what an individual with a particular purpose can do with a technology” (Markus and Silver, 2008). As the potential for behaviors, an affordance is a property of the relationship between an object and an actor (Abrishami et al., 2014). In short, technology has material properties, but these material properties afford different possibilities for actions based on the contexts in which they are used (Thapa and Sein, 2017). By focusing jointly on objects’ materiality and on people’s perceptions of affordance, the concept of affordance is useful for theory in that it has the potential to help explain why, how and when new technologies – such as industrial big data – become enrolled in and affect organizational action (Tim et al., 2018). An affordance provides action possibilities that, however, are just potentialities; to have influence, the affordances need to be actualized (Vaast and Kaganer, 2013; Rice et al., 2017; Vaast et al., 2017). Affordance actualization is “the actions taken by actors as they take advantage of one or more affordance through their use of the technology to achieve immediate concrete outcomes in support of organizational goals” (Strong et al., 2014). Based on this definition, Du et al. (2018) further developed and refined the concept of affordance actualization in their study of an emerging financial technology block chain. They explicitly considered “affordance actualization as the goal-oriented actions taken by actors as

Dual drivetrain model of digital transformation

347

MD 60,2

348

they use technology to achieve an outcome.” Several extant studies embrace the affordance actualization to demonstrate the importance of separation between the affordances and their actualization process. Overall, the focus of actualization is on the actions related to achieving the desired goal; these actualization actions generate immediate concrete outcomes, and the outcomes provide feedback for adjusting the actor’s behaviors and technology material features (Beynon-Davies and Lederman, 2017; Dong-Hee, 2017). We adopt affordance actualization theory as our theoretical lens (see Figure 1) because it considers affordances and their actualization separately and describes the synergistic cooperation process of technology, actor and context (Burton-Jones and Volkoff, 2017). Indeed, this theory offers explanatory power to address the urging problem regarding the DT of traditional industry driven by Industry 4.0. Specifically, the pervasive penetration of new digital technologies, such as IoT, cloud computing, 5G, big data and so on, is creating new affordances for the traditional industry to transform their manufacturing business (Zeng and Glaister, 2018). The affordance–actualization perspective allows us to examine how an organization that subscribes to emerging action goals related to DT interprets material properties of industrial big data with the objective of value cocreation with consumers. This distinction is important in the context of DTs because it allows the specification of how industrial big data contributes to changes in organizational changes, which in turn constitute the DT. 3. Research methodology 3.1 Research setting and case selection Due to the lack of prior research on industrial big-data-enabled DT, this work aims to conduct an in-depth case study and to develop a framework based on the empirical data. Single case analysis is appropriate for our study because the flexibility of the method allows researchers to develop a deep understanding of ISs’ effective use and organizational actions related to their use (Eisenhardt, 1989; Sobh and Perry, 2006; Eisenhardt and Graebner, 2007; Eisenhardt et al., 2010). We chose a revelatory case study, that is, one that is well suited for developing a theory about a phenomenon that was previous inaccessible to scientific investigation (Seidel et al., 2013), such as industrial big data drivetrain for DT. As an appropriate case site, we sought an organization that had undergone DT through the use of industrial big data and digital technologies. We chose CPC (a pseudonym) as our research site for the following reasons. First, the selected case firm has to master the changes taking place in years by industrial big data to alter their value creation processes. Founded in 1999, CPC is a business-to-business company with original business in industrial software and service solutions. In order to be competitive in terms of products and business models, CPC has consciously accumulated considerable experience and expertise in the data collection and analysis of a large number of industry clients covering airports, ports, tobacco,

Figure 1. Affordance– actualization theory (adapted from Du et al. (2018))

chemicals, metallurgy, manufacturing and so on. All the data and related technologies can be regarded as the strategically important resources that provide the foundation for developing new sensors – data-driven services and business models based on Industry Internet platform. Second, the selected case firm is representative because of its leading position in the industry. In 2015, CPC set up the first Industrial Internet platform CPC2025 in China. As a frontrunner, CPC2025 already has more than 4,200 loyal industrial enterprise clients. Due to the connectivity platform provided by CPC, many industrial enterprises’ machines, products and processes are interconnected and integrated to generate predictive insights for both CPC and the platform’s customers. The platform built by CPC focuses on connecting physical entities to a big-data-based network with sensors, sensing and capturing operational data. The increasing accumulation of industrial big data enables CPC to analyze and respond to changes taking place in the market. Specifically, harnessing industrial big data is pivotal to its industry Internet platform CPC2025. Therefore, CPC serves as an adequate context in which to study the launch of digital business models driven by industrial big-data-driven insights (i.e. the affordance of industrial big data). 3.2 Data collection In order to obtain in-depth qualitative data, exploratory interviews served as the primary source were conducted with CEO and senior managers who were in charge of CPC. Specifically, as the focus of our investigation lies in the DT driven by industrial big data, we selected managers and executives responsible for DT as well as software developers and consultants that were not directly involved in the organization’s transformation but in the core business of industrial big data for the interviews. When possible and appropriate, we also interviewed employees and their marketing managers of CPC. Table 1 summarizes the interviews conducted for this study. As is often recommended for exploratory research, we used semistructured, open-ended interviews guided by a few open questions. The interviews were guided by such key issues, such as accumulation and usage of industrial big data, the organizational changes brought by using the data and different phases of the product and service development life cycle. The entire data collection lasted from May 2017 to January 2020. We designed data collection with interviews of selected informants from CPC in three stages. First, we conducted baseline interviews shortly after we introduced our research objectives. These interviews focused on contextual information with respect to the general roles of industrial big data and major business improvements. Second, we conducted follow-up inquires in order to develop a deeper understanding of potential affordances about industrial big data. These interviews focused on the research questions addressing what products and services they provided for industrial manufacturing and their initial impression of how industrial big data would affect their offerings. Finally, we conducted interviews focused on affordance actualization. We asked interviewees about the industrial big data technologies they used and how industrial big data replace or augment physical offerings. The objective of the initial stage was to explore CPC’s social–technical context and derive insights into industrial big data affordances broadly. In the following stages, we conducted multiple rounds of interviews, where the framework was refined based on initial understanding. Before interviews, an interview guideline was developed according to the specific projects and main business improvements. The interview questions became more detailed and focused on specific affordances and their actualization that emerged from the interview data. We selected top management leaders, project managers, consulting manager, as well as frontline data scientists to be our informants in order to gather reliable case insights. Each interview was conducted by five members of the research team and lasted

Dual drivetrain model of digital transformation

349

MD 60,2

350

Stage

Interviewee Position

Gender Date

Duration of interview

First: contextual information

CX DSS LY LDF GZC

male male male male male

23-05-2017 23-05-2017 23-05-2017 12-09-2017 12-09-2017

63 min 63 min 85 min 85 min 85 min

male female female female male male female male female male male

12-09-2017 11-3-2018 11-3-2018 11-3-2018 10-11-2018 10-11-2018 15-6-2019 15-6-2019 15-6-2019 20-12-2019 20-12-2019

120 min 120 min 120 min 110 min 110 min 110 min 90 min 90 min 100 min 100 min 75 min

Second: potential affordance

Third: affordance actualization

Table 1. Overview of interviewees

WXQ XD XW XZ YQH XK XJH WY LQ SYF XCH

CEO CFO CIO Project manager Senior software manager Product manager Consulting manager Project manager Consulting manager Project manager Quality engineer Marketing manager Data scientist Consultant Data scientist Business manager

between 70 and 120 min. All interviews were recorded and later transcribed. CPC was supportive of our research and provided the research team with rich, internal, archival data. 3.3 Data analysis We began our data analyses with thick descriptions of our case. We described the DT in CPC in terms of the products and service, the accumulation and use of industrial big data and the growth after the transformation. The descriptions were reported back to CPC and interviewees to ensure the corrected understanding of the DT interpreted by the research team. We applied the method by Gioia (2013) to analyze our interview transcripts because it explicitly empowers researchers to use existing concepts as a theory base from which to build inductive theory while being open to unexpected concepts. We went through three data analysis steps, as those used in prior case studies (Pan and Tan, 2011). In the first step, we reexamined the interview transcripts and attempted to code categories and subcategories that illustrated how informants understand DT in CPC through the use of industrial big data. We met regularly to cross-examine the transcripts in order to capture the informant’s meanings and to ensure the consistency of the coding (Du et al., 2018). In the second step, we clustered the emergent first-order categories into second-order theoretical themes in terms of underlying concepts. It is an iterating process between the literature review and data analysis. As Gioia (2013) noted that “this is when our research transited from being inductive to being abductive in that data and existing theory is now considered in tandem.” We adopted affordance actualization as a theoretical lens and realized that digital technologies had to be seen as permitting action possibilities, rather than determining a certain result. The affordance–actualization lens allowed us to examine the industrial big data use context in which affordance originated and in which they were realized. The use of affordance– actualization theory in this situation is warranted because it allowed us to further categorize the second-order themes into aggregate dimensions related to the drivetrain. In the third step, we integrated the key themes derived from the second step into a model in order to form a theoretical model. We found that three functional affordances pulled the technology side and market side into a whole. These functional affordances and their actualization model thus became the core contributions of our research. With this process model, we hope to

create a comprehensive framework that can help researchers make sense of the reality of DT based on industrial big data assets. As such, Figure 2 shows the data analysis process and results. The aggregate dimension captures the evolving affordances of industrial big data in the Chinese industry – from realizing the affordance of data resources to deploying rare and valuable capabilities to actualizing the digital affordances.

Dual drivetrain model of digital transformation

351 4. Research findings The DT in CPC was triggered by combined influences from the digital technologies and market environment, which led to major business improvements. Prior studies on DT have considered the contribution of digital technologies for internal value creation processes, while the value is determined by customers’ evaluations of the benefits they expect to receive. However, there is still a lack of clarity about the integration of the clients and internal operations supported by digital technologies. Value creation in the Industry 4.0 age has become a value cocreation between the organization and its clients. Digital technologies can be interconnected with products to collect data and afford firms to uncover unforeseen opportunities for business and market opportunities. In response to these external changes, CPC enacted a transformation process based on the action possibilities that were afforded by industrial big data. In this section, we present the industrial big data affordances and their actualization in terms of the actions required for the achievement of DT. 4.1 Affordance of industrial big data 4.1.1 Affordance 1: data-driven project developing. This affordance refers to CPC with the aim to ensure the efficiency of software and system by customized products and solution projects to provide greater value than competitors, which is enabled by interpretations of clients’ operational data. CPC, as a software solutions provider, uses production, equipment operation and maintenance data to improve the design of the device information management system.

Figure 2. Data analysis process and results

MD 60,2

352

In China, different industry enterprises have different details in mechanization, automation, informatization and also have different digitalization requirements. CPC tracks the production process of its clients through the application of equipment information management systems in real time, particularly on the comprehensive interconnection of machines, materials, control systems, ISs, products and people. This enables the assessment and measurement of manufacturing efficiency in real time. Moreover, the accumulation of equipment data, production process data, after-sales data and service data has been the primary driver for customized products and solutions according to clients’ specific needs. One of CPC’s founders Cao Xin explained: CPC was established as an industrial software company that provided equipment asset management software for industrial enterprises to improve the reliability and safety of flexible industrial equipment in all enterprises. At first, we set up the PM5.0 version of the equipment management system based on PB (Power builder) technology; in the following several years, it researched and developed a series of equipment asset management software, which covered more than 20 industries. These software applications replaced the original manual recording way of equipment operation information data in enterprises and realized data recording electronically. The volume of equipment data accumulated in CPC affords the potential to develop different equipment management software according to market demand.

4.1.2 Affordance 2: provisioning equipment-data-driven life cycle services. This affordance reflects the organization’s goal of offering clients through equipment-data-driven life cycle services that complement CPC’s main products and projects. Different from the traditional after-sales service, industrial big data affords the potential for CPC to offer clients with life cycle services based on the comprehensive data analysis. CPC introduces the management consulting service, which extends the value chain of the traditional equipment management system to both ends. Forward, CPC helps clients to clarify equipment management objectives, assess equipment management status, sort out equipment management problems, select appropriate equipment management methods and customize products accompanying with professional training. Backward, CPC continues to provide equipment management data analysis services for enterprises in the process of product application, so as to discover the invisible degradation pattern of equipment system and improve production efficiency. The CIO described: We provide the service of the whole life cycle of the equipment for Guangzhou Crane Group through the big data service when our equipment management software installed in their products–crane equipment. Industrial equipment life cycle service already replaces after-sales service.

4.1.3 Affordance 3: establishing data-based trust. This affordance aims at leveraging products and sensor data to improve virtual collaboration trust. For example, a small production enterprise (a nonlisted company) does not have the financing in the capital market to expand production scale, so it can only rely on traditional financial service institutions, such as banks and financial leasing companies. However, it is very difficult for them to finance loans mainly because of the risk assessment system of traditional financial service institutions used to be based on financial indicators. Most small enterprise is not a listed company, so the credibility of financial information in the financial service institutions is not high. CPC built a huge “industrial credit system” based on the industrial big data covering orders, production quality, equipment operation status of industrial enterprises. A senior manager explained: Through equipment data and algorithms, we can judge the production efficiency based on the probability of inferior products in industrial enterprises and master the financial risk assessment combined with the comparison and verification of financial information data. In this way, industrial enterprises can realize capitalization development and evolution for high-quality enterprises and expand the production scale.

In addition, industrial credit also establishes a bridge between supply and demand enterprises. The algorithm model, built on the equipment operation data, production capacity data, workshop cooperation data and so on, affords the demand side to accurately know the precision time of the expected delivery and calculate the minimum inventory storage of the equipment and product. At the same time, the supplier can accurately identify and judge the inventory reduction of the demand side through the industrial big data and realize in-transit products in the production process in order to reduce inventory cost as much as possible. 4.2 Actualization of affordance Affordance actualization describes essential organizational actions to realize the potential affordance. 4.2.1 Technology-push actions. With the rapid development and application of advanced digital information technology such as AI, IoT, cloud computing and big data analytics, manufacturing systems are getting smart. Based on the connectivity afforded by the industrial IoT, cloud service, big data and analytics, firms’ products and processes are interconnected and integrated to support production intelligence. In this context, CPC innovatively transformed itself from a software service provider to an industrial Internet platform. CPC built the first industrial Internet platform “CPC2025” of China. This platform includes new IT architectures such as IaaS, PaaS and SaaS microservices, based on the accumulated industrial big data. The platform makes full use of industrial data coming from the comprehensive interconnection of information systems, products, people, machines and material to realize smart manufacturing. Besides, the platform gradually establishes an enterprise credit network to provide financial services to customers and address the problem of financing for SMEs. According to the streamlined process of information acquisition, transmission, integration, analysis and optimization, application and display (see Figure 3), CPC developed an effective technology support system for clients to realize the effective management of intelligent manufacturing. (1) Information collection: technologies are used to enable enterprises to collect equipment operational information through sensor technology, programmable logic controllers (PLC), distributed control systems (DCS), supervisory control and data acquisition (SCADA) and other industrial technologies in the perceptual acquisition layer. (2) Information transmission: these technologies are ways of information transmission, including various networks of communication, such as 3G, 4G, 5G, wireless or wired network transmission and so on. (3) Information integration: in order to enable clients to integrate different application systems’ data with different data formats, contents and storage methods, CPC designed an integrative technology support system from the vertical and horizontal aspects. The vertical technical support system integrates the basic automation (L1 layer), process control (L2 layer), manufacturing execution (L3 layer) in the equipment production network with the internal management system (L4 layer), intelligent decision system (L5 layer) and external integration (L6 layer), so as to ensure the operational efficiency and business management effectiveness. The horizontal technical support system, based on microservice technology such as Docker, Apache Mesos and Kubernetes, establishes a set of standard enterprise information integration streamlines, which could connect all independent equipment asset application systems in different manufacturers and transform the collected information into a standardized format.

Dual drivetrain model of digital transformation

353

MD 60,2

354

Figure 3. Technology support system

(4) Information application: CPC used the database formed in the information integration system to carry out a series of daily transactional work such as equipment account management, intelligent management of spare procurement and intelligent management of product inventory. (5) Analysis and optimization: based on the information integration to satisfy the needs of the daily production of equipment asset management, CPC took a step further to analyze and optimize the information, in order to discover the relevant rules of equipment operation and maintenance in the existing information and summarize the useful knowledge. (6) Information display: these technologies are various visualization technologies, which afford the organization to observe the entire work process from “end to end.” Real-time visualization of business process management accompanied by dashboards could display the status of the manufacturing process. At the same time, real-time tracking sensors including RFI (radio frequency identification) and CPC “Octopus” software afford the symbiosis of technology and the entire work process. A senior manager described the platform CPC2025 as follows: “CPC2025 Internet platform is based on the accumulation of industrial big data and equipment management experience over the past 20 years. The platform is mainly composed of three parts, including cyber-physical control enterprises (CPCE), cyber-physical control city (CPCC), and cyber-physical control data (CPCD). CPCE is an upgraded form of traditional Comprehensiveness Control and Management (ACCM) to provide users with a core application of information management integrated management and control platform. CPCC provides users with access to a network of manufacturing products, sales, and financial services. CPCD is a data collection and comprehensive analysis system through machine learning.”

4.2.2 Demand-pull actions. The new industrial revolution has fundamentally dampened China’s comparative labor cost advantage and brought severe challenges for all traditional manufacturing. CPC’s clients face serious survival pressures from market demand and overcapacity. They have to transform the traditional business model that usually relies on high resource and energy consumption to effectively use data and information technology in order to optimize the manufacturing process and reduce resource consumption. The strategic goal of manufacturers in China has changed from “Made in China” to “Created in China,” which aims to improve their competitiveness in the global markets. Industry enterprises realized that an agile, practical and effective management system could improve the production and operation systems effectively and in real time. As a result, the automation, information and digitalization of equipment and manufacturing processes have become the key determinant of the firm’s performance. In order to help enterprises to achieve higher security, reliability and operational efficiency, CPC initiated the implementation of value creation processes transformation. (1) Customized project After years of enhancement and optimization, the functions of equipment management software have become more robust, covering procurement management, account management, technical standard management and equipment operation and maintenance management, which can collect large amounts of industrial data. Data analysis contributes to the design of customized projects and integrated services including system design and technical support. Based on the industrial data collected in the last two decades, CPC has formed two different project service strategies to meet clients’ needs. For large group enterprise clients, with a variety of equipment, meticulous requirements of equipment management processes and sufficient equipment capitalization, CPC provides comprehensive services in the form of a project to design a customized equipment management system (such as EAM11g and related products). For small and medium enterprise (SME) customers, whose equipment needs are relatively simple and homogenized, CPC provides standardized products such as the mobile work order system where work tasks could be directly sent to the equipment operator’s smart device, so the operation and maintenance personnel could monitor the operation of equipment anytime and anywhere. The product manager explained: “our client-PetroChina natural gas pipeline is one of the most complex pipeline network systems in the world. The effective operation of the pipeline network system requires a large number of modern and high-quality equipment. We integrate equipment assets management with the international industry standards, and the current situation of equipment management in China, to design and provide a customized equipment management platform suit for natural gas pipeline network.” (2) Life cycle service In the CPC industrial Internet platform, every change in manufacturing and business practice can be recorded and analyzed. The industrial data coming from clients has provided unprecedented opportunities for CPC. They find that many enterprises face a significant challenge in the effective use of the manufacturing equipment information system. In order to improve the effective alignment of the industrial information system with business practice and the operational efficiency of the manufacturing, CPC provides life cycle services based on the existing business asset – the equipment management software and the accumulated industrial big data. On the one hand, CPC provides evaluation and management consultation before manufacturing enterprises adopt the IS. One the other hand, CPC offers education and training about equipment maintenance management, operational management and

Dual drivetrain model of digital transformation

355

MD 60,2

356

evaluation, inspection and maintenance technical standards and so on. In the process of optimizing the development of products and services, CPC built a new paradigm of industrial service, in which all CPC’s manufacturing clients were provided with end-to-end participatory services. Through life cycle management consultation, evaluation and education, CPC could quickly recognize unforeseen patterns about clients, businesses and markets. A consulting manager elaborated: “we analyze Industrial big data and forecast potential market demand, then design new products and services which could more in line with clients’ preferences. In this way, we can help our customers to innovate the business process, organizational structure, and operational practice.” (3) Virtual collaboration and trust With the development of the industrial Internet platform CPC2025, more than 200,000 clients including different sizes of manufacturing enterprises have registered the platform. Based on security protocol data, manufacturing data, cloud computing and other technologies, CPC can integrate and share knowledge through virtual artifacts and provide collaboration service for the entire platform ecosystem. In the procurement phase, the CPC2025 provides virtual cloud procurement services according to the actual needs of the platform’s users. The core of its services is to help enterprises realize and complete online transactions, such as inquiry and bidding. Online communication contributes to the rapid expansion of the cooperation network for enterprises. In the production phase, collaborative cloud services are provided to help enterprises visualizing the entire work process and improving the efficiency of equipment operation and production. The sales cloud services provide open certification and precise docking between the upstream and downstream suppliers and distributors, which could help them to achieve accurate matching in terms of products transaction, production collaboration and so on. Virtual collaboration lays the foundation for the credit network where the CPC2025 platform could provide enterprise clients with financial services and solve the financing problem, especially for small and medium-sized enterprises. Financial services include all aspects of enterprise procurement, manufacturing and sales. Based on the credit network, CPC introduces many qualified financial service providers and helps enterprise clients transform the traditional accounting period mode of various materials procurement to the current settlement mode. At the same time, it can provide a certain amount of cyclical liquidity without interest and mortgage and help the upstream and downstream enterprises greatly ease the pressure of capital. In conclusion, CPC provides effective financial support for the development of manufacturing enterprises. As the project manager explained: “CPC2025 platform affords small enterprise the potential to secure loans from banks and financial institutions. Small and medium firms have a strong need for loans; however, banks are unwilling to lend without sufficient information to control the risks. Our platform provides industrial big data analysis including production efficiency, product quality, overall equipment efficiency, transaction scale, and so on. This analysis leads to an enterprise credit network, which contributes to financing services for SMEs. The government wants to do with us is industrial credit.” 5. A drivetrain model on digital transformation by industrial big data affordances In this section, we integrate the findings described earlier and propose a model of how industrial big data affords CPC to realize DT. Building upon the affordance generation and specific actualization process, we identify three underlying drivetrain mechanisms used by CPC to advance industrial big data affordance and actualization: stabilizing, enriching and pioneering.

Stabilizing is to make minor incremental improvements in existing capabilities. The stabilizing process of CPC extends the repertoire of current products by adding customized projects to the current bundle. This process integrates clients’ demand with an existing capability and provides standardized and customized products. CPC has been independently researching and developing a series of equipment management software to meet the different automation needs of large, medium and small industrial enterprises. Standardized products build current R&D capability. When CPC was established in 1999, it developed a PM5.0 version of equipment management software based on PB technology. In the next few years, a B/S architecture equipment management software EAM2003 was developed in 2003, a self-developed product platform EAM2004 was developed in 2004 and a series of equipment assets management software based on EAM2008 was developed in 2008, which covers more than 20 industries. Customized products include equipment assets management platform for group enterprises EAM11g system, a series of EAM2012 software and EAM2015 systems. Enriching is to extend and elaborate a current capability. The enriching process of CPC improves the repertoire of current products by adding life cycle service to the current bundle. This process integrates clients’ demand by learning a new skill to improve existing capability and provides products and specific service. Facing the effective use of gaps in ISs in the majority of manufacturing clients, CPC transforms its main business from selling equipment asset management software to providing management consulting services. Different from the traditional after-sales service, industrial big data affords the potential for CPC to offer clients with life cycle services based on the comprehensive data analysis. The enriching process integrates newly industrial consulting services with the existing R&D capability. CPC attaches the service capability onto its product development capability, leading to a new, high-level product commercialization capability. Pioneering requires exploratory learning and stimulates the creation of a new capability. The pioneering process of CPC combines network service capabilities with the existing capabilities to create a new capability. This process allows CPC to uncover unforeseen clients’ demand by learning new skills to develop novel capabilities and provide new value. With the development of the 4th Industrial Revolution, CPC had reached a consensus to develop an industrial Internet platform after many rounds of discussions. They believed that the development of the industrial Internet platform not only satisfies the need for smart manufacturing but also enhances the likelihood to create new competitive advantage. The pioneering process includes the orchestrating of new platform resources with existing industrial big data resources. As a result, CPC provides a virtual collaboration service for all clients and builds the industrial credit network with financial services for small and mediumsized enterprises. Based on these insights, we propose a theoretical framework for DT driven by technology and market demand through stabilizing, enriching and pioneering mechanisms and the affordances of industrial big data play a decisive role. Figure 4 illustrates our framework that links the drivetrain mechanisms of industrial big data with DT from the lens of technology affordance and actualization. 6. Discussion 6.1 Main findings In this study, we set out to study the impact of industrial big data on DT. The underlying motivation was the novelty and complexity of the studies that examine how affordances related to the industrial big data phenomenon are actualized in order to alter their value creation processes. We conducted a revelatory case study at CPC that to a great extent has succeeded in bringing major business improvements by using industrial big data. Since our

Dual drivetrain model of digital transformation

357

MD 60,2

358

data collection, CPC has established an industrial Internet platform called CPC2025. CPC2025 is a priority phenomenon of interest that essentially relates to how business can be improved through the use of industrial big data. Through our work, we identified three industrial big data affordances, namely: developing a data-driven project, provisioning equipment-data-driven life cycle services, establishing data-based trust and collected detailed information on affordance actualization actions driven by technology and market. In interpreting the results, it is helpful to realize that CPC can benefit from industrial big data in an attempt to innovate their business models through affordances actualization practices. Analyzing the practices taken, we identified three organizational capabilities used by CPC to advance affordance actualization: stabilizing, enriching and pioneering. On the basis of the insights we gained from our findings, we present a drivetrain model of DT by industrial big data, which is our main contribution to the current research on the DT and operationalization of affordance in IS. 6.2 Implications for theory Our study highlights how industrial big data affordances enable DT and thus contribute to creating a new business model. In so doing, the study helps demonstrate how enterprises integrate industrial big data into the broader notion of DT and competitive environment. By weaving industrial big data affordances into the business improvements from a software provider to an industrial Internet platform, we can contextualize our knowledge around the use and effects of industrial big data in an organization (Burton-Jones and Volkoff, 2017) in the specific context of DT (Chen et al., 2012; George et al., 2014). Thus, our findings on industrial big data affordances and actualization resonate with insights provided by Bygstad et al. (2016), to emphasize concrete and specific techno-organizational context on how industrial big data maximizes the potential of the industrial Internet platform and shed light on the mechanisms that initially enable or constrain the actualization of the affordance. Moreover, instead of treating demand-pull model of service innovation and technologypush model of emerging technologies as separate transformations that are dramatically challenging the business models, our case results indicate the notion that manufacturing firms can benefit from servitization and digital technologies in the attempt to innovate their business models through effective use of industrial big data, highlighting the convergence of demand-pull and technology-push business improvements. In so doing, we respond to Tambe (2014), Shamim et al. (2018), Frank et al. (2019a, b) and Vial (2019) call for studies on the

Figure 4. A drivetrain model of digital transformation by industrial big data

connection between servitization with Industry 4.0. We contribute an industrial big-dataoriented perspective of the dual drivetrain model to the currently sparse theory on the integration between such two forms of technologies and services innovation. We argue that software and platform services can bring value to clients and, at the same time, they can become the channel of industrial big data gathering, aiming to foster business feedback that stimulates business improvements (Xie et al., 2016). Further, our research identifies the actualization of industrial big data affordances in terms of organizational capabilities and three general mechanisms behind how an organization actualizes DT: stabilizing, enriching and pioneering. Inspired by Du et al. (2018), we explore affordance actualization by gathering detailed information into the development of organizational capabilities. Our study offers insights into the transition process toward organization capabilities and continuous improvement practices of DT. This idea fits with the perspective of affordance actualization, described by Volkoff and Strong (2013) as the affordance will not be actualized unless the organization has the necessary capability. Our discovery of three mechanisms of organizational capabilities extends prior research on affordance actualization by relating the mechanisms to extant ambidexterity theory (Luger et al., 2018) and by developing a dynamic model of exploration and exploitation innovation. The study of industrial big data in DT is in its infancy. Affordance actualization becomes particularly relevant as a lens for understanding business improvements imposed by industrial big data, and further research that takes this perspective into account will help broaden our insights of DT. First, we identified three industrial big data affordances associated with DT. Future research may identify other potential affordances of industrial big data in different contexts and investigate their impact on the value creation process. Supplementary insights into how and to what extent industrial big data enables major business improvements are contingent on factors such as country, area, industry, organizational structure and size, which may offer more granular consideration on the potential value of industrial big data. Second, a valuable next step is to explore how different affordances interact and alter the company’s value creation process; particularly, the interrelationship of industrial big data and business model experimentation and their combined effects on the automation of the process, offerings or organizational routines. Studying connections and evolution also provides support to the understanding of how different technologies’ specific affordances support particular business improvements. Third, in addition to exploring further the concept of affordances, there is also a need to analyze the effects of affordances on both enablement and constraint. For instance, what is the impact of industrial big data affordances on the success or failure of DT? Like most things, affordances are often two-edged swords (Volkoff and Strong, 2017). Further research should study how the outcomes created by industrial big data affordances actualization contribute to or constrain new business models. 6.3 Implications for practice Despite the growing importance of industrial big data, it is not well understood how organizations realize industrial big data affordances successfully. In this study, we identify technology- and demand-level actions leading to the actualization of three affordances at CPC. This study also makes practical contributions by guiding managers to effectively implement digital technologies within their organizations to realize DT. First, the three affordances identified in this revelatory case can help firms understand what industrial big data can do for an organization and how industrial big data affords DT. Our findings suggest that successful DT starts with investing in digital technologies that facilitate the collection, computing and analysis of manufacturing data. The value of

Dual drivetrain model of digital transformation

359

MD 60,2

360

industrial big data is realized through clients’ participation in improving the existing product and creating new services. Second, this study offers three drivetrain mechanisms to actualize industrial big data affordances: stabilizing, enriching and pioneering. Each mechanism allows the organization to take specific actions that are intended to profit from industrial big data and create value for customers. The stabilizing process of CPC extends the repertoire of current products by adding customized projects to the current bundle. The enriching process of CPC improves the repertoire of current products by adding life cycle service to the current bundle. The pioneering process of CPC combines network service capabilities with the existing capabilities to create a new capability. Third, armed with the new lens of affordance to better understand industrial big data, firms can turn to three different processes to cultivate organizational capabilities in using digital technologies, beginning with and focusing on products and services to build core capability. Stabilizing is to make minor incremental improvements in existing capabilities. Enriching is to extend and elaborate a current capability. Pioneering requires exploratory learning and stimulates the creation of a new capability. In summary, this study can help organizations craft their DT strategies and will be particularly insightful for traditional manufacturing industries to seek profit from industrial big data. 6.4 Limitations Due to the interpretive nature of the revelatory case study, we cannot claim that the DT driven by industrial big data has been exhaustive. It must be acknowledged that the interviews may be interpreted by different researchers. To mitigate this weakness, our analysis of original data considered multiple viewpoints and insights, and the findings were corroborated by multiple researchers. A common criticism of a single case study is the issue of generalizability. As we applied a single case approach to generate rich and in-depth insights, our findings may also be limited by generalization. Further, our findings are derived from a software solution provider in an emerging economy where the financial infrastructure is underdeveloped (Du et al., 2018). For example, financial institutions always lend to big corporate customers, most of them are state-owned or at least state-controlled; however, many SMEs have financing difficulties. Our case study reveals that setting up an industry credit guarantee system to offer credit support for SMEs in financing is one of the effective ways to alleviate their financing difficulty.

7. Conclusion Although industrial big data has been recognized as a new form of resources in Industry 4.0, very little is known about these data-driven DTs. We have built a new drivetrain model of affordances that originated from industrial big data that enables the DT of an industrial software solutions provider. In this study, we lend some evidence to the proposal of the transformative power of industrial big data in the launch of digital business models. We identify three affordances of industrial big data, namely developing a data-driven customized project, provisioning equipment-data-driven life cycle services and establishing data-based trust and actualization of these affordances, which combine the technology-push and marketpull effects. This study also indicates the key mechanism of DT actualization, including stabilizing, enriching and pioneering. Insights from our case not only contribute to the existing research on DT but also shed light on what organizational capabilities can do to drive and realize technology affordances.

By developing an affordance lens to explore industrial big data, this study provides several contributions. First, the three affordances identified in this revelatory case can help firms understand what industrial big data can do for an organization and how industrial big data affords DT. Second, this study contributes to the industrial big data literature by offering insights into how industrial big data can be implemented within an organization. Third, by relating to organizational capability theory, three mechanisms behind how an organization actualizes industrial big data affordances become more aligned with the affordance– actualization theory. We describe the different capabilities developed from industrial big data affordances and provide a new way of thinking about the actualization of affordance. To conclude, we offer this study as an early effort to encourage IS and strategic researchers to undertake further investigations into the affordances of emerging technologies in facilitating DT. We hope that our findings will help researchers find new avenues for further research associated with the strategic implication of DT. As new digital technologies are constantly transforming business activities, processes and models, organizational changes at present have inevitably become strongly entangled with digital technologies (Yenni et al., 2020). More generally, the accordance–actualization lens provides a theoretical perspective useful in exploring the potential value and effective use of technologies in the performance of DT. References Abrishami, P., Boer, A. and Horstman, K. (2014), “Understanding the adoption dynamics of medical innovations: affordances of the da Vinci robot in The Netherlands”, Social Science and Medicine, Vol. 117, pp. 125-133. Anderson, C. and Robey, D. (2017), “Affordance potency: explaining the actualization of technology affordances”, Information and Organization, Vol. 27 No. 2, pp. 100-115. Autio, E., Nambisan, S., Thomas, L.D.W. and Wright, M. (2018), “Digital affordances, spatial affordances, and the genesis of entrepreneurial ecosystems”, Strategic Entrepreneurship Journal, Vol. 12 No. 1, pp. 72-95. Babin, R. and Grant, K.A. (2019), “How do CIOs become CEOs?”, Journal of Global Information Management (JGIM), Vol. 27 No. 4, pp. 1-15. Beynon-Davies, P. and Lederman, R. (2017), “Making sense of visual management through affordance theory”, Production Planning and Control, Vol. 28 No. 2, pp. 142-157. Bloomfield, B.P., Latham, Y. and Vurdubakis, T. (2010), “Bodies, technologies and action possibilities: when is an affordance?”, Sociology, Vol. 44 No. 3, pp. 415-433. Borangiu, T., Trentesaux, D., Thomas, A., Leit~ao, P. and Barata, J. (2019), “Digital transformation of manufacturing through cloud services and resource virtualization”, Computers in Industry, Vol. 108, pp. 150-162. Braganza, A., Brooks, L., Nepelski, D., Ali, M. and Moro, R. (2017), “Resource management in big data initiatives: processes and dynamic capabilities”, Journal of Business Research, Vol. 70, pp. 328-337. Burton-Jones, A. and Volkoff, O. (2017), “How can we develop contextualized theories of effective use? A demonstration in the context of community-care electronic health records”, Information Systems Research, Vol. 28 No. 3, pp. 468-489. Bygstad, B., Munkvold, B.E. and Volkoff, O. (2016), “Identifying generative mechanisms through affordances: a framework for critical realist data analysis”, Journal of Information Technology, Vol. 31 No. 1, pp. 83-96. Chemero, A. (2003), “An outline of a theory of affordances”, Ecological Psychology, Vol. 15 No. 2, pp. 181-195. Chen, H., Chiang, R.H. and Storey, V.C. (2012), “Business intelligence and analytics: from big data to big impact”, MIS Quarterly, Vol. 36 No. 4, pp. 1165-1188.

Dual drivetrain model of digital transformation

361

MD 60,2

C^orte-Real, N., Ruivo, P. and Oliveira, T. (2019), “Leveraging internet of things and big data analytics initiatives in European and American firms: is data quality a way to extract business value?”, Information and Management. doi: 10.1016/j.im.2019.01.003. Dong-Hee, S. (2017), “The role of affordance in the experience of virtual reality learning: technological and affective affordances in virtual reality”, Telematics and Informatics, Vol. 34 No. 8, pp. 1826-1836.

362

Du, W.D., Pan, S.L. and Huang, J. (2016), “How a latecomer company used IT to redeploy slack resources”, MIS Quarterly Executive, Vol. 15 No. 3, pp. 195-213. Du, W.D., Pan, S.L., Leidner, D.E. and Ying, W. (2018), “Affordances, experimentation and actualization of FinTech: a blockchain implementation study”, Journal of Strategic Information Systems, Vol. 28 No. 1, pp. 50-65. Eisenhardt, K.M. (1989), “Building theories from case study research”, Academy of Management Review, Vol. 14 No. 4, pp. 532-550. Eisenhardt, K.M. and Graebner, M.E. (2007), “Theory building from cases: opportunities and challenges”, Academy of Management Journal, Vol. 50 No. 1, pp. 25-32. Eisenhardt, K.M., Furr, N.R. and Bingham, C.B. (2010), “Crossroads–microfoundations of performance: balancing efficiency and flexibility in dynamic environments”, Organization Science, Vol. 21 No. 6, pp. 1263-1273. Faraj, S. and Azad, B. (2012), The Materiality of Technology: An Affordance Perspective, Oxford University Press, Oxford. Fichman, R.G., Dos Santos, B.L. and Zheng, Z. (2014), “Digital innovation as a fundamental and powerful concept in the information systems curriculum”, MIS Quarterly, Vol. 38 No. 2, pp. 329-353. Frank, A.G., Dalenogare, L.S. and Ayala, N.F. (2019a), “Industry 4.0 technologies: implementation patterns in manufacturing companies”, International Journal of Production Economics, Vol. 210 No. 4, pp. 15-26. Frank, A.G., Mendes, G.H., Ayala, N.F. and Ghezzi, A. (2019b), “Servitization and Industry 4.0 convergence in the digital transformation of product firms: a business model innovation perspective”, Technological Forecasting and Social Change, Vol. 141, pp. 341-351. George, G., Haas, M. and Pentland, A. (2014), “Big data and management”, Academy of Management Journal, Vol. 57 No. 2, pp. 321-326. Gioia, D.A., Corley, K.G. and Hamilton, A.L. (2013), “Seeking qualitative rigor in inductive research: notes on the Gioia methodology”, Organizational Research Methods, Vol. 16 No. 1, pp. 15-31. Hansen, R. and Sia, S.K. (2015), “Hummel’s digital transformation toward omnichannel retailing: key lessons learned”, MIS Quarterly Executive, Vol. 14 No. 2, pp. 51-66. Iftikhar, R. and Khan, M.S. (2020), “Social media big data analytics for demand forecasting: development and case implementation of an innovative framework”, Journal of Global Information Management (JGIM), Vol. 28 No. 1, pp. 103-120, doi: 10.4018/JGIM.2020010106. Jay, L., Hossein, D.A., Shanhu, Y. and Behrad, B. (2015), “Industrial big data analytics and cyberphysical systems for future maintenance and service innovation”, The Fourth International Conference on Through-Life Engineering Services. Ji, C., Shao, Q., Sun, J., Liu, S., Pan, L., Wu, L. and Yang, C. (2016), “Device data ingestion for industrial big data platforms with a case study”, Sensors, Vol. 16 No. 3, pp. 279-294. Karimi, J. and Walter, Z. (2015), “The role of dynamic capabilities in responding to digital disruption: a factor-based study of the newspaper industry”, Journal of Management Information Systems, Vol. 32 No. 1, pp. 39-81. Kusiak, A. (2017), “Smart manufacturing must embrace big data”, Nature, Vol. 544 No. 7648, pp. 23-25. Leidner, D.E., Gonzalez, E. and Koch, H. (2018), “An affordance perspective of enterprise social media and organizational socialization”, The Journal of Strategic Information Systems, Vol. 27 No. 2, pp. 117-138.

Leonardi, P.M. and Vaast, E. (2017), “Social media and their affordances for organizing: a review and agenda for research”, Academy of Management Annals, Vol. 11 No. 1, pp. 150-188. Li, L., Su, F., Zhang, W. and Mao, J.Y. (2018a), “Digital transformation by SME entrepreneurs: a capability perspective”, Information Systems Journal, Vol. 28 No. 6, pp. 1129-1157.

Dual drivetrain model of digital transformation

Li, X., Li, D., Li, S., Wang, S. and Liu, C. (2018b), “Exploiting industrial big data strategy for load balancing in industrial wireless mobile networks”, IEEE Access, Vol. 6, pp. 6644-6653. Lucas, H.C.J., Agarwal, R., Clemons, E.K., El Sawy, O.A. and Weber, B.W. (2013), “Impactful research on transformational information technology: an opportunity to inform new audiences”, MIS Quarterly, Vol. 37 No. 2, pp. 371-382. Luger, J., Raisch, S. and Schimmer, M. (2018), “Dynamic balancing of exploration and exploitation: the contingent benefits of ambidexterity”, Organization Science, Vol. 29 No. 3, pp. 449-470. Majchrzak, A., Markus, M.L. and Wareham, J. (2016), “Designing for digital transformation: lessons for information systems research from the study of ICT and societal challenges”, MIS Quarterly, Vol. 40 No. 2, pp. 267-277. Makhlouf, M. and Allal-Cherif, O. (2019), “Strategic values of cloud computing transformation: a multi-case study of 173 adopters”, Journal of Global Information Management (JGIM), Vol. 27 No. 1, pp. 128-143, doi: 10.4018/JGIM.2019010107. Markus, M.L. and Silver, M.S. (2008), “A foundation for the study of IT effects: a new look at DeSanctis and Poole’s concepts of structural features and spirit”, Journal of the Association for Information Systems, Vol. 9 No. 10, pp. 609-632. Mcafee, A. and Brynjolfsson, E. (2012), “Big data: the management revolution”, Harvard Business Review, Vol. 90 No. 10, pp. 60-68. Nambisan, S., Lyytinen, K., Majchrzak, A. and Song, M. (2017), “Digital innovation management: reinventing innovation management research in a digital world”, MIS Quarterly, Vol. 41 No. 1, pp. 223-238. Oesterreich, T.D. and Teuteberg, F. (2016), “Understanding the implications of digitisation and automation in the context of industry 4.0: a triangulation approach and elements of a research agenda for the construction industry”, Computers in Industry, Vol. 83 No. 12, pp. 121-139. Pan, S.L. and Tan, B. (2011), “Demystifying case research: a structured–pragmatic–situational (SPS) approach to conducting case studies”, Information and Organization, Vol. 21 No. 3, pp. 161-176. Pablo, B. (2018), “An efficient industrial big-data engine”, IEEE Transactions on Industrial Informatics, Vol. 14 No. 4, pp. 1361-1369. Perera, C., Vasilakos, A.V., Calikli, G., Sheng, Q.Z. and Li, K.C. (2018), “Guest editorial special section on engineering industrial big data analytics platforms for internet of things”, IEEE Transactions on Industrial Informatics, Vol. 14 No. 2, pp. 744-747. Piccoli, G. (2016), “Triggered essential reviewing: the effect of technology affordances on service experience evaluations”, European Journal of Information Systems, Vol. 25 No. 6, pp. 477-492. Rice, R.E., Evans, S.K., Pearce, K.E., Sivunen, A., Vitak, J. and Treem, J.W. (2017), “Organizational media affordances: operationalization and associations with media use”, Journal of Communication, Vol. 67 No. 1, pp. 106-130. Sakreida, K., Effnert, I., Thill, S., Menz, M.M., Jirak, D. and Eickhoff, C.R. (2016), “Affordance processing in segregated parieto-frontal dorsal stream sub-pathways”, Neuroscience and Biobehavioral Reviews, Vol. 69, pp. 89-112. Sebastian, I.M., Ross, J.W., Beath, C., Mocker, M., Moloney, K.G. and Fonstad, N.O. (2017), “How big old companies navigate digital transformation”, MIS Quart. Execut, Vol. 16 No. 3, pp. 197-213. Seidel, S., Recker, J. and Brocke, J.V. (2013), “Sensemaking and sustainable practicing: functional affordances of information systems in green transformations”, MIS Quarterly, Vol. 37 No. 4, pp. 1275-1299.

363

MD 60,2

Shamim, S. (2018), “Role of big data management in enhancing big data decision-making capability and quality among chinese firms: a dynamic capabilities view”, Information and Management, Vol. 56 No. 6, pp. 103-135, doi: 10.1016/j.im.2018.12.003. Simetinger, F. and Zhang, Z. (2020), “Deriving secondary traits of industry 4.0: a comparative analysis of significant maturity models”, Systems Research and Behavioral Science I, Vol. 37 No. 4, pp. 663-678, doi: 10.1002/sres.2708.

364

Smith, J. (2015), “Effects of affordance perception on the initiation and actualization of action”, Ecological Psychology, Vol. 22 No. 2, pp. 119-149. Sobh, R. and Perry, C. (2006), “Research design and data analysis in realism research”, European Journal of Marketing, Vol. 40 No. 11, pp. 1194-1209. Stoffregen, T. (2003), “Affordances as properties of the animal-environment system”, Ecological Psychology, Vol. 15 No. 2, pp. 115-134. Strong, D.M., Volkoff, O., Johnson, S.A., Pelletier, L.R., Tulu, B. and Garber, L. (2014), “A theory of organization-EHR affordance actualization”, Journal of the Association for Information Systems, Vol. 15 No. 2, pp. 53-85. Sun, Y., Song, H., Jara, A.J. and Bie, R. (2016), “Internet of things and big data analytics for smart and connected communities”, IEEE Access No. 4, pp. 766-773. Tambe, P. (2014), “Big data investment, skills, and firm value”, Management Science, Vol. 60 No. 5, pp. 1452-1469. Thapa, D. and Sein, M.K. (2017), “Trajectory of affordances: insights from a case of telemedicine in Nepal”, Information Systems Journal, Vol. 28 No. 5, pp. 796-817. Tim, Y., Pan, S.L., Bahri, S. and Fauzi, A. (2018), “Digitally enabled affordances for community-driven environmental movement in rural Malaysia”, Information Systems Journal, Vol. 28 No. 1, pp. 48-75. Vaast, E. and Kaganer, E. (2013), “Social media affordances and governance in the workplace: an examination of organizational policies”, Journal of Computer-Mediated Communication, Vol. 19 No. 1, pp. 78-101. Vaast, E., Safadi, H., Lapointe, L. and Negoita, B. (2017), “Social media affordances for connective action: an examination of microblogging use during the gulf of Mexico oil spill”, MIS Quarterly, Vol. 41 No. 4, pp. 1179-1205. Vial, G. (2019), “Understanding digital transformation: a review and a research agenda”, Journal of Strategic Information Systems, Vol. 28 No. 2, pp. 118-144. Volkoff, O. and Strong, D.M. (2013), “Critical realism and affordances: theorizing IT-associated organizational change processes”, MIS Quarterly, Vol. 37 No. 3, pp. 819-834. Volkoff, O. and Strong, D.M. (2017), “Affordance theory and how to use it in is research”, The Routledge Companion to Management Information Systems, Routledge, New York, pp. 232-245. Wamba, S.F., Gunasekaran, A., Akter, S., Ren, S.J.F., Dubey, R. and Childe, S.J. (2017), “Big data analytics and firm performance: effects of dynamic capabilities”, Journal of Business Research, Vol. 70, pp. 356-365. Xie, K., Wu, Y., Xiao, J. and Hu, Q. (2016), “Value co-creation between firms and customers: the role of big data-based cooperative assets”, Information and Management, Vol. 53 No. 8, pp. 1034-1048. Xu, Y., Sun, Y., Wan, J., Liu, X. and Song, Z. (2017), “Industrial big data for fault diagnosis: taxonomy, review, and applications”, IEEE Access, Vol. 5, pp. 17368-17380. Yenni, T., Taohua, O. and Delin, Z. (2020), “Back to the future: actualizing technology affordances to transform emperor qin’s terracotta warriors museum”, Information and Management. doi: 10. 1016/j.im.2020.103271. Yeow, A., Soh, C. and Hansen, R. (2017), “Aligning with new digital strategy: a dynamic capabilities approach”, Journal of Strategic Information Systems, Vol. 27 No. 1, pp. 43-58.

Youngjin, Y., Richard, J., Lyytinen, K. and Majchrzak, A. (2012), “Organizing for innovation in the digitized world”, Organization Science, Vol. 23 No. 5, pp. 1398-1408. .

Zeng, J. and Glaister, K.W. (2018), “Value creation from big data: looking inside the black box”, Strategic Organization, Vol. 16 No. 2, pp. 105-140.

Dual drivetrain model of digital transformation

Further reading Evans, P.C. and Annunziata, M. (2012), Industrial Internet: Pushing the Boundaries of Minds and Machines, General Electric, Technical Report, available at: http://www.ge.com/docs/chapters/ Industrial_Internet.pdf (accessed 6 November 2014). Evans, S.K., Pearce, K.E., Vitak, J. and Treem, J.W. (2017), “Explicating affordances: a conceptual framework for understanding affordances in communication research”, Journal of ComputerMediated Communication, Vol. 22 No. 1, pp. 35-52. Fayard, A.L. and Weeks, J. (2014), “Affordances for Practice”, Information and Organization, Vol. 24 No. 4, pp. 236-249. Geng, D., Zhang, C., Xia, C., Xia, X., Liu, Q. and Fu, X. (2019), “Big data-based improved data acquisition and storage system for designing industrial data platform”, IEEE Access, Vol. 7, pp. 44574-44582. Grgecic, D., Holten, R. and Rosenkranz, C. (2015), “The impact of functional affordances and symbolic expressions on the formation of beliefs”, Journal of the Association for Information Systems, Vol. 16 No. 7, pp. 580-607. Janssen, M., Voort, H. and Wahyudi, A. (2017), “Factors influencing big data decision-making quality”, Journal of Business Research, Vol. 70, pp. 338-345. Kane, G.C. (2014), “How Facebook and Twitter are reimagining the future of customer service”, MIT Sloan Management Review, Vol. 55 No. 4, pp. 29-36. Kane, G.C. (2017), “Big data and IT talent drive improved patient outcomes at Schumacher clinical partners”, MIT Sloan Management Review, Vol. 59 No. 1, pp. 39-45. Ketokivi, M. and Choi, T. (2014), “Renaissance of case research as a scientific method”, Journal of Operations Management, Vol. 32 No. 5, pp. 2-240. Lehrer, C., Wieneke, A., vom Brocke, J., Jung, R. and Seidel, S. (2018), “How big data analytics enables service innovation: materiality, affordance, and the individualization of service”, Journal of Management Information Systems, Vol. 35 No. 2, pp. 424-460. Leonardi, P.M. (2011), “When flexible routines meet flexible technologies: affordance, constraint, and the imbrication of human and material agencies”, MIS Quarterly, Vol. 35 No. 1, pp. 147-168. Leonardi, P.M. (2013), “When does technology use enable network change in organizations? A comparative study of feature use and shared affordances”, MIS Quarterly, Vol. 37 No. 3, pp. 749-775. Maedche, A. (2016), “Interview with michael nilles on ‘what makes leaders successful in the age of the digital transformation?’”, Business and Information Systems Engineering, Vol. 58 No. 4, pp. 287-289. Majchrzak, A. and Markus, M.L. (2012), “Technology affordances and constraints in management information systems (MIS)”, Encyclopedia of Management Theory, SAGE Publications, Thouasand Oaks. Pan, S.L., Newell, S. and Cui, L.L. (2016), “The emergence of self-organizing E-commerce ecosystems in remote villages of China: a tale of digital empowerment for rural development”, MIS Quarterly, Vol. 40 No. 2, pp. 475-484. Strauss, A. and Corbin, J.M. (1998), Basics of Qualitative Research: Grounded Theory Procedures and Techniques, Sage Publications, Thousand Oaks, CA.

365

MD 60,2

Treem, J.W. and Leonardi, P.M. (2012), “Social media use in organizations: exploring the affordances of visibility, editability, persistence, and association”, Annals of the International Communication Association, Vol. 36 No. 1, pp. 143-189. Waller, M.A. and Fawcett, S.E. (2013), “Data science, predictive analytics, and big data: a revolution that will transform supply chain design and management”, Journal of Business Logistics, Vol. 34 No. 2, pp. 77-84.

366

Zammuto, R.F., Griffith, T.L., Majchrzak, A., Dougherty, D.J. and Faraj, S. (2007), “Information technology and the changing fabric of organization”, Organization Science, Vol. 18 No. 5, pp. 749-762. Zhang, Q., Yang, L.T., Chen, Z., Li, P. and Bu, F. (2019), “An adaptive dropout deep computation model for industrial IoT big data learning with crowdsourcing to cloud computing”, IEEE transactions on industrial informatics, Vol. 15 No. 4, pp. 2330-2337.

Appendix Q: Could you explain CPC’s industrial big data? A: Our industrial big data is dynamic, not static. We develop a large number of software to record the corresponding data from entire life cycle of the equipment and device and replace the traditional manual recording. Industrial big data is useful to help enterprise customers realize their goal, such as security, reliability, operational efficiency or return on investment. Data analysis service is one of our major advantages. We collect and analyze data according to the customers’ equipment management objectives, then select the key points of equipment to install sensors. In this way, the key parameters and standard indicators that affect the operation of equipment are collected. We can clearly know what is the most useful data of equipment and avoid collecting the slack data of equipment blindly. Q: What’s the relationship between industrial big data and CPC’s products and service? A: At the beginning, we provide equipment management systems for individual companies. Later, many companies gradually became grouped. We then develop many software systems for large group companies. These group equipment management systems cover a wider range of different industries and enriched our data resource. Based on the diversify data resource, we can analyze equipment status, sort out equipment management problems and develop customized equipment management systems for every enterprise according to their actual conditions; we also provide education and training (such as equipment maintenance management concept, equipment operation management and evaluation, inspection and maintenance technical standards), maintenance and other services. Q: Could you explain more about the standard and customized equipment management? A: For SME customers, their equipment needs are relatively simple and tend to be homogenized, we provide standardized products (such as EAM2012, EAM2015, etc.), at the same time we often improve and upgrade the existing equipment management software EAM system (such as EAM11g, EAM2012, EAM2015, etc.) according to the market demand, we also develop a series of mobile device management software, which could send work tasks directly to the operator’s smartphone device, enabling the operation and maintenance worker to realize mobile office at any time in any place. For large-scale group enterprise users with variety of equipment, and complicated equipment management process, we provide comprehensive management and control service for them in the form of customized projects. For example, China National Petroleum Pipeline (CNPP) is one of the most complex pipeline network systems in the world (referred to as “pipe network system”). The effective operation of the pipe network system requires a large number of modern high-precision equipment. We integrate the international industry standards and also combine their equipment conditions, research and develop an advanced oil and gas pipeline network equipment management platform for CNPP. Q: How to attract customers to join CPC’s industry Internet platform? A: Our idea is to weave a small network and then let the company to expand the network based on their own needs. Every company could establish a special “channel.” One channel is a representative of a class industrial enterprises. After the channel is established, the channel’s upstream and downstream enterprise customers will develop into users of the platform, forming a grid-like grid gradually. Q: CPC’s platform provides what services for enterprise users?

A: In terms of procurement, we provide users with Sunshine Cloud procurement services. The platform provides online inquiry, bidding and expands the cooperation network of enterprises through various transactions. It also provides predictive maintenance service, we can monitor the operation of the device in real time, arrange maintenance management in time and provide predictive maintenance based on the data acquisition and transmission device and sensors installed on the user’s equipment. In terms of sales, the platform can help users to achieve accurate matching in transactions, sales expansion, production coordination, after-sales/maintenance services. In terms of industrial credit, we already built an enterprise credit evaluation system including a business health assessment and early warning analysis that meets the bank’s risk control standards. This system can build financial institutions’ trust in small enterprises and is helpful in enabling parties that are excluded from existing financial services. Built on this system, the platform promotes credit transactions between banks and enterprises.

Corresponding author Zuopeng (Justin) Zhang can be contacted at: [email protected]

For instructions on how to order reprints of this article, please visit our website: www.emeraldgrouppublishing.com/licensing/reprints.htm Or contact us for further details: [email protected]

Dual drivetrain model of digital transformation

367

The current issue and full text archive of this journal is available on Emerald Insight at: https://www.emerald.com/insight/0025-1747.htm

MD 60,2

368 Received 31 January 2020 Revised 21 March 2020 11 April 2020 1 May 2020 Accepted 14 May 2020

Antecedents to firm performance and competitiveness using the lens of big data analytics: a cross-cultural study Abhishek Behl Shailesh J Mehta School of Management, IIT Bombay, Mumbai, India Abstract Purpose – The study aims to understand how big data analytics capabilities of tech startups help them gain competitive advantage and improve their firm performance. The study is performed for two countries: India and China. A comparative analysis is also discussed in the study. Design/methodology/approach – The study collected responses from tech startups from both India and China. A total of 502 responses were collected with 269 from India and 233 from China. The results were analyzed using Warp PLS 6.0 after testing for common method bias, endogeneity and reliability of data. The study tested five primary hypotheses and also tested the effect of two control variables: country of origin of startup and age of the startup. Findings – We found that big data analytics capabilities have a positive and significant impact on the firm performance and competitive advantage of tech startups. While organizational culture proved to have a positive impact as a moderator, innovation was found to have non-significant effect. The results also found to have non-significant effect of age of the firm while its country of origin does play an important role in defining its success. Originality/value – The study offer key insights for the tech startups operating in two countries which are geographically neighbors but differ in the tech expertise from each other. Moreover, the study offers key insights on how does the origin of the country contributes significantly to explaining the success and competitiveness of the firm. Keywords Big data analytics, Competitive advantage, Firm performance, Startup, Innovation, Organizational culture Paper type Research paper

Management Decision Vol. 60 No. 2, 2022 pp. 368-398 © Emerald Publishing Limited 0025-1747 DOI 10.1108/MD-01-2020-0121

1. Introduction The performance of firms has always been of interest to management scholars. The changing dynamics of business are mostly driven by transformations in intellectual capital, modern and advanced technological upgradations, roles and relevance of stakeholders, which have led firms to being more competitive. Darwin’s theory of “survival of the fittest” is one of the apt theories in such a competitive environment. A large part of competitiveness is driven by the kind of resources which any firm possesses and more importantly how it builds upon it and/or uses it for expanding its business (Anwar, 2018). Management scholars have recently laid the importance of strategic decisions for gaining a competitive edge over rivals, most of which are driven by data (Raffoni et al., 2018; Raguseo and Vitari, 2018). A recent report by Gartner confirms that established firms which have larger market capitalization possess lesser risk and are often averse to change as compared to firms with small market capitalization. The dynamic nature of market and uncertain business dynamics makes it difficult for young firms to take appropriate strategic decisions at times, which leads to further dip in their existing market capitalization and in extreme cases go out of business The author would like to thank and acknowledge Dr Rameshwar Dubey for his constant mentoring and significant contribution to the field of Big Data Analytics and organizational theory.

(Pusala et al., 2016; Kubina et al., 2015). Firms that have survived in such a dynamic and competitive environment are the ones that have used innovation as their key to success (Jeble et al., 2018; Charles and Gherman, 2013). The death of brands like Nokia and Kodak are examples where lack of innovation and changing business practices with time led to their complete failure. Another critical success factor for success is the role of data which is the new oil for most businesses (Beyer and Laney, 2012). Firms have gained competitive advantage and have built strategic frameworks using big data analytics (BDA), machine learning, artificial intelligence, blockchain technologies, cloud computing, etc. (Akter and Wamba, 2016; Behl et al., 2019). This has also led to firms practicing and living in the world of industry 4.0 (Bag et al., 2020b). As all these factors help in examining the success of firms, there exists a need to understand the behavior of tech startups in an environment which is competitive, driven by innovation and has a huge scope as far as the application of data analytics is concerned. Blank (2020) defines startup as “an organization formed to search for a repeatable and scalable business model.” While the definition points to concepts like repeatability and scalability as two primary attributes, it is attributed to intellectual capital and competitiveness (Cavoukian and Castro, 2014). Unlike established firms, where risk appetite is higher, startups behave differently in understanding and reacting toward competitiveness and score differently when it comes to practicing innovation. A large part of their functioning can be explained by the sector and immediate competitors they face and more importantly their readiness to change and adapt to the growing needs of customers/ clients. Studies also claim that entrepreneurial strategy and intellectual capital act as a foundation for startups as they significantly impact competitiveness and sustainability in the business (Chen, 2019; Davenport and Bean, 2018). When compared to external factors, internal factors play a significant role in understanding performance-level indicators of any startup (Furtardo et al., 2017). Knowledge management also acts as a catalyst in fostering change in thoughts and actions for startups as it infuses a sense of sustainability in the actions taken by startups. With multiple theories and factors explaining the workings and reactions by startups, it is worthwhile to explore this area for geographies that have a high growth rate of tech startups in recent years. India and China are two such nations where the rate of growth of startups is more or less similar and more than that, they are believed to behave similarly when exposed to competition (Behl et al., 2019). Studies although have debated on why Indian markets are not ready to compete with China; however, most of the debate is because of the tech-investments made by these countries in setting up and growing the e-commerce market. There is no standard recipe for the success of any startup, but drawing upon the lens of Whetten (1989); Sutton and Staw (1995); Wacker (1998); it becomes important to ask three significant questions: what, why and how in the context of startups. Earlier studies have attempted to answer mostly one of the three questions using different methodologies across different parts of the world (Gupta and George, 2016). A large part of these studies is done in geographies that are technologically advanced, rich in knowledge and offer intellectual capital by human capital outsourcing. On the flip side, it is recorded that the rate of growth of startups in these countries is relatively lower in comparison to nations like India and China where growth and development parameters indicate the scope of growth of new businesses. The recent launch of “Startup India” in 2015 and the ongoing “Made in China 2025” have helped the best brains and ideas to germinate and grow in the same country. While both these countries are gearing up for a startup marathon, there lies a structural difference in the sectors which they have picked up. Indian startups largely rely on four pillars of education, ride-hailing, e-commerce and online payments, whereas China invests its ideas in developing artificial intelligence, robotics and digital money like Bitcoins. While there are other sectors which feature in the list, but their order of appearance is different for both the countries. This

Big data analytics in cross-cultural study 369

MD 60,2

calls for understanding the two economies which are aiming toward the same goal of nurturing startups but are different on paths and output which they want to achieve or output which would be achieved. The study, therefore, aims to answer the following research questions: RQ1. What role does data analytics capabilities of a firm play in enhancing the firm performance (FP) in the context of Indian and Chinese tech startups?

370

RQ2. How does organizational culture and innovation help moderate the relation between data analytics capabilities and FP in Indian and Chinese startups? To answer these questions, we investigate the antecedents to the success of startups in India and China through the lens of dynamic capabilities view (DCV) theory and examine the role of competitiveness and BDA in explaining better FPs across the two countries. DCV is an extension of the resource-based view (RBV) theory which proposes that firms’ internal capabilities contribute more significantly toward superior performance as compared to external capabilities (Barney, 1991). Extending the theoretical arguments of RBV into DCV, we propose to study the relationship between big data analytics capabilities (BDAC) of startups and competitive advantage which could then be used to explain FP. The proposed relationship is moderated by organizational culture and innovation, while the entire framework is controlled by control variables like Age and Country of Origin. The rest of the paper is discussed as follows. Section 2 discusses theoretical underpinning, key review of the literature and operationalizes the constructs along with their contextual references. Section 3 details the discussion on all the hypotheses and proposes a theoretical framework. Section 4 elaborates on research design and nuances of data collection and preliminary results for data validation. Section 5 presents the results of the study for both Indian and Chinese startups. Section 6 presents a discussion of results for each country and compares the same with each other along with proper justifications and reasoning. Section 7 offers a critical debate on how does the study contributes to theory and practice and discusses the shortcomings of the study along with the future scope. Section 8 concludes the study. 2. Literature review The journey of any entrepreneurial firm into the startup space is driven by multiple factors, while their success is driven by factors like vision, leadership, financial strength, technology support (Li et al., 2019). There also exist factors which are not in control of the startup firms like assessment and action on competition, degree of novelty of the idea, rules and regulations laid down by the industry and government of the geography of operation (Spender et al., 2017; Tellis et al., 2009). One of such scientific measure is “Ease of doing business” promoted by World Bank since 2003. Countries like New Zealand, Singapore and Denmark feature in the top three countries consistently, while there are countries that have significantly climbed up the rankings. India and China are two such countries that moved up their ranks significantly. India which stood at 142 rank in 2015 climbed up to 77th rank in 2019 and 63 in 2020, while China jumped from 90th rank to 46 in 2019 and 31 in 2020 in the same time frame. The World Bank uses a five step process to measure “Ease of Doing Business” which includes opening a business (starting a business, employing workers); getting a location (dealing with construction permits, getting electricity, registering property); accessing finance (getting credit, protecting minority investors); dealing with day-to-day operations (paying taxes, trading across borders) and operating in an secure business environment (enforcing contracts, resolving insolvency) (World Bank Report, 2020). Over the past five years, both the countries have seen a significant surge in opening new businesses which has helped their economy to grow significantly. Looking at the statistical data from the World Bank report, it is found that the Indian startup market has raised $50 billion across

3,700 þ deals. The startup ecosystem has already yielded more than 500 acquisitions and have created close to 7,50,000 jobs in Mumbai, Delhi and Bengaluru and a lot of them have also expanded operations in multiple countries. India stands at the third highest start up economy after the USA and China. This also makes China an interesting economy to study. With India and China featuring in top three spots and significantly gaining ranks in “ease of doing” business, it is worthwhile to understand two key aspects: competition within the country and among the firms. India has created some great unicorns in the recent past which includes Delhivery, BigBasket, Ola Electric, Dream 11, Rivigo to name a few. While, China and the USA dominate with over 80% of the world’s known unicorns, despite representing only half of the world’s GDP and a quarter of the world’s population. It is therefore worthwhile to understand how both India and China’s start up culture through appropriate theoretical and practical lens. Studies have explored the success and failure of startups from different theoretical viewpoints like human capital theory, expectancy theory, stakeholder theory, crowdfunding theory, prospect theory, etc. (Tsai et al., 2015; Wamba et al., 2015). The literature on startups can also be further classified based on technology and sector. It is important to, therefore, study the success factors of startups through a systematic review of the literature (Tranfield et al., 2003) and funnel them down in the context of India and China. A next step would be to study them using an appropriate theoretical lens and propose hypotheses. The subsequent sections are written keeping this flow. 2.1 Underpinning theory Dynamic capability view (DCV) theory has its roots grounded in RBV theory and has its applications in the strategic management literature (Vogel and G€ uttel, 2013; Makadok, 2001). The theory has been widely used to understand the economic competitiveness of the firm through a distinct and unique mechanism of capacity building which is considered to be a resource (Lawson and Samson, 2001; Vogel, R. and G€ uttel, 2013). Teece et al. (1997) proposed that for any firm to make a distinguishing mark in the market by its available set of resources, RBV theory can help in explaining the process. The concept was further extended and supported by arguments of Wernerfelt (1984) who supplemented the thought by addressing the importance of resources of any firm to gain competitive advantage among its competitors. It is also documented that RBV tends to support the argument of adding economic value to any firm by strictly working on their internal resources, developing capabilities for their external resources and deploying them strategically against the competitors (Kim et al., 2015). The application of RBV was multiple and studies confirm its application in developing countries and developed countries, for firms across sectors and specializations, for resources related to information and communication technology, BDA, blockchain technology, etc. The theory was able to explain the utilization and importance of resources for firms but lacked a better understanding in a dynamic setting wherein there are multiple players and each of them works with their own set of resources (Kraaijenbrink et al., 2010). Wang et al. (2016) concluded that RBV can be further extended using DCV which will help in explaining the competitive advantage at the firm level and also instill innovation in firms. The theory holds relevance for firms across sectors in an oligopolistic market and moreover for firms that have entered the competition recently. A digital startup faces similar issues as it needs to compete with players of their maturity in terms of existence and moreover with firms that have gained a higher degree of maturity and have a higher degree of resources (Stubbs, 2014; Freeman and Engel, 2007). Therefore, DCV holds a better theory to explain the digital startup market as the players are more dynamic in nature compared to an offline startup for which the competition is localized and often bound by geography. It is also comprehended that the online startups have to constantly work on improving the technical

Big data analytics in cross-cultural study 371

MD 60,2

372

support in order to understand the omnipresent customers and understand how their competitors are using technology (Duan and Xiong, 2015; Carlson and Usher, 2016). Referring to the work of Teece et al. (1997), dynamic capabilities encapsulate the firm’s capabilities to deploy, integrate and upgrade the competencies both at the internal and the external level. Lawson and Samson (2001) further added that dynamic capabilities support firms to substantially raise their profits by keeping a check on the firm’s capabilities in an uncertain environment which holds true in today’s scenario wherein online startups are increasing at a rapid pace. Thus, DCV can offer stronger theoretical support in explaining firms’ ongoing investment practices in resource building, their market analysis and their actions to remain competitive without compromising on the FP. The application is DCV suits best in the context of online startup firms that face issues like risk and volatility, which have to focus on tangible and intangible (Chen et al., 2012), internal and external resources to remain competitive (Erevelles et al., 2016), or else they will not survive in the business. The study, therefore, uses the theoretical foundations of DCV to explain how BDAC helps the firms to gain competitive advantage which then positively impacts a firm’s performance. We also control for innovation and organizational culture as moderators which are key indicators in explaining the firm’s performance and startups to gain a competitive advantage in the online competitive space. The next section discusses the scope and operationalizes the constructs in the context of the present study and details the importance of moderating variables in the proposed hypothetical model. 2.2 Big data analytics Capability BDA has been in the industry and in practice for a while now. It is seen that the width and the depth of BDA have increased over time largely because of the increase in the scale and magnitude of data and better computing processors. Studies have predicted diversity and variety of use of big data and predictive analytics (BDPA) for creating a niche in the competitive space for every firm (Wamba et al., 2015; Akter et al., 2016). BDPA also offers key insights to the business by offering strong statistical support to interdisciplinary studies which can help businesses grow (Dubey et al., 2018). The string of analytics coupled with 5 V’s of big data makes the process of analyzing raw data from the viewpoint of the descriptive, inferential and prescriptive lens (Nguyen, 2018). Recent studies have also claimed that BDPA can offer insights to any firm by churning complex and unstructured data into meaningful insights (Shamim et al., 2019; Sun et al., 2018). Studies also claim BDAC is an organizational capability that firms use to gain a competitive advantage in a dynamic environment (Wamba et al., 2017; Gupta and George, 2016). Firms practice this by engaging big data resources in the form of machines and specialists to churn data and capture more customers and/or work toward retaining their customers to gain market share and in turn improve their profitability. We can, therefore, confirm that by extending the learnings of DCV theory, a startup firm would also have to rely on two key factors: machine and manpower to develop their BDAC (Winter, 2003; Schoenherr and Speier-Pero, 2015). Thus, BDAC can be operationalized as the combination of data and skillset (both technical and non-technical) for making meaningful and timely predictions of the market which in turn will help strategize them better against their competitors. A brief discussion on the subparts of BDAC can be referred to as under: 2.2.1 Data and its importance. Data is the new oil that runs businesses and its abundance has been a challenge rather than its scarcity. Firms leverage the benefits of it to gain a competitive edge over their competitors. The concept of “big data” has been in the industry for a very long time but its popularity and operational definition have changed quite recently which makes it attractive for every business to look at (Mishra et al., 2018a; Mikalef et al., 2018). The transition from 3 V’s (volume, variety and velocity) to 5V’s has added two key

elements of “value” and “veracity.” The magnitude and speed of data have also been aided with its value which has helped gain an edge over others (Galina, 2013). One of the most important results which firms look forward to is the capacity of data to reveal trends, help predict the future and understand the dynamic nature of businesses (Galina, 2013; Dubey et al., 2019a, 2019d). BDPA has also helped firms especially startups to understand the hidden layers of information and helped them place their product/service at par with the existing competitors. Studies also reflect BDAC as a key resource for strengthening their core operations and therefore strategize their next steps (Ghezzi and Cavallo, 2018; Liu, 2014). The popularity and need for BDAC have helped firms to hire specialists in their firms. Reports have also claimed that 85% of the tech startup firms stress on understanding BDPA to gain market insights and use tools to extract information for their business. With data, comes an urge to understand its nuances and moreover hire resources to understand its language as well (McAfee et al., 2012; Batistic and van der Laken, 2019). Thus, while data is important, but it alone cannot solve issues in any company. 2.2.2 BDPA skillset and its importance. BDPA skills are hard to possess because appropriate resources are scant. Studies have discussed that firms need both managerial and technical skills to ensure BDPA tools are harnessed to the best of their capacity (Gupta and George, 2016; Mikhalef et al., 2020). It is also reported that in business expansion and moreover in new business setup, managers and technical staff should understand data from its generation phase to its analysis phase. Technical skills are something that every firm thrives for and it becomes more difficult for a startup to possess a team of highly trained technical staff because of their uncertain nature of business growth (Nemati and Khajeheian, 2018). Startups face challenges with respect to hiring, compensating and retaining a technical resource as their business operations are governed by revenue while at the same time, they have to be competitive in the market (Prescot, 2014, 2016). The technical workforce available is equally underprepared as most of them are academically strong but lack required and relevant practical exposure which is also a matter of concern for startups while hiring (Rai and Tang, 2010; Kabir and Carayannis, 2013). The technical human resource is also difficult to retain because of their higher demand which makes them vulnerable resources. Thus, the success of any startup depends on their technical team which understands data as well as understands business to present key insights for them to be competitive. Firms also need to invest substantially to upgrade the knowledge and skills of their technical workforce to be competitive (Hansen and Wernerfelt, 1989; Abbasi et al., 2016; Akter et al., 2016; Dubey et al., 2019b, 2019c). While technical skills offer brains to business, managerial skills offer smartness and personality to the business. Managerial skills are a result of working experience with multiple clients and on multiple roles for a considerable amount of time for different/similar firms (Abbasi et al., 2016; Mikhalef et al., 2020). The success of any BDPA project depends on the managerial skills of the team, a large part of it is driven by how clearly, they understand the goals and how they strategize to achieve that with given resources (Cavoukian and Castro, 2014). Managerial skills are also a function of time and its reward which an employee looks forward to. Thus, for any startup, getting these embedded into their culture becomes relatively tough as they are low on time spent in the competitive space (Ngai et al., 2017). It is also postulated that startups face the risk of hiring resources which are either promising on paper or are far away from understanding the complications of BDPA and its importance in driving their business (Chen, 2019; Ghezzi and Cavallo, 2018). BDAC is largely dependent on how well the managerial functions are handled by the team and how can they translate their skills in achieving profits and treated as competitors by others (Bag et al., 2020a). It can be thus summarized that apart from data which is generated by machines, the data managers are the ones who are actually responsible for supporting life to any business.

Big data analytics in cross-cultural study 373

MD 60,2

374

2.3 Firm performance Any firm in their sector is usually judged by its performance and market capitalization. Studies have documented that FP is a key indicator of the growth as well as the financial and intellectual health of the company (Gunasekaran et al., 2017; Hansen and Wernerfelt, 1989). It is also noted that companies often invest in both tacit and explicit knowledge to raise their intellectual capital and therefore their market share (Mishra et al., 2018b). Therefore, both market share and financial health form key indicators to assess the firm’s performance. There are various proxies used in academic literature for FP (Mithas et al., 2011). Out of all different types, studies have stressed the importance of operational performance and market performance to measure a firm’s performance (Gupta and George, 2016; Rai and Tang, 2010). Both the parameters are also used in the context of startups as well as help in measuring both financial and operational performance thereby helping them to understand how are they standing within their peers. Market performance for any online startup is crucial to understand their market share of existing customers and potential customers as well. A firm scoring high on market performance alongside a strong and significant rise in operational performance minimizes the risk of losses and shutdown (Ngai et al., 2017). Financial health which is often used as a proxy to financial health helps in confirming that firms’ startups have started to gain attention and there is a constant flow of money. As startup firms do not compete with only their digital twins in terms of ideas but have their fathers and forefathers with a wider scale of business, it becomes more important for these firms to grow their performance score as much as they can (Bag, 2017). 2.4 Competitive Advantage (CA) There is a steep rise in entrepreneurial activities and the expansion of businesses. Firms especially when they enter the market with a new business model aim to understand the market and in the long run wants to capitalize the market (Prescott, 2014, 2016). Firms invest in both tangible and intangible resources to stay competitive and moreover be agile (Stubbs, 2014). Sustainable business operations help firms achieve excellence while competition helps them constantly improve. Studies have also stressed the importance of ongoing competition within and outside the firm to ensure productivity and profits (Worster et al., 2014). It is also postulated that firms with an innovative idea and with the first-movers advantage tend to be more competitive than their followers. Businesses operated on digital platforms face bigger issues when it comes to customer retention (Bradlow et al., 2017) because of the volatility of their needs and existing players with a larger and effective workforce. Kabir and Carayannis (2013) have discussed the need to measure and standardize metrics for measuring and operating with a competitive advantage in order to create and maintain benchmarks with other players. It is recently been discussed in e-commerce literature that while innovation drives new businesses, the spirit of competitiveness has helped them grow and foster (Palem, 2014). It is therefore, worthwhile to explore how competitive advantage can help digital startups create and sustain their niche amongst their peers. 2.5 Organizational Culture (OC) Organizational culture is a key driver behind the success of any organization. Studies have proved that organizational culture is the outcome of employees and their relationship within themselves and with customers/clients. It is also proven that a well-nurtured and cultured organization has a better performance and contributes toward the sustainability of the practice of excellence (Davenport and Bean, 2018; Dubey et al., 2019a). Recent works have also explained the tangible and intangible resources help build organizational culture as it inculcates a habit of understanding recent practices which the industry is performing/ using in any firm (Dubey et al., 2019b; Frisk and Bannister, 2017). It is also seen that organization culture has been pivotal in helping firms adopt, implement and adopt BDPA

projects (Teece, 2015; Jeble et al., 2018). Studies have also highlighted the importance of key decision-makers and their motives as a prominent decision-maker for shaping the culture of any organization (Frisk and Bannister, 2017). Unlike firms that have matured over time and have stabilized with their organizational culture, there lie startups that have relatively a shorter life in the market and are in the phase of building organizational culture. The team sizes are also smaller and the business models of startup firms are different from that of giants. Therefore, while adopting a big data and predictive analytics tools and using them, the internal culture of employees and the owners play a significant role (Simon, 2013; Trabucchi and Buganza, 2019). The key stakeholders in startups also face the risk of investing in their existing resources in order to upgrade them or buying new resources to replace the old ones. It is also imperative that while they strive hard to build their culture of work, they are also bound by a dynamic environment around them. Studies were done for understanding the importance of inter- and intrapersonal skills (Hofstede, 1998) and rational choices made by the employees to evaluate the needs of any technology results in understanding how organizational culture helps in promoting FP (Shamim et al., 2019) and/or helps them survive in the competitive space. Of many and varied classifications for studying organizational culture, we have adopted and extended the work of Dubey et al. (2017) to understand the transaction-based culture or relationbased culture in understanding the adoption behavior of BDPA in startup firms. We have used the cultural dimension as a moderating variable to understand the role of BDAC in explaining FP and competitive advantage. 2.6 Innovation(IN) Innovation in its contemporary form is defined as introduction of a new product or qualitative change in the existing product, process innovation new to the industry, opening of a new market, development of new sources of supply of raw material or other inputs or changes in industrial organizations (Rogers and Rogers, 1998). Firms have used innovation as their primary tool for creating their market space and have continuously worked toward improving on the realm of product or service innovation. A lot of companies have also used knowledge as a key resource which helps them to innovate while others have used technology to distinguish them from masses. Innovation also involves the creation of entirely new knowledge as well as diffusing existing knowledge (Spender et al., 2017; Tellis et al., 2009). More recently, studies have also discussed how innovation help firms gain better FP (Weiblen and Chesbrough, 2015). Recent studies done for startups have proposed innovation as a key indicator in understanding their success (Carlson and Usher, 2016). Moreover, the rate and direction of innovation have also been expressed as a catalyst in explaining their growth. Most of the startup firms are started because of an idea that symbolizes innovation and their success is a function of adding fuel to their innovative idea through resources. It is also observed that tech startups have innovation clubbed with new or improved technology to address practical gaps (Chen, 2019; Oliva and Kotabe, 2019). Both internal and external innovation helps the firms to survive and capitalize on the market share and gain a competitive advantage. BDPA is also seen as innovative practice for tech-based startup firms and its applications range from fraud detection, customer engagement, understanding behavioral intentions, analyzing social media patterns, investment decisions, etc. (Kim et al., 2015). Sun et al. (2018) assert that both product and service innovation would be essential for any online startup to survive irrespective of their geography and magnitude of customers. In light of the above facts, this study aims to understand how innovation would help explain the competitive advantage and FP better. 3. Theoretical framework and hypothesis development The study applies the theoretical foundations of DCV and proposes to understand how BDAC can impact competitiveness and FP for digital startups. In order to discuss the arguments

Big data analytics in cross-cultural study 375

MD 60,2

376

proposed in this study, we referred to earlier studies that have used DCV theory to discuss similar arguments in a firm’s perspectives across various countries and across different timelines. We performed a systematic review in exploring the studies which have discussed one or multiple such hypotheses in different technological interventions as well. We also know that capabilities and resources are the key ingredients of DCV wherein capabilities are required to improve the performance and productivity of other resources while resources refer to knowledge, technology and human resources in any firm. We also discuss how the application of DCV is relevant and apt for understanding the nature of digital startups as they have features resembling existing firms but face a different degree of risks. It is also important to understand that investment for any technology and moreover its intervention to understand its market share is not a regular activity performed by startups. In light of the existing argument, we have discussed each of the primary proposed hypothesis along with the discussion for interacting and moderating effect of “innovation” and “organizational culture” for digital startups in different countries. Figure 1 presents the proposed hypothetical framework for the study: 3.1 Positive relationship between BDAC and FP Akter et al. (2016) in their seminal review article discuss the role of BDPA in improving FP which was further tested by Wamba et al. (2017). Referring to the literature of BDPA and FP, it is seen that new and improved technology intervention has helped firms to perform better and increase their efficiency. Gupta et al. (2018) also discussed how cloud ERP and BDPA positively impacts FP. Literature also supports a positive and significant impact of big data capabilities of firms on their financial performance indicators like return on investment, return on equity, sales revenue, market capitalization and has also helped inefficient management of their customers/clients (Wamba et al., 2017; Gupta and George, 2016). It is also witnessed that e-commerce startups which are driven by technology are developing their big data capacity as they have seen promising results from their established competitors in their respective businesses. There seems to be less evidence available exploring reasons why and under what conditions companies adopt and implement BDPA in their operations (Wang et al., 2016), but it is well established that there is a positive impact on their market and

Figure 1. Theoretical framework with proposed hypothesis

operational performance post-adoption (Vitari and Raguseo, 2019). Srinivasan and Arunasalam (2013) tested a similar argument for the healthcare sector and found that BDPA has helped them reduce healthcare waste and fraud. Dubey et al. (2017) also found positive and significant results in a humanitarian context. As tech-based firms generate larger and frequent data in their business operations and moreover, they are more prone to the risks of performance, there is a need to explore the relationship in their context. It is also important to assess the same for different sectors as it would help in generalize DCV theoretical implications for tech startups. We, therefore, hypothesize that: H1. There is a positive significant impact of BDAC on the FP of tech startups. 3.2 Positive relationship between BDAC and competitive advantage (CA) Big data and its scope have changed ways and reason for firms that have adopted them. Most of the firms began the process of adoption because of expanding operations and studies claim that in larger corporations treated it to be a luxury earlier. With the passage of time, studies have presented the transition from a luxury for firms to the necessity which has helped them gain CA (Kubina et al., 2015; Mikalef et al., 2020). It is also shown that BDPA capabilities are outsourced as firms face concerns regarding hiring and training resources for getting deeper insights. It is also reported that firms have gradually drifted from hiring third party resources to build technical expertise within the firms to ensure customized and on-demand solutions (Nemati and Khajeheian, 2018). The growth of the size and magnitude of data has also forced firms to use BDAC to remain competitive. Drawing from DCV theory, BDAC has proved to be a key resource for firms to expand the business (Vogel and G€ uttel, 2013; Wamba et al., 2017). The positive effect of using BDAC in gaining market share has been reported for firms in countries like China, Australia and European companies. While most of the earlier studies have looked at developing BDAC capacities internally because of firms wanting to have more control, it becomes a necessity for new ventures which lack capital and human resources (Gupta et al., 2018; Winter, 2003). Tech-based startups often produce data as large as their mature and old competitors but lack the depth and breadth of BDAC resources. Davenport and Bean (2018) also reported that e-commerce firms face risks of losing business if they underperform consistently which forces them to be highly competitive. We propose to test the relationship between BDAC and CA in such a context in order to understand how are the results similar with respect to previous studies. We therefore hypothesize: H2. There is a positive relationship between BDAC and CA for tech startups. 3.3 Positive impact of CA on FP Firms strive hard to gain a CA among their peers and competitors by adopting appropriate strategic moves. Some of the early literature on competitiveness proposes the importance of understanding the consequences of competition in understanding the growth of the firm (Teece, 2015; Wamba et al., 2017). Studies have reported that FP is measured by the operational and market performance of the firm but is not controlled by only the working of the firm alone (Wang et al., 2012). As firms exist in an oligopolistic market, their performance is also driven by how others are performing for similar or high proximity businesses. Developing on the theoretical arguments of DCV theory, it has been found that competitiveness drives FP in a positive manner (Gupta et al., 2018; Dubey et al., 2019a). The market conditions are often used as a driver for understanding competitiveness but as firms go global, the expansion of market and businesses have also been difficult to measure. Bradlow et al. (2017) performed a similar analysis in the hotel industry and found a significant positive effect of CA on FP. Drawing from the logic of DCV theory and earlier arguments, we aim to test the same logic for tech startup firms. The theoretical argument is worthwhile to

Big data analytics in cross-cultural study 377

MD 60,2

378

explore as the online business market behaves differently when it comes to understanding competitiveness. Shamim et al. (2019) also discussed that as startups are new in existence in any sector, they strive hard to achieve excellence and want to capture as much share as possible. It is also recorded that startups usually have a novel idea or technology which helps them gain market share and thus the scale of competitiveness holds differently than for firms operating in offline markets. It is, therefore, worthwhile to test the hypothesis: H3. There is a positive impact of CA on FP of tech startups. 3.4 Moderating effect of organizational culture Businesses have strived hard to develop their work culture which has helped them improve their market share, gain operational excellence, improved their customer retention and helped with gain consistent profit with lower churn propensity. Management and psychology scholars have worked on understanding the growth trajectory of firms which have improved and improvised organizational culture (Frisk and Bannister, 2017). While the main driver of organizational culture is employees and the leadership under whom they perform, it is recently been reported that like an individual’s cultural traits, organizations also possess certain traits that help them set their goals and therefore achieve them. As identified by many scholars (Dubey et al., 2017), organizational culture classified by control orientation (rational hand hierarchical culture) and flexible orientation (group and development culture) play their roles on different effects on organizational performance, respectively (Gupta et al., 2018). The operational boundaries of firms have also been reported as a key indicator controlling the performance and culture of work. Some key example includes Toyota and Honda which differ from the place of origin and are similar in their business but have different values and work cultures. The growth of firms and their crossfunctional roles in different geographies have affected their working style and culture (Nguyen, 2018). However, this change has not affected the way firms leverage their CA based on their working style. Some cases which have shown interesting results are comparisons between brands like Apple and Microsoft, Ford and Ferrari, Tata and Jaguar, etc. Studies have proved that firms that promote flexible work culture has helped them gain a larger and consistent market share and also helped with remain competitive in the business when compared to firms that have a controlled or closed working culture. Off late firms have also discussed the factor of readiness to change and flexibility to adopt new knowledge and technology as a part of organizational culture. The changing competitive markets and growing customer base have also helped firms to change their pattern of investments which are either done by choice or by force so that they could book more profits by offering similar products/services to the customers/clients (Seggie et al., 2017; Sun et al., 2018). Dubey et al. (2017) have also postulated that there exists a difference between control working culture and it should be mistaken with an antonym to flexibility as control can help firms achieve their defined goals in a stipulated time frame by investing on key resources. Resources and its scope have also evolved with time for firms and moreover for tech startups. Firms whose business model are reliant on the Internet and they are not into a customer-facing setting, the investment and use of digital tools like cloud computing, big data resources, machine learning tools, which facilitate flexibility to control their operations (Waller and Fawcett, 2013; Wang et al., 2012). To achieve higher growth and financial health, tech startups are required to have a suitable work culture and more important should be open for innovations in their firm. Moreover, their work culture would also help them hire and retain human resources and invest in demanding technologies like “big data”, “artificial intelligence”, “blockchain,” etc. which will in turn help them gain CA over their rivals. While earlier studies have argued that organizational culture acts like a catalyst for business, there is need to test to same in a tech-based startup environment. We therefore propose:

H4a. There is a positive moderating effect of organizational culture on the relationship between BDAC and FP. H4b. There is a positive moderating effect of organizational culture on the relationship between BDAC and CA. 3.5 Moderating effect on innovation Innovation is a key driver for the growth of any firm. Product and service innovation have been considered as two integral resources that every firm uses to gain profits and capture the market. With the advent of the Internet, both products and services are monitored and controlled by online activities of customers and firms spend a considerable amount of money and time to understand the effect of any innovative practice on their market capitalization and their financial and social performance (Weiblen and Chesbrough, 2015). Innovation, since its inception has been considered a source of constant motivation for firms to spend on resources which would help them book more profits (Freeman and Engel, 2007). Research intensive firms have innovation as their highest capital while firms belonging to the service sector follow them, while product innovation has the least visible impact for firms. Studies also confirm that innovation is mostly supported by the availability of resources (Ghezzi and Cavallo, 2018) which could be in the form of human resources or technologies (Kim et al., 2015; Lawson and Samson, 2001). Recent studies conducted in the sphere of tech companies in developed countries like Japan, Australia and the USA confirm that innovation adds to the intellectual capital of firms and helps them distinguish from their competitors. It also helps companies attract more customers and makes the process smoother by reducing the turnaround time. Tech startups also leverage of the same principle wherein the firms offer either services or products, a lot of them are short time and mostly for offering ease in operations or ease of use of end-users (Prescott, 216). Extending the logic of Innovation diffusion theory, the study aims to understand how innovation impacts the competition and performance of firms in a cross-cultural setting. We therefore hypothesize: H5a. There is a positive moderating effect of innovation on the relationship between BDAC and FP. H5b. There is a positive moderating effect of innovation on the relationship between BDAC and CA. 4. Research design We follow a systematic approach for collecting data and have consulted experts at frequent intervals to avoid bias and minimize errors in performing the required analysis. This section discusses steps used to collect the required sample of data, sampling adequacy, discussion of required tests to test the proposed hypothesis which will help in reporting the required and relevant results. 4.1 Sample and data We employed a survey-based approach for collecting data from startups from India and China. The survey instrument was sent to key stakeholders in the company, the details of which are discussed in the subsequent sections. The questionnaire was sent to the founder members of the team of startups in India and China. The study picked random samples from databases like Angel.co which lists all the startups for all the countries. We developed a database of startups that were registered as companies between 2015 and 2017. This makes the companies with an average life of 4 years suitable for any venture to establish. A sample of 200 companies from each of the countries was selected and the questionnaire was sent to their owner of key decision-makers with the help of a market research agency.

Big data analytics in cross-cultural study 379

MD 60,2

380

4.2 Survey instrument The theoretical framework used in this study is an outcome of a systematic three-step process. The first step began with studying existing scales (refer Table A7) and understanding their suitability and applicability in the context of this study. The second step involved extracting constructs and defining them operationally in the context of the study. The selection of items and its finalization was done using an iterative process by conducting pre-tests of the draft of the questionnaire and getting its content-validation done by experts. The details of the operational definition and the scales with which the constructs and borrowed have been discussed in Appendix B. The response was collected on a 5-point Likert scale which measured the degree of comfort and acceptance on factors for enabling startups in India and China. The instrument was then pre-tested with experts which were either renowned academicians and have published papers in high impact journals as well as practitioners who have contributed in the development of startups. The instrument was revised based on the mutual discussion and suggestions received by the panel of experts on the grounds of the appropriateness of items and clarity on the operational definition of constructs (DeVellis, 2016). The contextual clarity of the constructs was also validated by experts using the guidelines laid down by Dillman (2011) and appropriate changes were made which were then reconfirmed by the same experts (Chen and Paulraj, 2004). The instrument was then translated in Chinese by language-certified experts. The services of language translators were also used to reconfirm if the translated instrument had similar meaning and reference. The final instrument was then used in a pilot study at two international conferences held in India and China between January 2019 and March 2019. The questionnaire was shared with the potential presenters in the international conference to ensure that the final instrument is valid and reliable. We received a total of 64 responses out of a total of 203 target respondents. The results of the pilot survey were useful in finalizing the structure of the instrument which was then used for the data collection process. 4.3 Data collection The questionnaire was sent to the key stakeholders (founding members) of the startup firms which are involved in taking strategic decisions and were founding members of the organization through a market research agency. A total of 1876 firms were contacted of which 912 were from India and 964 were from China. The questionnaire was sent in multiple waves by the market research agency to the required stakeholders to collect data. We received a total of 514 responses from this exercise of which 235 were from China and remaining were from India. We ensured that we receive only one representation from a firm to maintain parsimony. In the case of multiple responses received, the response recorded by a higher designation employee was considered for analysis. In case of any confusion, all the responses were dropped. Data cleaning eventually led to a total drop of 12 responses thereby making 502 as an overall useable sample size. Table 1 represents a bifurcation of samples based on the Global Industry Classification Standard (GICS). GICS is used as a basis for Standard and Poor’s and Morgan Capital Stanley International financial market indexes in which each company is assigned to a sub-industry, and to an industry, industry group and sector, by its principal business activity. The study used two different samples, one each from India and China for further analysis. The descriptive distribution of the profile of respondents is mentioned in Table 2. 4.4 Non-response bias The nature of data is empirical and was therefore checked with non-response bias. Following the guidelines of Armstrong and Overton (1977), we looked at the trends of responses from early and late respondents. A sub-sample of 58 responses was randomly selected from China

and India to undertake this test and we found no significant difference between early and late respondents. The results confirm that there is no significant difference between the two waves for each of the items tested for both India and China respectively (p > 0.05). We also tested the financial performance of firms in both the waves by using paired sample t-test and found that there is no significant difference between the two for both the geographies: India and China (p > 0.05). This confirms that the study is free from non-response bias.

Big data analytics in cross-cultural study 381

5. Data analysis The study used partial least square (PLS) structured equation modeling (SEM) using WarpPLS 6.0 to test the theoretical model. WarpPLS is proposed by Kock (2019) initially and its advancements are embedded in the latest WarpPLS 6.0 which is extensively used by researchers on multiple occasions to test hypothesis using empirical data (Dubey et al., 2018a; Kumar and Purani, 2018; Kock, 2019; Ifinedo, 2016). PLS-SEM and its advantages (Akter et al., 2016) have been discussed at length by Peng and Lai (2012) especially in the context of predicting validity of exogenous variables. Unlike traditional regression methods, SEM

Type of industry Health tech Logistics Fintech Travel tech Ed tech Enterprise tech Consumer services Deep tech (AI and Big Data) Agriculture tech Automotive tech

Sample size in India

Sample size in China

33 34 18 19 48 21 37 31 12 16

31 15 29 8 28 7 25 47 18 25

Demographic variables

Categories

Age

25–30 years 30–35 years 35–40 years 40–45 years 45–50 years 50þ years India China Less than 2 years 2–4 years 4–6 years 6–8 years 8–10 years 0–50 50–100 100–200 200–500 500–1,000 More than 1,000

Geographical distribution Year of establishment of firm

No. of employees in the firm

Table 1. Classification of Startup based on Sectors

No. of respondents 51 96 138 79 95 43 269 233 104 79 117 112 92 83 107 203 68 28 13

Table 2. Demographic profile of respondents

MD 60,2

382

offers a simulated environment for understanding the relationship and dependency of multiple independent constructs with the dependent construct (Chin, 1998; Urbach and Ahlemann, 2010). Hair et al. (1998) have discussed the advantages of SEM in distinguishing the properties of independent and dependent variables and more especially exogenous and endogenous latent variables. The debate of using covariance-based SEM and PLS-based SEM is discussed by Preacher and Hayes (2008a) wherein they have proposed that guidelines for using a particular type of SEM under given circumstances. The present study satisfies most of the criteria laid down by Preacher and Hayes (2008b) as this study stems from its exploratory nature and the theoretical framework is not borrowed from the existing framework. The theoretical framework offers first-order reflective measures most of which are latent constructs. The items used to measure these constructs are independent of each other and indicate that there is a possibility of their existence irrespective of the construct name (Peng and Lai, 2012). Therefore, the items could be placed under any variable. This also indicates a positive inter-correlation among the constructs in the model (Diamantopoulos and Siguaw, 2006; Edwards and Bagozzi, 2000; Jarvis et al., 2003). The use of PLS-SEM is also discussed for offering simplicity in understanding results even with reduced sample size and restricted residual distribution along with handling issues like factor indeterminacy (Fornell and Bookstein, 1982). Therefore, following the guidelines of Peng and Lai (2012), we examined the reliability and validity of the measurement model to begin with followed by analyzing the structural model. 5.1 Measurement model We examined the measurement model before examining the output of PLS-SEM. We first calculated variance inflation factor (VIF) which helps in ruling out the chances of multicollinearity in data (Peng and Lai, 2012). The results of the study indicate its value to be 4.583 which is below the threshold of 5. Any value greater than 5 corresponds to the existence of risk of multicollinearity in the data. While studies claim that average block VIF should be lesser than 3 in the most ideal scenario, but our results indicate that it is still under the acceptable limits. We also calculated the average path coefficient (APC) and average R2 both of whose values are recorded in the table. The results are statistically significant thereby indicating that there are no concerns related to model fitting the data. We therefore conclude model fit for our data (Refer to Table A5). We also calculated statistical suppression ratio (SSR), R2 contribution ratio and Simpson’s paradox ratio (RSCR) and nonlinear bivariate causality direction ratio (NLBCDR) test for checking the endogeneity in the data following the guidelines of Kock (2019) and found their values greater than the threshold value (>0.7) (Refer to Table A5). We collected primary data for the study which may suffer from common method bias (Podsakoff and Organ, 1986). In the argument proposed by Podsakoff et al. (2003), it is clear that data which is self-reported may suffer from common method bias due to multiple reasons like social desirability and consistency of responses. Of many methods proposed by various studies, we employed the most robust method of designing the instrument using multiple scales to minimize the effect across each of the type of constructs (dependent, independent and moderating). 5.2 Common method bias test As a second step for validation of CMB, we employed conservative version of Harman’s onefactor test (Podsakoff et al., 2003) which depicted that one of the factors explained more than 43.45% of the variance. This concludes that data is free from CMB. A third test following the guidelines of Lindell and Whitney (2001) is performed on the data, often discussed as

correlational marker technique in literature. We chose a six items scale for measuring “Innovation” which provided lowest correlation between MV marker and other constructs (r 5 0.07) to adjust statistical significance and construct correlation (Lindell and Whitney, 2001). The three tests confirmed that there is a minimal scope for common method bias in the data. This also confirms that while it is impossible to completely remove CMB, it gives a significant test that it has been controlled in design of the study. Next, we tested for causality which is also a prerequisite condition before hypothesis testing (Guide and Ketokivi, 2015). Referring to the conceptual framework and discussed hypothesis, it is seen that the arguments are presented unidirectional in nature. Thus, in order to test for the causality for relationships with interchanging dependent and independent constructs, we performed Durbin–Wu–Hausman test following the guidelines of Davidson and MacKinnon (1993). The test results confirmed that residual was insignificant thereby confirming that the nature and characteristics of constructs remain as proposed in the theoretical framework and there is no significant change in the proposed relationships. The triple confirmatory tests (CMB-Causality-Endogeneity) ensure data preparedness (Dubey et al., 2019a). Summary statistics of each of the constructs are presented in Table A4. We calculated scale composite reliability (SCR) of each construct and found their value greater than 0.70, while their average variance extracted (AVE) has values greater than 0.50 (refer to guidelines by Hair et al., 2017). This indicates that there exists reliability for every construct and the latent constructs in the framework account for a minimum 50% of the variance in their corresponding and respective items. It is also found that the loadings fall in an acceptable range and all of them are significant at 0.01 level. Following the guidelines of Fornell and Larcker (1982), it can be seen that the square root of AVE is found to be significantly greater than the inter-construct correlation. This is a strong indicator of discriminant validity. The details of discriminant validity can also be referred to from Table A1. We also calculated the value of Cronbach’s alpha for each of the constructs and compared them with the threshold value of 0.7 as suggested by Hair et al. (2016). The results confirm that each of the constructs individually is reliable (Tellis et al., 2009). We also tested for checking the overall reliability which also exceeded the threshold value. The results for the same are listed in the table. 5.3 Results of hypothesis testing and discussion The results of the hypothesis testing using PLS-SEM can be referred to in Figure 2. Results record a value of r 5 31% for explaining CA and overall value of r 5 69% for explaining FP. Each of the corresponding values of PLS path coefficients are their corresponding p-value (* for 0. At b ¼ 0, the function FðLÞ monotonically decreases at 0 < L < ∞. The function (24) has a minimum when L satisfies the standard extremum condition F 0 ðLÞ ¼ 0 or r r r R0 (25) eðb−rÞL ¼ 1  ebL þ þ ðb  rÞ at b ≠ r; b b b C0 R0 (26) ebL ¼ bL þ 1 þ b at b ¼ r: C0 Solving equation (25) delivers the machine replacement time, which minimizes the EAC (8) at the lognormally distributed operating cost (22). 3.2 Theoretic comparison of economic life algorithm and stopping problem The goal of our analysis is to prove analytically and demonstrate numerically that both techniques possess the same qualitative properties and deliver similar results, at least, when the uncertain operating cost increases exponentially and the volatility is small. Let us start with the case b ¼ r. Then, both the stopping problem of section 2.3 and the stochastic EL algorithm deliver identical results, because equation (19) is equivalent to the optimality condition (25) at x ¼ C0 ebL. At b ≠ r, the optimal threshold value x in the stopping problem is determined from the nonlinear equation (17) and the expected replacement time is L ¼ lnðx=C0 Þ=b. In the EL algorithm, the optimal replacement time L is determined from the nonlinear equation (25), so the optimal threshold value x ¼ C0 ebL satisfies the nonlinear equation:  1−br x r x r r R0 þ þ ðb  rÞ : (27) ¼ 1 C0 b C0 b b C0 Equations (17) and (27) have a similar structure. The only difference is the parameter α in equation (17), which is r/b in equation (27). By equation (18), α ¼ r=b at σ ¼ 0 and, therefore, equations (17) and (27) coincide in the deterministic case. Next, we explore two questions: (1) how close solutions to equations (17) and (27) are at cost uncertainty and (2) how the level of uncertainty affects those solutions. First of all, applying the Taylor series to equation (18), we obtain that the parameter α in equation (17) satisfies:     r σ2 r þ o σ 2 at σ 0 of cost increase, the optimal machine replacement time L in the stochastic EL algorithm (5) is smaller, and the corresponding cost threshold x is larger in the case of a larger volatility σ . The proof follows from Property 1 and the relation b ¼ μ þ σ 2 =2. By this property, if the deterministic cost rate μ remains the same, then the increasing cost uncertainty does not delay the machine replacement decision. In the real options stopping problem, such analytic results as Properties 1 and 2 are not possible because of the complexity of equation (17). However, as the solutions of equations (17) and (27) are close, we expect their qualitative properties to be similar. Next, we confirm those properties via numeric simulation. 3.3 Numeric simulation: impact of cost volatility To estimate how close the expected replacement time L and the threshold cost x are in the stochastic EL algorithm and in the stopping problem at stochastically increasing geometric cost, we solve nonlinear equations (17), (25), (26) and (27) using Excel Solver. We choose the model parameters r 5 0.05 ÷ 0.1, μ 5 0.04 ÷ 0.15, C0 5 10, R0 5 100, σ 5 0 ÷ 0.2 in the similar ranges as in Dobbs et al. (2004). However, we fix the deterministic rate μ of the cost increase, while Dobbs et al. (2004) assumed a fixed value b 5 μ þ σ 2/2 of the rate of the expected cost increase. The rationale behind our choice is the following: if the deterministic trend μ of the cost rate X(t) is fixed, say, for the next year, then an increasing volatility σ leads to a larger expected cost. In Ye (1990), Mauer and Ott (1995), Dobbs et al. (2004), Richardson et al. (2013), the presence of cost uncertainty increases the cost threshold, which is expectable in the optimal stopping framework (Dixit and Pindyck, 1994). Our simulation under a fixed deterministic rate μ of cost increase delivers similar results for the cost threshold x, but demonstrates that the expected replacement time L is smaller for a larger cost volatility σ (Figures 1(a) and 1(b)). Therefore, at a fixed deterministic cost rate μ, the cost uncertainty increases the cost threshold but does not delay the investment decision. The numeric simulation of Dobbs et al. (2004) at a fixed expected cost rate b 5 μ þ σ 2/2 shows that both the replacement time L and the threshold cost x increase in σ . Most of other research on real options-based replacement (Ye, 1990; Mauer and Ott, 1995; Dobbs et al., 2004; Richardson et al., 2013; Zambujal-Oliveira and Duque, 2011; Adkins and Paxson, 2011) do not provide any estimate of the replacement time. Our outcome is in line with Gryglewicz et al. (2008) who find that, in contrast with the existing theory, investments may be accelerated by increased uncertainty. Figure 1 also demonstrates that the replacement time L and threshold cost x calculated using both the EL algorithm (5) and the stopping problem are similar for small values of the volatility σ . The difference in replacement time is smaller and increases slower than for the cost x when σ increases. The dynamics of both threshold cost x and replacement time L is qualitatively the same, which confirms and illustrates analytic properties of optimal replacement obtained in section 3.2. The obtained analytic and numeric outcomes have essential practical implications. Indeed, if the observed operating cost does not increase geometrically or linearly, then the real options algorithm cannot solve the stopping problem for optimal cost threshold. However, the stochastic EL algorithm (5) finds the optimal cost threshold for arbitrary age-dependent profiles of operating cost, which will be close to the solution of stopping problem. Therefore, a decision-maker can expect the optimal replacement time be smaller for a larger volatility σ of stochastic operating cost (assuming the same deterministic cost rate).

Machine replacement decisions

479

MD 60,2

Threshold cost x and asset lifetime L, case (a) 30.00 25.00 20.00 X Dobbs

480 15.00

X L Dobbs

10.00

L 5.00 0.00 0

0.05

0.1

0.15

0.2

Volatility σ

Threshold cost x and asset lifetime L, case (b) 40.00 35.00 30.00

Figure 1. The optimal replacement decisions by the EL algorithm (5) and the real options algorithm of Dobbs et al. (2004). Case (a): at r 5 0.05 and μ 5 0.04. Case (b): at r 5 0.1 and μ 5 0.11

25.00

X Dobbs

20.00

X

15.00

L Dobbs

10.00

L

5.00 0.00 0

0.05

0.1

0.15

0.2

Volatility σ

4. Case study: replacement of medical imaging equipment Our simulation example describes a large US hospital that uses hundreds of units of medical imaging equipment: MRI, CT, C-Arm X-ray, digital X-ray and other. Rational replacement of such devices is an essential part of effective cost management in the hospital. Those imaging devices perform identical functions but differ in operating and replacement costs. The hospital stores data about the age, total annual operating cost and replacement cost for all used devices. The total operating cost includes the costs of parts, labor, external contactors and the overhead cost. We will consider a data sample of 70 X-ray machines used during 2009–2017. Based on industry standards, the recommended useful life of such devices is five years. However, some devices in the sample have been used for longer periods, which gives us some data for six-, seven- and eight-year-old devices as well. So, the hospital decision-makers possess samples with cost data for the clusters of devices of each age j 5 1 ÷ 8. The operating cost for different devices of the same age (from the same cluster) varies, which naturally leads to uncertainty in the age-dependent operating cost. Histograms of the cost distributions C(j) in the samples for each age j are provided in Figure 2, while aggregated parameters of the clusters for each age are summarized in the first four columns of Table 1.

Machine replacement decisions

Histograms of annual operating costs Frequency in the sample

40 35 30

Age = 0 years

25

Age = 1 years

20

Age = 2 years

15

Age = 3 years

10

Age = 4 years

5

Age = 5 years

0

Age = 6 years

2500

5000

7500

10000 12500 15000

More

Operating cost ($)

Device age 1 2 3 4 5 6 7 8

Cluster size

Mean cost

Standard deviation

Calculated EAC

69 63 55 53 42 42 37 17

US$12,089 US$9,872 US$10,684 US$12,459 US$10,824 US$9,501 US$12,369 US$23,499

US$3,565 US$1,425 US$3,166 US$3,082 US$1,546 US$1,085 US$2,061 US$2,213

– US$67,597 US$38,716 US$29,659 US$24,759 US$21,586 US$19,894 US$20,045

481 Figure 2. Histograms of the operating cost distribution by the clusters of same-age devices

Device should be replaced No No No No No Yes –

Figure 3 depicts the age-dependent profile of the average annual cost for this set of imaging devices. As expected, it does not follow geometric or linear cost dynamics. In fact, the agedependent cost possesses a unique shape with two minima. Next, we apply our stochastic algorithm (5) to calculate the optimal replacement of devices from the sample with this cost distribution. Let also q 5 0.95 in forthcoming examples. Case 1 (a new device). Let us assume that the replacement cost of a newly installed X-ray device is R 5 US$45000, which includes the purchase price of a new device, installation cost and the salvage value of the current device. As this device is just installed, we have no records about its annual operating cost. Nevertheless, the hospital has recorded the operating costs for previously used identical devices for all ages j 5 1 ÷ 8 years. Then, following section 2.1, we calculate the means X j of cost distribution samples C(j) (shown in the third column of Table 1) and can use them and equation (5) to calculate the EAC(j) for ages j 5 1, . . ., 8, which are shown in the last column. By Table 1, the EAC(j) is minimal at j 5 7, i.e. the optimal decision is to replace that device at the age of seven years. Case 2 (an old device). Now, let us consider another specific X-ray device that has been in exploitation for four years. Let us assume that its known age-dependent costs are C(1) 5 US$5,253, C(2) 5 US$4,750, C(3) 5 US$5,804, C(4) 5 US$10,989. Naturally, those individual costs differ from the corresponding sample averages recorded in Table 1, column 3. Using the individual costs for the ages 1–4 and the average sample costs (US$10,824, US$9,501, US$12,369 and US$23,499) from the table for ages 5–8, we obtain that the optimal replacement age for that device is still seven years.

Table 1. Data sample parameters and the EAC of device clusters

MD 60,2

Mean Annual Operating Cost $25,000 $20,000 $15,000

Figure 3. Dependence of the average annual operating cost on the Xray device age

$10,000 $5,000

Cost

482

$0 0

1

2

3

4

5

6 Age (years)

7

Case 3 (even older device). Now, let us move one year ahead. Then, the same X-ray device from case 2 has already been in use for five years (rather than four). For illustration, we assume its individual cost C(5) 5 US$1,717 for the fifth year to be quite different from the average sample cost US$10,824 for the same year in Table 1. However, the calculated replacement age for that device is still seven years. Case 3 shows that the constructed stochastic EL algorithm is robust with respect to possible changes in individual operating costs. More detailed sensitivity analysis of the algorithm leads to the same conclusion. An advantage of the stochastic EL algorithm is that it does not require using a special optimization software.

5. Discussion In management practice, the rational replacement of machines should be analyzed under uncertain operating and maintenance costs. The uncertainty of machine costs is an important issue of the asset replacement theory. Various lines of research to address the cost uncertainty in machine replacement have been offered by Meyer (1971), Pierskalla and Voelker (1976), Brown (1993), Christer and Scarf (1994), Mauer and Ott (1995), Hartman (2001, 2004), Chang (2005), Gryglewicz et al. (2008), Mercier (2008), Pommeret and Schubert (2009), Tan and Hartman (2010) and others. Those contributions were thoroughly analyzed in Introduction. In this paper, we focus on a common real situation that occurs when a company uses many identical assets with variations in individual costs, which naturally creates the uncertainty in future operating cost for a specific asset in use. 5.1 Theoretical contribution The constructed stochastic algorithm for rational machine replacement extends the well-known deterministic EL method (Thuesen and Fabrycky, 1993; Newman et al., 2004; Hartman, 2007). It is compared to the popular approach to stochastic asset replacement (Ye, 1990; Mauer and Ott, 1995; Dobbs et al., 2004; Reindorp and Fu, 2011; Richardson et al., 2013; Yatsenko and Hritonenko, 2017) based on the real options concept and dynamic programming (Dixit and Pindyck, 1994). We prove analytically and demonstrate numerically that both approaches deliver similar results when the cost volatility is small. A new theoretical finding is that the increasing uncertainty of operating costs does not delay the replacement decision. The major advantage of the suggested stochastic replacement algorithms (5) and (8) is that they work equally well for any distribution of stochastic operating cost (measured over discrete time intervals). By contrast, the real options-based technique has been developed for linearly

and exponentially increasing stochastic cost (arithmetic and geometric Brownian motions) only. Such theoretic cost distributions are convenient for analysis, but real applications rarely follow them (Al-Chalabi et al., 2015; van den Boomen et al., 2019). A practical problem with nonexponential cost dynamics is solved using the stochastic EL algorithm in section 4. The qualitative and numeric analyses of section 4 show that our stochastic replacement algorithm and the real options stopping problem lead to similar but not identical results. So, there exists a room for discussing which model assumptions better describe real equipment replacement process. However, this difference becomes less essential in managerial decision-making practice, which is characterized by discrete time, information gaps and measurement errors. 5.2 Managerial implications The constructed replacement algorithm is a central point of a new analytic and computational methodology for making rational decisions about equipment replacement under improving technology, uncertain costs and uncertain environment (Yatsenko and Hritonenko, 2016). The methodology is based on detailed analysis of available data about maintenance and operating costs, equipment deterioration and technological change. The developed technique successfully addresses challenges of the preliminary processing and organization of data. Particularly, it groups all available data into separate age-specific samples for different modifications of equipment to obtain the most accurate prediction of the expected cost for a specific device. Such grouping decreases the standard deviation of cost samples, but, on the other side, then age-specific samples can become too small. One more practical advantage of the stochastic replacement algorithm (5) compared to the real options-based technique is that it naturally works with the annual cost, which is a standard unit of measurement in business practice. By contrast, real options-based stopping problems (Ye, 1990; Mauer and Ott, 1995; Reindorp and Fu, 2011; Richardson et al., 2013, and others) minimize the total cost over asset lifetime and do not calculate its annual portion. With notable exception of Dobbs et al. (2004), the real options-based asset replacement does not even mention the time of asset replacement, so corresponding replacement models cannot directly calculate the annual costs. Finally, managerial practice requires solving machine replacement problems in discrete time, which are analytically more difficult as compared to continuous-time problems (Yatsenko and Hritonenko, 2010, 2015). In this paper, we developed both discrete- and continuous-time stochastic versions of the classic EL algorithm. They are based on discrete-continuous optimization analysis of Yatsenko and Hritonenko (2010), which allows to overcome some mathematical challenges of discrete-time replacement and bring analytic justification to known phenomena in equipment replacement practice, such as asset clustering. 5.3 Limitations and directions for future research Common data-related issues may negatively affect the reliability of our (and any other) replacement technique. Thus, data samples can be inadequate (in small firms) or not representative (in large organizations), while recording errors can lead to wrong allocation of replacement-related costs to time periods and specific machines. Certain cost elements related to machine replacement, such as general overhead costs, are often overlooked. Evidently, the rational machine replacement becomes a part of more general management decisions, such as the maintenance and repairs planning (Mercier, 2008; Hartman and Tan, 2014; van den Boomen et al., 2019), inventory and supply chain management (Wu and Ryan, 2014), warranty and customer relationship management (Koschnick and Hartman, 2020) and, finally, strategic planning (Diniz and Sessions, 2020). It leads to a new challenge to incorporate machine replacement algorithms as integral parts of modern enterprise information systems and requires new research.

Machine replacement decisions

483

MD 60,2

484

A challenging theoretic task is to build reliable algorithms for machine replacement under stochastically evolving technology (reflected in time-dependent operating and replacement costs). The impact of anticipated technological change greatly complicates the optimal replacement even in deterministic setting (Grinyer, 1973, Christer and Scarf, 1994, Regnier et al., 2004, Hartman, 2007, Goetz et al., 2008, Mardin and Takeshi, 2012, Yatsenko and Hritonenko, 2011, 2015, 2016, 2020). In the special case of geometric stochastic costs, the real options approach has been extended to profit-maximizing asset replacement in Adkins and Paxson (2011) and to cost-minimizing replacement in Yatsenko and Hritonenko (2017). Nevertheless, a systematic analysis of asset replacement at stochastic time- and age-dependent costs remains an open theoretic issue. In particular, the real options solution for profit-maximizing replacement at deterministic exponential costs (Adkins and Paxson, 2013) appears to differ from the discounted infinite-horizon optimization at the same underlying assumptions (Yatsenko and Hritonenko, 2020). This controversy requires deeper investigation of the links between the real options replacement and classic infinite-horizon optimization (part of which is the EL method). A thorough comparative analysis would enlarge the applicability of both EL and real optionsbased replacement approaches to managerial decision-making in the real world. At the same time, the major practical limitation of the real options-based replacement remains the assumption about stochastic costs as geometric (or arithmetic) Brownian motions, which in plain words means exponential or linear costs. Such cost distributions are convenient in theory, but rarely happen in industrial reality. So, the EL algorithms and their stochastic extensions will remain a theoretical basis for practical replacement decisions. 6. Conclusion This paper is devoted to theoretic analysis and numeric simulation of equipment replacement decisions in the modern data-driven industrial environment. It develops a theoretical framework for rational machine replacement decisions under uncertain costs and imperfect forecast. The provided methodology can be implemented as a decision-support system for rational technological renovation under uncertain economic and environmental factors. Important implementation issues include aligning used replacement models to business practice, model clarity, applicability to common standards and requirements to input data, among other practical matters (Lanza and R€ uhl, 2009; Yatsenko and Hritonenko, 2016). Compared to the continuous-time real-options replacement theory (Ye, 1990; Mauer and Ott, 1995; Dobbs et al., 2004; Reindorp and Fu, 2011; Richardson et al., 2013), the major advantage of constructed stochastic algorithms is that they work for any distribution of age-dependent stochastic operating cost. Also, the discrete time used in our algorithms better matches the nature of decision-making practice. These algorithms have been specifically developed for reallife management and use only standard cost data available from previous years. They have been approbated and proved their effectiveness for real industrial data. The case study (section 4) about the rational replacement of medical imaging devices illustrates analytic findings. Note 1. The terms service life, EL and lifetime of assets are used interchangeably in textbooks and research literature. References Adkins, R. and Paxson, D. (2011), “Renewing assets with uncertain revenues and operating costs”, Journal of Financial and Quantitative Analysis, Vol. 46 No. 3, pp. 785-813. Adkins, R. and Paxson, D. (2013), “Deterministic models for premature and postponed replacement”, Omega, Vol. 41, pp. 1008-1019.

Al-Chalabi, H., Lundberg, J., Ahmadi, A. and Jonsson, A. (2015), “Case study: model for economic lifetime of drilling machines in the Swedish mining industry”, Engineering Economist, Vol. 60, pp. 38-154. Babbitt, C.W., Kahhat, R., Williams, E. and Babbitt, G.A. (2009), “Evolution of product lifespan and implications for environmental assessment and management: a case study of personal computers in higher education”, Environmental Science & Technology, Vol. 43 No. 13, pp. 5106-5112. Brown, M. (1993), “A mean-variance serial replacement decision model: the correlated case”, Engineering Economist, Vol. 8 No. 3, pp. 237-247. Chang, P.T. (2005), “Fuzzy strategic replacement analysis”, European Journal of Operational Research, Vol. 160 No. 2, pp. 532-559. Christer, A.H. and Scarf, P.A. (1994), “A robust replacement model with applications to medical equipment”, Journal of the Operational Research Society, Vol. 45, pp. 261-275. Crow, E. and Shimizu, K. (Eds) (1988), Lognormal Distributions: Theory and Applications, Dekker, New York, NY. Diniz, C. and Sessions, J. (2020), “Ensuring consistency between strategic plans and equipment replacement decisions”, International Journal of Forest Engineering, published online. doi: 10. 1080/14942119.2020.1768769. Dixit, A. and Pindyck, R. (1994), Investment under Uncertainty, Princeton University Press, Princeton. Dobbs, I. (2004), “Replacement investment: optimal economic life under uncertainty”, Journal of Business Finance & Accounting, Vol. 31, pp. 729-757. Goetz, R., Hritonenko, N. and Yatsenko, Y. (2008), “The optimal economic lifetime of vintage capital in the presence of operating cost, technological progress, and learning”, Journal of Economic Dynamics and Control, Vol. 32, pp. 3032-3053. Grinyer, P. (1973), “The effects of technological change on the economic life of capital equipment”, AIIE Transactions, Vol. 5, pp. 203-213. Gryglewicz, S., Huisman, K.J.M. and Kort, P.M. (2008), “Finite project life and uncertainty effects on investment”, Journal of Economic Dynamics and Control, Vol. 32, pp. 2191-2213. Hartman, J. (2001), “An economic replacement model with probabilistic asset utilization”, IIE Transactions, Vol. 33 No. 9, pp. 717-727. Hartman, J. (2004), “Multiple asset replacement analysis under variable utilization and stochastic demand”, European Journal of Operational Research, Vol. 159 No. 1, pp. 145-165. Hartman, J. (2007), Engineering Economy and the Decision-Making Process, Pearson Prentice Hall, Upper Saddle River, New Jersey. Hartman, J. and Tan, C. (2014), “Equipment replacement analysis: a literature review and directions for future research”, Engineering Economist, Vol. 59, pp. 136-153. Keating, E.G. Blickstein, I., Boito, M., Chandler, J. and Peetz, D. (2014), “Investigating the desirability of navy aircraft service life extension programs”, Defence and Peace Economics, Vol. 25 No. 3, pp. 271-280. Koschnick, C. and Hartman, J.C. (2020), “Using performance-based warranties to influence consumer purchase decisions”, Engineering Economist, Vol. 65 No. 1, pp. 1-26. Lanza, G. and R€ uhl, J. (2009), “Simulation of service costs throughout the life cycle of production facilities”, CIRP Journal of Manufacturing Science and Technology, Vol. 1, pp. 247-253. Mardin, F. and Takeshi, A. (2012), “Capital equipment replacement under technological change”, Engineering Economist, Vol. 57, pp. 119-129. Mauer, D.C. and Ott, S.H. (1995), “Investment under uncertainty: the case of replacement investment decisions”, Journal of Financial and Quantitative Analysis, Vol. 30, pp. 581-605. Mercier, S. (2008), “Optimal replacement policy for obsolete components with general failure rates”, Applied Stochastic Models in Business and Industry, Vol. 24, pp. 221-235.

Machine replacement decisions

485

MD 60,2

Meyer, R.A. (1971), “Equipment replacement under uncertainty”, Management Science, Vol. 17 No. 11, pp. 750-758. Newman, D., Eschenbach, T. and Lavelle, J. (2004), Engineering Economic Analysis, 9th ed., Oxford University Press, New York, NY. Pierskalla, W.P. and Voelker, J.A. (1976), “A survey of maintenance models: the control and surveillance of deteriorating systems”, Naval Research Logistics Quarterly, Vol. 23, pp. 353-388.

486

Pommeret, A. and Schubert, K. (2009), “Abatement technology adoption under uncertainty”, Macroeconomic Dynamics, Vol. 13, pp. 493-522. Regnier, E., Sharp, G. and Tovey, C. (2004), “Replacement under ongoing technological progress”, IIE Transactions, Vol. 36, pp. 497-508. Reindorp, M. and Fu, M. (2011), “Capital renewal as a real option”, European Journal of Operational Research, Vol. 214, pp. 109-117. Richardson, S., Kefford, A. and Hodkiewicz, M. (2013), “Optimised asset replacement strategy in the presence of lead time uncertainty”, International Journal of Production Economics, Vol. 141, pp. 659-667. Sebo, J., Busa, J., Demec, P. and Svetlik, J. (2013), “Optimal replacement time estimation for machines and equipment based on cost function”, Metalurgija, Vol. 52 No. 1, pp. 119-122. Tan, C. and Hartman, J. (2010), “Equipment replacement analysis with an uncertain finite horizon”, IIE Transactions, Vol. 42 No. 5, pp. 342-353. Thuesen, G. and Fabrycky, W. (1993), Engineering Economy, 8th ed., Prentice Hall, New Jersey, NJ. van den Boomen, M., van den Berg, P.L. and Wolfert, A.R.M. (2019), “A dynamic programming approach for economic optimisation of lifetime-extending maintenance, renovation, and replacement of public infrastructure assets under differential inflation”, Structure and Infrastructure Engineering, Vol. 15 No. 2, pp. 193-205. Wu, X. and Ryan, S.M. (2014), “Joint optimization of asset and inventory management in a productservice system”, Engineering Economist, Vol. 59 No. 2, pp. 91-115. Yatsenko, Y. and Hritonenko, N. (2010), “Discrete-continuous analysis of optimal equipment replacement”, International Transactions in Operational Research, Vol. 17, pp. 577-593. Yatsenko, Y. and Hritonenko, N. (2011), “Economic life replacement under improving technology”, International Journal of Production Economics, Vol. 133, pp. 596-602. Yatsenko, Y. and Hritonenko, N. (2015), “Algorithms for asset replacement under limited technological forecast”, International Journal of Production Economics, Vol. 160, pp. 26-33. Yatsenko, Y. and Hritonenko, N. (2016), “Asset replacement under improving operating and capital costs: a practical approach”, International Journal of Production Research, Vol. 54, pp. 2922-2933. Yatsenko, Y. and Hritonenko, N. (2017), “Machine replacement under evolving deterministic and stochastic costs”, International Journal of Production Economics, Vol. 193, pp. 491-501. Yatsenko, Y. and Hritonenko, N. (2020), “Optimal asset replacement: profit maximization under varying technology”, International Journal of Production Economics, Vol. 228, p. 107670, October 2020, doi: 10.1016/j.ijpe.2020.107670. Ye, M.H. (1990), “Optimal replacement policy with stochastic maintenance and operation costs”, European Journal of Operational Research, Vol. 44, pp. 84-94. Zambujal-Oliveira, J. and Duque, J. (2011), “Operational asset replacement strategy: a real options approach”, European Journal of Operational Research, Vol. 210 No. 2, pp. 318-325. About the authors Dr. Yuri Yatsenko has published over 200 papers and eight books. He earned MS and PhD from Kiev University and a Doctor of Science from the USSR Academy of Sciences. During his career, he has been a Professor in five different countries and taught mathematics, statistics, information systems and

computer sciences in four languages. He also held senior analytic positions at international companies in the USA and Canada. His areas of expertise include modeling and optimization of economic, industrial and environmental processes; technological change; innovations; operations research; and computational methods. Yuri Yatsenko is the corresponding author and can be contacted at: [email protected] Dr. Natali Hritonenko is an Associate Dean and Professor of Mathematics at Prairie View A&M University. Her research area is mathematical modeling and optimal control in operations research, economics and environmental economics. She has been invited and traveled the world sharing her research results through numerous presentations and collaborating on groundbreaking projects with a diverse team of leading experts. During her prolific career, Dr. Hritonenko has authored seven books and well over 130 papers. Her books are used as textbooks or translated to other languages. She is also on the editorial board of nine international interdisciplinary journals.

For instructions on how to order reprints of this article, please visit our website: www.emeraldgrouppublishing.com/licensing/reprints.htm Or contact us for further details: [email protected]

Machine replacement decisions

487

The current issue and full text archive of this journal is available on Emerald Insight at: https://www.emerald.com/insight/0025-1747.htm

MD 60,2

488 Received 20 January 2020 Revised 29 February 2020 23 March 2020 30 May 2020 Accepted 2 June 2020

Green innovation as a mediator in the impact of business analytics and environmental orientation on green competitive advantage Hashim Zameer, Ying Wang and Humaira Yasmeen College of Economics and Management, Nanjing University of Aeronautics and Astronautics, Nanjing, China, and

Shujaat Mubarak Department of Management Sciences, Mohammad Ali Jinnah University, Karachi, Pakistan Abstract Purpose – The purpose of this paper is to investigate the role of business analytics and environmental orientation toward green innovation and green competitive advantage. In addition, the study aims to explore the mediating role of green innovation in the impact of business analytics and environmental orientation on green competitive advantage. Design/methodology/approach – Based upon the theoretical analysis of existing literature, several hypotheses have been developed. Data was gathered using a survey method. The survey was conducted using online portal, 388 valid responses have been processed using SPSS 23.0 and AMOS 23.0 for empirical analysis. Two steps were used, first reliability and validity have been measured. Following this, the authors employed structural equation modeling technique to test hypothetical relationships. Findings – The results from the authors’ empirical analysis indicate that business analytics and environmental orientation have a pivotal role toward green innovation as well as green competitive advantage. If the results are seen comparatively, then it can be indicated that the role of business analytics is more powerful compared with the environmental orientation. Although environmental orientation is a key factor of green innovation, but its direct role toward green competitive advantage is not so strong. Similarly, to check the other mechanisms, the role of green innovation as a mediator was explored. Empirical findings have established the mediating role of green innovation in the influence of business analytics and environmental orientation on green competitive advantage. Thus, the results confirm a mechanism of green innovation in the impact of business analytics and environmental orientation on green competitive advantage. Practical implications – The study captures the attention of decision-makers and highlights that business leaders need to emphasize on business analytics while making managerial decisions related to green innovation and green competitive advantage. Originality/value – For the first time, this study explored the role of business analytics and environmental orientation together toward green innovation and green competitive advantage. The study adds value to the existing literature and opens new avenues for scholarly research in the area of managerial decision-making. Keywords Business analytics, Business decisions, Innovation, Green competitive advantage Paper type Research paper

1. Introduction The development of emerging technologies and their applications have led to an explosive growth of data volume, and the era of “big data” has emerged. Big data has gradually penetrated into all walks of life and has become an attractive avenue for scholarly research. It has provided new opportunities for business analytics (Gillon et al., 2014). The business analytics concept is not new, but the concept has recently re-emerged as a new and an Management Decision Vol. 60 No. 2, 2022 pp. 488-507 © Emerald Publishing Limited 0025-1747 DOI 10.1108/MD-01-2020-0065

This research is supported by National Natural Science Foundation of China (Grant No. 71873064) and General Projects of Humanities and Social Sciences of the Ministry of Education (Planning Projects) (Grant No. 18YJA790085).

important research direction for the development of capabilities to handle big data (Watson, 2014; Wang et al., 2018). The business analytics through big data helps in promoting economic and social development. However, the rapid economic growth and the issues related to the environment and resources have become important bottlenecks for sustainable economic development. These days environmental issues are getting more and more attention from the general public, the government and customers worldwide (Yasmeen et al., 2019), and companies have begun to take responsible measures to curb environmental issues (Zameer et al., 2020). Similarly, the rapid development of knowledge economies, the continuous shortening of product life cycles and the complex and changing consumer needs have led to the more intense market competition, and the importance of product innovation has become more apparent (Coccia, 2017; Zhan et al., 2017). Therefore, as an enterprise, it is necessary to understand the ways and methods of using big data analytics to innovate and gain competitiveness and the influencing factors in this process. Furthermore, under the new regime of environmental regulations worldwide, conventions on environmental protection and consumer’s general awareness toward the environment have affected the rules of business competition and business models of global industries (Chen and Tsai, 2016; Dechezlepr^etre and Sato, 2017). Similarly, corporate practices in the context of environment management are playing a vital role in the current business world. Although this strategic and environmental integration provides opportunities for business enterprises to implement green innovations, it also brings enormous challenges. Therefore, in the current era, it is crucial for companies to use business analytics to learn and innovate and to explore new knowledge and technology. Business analytics using big data is regarded as a strategic tool that can be integrated into the specific practice of enterprise operation and management to achieve the value addition. The recent developments in the usage of business analytics and environmental management concepts are forcing companies to reposition their future development directions and the ways to obtain competitive advantage. More and more people are realizing that the manufacturing of new products must exhibit greener characteristics (Zameer et al., 2020). Consequently, industrial production is paying more attention to big data business analytics for improving the environmental performance of the entire production cycle, and environmental performance is regarded as the main goal of enterprises (Liu and Yi, 2017). Therefore, it can be indicated that corporate environmental management and business analytics can play an important role in the current corporate domain. Similarly, firms need to ensure that all innovations and technologies through which new products or services are being developed should play a significant positive role for environmental management. In other words, a new product innovation through business analytics should provide environmental benefits. In summing up, it can be argued that many organizations advocate the use of business analytics and innovative solutions for environmental issues to effectively utilize scarce resources and reduce industrial waste. In light of this, it is becoming increasingly important for organizations to make strategic investments in the business analytics and the environment while maintaining a competitive advantage. Similarly, to reduce the damages to the natural environment, green innovations through new or improved processes, products and technologies that substitute wasteful and inefficient energy practices are considered effective approaches. In addition, related research that regards green innovation as part of the sustainability efforts pursued by companies has mostly focused on the drivers of green innovation and its influence on corporate economic and ecological performance. However, research on how manufacturing companies can improve environmental performance and/or gain competitiveness through green innovation under the guidance of different environmentally oriented strategies and business analytics is very important, but very limited studies take this into consideration. Therefore, to link the business analytics and environmental orientation to green innovation and organization’s competitiveness, the new study seems highly significant.

Green competitive advantage

489

MD 60,2

490

Thus, to this end, this study intends to explore the link between business analytics, environmental orientation and green innovation to fill the aforementioned research gap. Moreover, for the first time, this study links business analytics and environmental orientation to the green competitive advantage. Besides, the mediating role of green innovation in the impact of business analytics and environmental orientation on the green competitive advantage is explored. In doing so, the study significantly adds value to the existing research in the context of equipment manufacturing organizations. The role of business analytics and environmental orientation would give an indication that to what extent organizations are adoptive to environment and business analytics to gain competitive advantage. Finally, the study extends the theoretical basis and puts forward some managerial suggestions to equipment manufacturing organizations. 2. Related literature and hypothesis The collection of high-capacity, high-generation rates and multiple types of information are regarded as big data (G€artner and Hiebl, 2017; Mishra et al., 2019). Business analytics is the use of new methods to process and analyze this valuable information, which can assist us in making judgments and decisions and improving insight (Ashrafi et al., 2019; Ferraris et al., 2019). Compared with traditional data and business decision-making, the big data business analytics provides a strong basis for decision-making, and big data can be transformed and processed in a short time through using business analytics. Business analytics is a useful resource, which can help the organizations in making decisions and strategies. Further, in the view of resource-based (RBV) theory, the execution of strategy requires enterprises to have appropriate resources and capabilities (Wernerfelt, 1984). Whereas in the context of the natural RBV, environmental orientation emphasizes organizational capability, which is a strategic and active internal capability (Hart, 1995). Similarly, it is possible to promote sustainable development through the values and ability of the organization to translate the complex environment into clear environmental orientation for key stakeholders. Previous sustainable development efforts classified internal capabilities as environmental orientation, emphasizing internal ecological practices, including ISO 14001 certification (Gonzalez-Benito and Gonzalez-Benito, 2005; Fiorini et al., 2019), ecological design and labeling (Stadelmann and Schubert, 2018), environmental audit and reporting (Braam et al., 2016). Environmental motivation is another environmental orientation–related prominent feature, which reflects persistent involvement in the search for ecological adoption opportunities to avoid adverse consequences for the environment (Graham and Potter, 2015). Previous studies on environmental orientation focused on its determinants, such as Gabler et al. (2015) focused on environmental orientation through developing ecological capabilities. First and Khetriwal (2010) explored the role of environmental orientation toward brand value. Yu and Huo (2019) explored that to what extent environmental orientation is pivotal for supplier green management practices. Another study focused on exploring the linkage among environmental orientation and sustainable behavior of the firms (Keszey, 2020). A few studies have investigated the business analytics and environmental orientation together. Studies concluded that enterprises that deploy resources in a proactive manner will benefit more from the environmental requirements and be able to better respond to these requirements. As we already discussed that business analytics provides a strong basis to the firms to make strategies and decisions. Therefore, it can be presumed that this proactive environmental strategy developed through business analytics can empower managers to use green management practices more proactively to create long-lasting competitive advantages for their enterprises. In addition, said strategy can play a significant role to constantly adapt and improve the changing external environment. That is to say, this kind of environmental strategy provides enterprises with the ability to gain competitive advantage.

Furthermore, although relevant scholars have long believed that the improvement of the environmental orientation will improve their strategic response to environmental issues, but this view has not been comprehensively analyzed. Such as, how environmental orientation along with business analytics can improve competitiveness of the firm is never studied and also, how the environmental orientation and business analytics can help to strengthen green innovation. In this context, this study aims to enrich the existing literature by examining the mechanism of the relationship between environmental orientation, business analytics and green competitive advantage. To this end, the study first proposes an important environmental management practice – green innovation, as a mediator in the impact of environmental orientation on enterprise green competitive advantage. Also, the key role of green innovation as a mediator is being tested among business analytics and green competitive advantage. The findings of the study will not only highlight the way firms can gain green competitive advantages but also point out the importance of business analytics. 2.1 Research hypothesis In recent years, there have been many related studies on the value and impact of business analytics, and a more systematic theory has been formed. First, business analytics based upon big data is recognized as an important resource among the business and academia. For example, McAfee et al. (2012) explained the importance of data to enterprises in their research and regarded data as the core assets of enterprises. Fan and Bifet (2013) highlighted the importance and status of data compared with traditional production factors. Another study mentioned that focusing on big data and its tools of analytics is equivalent to mastering resources (Toga et al., 2015). Secondly, as a new resource, business analytics and big data will affect all aspects of economic and social development such as production management, operation management and social public management. The business analytics through big data has a potential that can provide value to the firms in building and innovating products to reinforce competitive advantage (Manyika et al., 2011; Prescott, 2014). Another study also emphasized that big data can portray opportunities for enterprises to reinforce competitive advantages (Erevelles et al., 2016). Duan et al. (2020) also explored the extent to which innovation can be reinforced through business analytics. They found that business analytics boosts data-driven culture in the organizations, which makes enterprises capable for environmental scanning and developing innovative products. Furthermore, several researchers have highlighted that business analytics influences enterprises at large in product optimization and innovation, production models, market insights and marketing decisions (Sindakis, 2017; Chiang et al., 2018; George and Lin, 2017; Ransbotham and Kiron, 2017; Engelman et al., 2017). Feng and Guo (2018) believed that business analytics can increase the added value of products and services, optimize business operations processes, help companies create advantages in customer segmentation and market positioning. In addition, it can enhance the forecasting and decision-making and promote the continuous development of enterprises. Kamble et al. (2019) mentioned that one of the most important applications of big data analytics is the creation of new knowledge, the generation of new management rules and the new economy built on big data. In a recent study, authors emphasized on the importance of business analytics for successful innovations (Niebel et al., 2019). Sharma et al. (2019) focused on data management, analytics and innovation opportunities. Caputo et al. (2019) indicated the role of digital technologies for successful innovations. Although many studies tried to explore the role of business analytics toward innovation, none of the studies has emphasized on the equipment manufacturing industry of China. And also, very limited studies put together the role of analytics, innovation and competitive advantage in one model. Therefore, we believe it is a knowledge gap and fresh evidence is the need of time. Hence, based upon theoretical

Green competitive advantage

491

MD 60,2

shortcomings discussed earlier, the subsequent hypothesis has been postulated for empirical testing in the context of equipment manufacturing industry of China. H1. Business analytics significantly influences green innovation. H2. Business analytics is useful to reinforce green competitive advantage.

492

The exploration of the literature related to the environmental management indicates environmental orientation as an important business principle that is pivotal for guiding enterprise environmental practice, and it has been regarded as the core concept of environmental management research (Gabler et al., 2015; Yu and Huo, 2019). Previous works have studied a wide range of factors affecting environmental orientation, such as institutional/regulatory forces (Charan and Murty, 2018; Kang and He, 2018), stakeholder pressure (Garces-Ayerbe et al., 2012), organizational resources and cultural factors (De Medeiros et al., 2014), which may lead enterprises to divert attention to protect the natural environment. It is believed that the value of protecting the environment and the moral standards of environmental responsibility and the commitment of senior managers to environmental protection will effectively promote the implementation of environmental management practice in the enterprise. In addition, the government, customers, competitors and other external stakeholders’ views on environmental needs will also make enterprises realize the importance of green, and the external environment orientation is related to the managers’ views on meeting the environmental needs of external stakeholders (Zameer et al., 2020). Green management is the source of competitive advantage of enterprises in the new situation. According to RBV, valuable, rare and hard-to-imitate resources can help enterprises achieve sustainable competitive advantage. Zameer et al. (2018) have similar beliefs that products that are difficult to counterfeit create sustainable advantage to the firm. In addition, organizational resources help enterprises to utilize key resources to reinforce core competencies (Salunke et al., 2019; Nagano, 2020). That is to say, enterprises gain strong competitive advantage by providing consumers with green products. Furthermore, scholars in environmental management research have long believed that the improvement of enterprise environmental orientation level will improve its strategic response to environmental problems (Paulraj, 2009; Keszey, 2020). In addition, these studies have indicated that environmental orientation strategy has a positive influence on environmental performance. However, studies exploring the linkage between environmental orientation and competitiveness are scarce in the literature. Whereas, if we talk about the practice of environmental management, then we can see that enterprise environmentalism provides competitive advantages, which can greatly reduce costs in the long run or help distinguish products and services. Generally, enterprises save costs through the use of recycled raw materials or by energy saving and process improvement. In addition, enterprises’ production and customers’ consumption require to follow the ecological environment protection and implement clean production and green consumption. Under the circumstances of this general trend and the increase of awareness of human environmental protection, a considerable number of consumers now have great environmental awareness. Therefore, the consumer market with emphasis upon environmental protection can also help to gain competitive advantage. Moreover, according to the RBV, as a kind of strategic ability, the environmental orientation has a strong impact on the green practice behavior of enterprises. It can promote the resources and abilities that are required for the implementation of green practice and bring sustained economic and environmental performance and finally bring sustained competitive advantage. Therefore, to sum up, the following hypothesis has been drawn in this study. H3. Environmental orientation reinforces green competitive advantage.

The development of impactful green practices needs organizations to actively respond to complex and volatile business environments. And environmental orientation can provide an organizational procedure for managers to manage complex and changeable business environment. In previous studies, this process was considered as a kind of social complex, innovative and valuable strategic ability (Gabler et al., 2015). Supply chain orientation is a kind of strategic ability to support the development of green supply chain management by meeting business objectives (Patel et al., 2013). The core of environmental orientation, supply chain orientation and resource-based theory is a kind of strategic ability. It supports the development of green innovation by realizing environmental objectives. Accordingly, the motivation of external environmental guidance such as the green demand of customers and the competition of environmental protection practice in the same industry will influence the enterprise to passively or actively implement green innovation in the light of environmental problems. Nguyen and Hens (2015) found that ISO 14001 certified facilities could not directly improve the environmental performance. On the contrary, adhering to the green principle, enterprises should constantly improve and design innovative methods to improve production process. Green innovation has been widely accepted by the industry and scholars as a powerful method to reduce environmental problems (Chiou et al., 2011; Dangelico et al., 2017; Li et al., 2018; Singh et al., 2020). Therefore, products and services produced in a cost-effective way through green innovation help enterprises to comply with environmental regulations and fulfill their social responsibilities. In summing up discussion, the following hypothesis has been developed. H4. Environmental orientation is pivotal for green innovation. The traditional economic view states that every effort that is made to improve the footprints of the environment will increase the cost and additional burden upon producers. Green innovation is also a practice that leads to the increased cost of using environment sustainable technology and strong compliance with the environment, which may pose adverse effects on the financial performance in the long run (Ambec and Lanoie, 2008). However, Porter hypothesis highlights that innovation is a key determinant to improve the enterprise competitive advantage. With the introduction of more and more environmental protection laws and regulations, it is more and more essential to intend the entire life cycle of the product at the time of product design decisions. Green product and process innovation reduces the negative consequences for the environment on the one end, whereas, on the other end, it also increases the competitive advantage of enterprises (Porter and Van der Linde, 1995). In addition, recent research shows that innovation and environmental management are expected to become the key performance indicators of competitive advantage. Similarly, enterprises will gain competitive advantage when implementing green innovation (Zameer et al., 2020; Wang, 2019). Likewise, through green product innovation, enterprises will obtain cost savings, improve efficiency, productivity, product quality and ultimately enhance the competitive advantage. Moreover, the practice of green innovation can bring green reputation for the firm, differentiated environmental protection of products and the opportunity to enter the new market with new green products. In the recent study of Zameer et al. (2020), they discussed the concept of green competitive advantage as collective learning and abilities to innovate environmental sustainable products for ecological management, which positively influence enterprise ability to be capable to design green products and process innovation. In turn, it can be useful for improving the green image, thus cultivating the competitive advantage. Green innovation is regarded as an intangible strategic capability, which is valuable, rare, cannot be imitated and copied. Based upon said qualities, managers become more capable to use green innovation practices for creating long-lasting advantages and constantly progress and familiarize with the volatile and complex external environment. In conclusion, the following hypothesis is proposed.

Green competitive advantage

493

MD 60,2

494

H5. Green innovation has a significant positive impact on green competitive advantage. The environmental performance is related to the improvement of energy and resource utilization efficiency and the reduction of environmental impact. Similarly, it brings the following benefits: reducing production costs, increasing productivity, building corporate reputation and attracting environmentally conscious customers (Konuk, 2019; Kularatne et al., 2019). In addition, by improving environmental performance, green strategy helps to increase product diversification and reduce costs, thus improving the competitive advantage (Zameer et al., 2020). Generally speaking, implementation of environmental management practices is a way forward to improve environmental performance. Similarly, organization can increase the value of core business plan by successfully implementing an important environmental management practice, green innovation and implementing environmental plan. In addition, green innovation can effectively improve the environmental performance of manufacturing enterprises by improving efficiency and synergy, so as to improve enterprise competitiveness and save costs (Rao and Holt, 2005). Moreover, enterprises can improve their green image by improving their environmental adaptation mechanism and performance, which will help them to develop new business opportunities and improve their competitive advantage (Chen, 2008). Therefore, it can be presumed based upon the previously developed hypothesis and discussion in this section that environmental orientation of the firm will significantly make firms capable of green innovations, which can further influence competitive advantages. To test this assumption in the context of equipment manufacturing organizations of China, the authors have developed the following hypothesis. H6. Green innovation plays a mediating role in the impact of environmental orientation on green competitive advantage. Furthermore, academic research has highlighted that big data and information are considered as an important asset. Similarly, being an important asset, information drawn from big data through business analytics helps organizations to develop successful innovations and attain green competitive advantage. The existing studies have shown that green innovation works as a mechanism in the impact of various factors on competitive advantage. Such as the study of Chang (2011) explored and confirmed that environmental management practice contributes toward competitive advantage passing via green innovation. Zameer et al. (2020) highlighted the mediating role of green production in the relationship of customer pressure and competitive advantage. Similarly, customer can create pressure through social media and such media is the biggest engine for big data. Business analytics relies on big data. Therefore, one can suppose that green innovation will work as a mechanism in the impact of business analytics on competitive advantage. Moreover, it has been already discussed that business analytics, environmental orientation and green innovation directly influence competitive advantage. And also, it is presumed that business analytics contributes toward green innovation. For comprehensive analysis, it is necessary to explore the indirect/mediating effects. It can be argued that in a situation when firms use business analytics, it poses opportunities for green innovation, which can improve competitive advantage. Such kind of associations are regarded as mediating effect hypothesis (Preacher and Hayes, 2008). Thus, aforementioned discussion can be summarized as: business analytics will contribute toward green innovation, which will further strengthen competitive advantage. Similarly, following mediation hypothesis is proposed. H7. Green innovation mediates the relationship between business analytics and green competitive advantage.

2.2 Research framework This paper first performs a literature review, to summarize the evidence on green innovation, business analytics, environmental orientation and green competitive advantage into a new conceptual model from the perspective of how and to what extent business analytics and environmental orientation can play a role to strengthen green competitive advantage. Although some of the scholars tried to explore the ways to strengthen green competitive advantage, no study explores the role of green innovation as mediator. And also, the role of business analytics and environmental orientation was completely ignored in the model of green competitive advantage. Thus, to fill aforementioned theoretical shortcomings in research, the study developed a novel theoretical framework. In this study, authors have postulated seven hypotheses, five are developed to show direct relations, whereas other two are used to indicate indirection or mediating impact of variables. The framework designed is presented in the form of Figure 1.

Green competitive advantage

495

3. Methodology 3.1 Measurement The selection of measurement variables is a crucial task for obtaining data and also for empirical analysis. Similarly, this selection can play a vital role toward the reliability and validity of the results. The key variables designed in this study are environmental orientation, business analytics, green innovation and green competitive advantage. Observable constructs were used to represent selected variables of the study. The observable constructs were obtained from previous studies. The variables used to measure business analytics were borrowed from the study of Duan et al. (2020). The study measures business analytics from three different perspectives, that is, descriptive, predictive and prescriptive analytics. Based upon these three categories of business analytics, firms can analyze what is currently happening in the market, what would happen and how to respond to this situation while making decisions. By doing so, this study opens new horizons for the research in the area of business analytics and its consequences. Green innovation was measured using four variables, these were adopted form the study of Cuerva et al. (2014). In this study, four items were retained from the study of Chan et al. (2012) to measure environmental orientation. Finally, green competitive advantage was measured using fouritem scale adopted from the study of Chen and Chang (2013). We used five-point Likert scale to measure the observable constructs. In the questionnaire, 1 was used for strongly disagree, 2 represents disagree, 3 used for neutral, 4 means agree and 5 indicates strongly agree. Initially, we designed a survey instrument/questionnaire in English. Keeping in the mind that survey is being conducted among Chinese respondents, the questionnaire was translated into Chinese language prior to its circulation among respondents. Two academic experts were requested to ensure that the correct academic language is used. And also, it was ensured that

Business Analytics

Green Innovation

Environmental Orientation

Green Competitive Advantage

Figure 1. Theoretical framework

MD 60,2

496

questionnaire should not contain any discrepancy due to translation. Once the questionnaire is finalized, it was distributed among target people. 3.2 Sampling and data collection The core purpose of this paper is to discover the role of business analytics and environmental orientation toward green competitive advantage. And also, the role of green innovation as mediator is being explored in this study. Therefore, manufacturing companies that implement green practices in China such as equipment manufacturing companies were selected as a unit of analysis. To get the reasonable data, it should be ensured that target respondents were chosen carefully. Similarly, in this study we choose to target manageriallevel employees using convenient sampling approach, and it was ensured that only those people who have sound knowledge of the target variables should fill the questionnaire. The target audience was shortlisted through personal linkages, alumni, friends and social media. Similarly, they were contacted prior to sending them questionnaire. The questionnaires are distributed mainly through online electronic questionnaires. The invitations to the questionnaire are sent via e-mail, WeChat and LinkedIn. Initially, the survey was done with 30 managerial-level employees for the purpose of pilot testing. Once it was confirmed that the survey instrument is reliable and can be used for further data collection, we have sent electronic questionnaire to 1,500 managers at various levels, including front-line managers, middle-line managers and top managers. 408 responses were received. After evaluation of unengaged responses, outliers and missing values, we got 388 valid responses that we processed for further analysis. 3.3 Method of analysis In empirical research, the method used to analyze the data is crucial to the reliability and validity of research. In case an inappropriate method was adopted for analysis, the results may be spurious. Therefore, it is important to choose an appropriate method of analysis for empirical estimations. Similarly, keeping in mind the sample size and following the relevant recent studies, in this paper we have opted to use two statistics software SPSS 23.0 and AMOS 23.0. Using said software, a two-step method of analysis introduced by Anderson and Gerbing (1988) has been employed. First, it was ensured through discriminant and convergent validity that variables used in the study are unidirectional and reliable. Hair et al. (2010) state that SEM is most robust and suitable technique to analyze survey data. Similarly, as a second step, we employed covariance-based structural equation modeling (SEM) technique to test the proposed hypothetical relationships. Finally, mediating paths are being tested using bootstrap methodology. We used 5,000 resamples while employing bootstrap method. 4. Results and findings The previous section is used to discuss in detail the methodology adopted for this study. Using the aforementioned methodology, the data is processed and analyzed. This section elaborates the findings of the study. As the survey has been conducted among the managerial-level employees working in equipment manufacturing sector of China, therefore, it is necessary to have a general positioning of the basic background of the respondents. Similarly, the study explored the demographic profile of respondents using the descriptive statistics method. 4.1 Descriptive analysis The results from descriptive statistics have shown that the respondents of our study are mostly managers. The evaluation of gender of the respondents shows that 58.5% respondents are male, whereas the rest 41.5% are female respondents. Most of

the respondents are 30–39 years old, followed by the second group that is 40–49 years old. The rest of the respondents were belonging to the age group of 20–29 years. Only few belong to the age group that is above 50 years old. About 55% respondents say that they hold an undergraduate degree and 40% have attained a postgraduate qualification. Our respondents consist of front-line manager, middle-line managers and top management. The majority of the responses has indicated that they are working in front-line management (48%). Followed by middle managers who were 36% and top-level management who participated in the survey was 16%. The detailed descriptive analysis is shown in Table 1. Based upon the role of the participants, it was anticipated and ensured that these people have sound knowledge of the constructs used in the survey and application of these concepts in their organization.

Green competitive advantage

497

4.2 Reliability and validity analysis Following the evaluation of demographic profile, it is necessary to check the reliability and validity of the scale. As we already discussed that the measurement variables used in scale development were adopted from existing literature, therefore, prior to the application of SEM, one should check the reliability and validity of the scale. Reliability is used as an indicator that can accurately reflect the internal consistency of the scale. Following the study of Zameer et al. (2019), the model fit results were evaluated and it is confirmed that CFI is above 0.90, GFI is above 0.80, CMIN/DF is below 5.0 and RMR is less than 0.08. The detailed results are presented in Table 2. This study used Likert scale, so first of all we checked the reliability of scale using Cronbach’s alpha (α) coefficient. The alpha (α) value for all of the variables under investigation is above the threshold level of 0.70, which endorses the scale reliability. Once the overall reliability of the scale is confirmed, the next step is to perform in-depth analysis to check the indicator validity, internal consistency reliability, convergent validity and discriminant validity. According to the study of Bagozzi and Yi (1991), if the estimated values of the factor loadings are above the minimum level of 0.50, it means that indicator validity is achieved. Similarly, the results shown in Table 3 describe that factor loading value of every construct included in final measurement model is above the minimum level of 0.50. Thus, our findings have achieved the required level of indicator validity. Internal consistency can be

Frequency

%

Gender Male Female

182 138

58.5% 41.5%

Age 20–29 Years 30–39 Years 40–49 Years 50 Years and above

91 140 128 29

23.5% 36.1% 33.0% 7.4%

Education High school and below Undergraduate Postgraduate

20 213 155

5.0% 55.0% 40.0%

Job level Front-line manager Middle-level manager Top-level management Total respondents

186 140 62 388

48.0% 36.0% 16.0% 100%

Table 1. Descriptive analysis

MD 60,2

498

ensured through composite reliability (CR) estimation. The results for CR are also shown in Table 3. Hair et al. (2010) argued that the threshold level of CR should be above 0.70 to claim the internal consistency. In this study, the results of CR indicate that the values are over the required threshold of 0.70, which approves that internal consistency has been achieved. Convergent validity can be measured using average variance extracted (AVE) (Hair et al., 2010). Similarly, in this study, the AVE has been estimated and presented in Table 3. The results portray that AVE for every construct is over the threshold of 0.50, which had endorsed the existence of convergent validity. After the evaluation and confirmation of indicator validity, internal consistency reliability and convergent validity, the next step is the evaluation of discriminant validity. The study used widely acceptable method for the evaluation of discriminant validity, that is, Fornell– Larcker criterion. Fornell and Larcker (1981) argued that correlation between the selected constructs should be lower than the square root of the AVE of the constructs under investigation. The results for discriminant validity are presented in Table 4. It can be seen that the square root of the AVE of all constructs is below the correlation of these constructs. Similarly, it can be claimed now that discriminant validity is achieved. Index

Estimated value

Threshold

CFI 0.96 >0.90 GFI 0.89 >0.80 NFI 0.94 >0.90 TLI 0.95 >0.75 RMR 0.03 0.5)

Business analytics 0.852 0.860 0.673 BA1 0.87 BA2 0.83 BA4 0.76 Environmental orientation 0.888 0.895 0.683 EO1 0.67 EO2 0.83 EO3 0.89 EO4 0.89 Green innovation 0.955 0.957 0.847 GINN1 0.89 GINN2 0.92 GINN3 0.95 GINN4 0.92 Green competitive advantage 0.954 0.955 0.842 GCADV1 0.93 GCADV2 0.94 GCADV3 0.91 GCADV4 0.89 Note(s): CR is composite reliability; AVE is average variance extracted; loadings are basically factor loadings; Table 3. factor loadings are significant at: p < 0.001 Summary of validity and reliability analysis Source(s): Authors’ estimation

Another problem that is associated with the validity and reliability issues is the common method bias. Nowadays scholars in the survey research methods are concerned about the common method bias. Similarly, to explore the common method bias, the study used two methods, that is, Harman’s one-factor test followed by the marker variable concept. First, Harman’s one-factor test was used by putting all the factors into principal components factor. The results from this technique indicated that none of the single factor accumulates 50% variance. In the second step, marker variable concept was employed. Common method variance is found to be 0.088, which is below the threshold level of 10%. Similarly, based upon the results for both approaches, it can be claimed that no common method bias exits. Thus, the data can be used for the evaluation of causal relationships using SEM models.

Green competitive advantage

499

4.3 Hypothesis testing In the previous sections, the causal relationships between various constructs were developed and several hypotheses proposed. To test those hypotheses, the study employed SEM. The study explored various paths among business analytics, environmental orientation, green innovation and green competitive advantage. The explored paths include direct and indirect paths. SEM is a mature and standardized statistical method, often used to provide direct and indirectly linked parameter estimates and causal relationships between observed or reference variables and unobserved variables. First, the direct relations among the variables have been measured. Following this, we explored the indirect/mediating relationships. Prior to the discussion on hypothesis testing, it was ensured that structural model fulfills the adequate criteria of goodness of fit. It has been confirmed that the value of CFI of structural model is above 0.90, the value of GFI is above 0.80, the value of RMR is below 0.08 and finally the value of CMIN/DF is below 5.0. Similarly, once the goodness of fit indexes confirm that model meets the criteria, one can move to discuss results from hypothesis testing. Figure 2 is used to show the path model along with path coefficients ( β). Hypothesis testing results are presented in Figure 2 and detailed results are shown in Table 5. The combined results of Figure 2 and Table 5 are being used for testing the direct relationships among the variables. The results of path analysis of the aforementioned structural equation model are summarized in Table 5, and the conclusions of each hypothesis can be obtained. In Table 5, statistically significant (p-values), critical ratio, standard error and path coefficients ( β) are presented. These values provide good basis to decide the acceptance or rejection of a hypothesis. The path from business analytics to green innovation is used as Hypothesis 1. The value of path coefficients ( β) and p-value of the path from business analytics to green innovation are 0.549 and 0.000, respectively. The p-value is below 0.05 and path coefficient is positive, which is indicating that the business analytics has significant impact on green innovation. Therefore, Hypothesis 1 is being accepted. Another path was developed to check the impact of business analytics on green competitive advantage. Hypothesis 2 was designed Environmental orientation Environmental 0.827* orientation Green competitive 0.428 advantage Business analytics 0.711 Green innovation 0.686 Source(s): Authors’ estimation based on CBSEM

Green competitive advantage

Business analytics

Green innovation

0.917* 0.474 0.481

0.820* 0.778

0.921*

Table 4. Discriminant validity: Fornell–Larcker criterion

MD 60,2

H2

Business Analytics

β = 0.171

H1 β = 0.549

500

Environmental Orientation

H3

Independent construct Business analytics Business analytics

→ →

H3

Environmental orientation Environmental orientation Green innovation



H5

Green Competitive Advantage

β = 0.108

β

S.E.

C.R.

P

Conclusion

Green innovation Green competitive advantage Green competitive advantage Green innovation

0.549 0.171

0.054 0.086

12.641 2.448

0.000 0.014

Supported Supported

0.108

0.062

1.743

0.081

Supported

0.283

0.045

6.505

0.000

Supported

Green competitive advantage

0.272

0.068

3.963

0.000

Supported

Dependent construct

H1 H2

H4

β = 0.272

H4

β = 0.253

Figure 2. Estimated structural model (Author’s estimations)

Table 5. Standardized direct effects

H5

Green Innovation

→ →

Source(s): authors’ estimations

to show said relation. The path coefficient and p-value for Hypothesis 2 are 0.171 and 0.014, respectively, which shows that path is statistically significant at 5% level of significance. Similarly, Hypothesis 2 has been supported. The study also explores the role of environmental orientation of the firm toward green competitive advantage. Hypothesis 3 was proposed in this regard. The test results from Hypothesis 3 indicate that the path coefficient ( β) is 0.108 and p-value is 0.081. Although p-value is above 0.05, it is below 0.10 and path coefficient ( β) is positive, which indicate that the results are statistically significant at 10% level of significance. Hence, Hypothesis 3 is supported. In addition, the study also tests the role of environmental orientation toward green innovation. To show the said relationship, Hypothesis 4 was proposed. The results for this hypothetical path show that the path coefficient ( β) is 0.283 and p-value is 0.000, which indicate that environmental orientation plays a significant and positive role toward green innovation. Thus, Hypothesis 4 has been accepted. Finally, fifth hypothesis (H5) was developed to check the relation of green innovation and green competitive advantage. The estimations for this hypothetical path (H5) Independent

Table 6. Mediating effects

Mediator

Dependent

Estimate

S.E.

C.R.

p

Conclusion

H6 BA GINN GCAD 0.183 0.059 3.102 0.001 Supported H7 EO GINN GCAD 0.078 0.031 2.516 0.001 Supported Note(s): GINN green innovation, BA is business analytics, EO is environmental orientation and GCAD is green competitive advantage

show that the path coefficient ( β) is 0.272 and p-value is 0.000, which indicate that green innovation plays a significant and positive role toward green competitive advantage. It has been discussed earlier that the study tests indirect relations along with the direct paths. Similarly, to explore the mediating relations, two mediation hypotheses were proposed. These hypotheses were proposed to check the mediation effects of green innovation on business analytics and green competitive advantage. Also, the mediating role of green innovation in the impact of environmental orientation on green competitive advantage was proposed. The estimated results from mediation testing are presented in Table 6. The results for the mediation of green innovation in the impact of business analytics on green competitive advantage have indicated that the path coefficient and p-value are 0.183 and 0.001, respectively. The p-value is below 0.05. Hence, Hypothesis 6 has been accepted. It has been already stated and confirmed that business analytics has direct positive effect on green competitive advantage. Consequently, it can be said that the mediation is partial mediation. Similarly, it can be concluded that green innovation partially mediates the relationship between business analytics and green competitive advantage. Moreover, another mediation effect hypothesis was proposed, that is, green innovation mediates the relation between environmental orientation and green competitive advantage. The results for this hypothesis (H7) show that path coefficient is 0.078 and p-value 0.001. The p-value is below 0.05, which shows that hypothesis (H7) is supported. It has been already stated and confirmed that environmental orientation has direct positive impact on green competitive advantage. Therefore, it can be said that the mediation is partial mediation. Similarly, it can be concluded that green innovation partially mediates the relationship between environmental orientation and green competitive advantage. 5. Discussion and conclusion The ability of business analytics is based on the big data that have been closely watched and valued by entrepreneurs and scholars. However, it is still a relatively new concept so far, and the academic community has not reached a consensus on it. In this study, authors have tried to explore the link between business analytics, green innovation and green competitive advantage. More specifically, the study classified the items from previous researches to measure the business analytics. Three dimensions were used to measure the business analytics concept. These dimensions include descriptive, predictive and prescriptive analytics. The three dimensions of business analytics, in-depth analysis and insight ability and developed corresponding scales for each dimension provide basis to explore the interaction of business analytics with green innovation and green competitive advantage through empirical investigations. The study used the data of equipment manufacturing sector organizations that have familiarity with the business analytics. Prior to the specific empirical analysis of the collected data, the reliability and validity of the theoretical model and measurement scale developed in this research were analyzed. After the evaluation and confirmation of reliability, validity and model fit, the causal model is used to check the impact of business analytics on green innovation and green competitive advantage. The empirical estimations have shown that business analytics has significant positive contribution toward green innovation and green competitive advantage. Few years back, George and Lin (2017) developed a stylistic model to link the role of business analytics toward innovation. Similarly, our study extends their study and provides empirical evidence. Moreover, the study of Duan et al. (2020) also found similar results that green innovation is reinforced by business analytics. Whereas no study explored the role of business analytics toward green competitive advantage, therefore, it is considered to be the extension in existing literature. Similarly, this study adds value to the exiting researches and contributes theoretically.

Green competitive advantage

501

MD 60,2

502

Furthermore, this study explored the influence of environmental orientation and green innovation on the acquisition of green competitive advantage and verifies the complex causal relationship between the aforementioned variables. The empirical analysis of the final structural equation model results shows that all of the theoretical assumptions have been verified. The results show that environmental orientation has a positive impact on green innovation, and environmental orientations are positively related to green competitive advantage. In other words, environmental orientation can be valuable capability that managers can use to develop and implement green innovations to help companies to gain a green competitive advantage. In addition, results of the green innovation mediating effect focused on providing guidance on the ability to implement green innovation, giving companies the prospect to design and exert effective and valuable green innovation. Managers should be aware of the important role of environmental orientation in order to control and determine the strategic direction of green innovation practices, rather than simply passively responding to external environmental pressures. The mediation of green production is verified in the study of Zameer et al. (2020), and the impact of environmental orientation on green innovation is confirmed in the study of Lu et al. (2013). Thus, it can be stated that our results are consistent with the existing studies. Further, Papadas et al. (2019) highlighted the role of green orientations toward competitive advantages. Thus, it is concluded that business analytics and environmental orientation are two pillars that are highly significant for green innovations and green competitive advantage, although business analytics and environmental orientation have a pivotal role toward green innovation as well as green competitive advantage. However, it is concluded that the role of business analytics is more powerful compared with the environmental orientation. Though, environmental orientation is the key factor of green innovation, but its direct role toward green competitive advantage is not so strong. Similarly, to check the other mechanisms, the role of green innovation was explored. The conclusion has indicated that green innovation mediates the relationship among business analytics, environmental orientation and green competitive advantage. Thus, this study confirms a mechanism of green innovation in the impact of business analytics and environmental orientation on green competitive advantage. 6. Research implications Big data business analytics has brought a new resource, new technology and new tools, and it has caused a change in thinking and concepts. Business analytics will definitely make greater contributions to economic and social development. Taking the recent technological advancements in the background, the study empirically validates the impact of business analytics on green innovation and green competitive advantage. To effectively develop and utilize the value of big data, companies must actively cultivate an innovation-oriented culture and build a knowledge enterprise through business analytics. For the intention of successful innovations, managers need to focus on business analytics and encourage the knowledge climate within the organization. Knowledge climate is conducive to the introduction, expansion and application of big data business analytics, bringing lasting innovation momentum and competitive advantages, which is also difficult for other external organizations to imitate/counterfeit. The study of Zameer et al. (2018) also indicated that counterfeiting creates severe challenges for the legitimate brands. Thus, the role of business analytics is pivotal for the organizations that managers need to understand and use in decision-making. Furthermore, the results of this study show that in the context of environmental problems, companies need to begin to implement green practices passively or more proactively and to develop more environmentally friendly products. Although managers have begun to

understand the importance and potential value of green innovation in practice, many companies are still struggling to develop influential green practices. The resources and capabilities required to implement green practices are imperative to promote green transformation, and the key to green economic transformation is innovation and efficiency. In the practical operation of enterprises, managers should be aware of the important role of environmental orientation in order to control and determine the strategic direction of green innovation practices, rather than simply responding to external pressures, such as compliance with environmental regulations or response to environmental regulations. Moreover, it is expected that environmental orientation and innovation will become important performance indicators for future competitive advantages of enterprises. Although a green management strategy may not necessarily enable a company to directly gain core competitive advantages, by implementing green products, companies will gain cost savings, improve productivity and product quality, thereby increasing their competitive advantage. In addition, it will bring enterprises a green reputation, differentiated environmentally friendly products and opportunities to enter new markets through green product innovation. One potential benefit of green innovation is increasing barriers to entry for other competitors. Therefore, when evaluating the potential results of green innovation practices, managers should go beyond cost efficiency and incorporate green innovation into corporate strategies to build and maintain a green competitive advantage. Finally, as it has been found that environmental orientation and green innovation are closely related to the green competitive advantage. Consequently, it is suggested that equipment manufacturing companies should invest a lot of energy in environmentally oriented green innovation, in order to improve and maintain their green competitive advantage. References Ambec, S. and Lanoie, P. (2008), “Does it pay to be green? A systematic overview”, The Academy of Management Perspectives, Vol. 22 No. 4, pp. 45-62. Anderson, J.C. and Gerbing, D.W. (1988), “Structural equation modeling in practice: a review and recommended two-step approach”, Psychological Bulletin, Vol. 103 No. 3, pp. 411-423. Ashrafi, A., Ravasan, A.Z., Trkman, P. and Afshari, S. (2019), “The role of business analytics capabilities in bolstering firms’ agility and performance”, International Journal of Information Management, Vol. 47, pp. 1-15. Bagozzi, R.P. and Yi, Y. (1991), “Multitrait-multimethod matrices in consumer research”, Journal of Consumer Research, Vol. 17 No. 4, pp. 426-439. Bentler, P.M. (1990), “Comparative fit indexes in structural models”, Psychological Bulletin, Vol. 107 No. 2, p. 238. Braam, G.J., de Weerd, L.U., Hauck, M. and Huijbregts, M.A. (2016), “Determinants of corporate environmental reporting: the importance of environmental performance and assurance”, Journal of Cleaner Production, Vol. 129, pp. 724-734. Byrne, B.M. (2013), Structural Equation Modeling with EQS: Basic Concepts, Applications, and Programming, Routledge, New York, NY. Caputo, F., Cillo, V., Candelo, E. and Liu, Y. (2019), “Innovating through digital revolution”, Management Decision, Vol. 57 No. 8, pp. 2032-2051. Chan, R.Y., He, H., Chan, H.K. and Wang, W.Y. (2012), “Environmental orientation and corporate performance: the mediation mechanism of green supply chain management and moderating effect of competitive intensity”, Industrial Marketing Management, Vol. 41 No. 4, pp. 621-630. Chang, C.-H. (2011), “The influence of corporate environmental ethics on competitive advantage: the mediation role of green innovation”, Journal of Business Ethics, Vol. 104 No. 3, pp. 361-370.

Green competitive advantage

503

MD 60,2

Charan, P. and Murty, L. (2018), “Institutional pressure and the implementation of corporate environment practices: examining the mediating role of absorptive capacity”, Journal of Knowledge Management, Vol. 22 No. 7, pp. 1591-1613. Chen, Y.S. (2008), “The positive effect of green intellectual capital on competitive advantages of firms”, Journal of Business Ethics, Vol. 77 No. 3, pp. 271-286.

504

Chen, Y.S. and Chang, C.H. (2013), “The determinants of green product development performance: green dynamic capabilities, green transformational leadership, and green creativity”, Journal of Business Ethics, Vol. 116 No. 1, pp. 107-119. Chen, C.-L. and Tsai, C.-H. (2016), “Marine environmental awareness among university students in Taiwan: a potential signal for sustainability of the oceans”, Environmental Education Research, Vol. 22 No. 7, pp. 958-977. Chiang, R.H., Grover, V., Liang, T.-P. and Zhang, D. (2018), Strategic Value of Big Data and Business Analytics, Taylor & Francis, London. Chiou, T.-Y., Chan, H.K., Lettice, F. and Chung, S.H. (2011), “The influence of greening the suppliers and green innovation on environmental performance and competitive advantage in Taiwan”, Transportation Research Part E: Logistics and Transportation Review, Vol. 47 No. 6, pp. 822-836. Coccia, M. (2017), “Sources of technological innovation: radical and incremental innovation problemdriven to support competitive advantage of firms”, Technology Analysis & Strategic Management, Vol. 29 No. 9, pp. 1048-1061.  and Corcoles, D. (2014), “Drivers of green and non-green innovation: Cuerva, M.C., Triguero-Cano, A. empirical evidence in Low-Tech SMEs”, Journal of Cleaner Production, Vol. 68, pp. 104-113. Dangelico, R.M., Pujari, D. and Pontrandolfo, P. (2017), “Green product innovation in manufacturing firms: a sustainability-oriented dynamic capability perspective”, Business Strategy and the Environment, Vol. 26 No. 4, pp. 490-506. De Medeiros, J.F., Ribeiro, J.L.D. and Cortimiglia, M.N. (2014), “Success factors for environmentally sustainable product innovation: a systematic literature review”, Journal of Cleaner Production, Vol. 65, pp. 76-86. Dechezlepr^etre, A. and Sato, M. (2017), “The impacts of environmental regulations on competitiveness”, Review of Environmental Economics and Policy, Vol. 11 No. 2, pp. 183-206. Duan, Y., Cao, G. and Edwards, J.S. (2020), “Understanding the impact of business analytics on innovation”, European Journal of Operational Research, Vol. 281 No. 3, pp. 673-686. Engelman, R.M., Fracasso, E.M., Schmidt, S. and Zen, A.C. (2017), “Intellectual capital, absorptive capacity and product innovation”, Management Decision, Vol. 55 No. 3, pp. 474-490. Erevelles, S., Fukawa, N. and Swayne, L. (2016), “Big Data consumer analytics and the transformation of marketing”, Journal of Business Research, Vol. 69 No. 2, pp. 897-904. Fan, W. and Bifet, A. (2013), “Mining big data: current status, and forecast to the future”, ACM sIGKDD Explorations Newsletter, Vol. 14 No. 2, pp. 1-5. Feng, Y. and Guo, Y. (2018), “Research on business model innovation in the era of big data”, 3rd International Conference on Humanities Science, Management and Education Technology (HSMET 2018), Atlantis Press, Paris. Ferraris, A., Mazzoleni, A., Devalle, A. and Couturier, J. (2019), “Big data analytics capabilities and knowledge management: impact on firm performance”, Management Decision, Vol. 57 No. 8, pp. 1923-1936. Fiorini, P.D.C., Jabbour, C.J.C., de Sousa Jabbour, A.B.L., Stefanelli, N.O. and Fernando, Y. (2019), “Interplay between information systems and environmental management in ISO 14001-certified companies”, Management Decision, Vol. 57 No. 8, pp. 1883-1901. First, I. and Khetriwal, D.S. (2010), “Exploring the relationship between environmental orientation and brand value: is there fire or only smoke?”, Business Strategy and the Environment, Vol. 19 No. 2, pp. 90-103.

Fornell, C. and Larcker, D.F. (1981), “Evaluating structural equation models with unobservable variables and measurement error”, Journal of Marketing Research, Vol. 18 No. 1, pp. 39-50. Gabler, C.B., Richey, R.G. Jr and Rapp, A. (2015), “Developing an eco-capability through environmental orientation and organizational innovativeness”, Industrial Marketing Management, Vol. 45, pp. 151-161. Garces-Ayerbe, C., Rivera-Torres, P. and Murillo-Luna, J.L. (2012), “Stakeholder pressure and environmental proactivity: moderating effect of competitive advantage expectations”, Management Decision, Vol. 50 No. 1, pp. 189-206. G€artner, B. and Hiebl, M.R. (2017), Issues with Big Data. The Routledge Companion to Accounting Information Systems, Routledge, London. George, G. and Lin, Y. (2017), “Analytics, innovation, and organizational adaptation”, Innovation, Vol. 19 No. 1, pp. 16-22. Gillon, K., Aral, S., Lin, C.-Y., Mithas, S. and Zozulia, M. (2014), “Business analytics: radical shift or incremental change?”, Communications of the Association for Information Systems, Vol. 34 No. 1, pp. 287-296. Graham, S. and Potter, A. (2015), “Environmental operations management and its links with proactivity and performance: a study of the UK food industry”, International Journal of Production Economics, Vol. 170, pp. 146-159. Gonzalez-Benito, J. and Gonzalez-Benito, O. (2005), “An analysis of the relationship between environmental motivations and ISO14001 certification”, British Journal of Management, Vol. 16 No. 2, pp. 133-148. Hair, J.F., Anderson, R.E., Babin, B.J. and Black, W.C. (2010), Multivariate Data Analysis: A Global Perspective, Pearson Upper, Saddle River, NJ. Hart, S.L. (1995), “A natural-resource-based view of the firm”, Academy of Management Review, Vol. 20 No. 4, pp. 986-1014. Hu, L. and Bentler, P.M. (1995), “Evaluating model fit” in Hoyle, R.H. (Ed.), Structural Equation Modeling: Issues, Concepts, and Applications, Sage, Newbury Park, CA, pp. 76-99. Kamble, S.S., Gunasekaran, A., Goswami, M. and Manda, J. (2019), “A systematic perspective on the applications of big data analytics in healthcare management”, International Journal of Healthcare Management, Vol. 12 No. 3, pp. 226-240. Kang, Y. and He, X. (2018), “Institutional forces and environmental management strategy: moderating effects of environmental orientation and innovation capability”, Management and Organization Review, Vol. 14 No. 3, pp. 577-605. Keszey, T. (2020), “Environmental orientation, sustainable behaviour at the firm-market interface and performance”, Journal of Cleaner Production, Vol. 243 No. 10, p. 118524. Konuk, F.A. (2019), “Consumers’ willingness to buy and willingness to pay for fair trade food: the influence of consciousness for fair consumption, environmental concern, trust and innovativeness”, Food Research International, Vol. 120, pp. 141-147. Kularatne, T., Wilson, C., M ansson, J., Hoang, V. and Lee, B. (2019), “Do environmentally sustainable practices make hotels more efficient? A study of major hotels in Sri Lanka”, Tourism Management, Vol. 71, pp. 213-225. Li, D., Huang, M., Ren, S., Chen, X. and Ning, L. (2018), “Environmental legitimacy, green innovation, and corporate carbon disclosure: evidence from CDP China 100”, Journal of Business Ethics, Vol. 150 No. 4, pp. 1089-1104. Liu, P. and Yi, S.-p. (2017), “Pricing policies of green supply chain considering targeted advertising and product green degree in the big data environment”, Journal of Cleaner Production, Vol. 164, pp. 1614-1622.

Green competitive advantage

505

MD 60,2

Lu, M.T., Tzeng, G.H. and Tang, L.L. (2013), “Environmental strategic orientations for improving green innovation performance in fuzzy environment-Using new fuzzy hybrid MCDM model”, International Journal of Fuzzy Systems, Vol. 15 No. 3, pp. 297-316. MacCallum, R.C., Browne, M.W. and Sugawara, H.M. (1996), “Power analysis and determination of sample size for covariance structure modeling”, Psychological Methods, Vol. 1 No. 2, p. 130.

506

Manyika, J., Chui, M., Brown, B., Bughin, J., Dobbs, R., Roxburgh, C. and Byers, A.H. (2011), Big Data: The Next Frontier for Innovation, Competition, McKinsey Global Institute, Washington, DC. McAfee, A., Brynjolfsson, E., Davenport, T.H., Patil, D. and Barton, D. (2012), “Big data: the management revolution”, Harvard Business Review, Vol. 90 No. 10, pp. 60-68. Mishra, D., Luo, Z., Hazen, B., Hassini, E. and Foropon, C. (2019), “Organizational capabilities that enable big data and predictive analytics diffusion and organizational performance”, Management Decision, Vol. 57 No. 8, pp. 1734-1755. Nagano, H. (2020), “The growth of knowledge through the resource-based view”, Management Decision, Vol. 58 No. 1, pp. 98-111, doi: 10.1108/MD-11-2016-0798. Nguyen, Q.A. and Hens, L. (2015), “Environmental performance of the cement industry in Vietnam: the influence of ISO 14001 certification”, Journal of Cleaner Production, Vol. 96, pp. 362-378. Niebel, T., Rasel, F. and Viete, S. (2019), “BIG data–BIG gains? Understanding the link between big data analytics and innovation”, Economics of Innovation and New Technology, Vol. 28 No. 3, pp. 296-316. Papadas, K.-K., Avlonitis, G.J., Carrigan, M. and Piha, L. (2019), “The interplay of strategic and internal green marketing orientation on competitive advantage”, Journal of Business Research, Vol. 104, pp. 632-643. Patel, P.C., Azadegan, A. and Ellram, L.M. (2013), “The effects of strategic and structural supply chain orientation on operational and customer-focused performance”, Decision Sciences, Vol. 44 No. 4, pp. 713-753. Paulraj, A. (2009), “Environmental motivations: a classification scheme and its impact on environmental strategies and practices”, Business Strategy and the Environment, Vol. 18 No. 7, pp. 453-468. Porter, M.E. and Van der Linde, C. (1995), “Toward a new conception of the environmentcompetitiveness relationship”, The Journal of Economic Perspectives, Vol. 9 No. 4, pp. 97-118. Preacher, K.J. and Hayes, A.F. (2008), “Asymptotic and resampling strategies for assessing and comparing indirect effects in multiple mediator models”, Behavior Research Methods, Vol. 40 No. 3, pp. 879-891. Prescott, M.E. (2014), “Big data and competitive advantage at Nielsen”, Management Decision, Vol. 52 No. 3, pp. 573-601. Ransbotham, S. and Kiron, D. (2017), “Analytics as a source of business innovation”, MIT Sloan Management Review, pp. 1-16. Rao, P. and Holt, D. (2005), “Do green supply chains lead to competitiveness and economic performance?”, International Journal of Operations and Production Management, Vol. 25 No. 9, pp. 898-916. Salunke, S., Weerawardena, J. and McColl-Kennedy, J.R. (2019), “The central role of knowledge integration capability in service innovation-based competitive strategy”, Industrial Marketing Management, Vol. 76, pp. 144-156. Sharma, N., Chakrabarti, A. and Balas, V.E. (2019), “Data management, analytics and innovation”, Proceedings of ICDMAI, Vol. 1, doi: 10.1007/978-981-32-9949-8. Sindakis, S. (2017), Applying Data Analytics for Innovation and Sustainable Enterprise Excellence. Analytics, Innovation, and Excellence-Driven Enterprise Sustainability, Springer, New York, NY. Singh, S.K., Del Giudice, M., Chierici, R. and Graziano, D. (2020), “Green innovation and environmental performance: the role of green transformational leadership and green human resource management”, Technological Forecasting and Social Change, Vol. 150, p. 119762.

Sivo, S.A., Fan, X., Witta, E.L. and Willse, J.T. (2006), “The search for ‘optimal’ cutoff properties: fit index criteria in structural equation modeling”, The Journal of Experimental Education, Vol. 74 No. 3, pp. 267-288. Stadelmann, M. and Schubert, R. (2018), “How do different designs of energy labels influence purchases of household appliances? a field study in Switzerland”, Ecological Economics, Vol. 144, pp. 112-123. Toga, A.W., Foster, I., Kesselman, C., Madduri, R., Chard, K., Deutsch, E.W., Price, N.D., Glusman, G., Heavner, B.D. and Dinov, I.D. (2015), “Big biomedical data as the key resource for discovery science”, Journal of the American Medical Informatics Association, Vol. 22 No. 6, pp. 1126-1131. Wang, C.H. (2019), “How organizational green culture influences green performance and competitive advantage: the mediating role of green innovation”, Journal of Manufacturing Technology Management, Vol. 30 No. 4, pp. 666-683. Wang, Y., Kung, L. and Byrd, T.A. (2018), “Big data analytics: understanding its capabilities and potential benefits for healthcare organizations”, Technological Forecasting and Social Change, Vol. 126, pp. 3-13. Watson, H.J. (2014), “Tutorial: big data analytics: concepts, technologies, and applications”, Communications of the Association for Information Systems, Vol. 34 No. 1, pp. 1247-1268. Wernerfelt, B. (1984), “A resource-based view of the firm”, Strategic Management Journal, Vol. 5 No. 2, pp. 171-180. Yasmeen, H., Wang, Y. and Zameer, H. (2019), “Modeling the role of government, firm, and civil society for environmental sustainability”, International Journal of Agricultural and Environmental Information Systems, Vol. 10 No. 2, pp. 82-97. Yu, Y. and Huo, B. (2019), “The impact of environmental orientation on supplier green management and financial performance: the moderating role of relational capital”, Journal of Cleaner Production, Vol. 211, pp. 628-639. Zameer, H., Wang, Y., Yasmeen, H., Mofrad, A.A. and Saeed, R. (2018), “A game-theoretic strategic mechanism to control brand counterfeiting”, Marketing Intelligence and Planning, Vol. 36 No. 5, pp. 585-600. Zameer, H., Wang, Y. and Yasmeen, H. (2019), “Transformation of firm innovation activities into brand effect”, Marketing Intelligence and Planning, Vol. 37 No. 2, pp. 226-240. Zameer, H., Wang, Y. and Yasmeen, H. (2020), “Reinforcing green competitive advantage through green production, creativity and green brand image: implications for cleaner production in China”, Journal of Cleaner Production, Vol. 247 February, 119119. Zhan, Y., Tan, K.H., Ji, G., Chung, L. and Tseng, M. (2017), “A big data framework for facilitating product innovation processes”, Business Process Management Journal, Vol. 23 No. 3, pp. 518-536.

Corresponding author Ying Wang can be contacted at: [email protected]

For instructions on how to order reprints of this article, please visit our website: www.emeraldgrouppublishing.com/licensing/reprints.htm Or contact us for further details: [email protected]

Green competitive advantage

507