Artificial Intelligence in Cancer: Diagnostic to Tailored Treatment [1 ed.] 0128202017, 9780128202012

Artificial Intelligence in Cancer: Diagnostic to Tailored Treatment provides theoretical concepts and practical techniqu

672 63 11MB

English Pages 318 [300] Year 2020

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Artificial Intelligence in Cancer: Diagnostic to Tailored Treatment [1 ed.]
 0128202017, 9780128202012

Table of contents :
Cover
ARTIFICIAL
INTELLIGENCE
IN CANCER:
DIAGNOSTIC TO
TAILORED
TREATMENT
Copyright
Dedication
Foreword
Preface
1
Life challenge. Cancer
Cancer throughout history
Where are we now?
Statistical assessment of research
False positive and false negative
p-Level
Power analysis
The Normal (Gaussian) distribution XN(μ,σ2)
Kolmogorov-Smirnov test for Normal distribution
Lilliefors test
Shapiro-Wilk W test
t-Test or Students t-test
One group of observations t-test
The sign test
The Wilcoxon signed rank sum test
Two groups of paired observations
Two independent groups of observations
Mann-Whitney test
F test
Levenes test
Bartletts test
One way ANOVA
z-Test
One sample z-test
Two proportion z-test
χ2 test
Bayes theorem
Kolmogorovs axioms
How to read statistical tables
Hope is around the corner. Artificial Intelligence steps in
References
Further reading
2
The beginnings
Doctor's suspicion. Doctor+artificial intelligence combo's diagnosis
Artificial neural networks
Multiple linear regression
Logistic regression
Softmax classifier
Learning paradigms
The backpropagation algorithm
Evolutionary computation learning paradigm
Crossover or recombination
Mutation
Bayesian learning paradigm
Radial basis function neural networks
Extreme learning machine
Adaptive single layer feedforward neural network
ELM/ant colony optimization hybrid
Ant colony optimization feature selection algorithm
Probabilistic neural networks
Deep learning/convolutional neural networks
k-Nearest neighbor
Clustering
Non-hierarchical clustering
Hierarchical clustering
World collapses. Making an informed decision
References
Further reading
3
Pathologist at work
Building the tumor's pattern
Artificial Intelligence and histology
Cox regression model
Artificial Intelligence and immunohistochemistry
Artificial Intelligence and genetics
References
Further reading
4
Surgeon at work
Learning everything about the tumor: Tumor profiling
Making a clean cut with the help of Artificial Intelligence
References
5
Oncologist at work
Establishing a treatment plan. Oncological guides
Chemotherapy and Artificial Intelligence
Support vector machines
References
6
Radiotherapist at work
Establishing a treatment plan
Radiotherapy and Artificial Intelligence
References
7
Survival analysis
Kaplan-Meier survival curve
Life tables
The logrank test
The hazard ratio
Nelson Aalen Filter
Survival regression
The exponential distribution
Poisson process
The log normal distribution
The Weibull distribution
The log logistic distribution or Fisk distribution
Gamma distribution
The Gompertz distribution
References
Further reading
8
Remission and recurrence. What do to next?
Decision trees
Hunt's algorithm
GINI index
Entropy
Misclassification measure
Self organizing maps or Kohonen networks
Cluster network
References
9
Artificial Intelligence in cancer: Dreams may come true someday
References
Index
A
B
C
D
E
F
G
H
I
K
L
M
N
O
P
R
S
T
U
V
W
X
Z
Back Cover

Citation preview

ARTIFICIAL INTELLIGENCE IN CANCER: DIAGNOSTIC TO TAILORED TREATMENT

ARTIFICIAL INTELLIGENCE IN CANCER: DIAGNOSTIC TO TAILORED TREATMENT SMARANDA BELCIUG

Academic Press is an imprint of Elsevier 125 London Wall, London EC2Y 5AS, United Kingdom 525 B Street, Suite 1650, San Diego, CA 92101, United States 50 Hampshire Street, 5th Floor, Cambridge, MA 02139, United States The Boulevard, Langford Lane, Kidlington, Oxford OX5 1GB, United Kingdom © 2020 Elsevier Inc. All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage and retrieval system, without permission in writing from the publisher. Details on how to seek permission, further information about the Publisher’s permissions policies and our arrangements with organizations such as the Copyright Clearance Center and the Copyright Licensing Agency, can be found at our website: www.elsevier.com/permissions. This book and the individual contributions contained in it are protected under copyright by the Publisher (other than as may be noted herein). Notices Knowledge and best practice in this field are constantly changing. As new research and experience broaden our understanding, changes in research methods, professional practices, or medical treatment may become necessary. Practitioners and researchers must always rely on their own experience and knowledge in evaluating and using any information, methods, compounds, or experiments described herein. In using such information or methods they should be mindful of their own safety and the safety of others, including parties for whom they have a professional responsibility. To the fullest extent of the law, neither the Publisher nor the authors, contributors, or editors, assume any liability for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions, or ideas contained in the material herein. Library of Congress Cataloging-in-Publication Data A catalog record for this book is available from the Library of Congress British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library ISBN 978-0-12-820201-2 For information on all Academic Press publications visit our website at https://www.elsevier.com/books-and-journals

Publisher: Masucci, Stacy Senior Acquisitions Editor: Teixeira, Rafael Editorial Project Manager: Young, Sam W. Production Project Manager: Raviraj, Selvaraj Cover Designer: Hitchen, Miles Typeset by SPi Global, India

Dedication

In loving memory of my father, Florin Gorunescu If you remember me, then I don’t care if everyone else forgets Haruki Murakami—Kafka on the Shore I will always remember you, Dad!

Foreword

This is an exceptional book. It takes the reader through the cancer journey of a patient, explaining at each stage what exactly it means for the patient and where artificial intelligence (AI) can and is helping all concerned to understand what is happening and ameliorate the journey and the path. As such it provides a very readable, yet technical, description of the whole area of AI for the cancer patient journey that can be understood and enjoyed by students and researchers from many areas of computer science, mathematics, statistics, physics, and related disciplines. Overall this is book which combines passion and emotion with a comprehensive treatment of the whole area, providing a very scholarly approach while communicating the basic ideas in a clear and understandable manner. Throughout there are lots of examples, web-links and illustrations that make it easy for the novice or lay-person to appreciate the concepts, while the more experienced reader can empathize with the patient, their family and the healthcare professionals while enjoying the background anecdotes, instances and explanations. Thus, the patient can learn to understand what is happening and how to interpret the evidence while the student or

scientist learn about the human journey and the real potential of relevant technical methods to alleviate suffering. As such the book is structured through the stages of the cancer patient journey starting with the initial presentation of symptoms and diagnosis through pathology and testing to treatment, involving the role of AI in surgery, oncology using Big Data and dose control in radiology. This is followed by an explanation of survival analysis, including lots of examples and case studies, so that the reader can better calculate and compute future likelihood of remission, recurrence, and life-expectancy of the patient, following their diagnosis and treatment. In the final chapter the future of cancer research is discussed, including the role of AI in palliative care. Overall this is an outstanding book which grasps its topic with enthusiasm and compassion while showcasing the breadth, depth and applicability of AI in treating the cancer patient, as well as inspiring and motivating the reader to learn more about these exciting new developments and possibilities. Sally McClean Portrush, Northern Ireland

ix

Preface Dear reader, by continuing reading this book you agree on embarking on a rollercoaster ride that will take you through the ups and downs of Artificial Intelligence applied in cancer research. You will pass every single stage that is lived by a cancer patient: finding out that she/he is suspected of having the disease, followed by all the tests, blood work and imaging, etc. The world collapses. But wait, a sparkle, a little light appears at the end of the tunnel. What is that? It is hope in the shape of Artificial Intelligence. Step by step, we see how the doctor’s suspicion rises, and how the diagnostic is set using Artificial Intelligence. After that we go into the pathology lab and watch the pathologist at work. We learn how Artificial Intelligence is applied in histology, immunohistochemistry, and genetics. One step further and we find ourselves in the operating room seeing the surgeon resecting tumors, after she/he preplanned the operation, learning everything about the tumor using Artificial Intelligence. The surgery is over, the patient lives, and now she/he is about to meet the rest of the oncology team. Using Big Data, Artificial Intelligence helps the oncologist establish a tailored treatment plan, taking into account everything she/he knows about the patient, from genetic data to her/his way of living. Maybe chemotherapy is not the only answer, maybe we need to follow the radiotherapist, and see how she/ he can determine the right radiation dosage with the aid of Artificial Intelligence. Through every step of the journey we have only one thing in mind: what are the odds of beating this monster? How long will

the patient survive? We compute them with survival analysis. When the miracle phrase: “you are in remission” is said, we jump up and down with joy. But also we find out that the next 5 years are crucial. The word recurrence hits us right in the chest. What are the chances that the cancer will come back? Artificial Intelligence is here to help us compute the odds. Last but not least, in the final chapter, we learn what might come next in cancer research and how Artificial Intelligence has started to enter the palliative care also. Whether Artificial Intelligence can help our dreams become true or not, only time can tell. This book is addressed to computer scientists, physicians, mathematicians, and the general public. I decided to write it in a friendly way, so that you will want to keep on reading it. I wanted you to understand that Artificial Intelligence is not voodoo, not magic, nor snake oil. It is math and super computational power. In some parts of cancer research, Artificial Intelligence is just in the beginning. I hope that in the future doctors and data scientists will continue to work together in discovering new ways of fighting this awful disease. This book is dedicated to my father, Prof. Florin Gorunescu, whom I lost in 2018 to a rare aggressive form of esophageal cancer. So, when I wrote every chapter I had him in mind and in my thoughts, and I relived everything. That is why you may feel my emotions floating around in the pages of the book. I am a data scientist, and still I did not know how to understand the numbers. I wanted a miracle, and unfortunately

xi

xii

Preface

the miracle did not happen for us. So, I will say that this book is dedicated to my father, and to all the people that are struggling with this disease, may you succeed in your battle, and to those who have lost the battle, may you all rest in peace. I want to express my gratitude to Rafael Teixeira and Samuel Young for offering me this great opportunity of writing this book, and who so warmly sustained this project. Many thanks go out to Prof. Sally McClean for always supporting me, and for taking

the time to review this book before being published, and more especially for always being my friend. Last but not least, I would like to thank my family for their understanding, love, and unconditional support. As a final note, thank you Dad for being my mentor, my friend, and for teaching me Statistics. Until we meet again, may God hold you gentle in the palm of His hand. Smaranda Belciug Craiova, Romania

C H A P T E R

1

Life challenge. Cancer 1.1 Cancer throughout history Cancer seems like a Russian roulette. It is like everyday you hear that someone you know or heard of has just been diagnosed with it. It is scary. You feel insecure and think that you might be next. You comfort yourself with thoughts that you or your family won’t get “hit” by it. You eat the right food, exercise, sleep well, take your minerals and vitamins, you do not smoke, nor drink, and maybe, just maybe, you have no family history records with it. Still, you might have heard of people that never smoked, never ate “bad” food, were fit, and Zen, and still got cancer. Alarmists that say that the cancer rates are going up fuel all these thoughts. Are they right, or are they just adding fire to the general paranoia? Should you live in fear? We believe not. Does our answer mean that you will live a long and healthy life, cancer free? Not quite. According to http://ourworldindata.org/cancer (Accessed May 3, 2019), cancer is the second leading cause of death, after cardiovascular diseases. The Institute for Health Metrics and Evaluation (IHME)—https://www.healthdata.org (Accessed May 3, 2019) states that in 2017 between 9.2 and 9.7 million deaths were caused by various forms of cancer. So, things are not looking great. Now, the alarmists step in and say that in the past the rates were lower. That may be true, but we should take into account the fact that nowadays through screening tests and new Artificial Intelligence (AI) technology we detect cancer more. Thus, before embarking on our cancer history trip, we suggest that you take a deep breath, relax, and enjoy the moment. Carpe diem! You never know what might come next, so do not waste your time fearing it, “What must be, shall be”—Juliet, from William Shakespeare’s Romeo and Juliet. There are many types of cancer. Cancer can begin in any part of the body. The moment when a cell starts growing out of control and starts crowding out the healthy cells, that is where it all begins. Due to the fact that any cell can become cancerous implies the fact that cancer is not just a single disease. Cancer can start in any organ, or even in the blood. Cancer that starts in the same type of cells are somehow alike, but still they are different in what regards their speed of growth and spread. That is what makes each cancer unique, and tailored accordingly to the person that has it. Cancer develops in any living organism, whether it is a human being or an animal. Some of the first proofs of cancer have been found in Ancient Egypt among fossilized bone tumors.

Artificial Intelligence in Cancer: Diagnostic to Tailored Treatment https://doi.org/10.1016/B978-0-12-820201-2.00001-5

1

# 2020 Elsevier Inc. All rights reserved.

2

1. Life challenge. Cancer

Osteosarcoma, or bone cancer, has been discovered in mummies (Hamada and Rida, 1972). In 1862, at Luxor, Edwin Smith, an American Egyptologist, came into custody of a medical papyrus that was named after his owner—the Edwin Smith Papyrus. The Edwin Smith Papyrus is the oldest known medical recording that dates back into 1700 BC, and it is thought to be a copy of the original manuscript dated in 3000 BC. It has 4.68 m in length and it is divided in 17 pages. In 1906, Smith’s daughter gave the papyrus to the New York Historical Society. The Edwin Smith Papyrus is a surgical document. In it, there are described eight cases of breast tumors or ulcerations that were removed through cauterization with the use of a tool called fire drill. Regarding this disease it is written, “There is no treatment”. Georg Moritz Ebers, a German Egyptologist and novelist, acquired another ancient papyrus, in Thebes in 1872. The Ebers papyrus dates back from 1500 BC, and included information regarding an incurable disease that produced tumors found in skin, uterus, stomach, and rectum. Compared to the Edwin Smith papyrus, this papyrus is medical, describing pharmacological, mechanical and magical treatments (Haas, 1999). An example of a cancer remedy is: when dealing with “a tumor against the God Xenus” the recommendation is “do thou nothing there against” (Hajdu, 2011). Around 400 BC, Hippocrates named the non-ulcer forming and ulcer-forming tumors carcinos and carcinoma after the Greek word for “crab.” Remember that it was Karkinos, also known as Carcinus, the giant crab that helped Hydra in the legend battle with Herakles (or Hercules) at Lerna. Even if Herackles’ foot crushed Karkinos, the goddess Hera placed it amongst the stars, as the constellation cancer. There are some explanations to why he chose this name: (a) malignant tumors together with their blood vessels that have swollen, its adherents, resemble the picture of a crab with its legs spread in a circle; (b) malignant tumors have hard tissue that reminds of the hardness of the crab’s shell, and (c) the pain caused by the tumor can be associated with the pinch o a crab’s claw (Hajdu, 2011; Mitrus et al., 2012). In what regards the term “oncology,” the name comes from the Ancient Greek word “onkos” a tragic mask worn in theater plays that symbolized the burdens carried by the person who wore it. On the other hand, oncos in Greek means swelling. Around the year 200 AD, the Greek physician Galen introduced the medical term onkos for all tumor types, whether they were malignant or benign. As time passed by and Rome had fallen, Galen’s teachings regarding cancer started to reach physicians in Constantinople, Cairo, Alexandria, and Athens. While in the West, sorcery was used to cure diseases, in this part of the world scientists were trying to understand and explain cancer. They agreed with Hippocrates’ conclusions that cancer was a result of excess of black bile (a medieval belief that a humor secreted by kidneys or spleen that caused melancholy—according to Merriam Webster dictionary—https://www.merriam-webster .com/dictionary/black%20bile—Accessed May 2, 2019). Hippocrates hypothesized that cancer was correlated to the imbalance of the four body humors: blood, phlegm, yellow bile, and black bile. They also noted that cancer was curable only in its earliest stages. The treatment was based on arsenic. Fast forwarding to the Middle Ages, due to religious beliefs that forbidded scientific improvement, cancer was supposedly be an infectious disease. The excess of black bile idea continued to be present until the 16th century. During the Renaissance, scientists started performing autopsies (remember Rembrandt’s The Anatomy Lesson of Dr. Nicolaes Tulp). Andreas Vesalius, a 16th century anatomist, demonstrated that

1.1 Cancer throughout history

3

the black bile does not exist (Di Lonardo et al., 2015). Meanwhile, Gaspare Aselli, an Italian physician, discovered the lacteal vessels of the lymphatic system, and also advocated the fact that abnormalities of the lymph may cause cancer. It was in the same century, when Paracelsus, born Philippus Aureolus Theophrastus Bombastus von Hohenheim, a Swiss physician, started the “medical revolution” of the Renaissance. Paracelsus is the “father of toxicology,” due to his studies concerning the tumors of mine workers. He suggested that the deposits of sulfur and arsenic salts found in their blood caused the cancer (Hajdu, 2011; Mitrus et al., 2012; Henschen, 1968; Di Lonardo et al., 2015). Another opinion came from Paris in the 1730s, when the physician Claude Gendron stated that cancer results from a locally hard, growing mass, that does not respond to drug treatment, and thus must be removed with all its adhesions. Herman Boerhaave, a Dutch physician, first introduced the idea of cancer being hereditary. He implied that “… cancer was most likely induced by viruses present in water or soil. Once acquired, the cancer viruses remained in the body, and they could be transferred by contagious infections or by heredity” (Di Lonardo et al., 2015). We can easily see the similarities between the idea of an infectious disease present in the Middle Ages and the toxicity discovered by Paracelsus. Tumors started to be better investigated when researchers started using the microscope. In 1838, Johannes Peter Muller, a German physiologist, anatomist, ichthyologist, and herpetologist, suggested that cells form cancer, while in 1869, Rudolf Ludwig Carl Virchow, a German physician, proposed that cancer is cell disease. Giovanni Battista Morgagni, an Italian anatomist, founded in the 18th century the modern anatomical pathology. He taught pathology for 56 years at the University of Padua. Regarding cancer, he reported that it is the result of organ lesion. In the 18th century hospitals that specialized in cancer were opened. Between 1871 and 1874, it was the English surgeon Campbell de Morgan that defined the term metastatis, “the cancer poison” that spread from the primary tumor to the nearby lymphatic nodes. Surgery was the first course of treatment, but unfortunately, due to precarious hygiene few people would survive it. The Scottish surgeon Alexander Morno reported only two out of sixty breast cancer surgery survivors. The number of survivors went up when aseptic surgeries started to take place. In the late 19th century, Marie Sklodowska Currie and Pierre Currie discovered radiation, thus making possible a non-surgical cancer treatment. In 1911, Francis Petyon Rous, an American Nobel Prize—winning virologist, documented a viral cause of cancer in chickens. Being a pathologist, he diagnosed a large lump in the breast of a hen brought to him by a farmer as a sarcoma. He verified if the tumor could be transplanted into other hens that resembled the original one. After establishing that this was possible, he observed that with each passing the tumor became more and more aggressive. The next step was to test whether a virus caused the cancer, thus Rous shredded a sample tumor in saline solution, and passed it afterwards through a filter that removed the bacteria and tumor cells. This extract was injected into healthy hens and curiously enough it produced new tumors (Rous, 1911a,b). His findings were very controversial, so he abandoned cancer research until the 1930s, when a colleague of his, Richard Shope, revealed that another virus, papilloma, caused tumors in rabbits. In the 1960s, scientists discovered the src gene that produces a protein that leads to tumors. After 50 years since the discovery of the sarcoma virus, today known as a retrovirus, Rous won a Nobel Prize. It was Rous’ study that inspired other scientists hence

4

1. Life challenge. Cancer

new examples of tumor-induced viruses in other nonhuman primates (rabbits, cats, mice, etc.) were performed (Shope and Hurst, 1933; Bittner, 1942; Gross, 1951; Sweet and Hilleman, 1960). In 1964, the Epstein-Barr virus, a human herpesvirus that has been associated with lymphoid and epithelial tumors, was observed (Epstein et al., 1964). However, it was not Rous who described the first tumor virus. The Danish physicianveterinarian duo of Vilhelm Elerman and Oluf Bang proved in 1908 that a filterable extract could transmit leukemia among chickens (Ellerman and Bang, 1908). You may ask yourself why was this important finding overlooked, and the answer is that leukemia was no recognized as a neoplastic disease up until 1930 (Van Epps, 2005). In Rous’ Nobel lecture the work of Ellerman and Bang was mentioned. David Paul von Hansemann, a German pathologist, described in his works, published between 1890 and 1919, the multipolar mitoses in animal cells. Theodor Boveri, a German biologist, continued the Hansemann theory and proposed in 1914 the chromosomal theory of cancer, with respect to the abnormal numbers of chromosomes arising in cells by multipolar mitoses in adult cells. Boveri’s theory was based on the observation of some cells in early sea urchin embryos that were having abnormal chromosome complements wandering from their normal growth paths (Balmain, 2001; Boveri, 2008). Boveri wrote in his paper “Zur frage der entstehung maligner tumoren” (Boveri, 1914), that “tumor growth is based on … a particular, incorrect chromosome combination which is the cause of the abnormal growth characteristics passed on to daughter cells.” Boveri’s theory is often recognized as the first chromosomal theory of cancer, when in fact it was Hansemann’s contribution that is the original and the more significant hypothesis (Bignold et al., 2006). In 1907, the American Association for Cancer Research was founded (Creech, 1979). In 1913, with all these discoveries people were still uninformed regarding cancer and a wave of public fear started. Hence, a major breakthrough took place: an article regarding cancer’s warning signs was published in a woman’s magazine, Ladies’ Home Journal (Adams, 1913). In the 1900s, cancer research was marching at full speed. Some of the most important concepts and accomplishments are presented in Table 1.1.

TABLE 1.1 Cancer concepts and successes in the 1900s. Year

Concepts and successes

1907

Giuseppe Ciuffo proves the fact that an infectious agent causes human warts (Ciuffo, 1907)

1908

Vilhelm Elerman and Oluf Bang discover that a virus causes leukemia (Ellerman and Bang, 1908)

1910

P. Rous discovers the sarcoma virus in the breast muscle of a Plymouth Rock hen (Rous, 1911a,b)

1915

Katsusaburo Yamagiwa and Koichi Ichikawa induced cancer in rabbits through the application of coal tar on their skin (Yamagiwa and Ichikawa, 1917)

1933

R. Shope discovers the rabbit papillomavirus (Shope and Hurst, 1933)

1936

J. Bittner finds the mouse mammary RNA (Ribonucleic Acid) transmitted through breast-feeding (Bittner, 1936)

1953

L. Gross isolates the murine polyomavirus, a virus from mice lymphoma that induces the same tumor when infected in newborn mice (Gross, 1953)

1.2 Where are we now?

TABLE 1.1

5

Cancer concepts and successes in the 1900s—cont’d

Year

Concepts and successes

1957

A. Graffi isolates the murine leukemia virus (Graffi, 1957)

1959

B. Eddy and S. Steward discover the polyoma virus (Eddy and Stewart, 1959)

1960

B.H. Sweet and M.R. Hilleman discover the Simian Vacuolating virus 40 (Sweet and Hilleman, 1960)

1964

The Epstein-Barr virus is identified as the first human cancer virus (Epstein et al., 1964)

1969

The oncogene hypothesis is proposed by R. Huebner and G. Todaro (Huebner and Todaro, 1969)

1973

J. Rowley discovers chromosomes in cancer patients have abnormalities (Rowley, 1973)

1973

Ames, Durston, Yamasaki, and Lee find that carcinogens mutate bacterial genes (Ames et al., 1973)

1979

The p53 anti-oncogene is discovered (Kress et al., 1979; Melero et al., 1979; Smith et al., 1979).

1983

Three human transforming genes related to viral ras oncogenes are discovered (Shimizu et al., 1983)

1983

Harald zur Hausen discovers HPV (Human Papilloma Virus) 16 and 18. (Ikenberg et al., 1983)

1986

The Human Genome Project starts

1988

Finding the Anti-oncogene from Retinoblastoma (Weinberg, 1988)

1999

The Human Genome Project produces the sequence of the first chromosome (Mayor, 1999)

In the last two decades cancer research produced massive amounts of data. Knowledge on human genomes along with protein expressions (mass-spectrometry—MS), metabolomic (metabolomics is the comprehensive analysis of metabolites—small molecule that intermediates and produces the metabolism, which is a formidable tool in precision medicine; Clish, 2015), and clinical data, needs to be processed fast and accurate in order to identify cancer in novel patients and to tailor therapy and monitoring. So, as we let go of the past, we shall commence our next section where the present state-of-the-art on cancer research is discussed.

1.2 Where are we now? A great part of this book will regard Statistics. Statistics provides the perfect compass for traveling along AI through the depths of cancer research. Still, one must keep in mind the fact that any measure, any result, that we obtain using Statistics is not a certainty. In this day of age the majority of us use Google to obtain our diagnosis before talking to a healthcare professional. It is a well known fact that if you Google any symptom, whether we are talking about a toothache or bunion, cancer will appear on our search. Obviously, after seeing a physician, one shall Google the diagnosis. If that person has been diagnosed with cancer she/he will start looking at those numbers that appear before her/his eyes. What is the 5-year survival rate? What is the overall survival rate? What is the morbidity or mortality regarding a certain cancer surgery? People Google forums, support groups, trying to see whether they or their loved ones will fall under the “win” category. Seeing the percentages some build up confidence, others fall in depression.

6

1. Life challenge. Cancer

Statistics is a good measure, but in order to trust it we need to know how to interpret it. Because some numbers without proper analysis might lead to wrong conclusions. For example, let us look at the data provided by the World Cancer Research Fund—American Institute for Cancer Research—https://www.wcrf.org/dietandcancer/cancer-trends/data-cancerfrequency-country (Accessed May 4, 2019) in Table 1.2. The data is age-standardized rate per 100,000. TABLE 1.2 Cancer frequencies by country. Rank

Country

Age-standardized rate per 100,000

1

Australia

468.0

2

New Zealand

438.1

3

Ireland

373.7

4

Hungary

368.1

5

United States

352.2

6

Belgium

345.8

7

France (metropolitan)

344.1

8

Denmark

340.4

9

Norway

337.8

10

Netherlands

334.1

11

Canada

334.0

12

New Caledonia (France)

324.2

13

United Kingdom

319.2

14

South Korea

313.5

15

Germany

313.1

16

Switzerland

311.0

17

Luxembourg

309.3

18

Serbia

307.9

19

Slovenia

304.9

20

Latvia

302.2

21

Slovakia

297.5

22

Czech Republic

296.7

23

Sweden

294.7

24

Italy

290.6

25

Croatia

287.2

26

Lithuania

285.8

7

1.2 Where are we now?

TABLE 1.2

Cancer frequencies by country—cont’d

Rank

Country

Age-standardized rate per 100,000

27

Estonia

283.3

28

Greece

279.8

29

Spain

272.3

30

Finland

266.2

31

Uruguay

263.4

32

Belarus

260.7

33

Portugal

259.5

34

Iceland

257.8

35

Guadeloupe (France)

254.6

36

Puerto Rico

254.5

37

Moldova

254.3

38

Poland

253.8

39

Cyprus

250.8

40

Martinique (France)

250.8

41

Malta

249.4

42

Singapore

248.9

43

Japan

248.0

44

Austria

247.7

45

Barbados

247.5

46

French Guiana

247.0

47

Bulgaria

242.8

48

Lebanon

242.8

49

French Polynesia

240.6

50

Israel

233.6

From Table 1.2, we depict that the highest cancer rate for both men and women was in Australia, and also that the top 12 countries come from Europe, North America, and Oceania. We could draw the conclusion that the developed countries (Australia, New Zealand, Ireland, United States, Belgium, metropolitan France, Denmark, Norway, etc.) have higher rates of cancer, thus implying that some factors related to the lifestyle of their citizens lead to cancer. We strongly believe this is not the case. Our explanation is that in these developed countries the screening procedures and state-of-the-art diagnosing techniques detect cancer at a faster

8

1. Life challenge. Cancer

pace than in the other countries. Having screening programs and easy access to medical care without no doubt lead to discovering more cancer cases. The fact that other countries do not report cancer cases does not imply that those are “safer.” More on data and statistics will be covered in what follows.

Statistical assessment of research Statistical analysis is essential. Most of the biomedical research regards designing and implementing new algorithms. Sadly enough, the most important part, the evaluation of their efficiency, is still overlooked. Medical residents and computer science MSc or PhD students lack the know-how of understanding common statistics that are found in clinical journals (Windish et al., 2007). If one fails in comprehending crucial information in those research papers, then it is clear that she/he cannot apply the new found knowledge in practice, ultimately stopping the advancement in patient treatment and subsidiarity, in science. There are several organisms that support and argue about the importance of statistics in medicine, including the International Medical Informatics Association (IMIA) and the National Library Medicine (NLM) degree-granting programs in biomedical informatics. IMIA includes statistics in their recommendation regarding medical informatics education, and NLM requires introductory course in biostatistics ( Johnson, 2003; IMIA, 2000; NLM, 2007). In 2010, a review of the use of statistical analysis in biomedical informatics literature has been done, and the authors’ findings are that the main statistical tools that are used in research paper are: descriptive statistics, elementary statistics, multivariable statistics, and regression analysis (Scotch et al., 2010). If we look at the numbers presented in the paper, we discover some disturbing facts, such as: • 18% of the papers published in the Journal of American Medical Informatics Association (JAMIA) and 36% paper published in the International Journal of Medical Informatics (IJMI) between 2000 and 2007 have no statistics whatsoever in them; • 71% from JAMIA and 62% from IJMI contain basic descriptive statistics; • 42% from JAMIA and 22% from IJMI contain elementary statistics; • 12% from JAMIA and 6% form IJMI contain multivariable statistics; • 9% from JAMIA and 6% from IJMI contain machine learning. Our goal is to try and change this fact. So buckle your seatbelts, because in what follows we shall present step-by-step statistical tools, so that the reader gains this very important skill in biomedical informatics. An old saying is “medicine is not mathematics,” which is true, because in medicine nothing is 100% accurate. On an endoscopy, for instance, an esophageal tumor may appear to be only stage II, when in fact it is stage IV. On MRI small tumors, metastasis, may not be seen with the naked eye, due to the fact that they are measured in microns. A surgeon knows that she/he cannot rely 100% on the scans that she/he reads before going into surgery. She/he knows that things may not be as they seem on the scans or on their interpretation. An oncologist knows that each person is unique, thus hers/his cancer is unique and it may respond different from others to a certain course of treatment. Accuracy in medical diagnosis is still not perfect, producing false positives or false negatives.

1.2 Where are we now?

9

False positive and false negative A false positive is when a person receives a positive result when she/he should have received a negative result. For instance a woman is being told after a routine mammogram that she has breast cancer, when in fact it was just a misread on the scan. In statistics, this represents a Type I error, or α. A Type I error implies the incorrect rejection of the true null hypothesis. The null hypothesis, notated H0, is the generally accepted hypothesis, while its opposite, notated H1, is the alternative hypothesis. Theoretically, in order to test a statistic hypothesis, the whole population should be examined. Due to the fact, that this is practically impossible, a random sample of the population is chosen, just as Jules Henri Poincare stated: “Our weakness does not allow us to embrace the entire universe, and obliges us to decompose it into slices.” Please keep in mind that the random sample should match the characteristics of the whole population (e.g. if we have 54% women and 46% men in the population, then in the sample population the percentage should remain the same, 54% vs 46%). After this step, a hypothesis is formulated. An example of such a hypothesis is: “a new drug produces better outcome, thus being more effective than the drug that is in use today for a certain cancer treatment.” In this case we have: • H0: states that there is no significant difference between the two drugs, the difference reported by the drug company simply being the result of hazard. The null hypothesis refers to the general accepted hypothesis, which the statisticians are trying to reject taking into account new discoveries. Thus, we can say that the true “research” hypothesis is H1. • H1: states that there are significant differences between the new drug and the classical one, differences that are not the result of hazard. A false negative is the opposite concept, when a person receives a negative result on a medical test when she/he should have received a positive result. When in a private conversation a person uses the terms “statistical significance,” many of us do not comprehend it’s meaning, and we just agree and nod our heads, considering her/ him as being some sort of a statistic guru that has just performed complex computations which proved that her/his point cannot be questioned. Now, it is time to show that this “statistical significant” subject is not rocket science, and absolutely everyone should grasp it. Statistical significance implies know-how on three simple concepts: hypothesis testing, the Normal distribution, and the famous p-level. How do we determine which hypothesis is correct, based on the evidence we have? There are two types of statistical tests: parametric and non-parametric. • The parametric test refers to statistical hypothesis, which regard statistical parameters such as the mean, standard deviation or dispersion, as well as the distributions, which govern the data. These types of test include: t-test and ANOVA. Both tests assume that the data has a Normal distribution. • The non-parametric test does not make presumptions on the population parameters or distributions. They are used on data that is not governed by the Normal distribution. These type of tests include: chi-square and Mann Whitney U test. Each parametric test has its non-parametric equivalent. For instance, if you have parametric data from two independent groups then you simply apply the t-test for mean comparison,

10

1. Life challenge. Cancer

otherwise if you do not have parametric data, you apply the Wilcoxon rank-sum test for mean comparison. Before we proceed with the presentation of some of the most used statistical tests, we shall discuss the p-level and Normal distribution. p-Level p-level is a number between 0 and 1 that defines the significance of the obtained results. We can interpret the p-level values as it follows: • if the p-level has a small value, usually 0.05, this indicates that there exists strong evidence against the null hypothesis, thus the results are significant and one must reject H0. • if the p-level value is large (>0.05), this indicates that the evidence is not sufficient to reject the null hypothesis, the results not being statistically significant enough. We would like to underline the fact that above-mentioned cut-off values are arbitrary and offer just an assumption. Let us suppose that our p-value is 0.051. Does this mean that the obtained results are not significant enough? Or what if the p-value is 0.049? Does this mean that the results are significant? Isn’t it a little bit weird to give two opposite resolutions taking into account such a small difference between the two values? There is a debate ongoing regarding the way how researchers should use the p-value: presenting the exact value p ¼ 0.03 or use p < 0.05. Some prefer the first, because it lets each person decide whether the results are significant or not. This problem increases when dealing with cancer research. So, what is there to be done? How can we make sure that we do not misinterpret the p-value? The answer is: having the appropriate sample size, so that we can achieve a high statistical power for our tests. Power analysis The power of a test is the probability that the results obtained performing a test on a small sample data would be the same when the test will be performed on a larger dataset. Grosso modo, it means that if our test shows a significant difference between two cancer treatments, for example, on a small given number of subjects, we can be sure that it will give approximately the same difference when applied on a larger cohort. Unfortunately, cancer literature contains many clinical trials that have a too small number of patients, and thus cannot prove their true efficiency. Before any researcher, from the medical field or not, performs any statistical test on her/his obtained results, she/he must make sure that the statistical computation regarding the appropriate sample size has been performed. As a general rule, the larger the sample size, the higher the statistical power (Altman, 1991). Power analysis lets us find the statistical power: presuming that a certain effect exists, what is the probability of finding that effect? In other words, if we have two cancer treatments, and the null hypothesis states that there is no significant difference between them, but in fact there is, what is the probability of rejecting this null hypothesis when indeed it is false. Example: We are conducting a cancer clinical trial: the control group receives a placebo, and the other group receives a new drug. After running some tests we find that our power is 0.95, implying that in 95% of the time we will obtain a statistical significant result. That being said, it means that in the other 5% of the time the results are not statistical significant.

1.2 Where are we now?

11

So, why should we perform a power analysis? There are three situations: • to validate our research; • to find the sample size that we need in order to trust our results; • to find the power, given a sample size. In order to compute the required sample size we either compute difficult formulae by hand, or use statistical software. All computations are based on the standardized difference. This variable is computed differently for continuous data and categorical data. Nevertheless, in both cases we need the ratio of the difference of interest to the standard deviation of the sample data. Obviously, for a smaller ratio we need a larger set of data. If in our trial we have to deal with continuous data, then we need to known the following variables as input: • • • •

clinically relevant difference, which will be denoted as δ; the significance level (α-two sided); the power (1  β); the standard deviation of the sample data (sd)

Again, we will presume that the data distribution is approximately Gaussian. The standardized difference will be δ/sd. In the case where we do not know the standard deviation, because the necessary sample size has not yet been computed, we can start our trial, compute the standard deviation from the first patients, and then find the needed sample size. At this moment in time we cannot present an example of how to compute the statistical power, because we did not discuss statistical tests. Thus, we shall pass onto the next section.

The Normal (Gaussian) distribution X  N(μ, σ2) The Bell curve, Gaussian, or Normal distribution is the distribution that governs many natural phenomena. Mathematically speaking, we say that the random variable X is normally distributed with mean μ and dispersion σ 2, if the density and distribution function are given by the following formulas: ðxμÞ2 1  p ffiffiffiffiffi fX ðxÞ ¼  e 2σ2 ,  ∞ < x < ∞, σ 2π ðx ðtμÞ2 1 e 2σ 2 dt,  ∞ < x < ∞: FX ðxÞ ¼ pffiffiffiffiffi  σ 2π ∞ In Fig.1.1, we present the normality distribution graph, also known as Gauss Bell, with the following properties: • • • •

the graph is symmetric to x ¼ μ line; x ¼ μ admits a maximum point; the inflection points are μ  σ, μ + σ; the function admits Ox as horizontal asymptote for ∞;

12

1. Life challenge. Cancer

0.25

Normal distribution

0.20

0.15

0.10

0.05

0.00 0

4

2

6

10

8

12

14

16

FIG. 1.1 Gaussian Bell graph with μ ¼ 8 and σ ¼ 3. Normal distribution.

Some empirical, yet very useful rules regarding the Gaussian Bell graph are: • the area under the distribution graph between x ¼ μ  σ and x ¼ μ + σ lines represents 68% of the whole area. In other words, 68% of the observations composing the statistical series fall into this interval. • the area under the distribution graph between x ¼ μ  2σ and x ¼ μ + 2σ lines represents 95% of the whole area. In other words, 95% of the observations composing the statistical series fall into this interval also know as the 95% confidence interval. • the area under the distribution graph between x ¼ μ  3σ and x ¼ μ + 3σ lines represents 99.7% of the whole area. In other words, 99.7% of the observations composing the statistical series fall into this interval. Besides the mean and variance, another two variables describe the Normal distribution: the skewness, which measures the asymmetry, and the kurtosis, which measures the flatness or peakness. Skewed distributions can be (Figs. 1.2–1.4):

0.4

0.3

0.2

0.1

0.0

−2

0

FIG 1.2 Positively skewed distribution.

2

4

6

13

1.2 Where are we now?

0.30 0.25 0.20 0.15 0.10 0.05 0.00 −10.0 −7.5

−5.0 −2.5

0.0

2.5

5.0

7.5

10.0

FIG. 1.3 Normal skewed distributions.

0.4

0.3

0.2

0.1

0.0

−2

0

2

4

6

FIG. 1.4 Negatively skewed distribution.

• positively skewed: the most frequent encountered values are low, making the tail go toward the high values; • normal; • negatively skewed: the most frequent values are high, making the tail go toward the low values. Kurtosis can be (Fig. 1.5): • Mesokurtic—when the distribution is similar to the Normal distribution; • Platykurtic—when the distribution is flat (negative kurtosis); • Leptokurtic—when the distribution is thin (positive kurtosis) There are several methods to test whether your sample data is governed by the Normal distribution or not. The simplest way is to plot the data and see if it appears to be Gaussian

14

1. Life challenge. Cancer

0.6 0.5 0.4 0.3 0.2 0.1 0.0 −10.0

−7.5

−5.0

−2.5

0.0

2.5

5.0

7.5

10.0

FIG. 1.5 Different types of kurtosis (e.g. the highest peaked distribution is leptokurtic, the flat distribution is platykurtic, and the distribution that has its peak at 0.4 is mesokurtic).

or not. Another method is to apply the Kolmogorov-Smirnov Goodness of Fit test (K-S test). This test compares your data sample with a known distribution and indicates whether they have the same distribution or not. A corrected version of K-S test for normality, that returns a better approximation of the test statistic’s distribution is the Lilliefors test. Kolmogorov-Smirnov test for Normal distribution At first we shall present the working hypothesis: • H0: the data has a Gaussian distribution. • H1: the data does not have a Gaussian distribution. In order to perform the test, one must follow these steps: 1. 2. 3. 4. 5. 6.

Create an empirical distribution function for the sample data. Plot the Normal distribution together with the empirical distribution in the same graph. Compute the greatest vertical distance between the two plots. Compute the test statistic. Find the critical value in the K-S Table A1 in the section of this chapter. Compare the critical value to the test statistic value.

If we denote Fdata(x) the empirical distribution, and F0(x) the Normal distribution, then the K-S test statistics value is given by the following formula: D ¼ sup |F0 ðxÞ  Fdata ðxÞ|: x

Lilliefors test We have mentioned above that the Kolmogorov-Smirnov test has been improved by the Lilliefors test. The Lilliefors test corrects the K-S for small values, and also can be applied when you do not know the sample’s mean or SD. Both the null and the alternative hypotheses remain the same as for K-S test. Just like the K-S test, the computations regarding the Lilliefors test are complicated, due to the fact that you need to calculate the z-score for every observation, and thus we recommend that the reader uses a statistical software tool.

15

1.2 Where are we now?

Even so, we are going to present the steps that you must follow in order to perform a Lilliefors test: 1. Compute the z-score for every sample observation using the following formula: zi ¼

Xi  X , i ¼ 1, 2…, n s

where: • • • •

zi ¼ the z - score for every sample observation s is the standard deviation of the sample set Xi ¼ sample observation X ¼ sample mean

2. Determine D ¼ max {D+, D}, where



 i  zi , 1  i  n , n   i1  ,1in , D ¼ max zi  n D + ¼ max

3. Find the critical value for the test from Table A2, and see whether the null hypothesis is rejected or not. For example, let us suppose that we have implemented an AI algorithm that can determine whether a tumor is malignant or benign based on DNA microarrays. After running the algorithm for 24 times, we want to see whether the sample data, consisting of the 24 recorded performances, has a Normal distribution or not. Table 1.3 presents the accuracies recorded after the 24 computer runs, together with the mean, standard deviation (SD), and 95% confidence interval (CI). TABLE 1.3 Accuracies of AI algorithm over 24 computer runs. No. of run

Accuracy (%)

1

95.23

2

65.08

3

74.60

4

69.84

5

71.42

6

71.42 Continued

16

1. Life challenge. Cancer

TABLE 1.3 Accuracies of AI algorithm over 24 computer runs—cont’d No. of run

Accuracy (%)

7

71.42

8

71.42

9

71.42

10

61.90

11

71.42

12

71.42

13

69.84

14

71.42

15

71.42

16

71.42

17

68.25

18

71.42

19

71.42

20

71.42

21

71.42

22

71.42

23

71.42

24

71.42

Mean

71.62

SD

5.60

95 % CI ¼ Mean  1.96  SD

(64.64, 82.60)

In Fig. 1.6 we have plotted the empirical sample distribution (the vertical bars) together with the Normal distribution. We can see that the K-S statistic is equal to 0.43079, and looking at the critical values in Table A1, we find the p-value 2 > : 0, x < 0 where Γ : (0, ∞) ! R is an improper integral, having the following formula (Fig. 1.15): ð∞ ΓðkÞ ¼ ex  xk1 dx, k > 0: 0

0.5

1 degrees of freedom

0.4 2 degrees of freedom

0.3 3 degrees of freedom

0.2

4 degrees of freedom 6 degrees of freedom

0.1

9 degrees of freedom

0.0 0

1

2

3

4

5

6

7

8

x

FIG. 1.15 χ 2 distribution plot.

Let us denote the observed data samples with Oi, i ¼ 1, …, n, and the expected data with Ei, i ¼ 1, …, n. To see whether the two data samples have the same distribution, we need to compute the following statistics: χ2 ¼

n X ðOi  Ei Þ2 i¼1

Ei

:

We will use Table A11 to determine the p-value for the χ 2 statistics value with (n  1) degrees of freedom.

44

1. Life challenge. Cancer

If we consider a Gaussian sample with the standard deviation σ, and a subsample of this sample, randomly chosen, having as standard deviation s, then we can define the χ 2 statistics as: χ2 ¼

ð n  1Þ  s 2 : σ2

Let us see how we can use the χ 2 test in a practical example. Let us suppose that the mean survival rate in a certain type of cancer is 6 years, with standard deviation of 1 year. We are interested in finding out the following: if we have nine patients that suffer from that type of cancer, what is the probability that the standard deviation of this subsample is less than 0.80? We must compute the χ 2 statistics taking into account that we have n ¼ 9, thus we have 8 degrees of freedom, s ¼ 0.8, and σ ¼ 1: χ2 ¼

8  0:82 ¼ 5:12: 1

If we look at Table A11, on the row 8, for 8 degrees of freedom we see that the p-value is greater than 0.2, which means that the probability of the standard deviation of the subsample to be less than 0.8 years is approximately 20%. If we are interested in the χ 2 goodness of fit test then all we have to do is to split all the possible values of the random variable X into k disjoint classes Ai, i ¼ 1, …, k. We can compute the theoretical probabilities: pi ¼ PðXEAi Þ: Thus, we have the theoretical probabilities and the observed probabilities, fi. In this case, we will use the following statistics, and also Table A11: χ2 ¼

k X ðfi  npi Þ2 i¼1

npi

:

An interesting fact is the false positive paradox. Curiously enough, the false positive paradox appears when a test has a higher chance of giving false positives than that medical condition has a change of occurring. Grosso modo, in means that some tests that seem very reliable do not produce trustworthy results. So, if you are diagnosed positive after an imaging test, that it is said to be 99% accurate, what are the odds that you actually have the disease? If your answer is 99%, you will be very stunned to find out that you are wrong. If the disease is very common, then your chances of having that disease are close to 99%, but if the disease is a rare one, the odds are lower. Let us presume that a certain type of cancer affects 1 in 10,000 people. An imaging test has the following rates: if a person has the disease the test is 99% positive, and if a person does not have the disease the test is 0.1% positive. We are interested in finding what is the probability of a person to have the disease, if the result of the imaging test is positive? How can we find out the result? Fortunately, since 1763, when the work of the English Nonconformist theologian and mathematician Thomas Bayes, “Essay Towards Solving a Problem in the Doctrine of Changes,” had been published posthumously in the Philosophical Transaction of the Royal Society, we have Bayes’ theorem (Bayes and Price, 1763).

1.2 Where are we now?

45

Bayes’ theorem Mathematically speaking, for any event A Ω, where Ω is the measure space, the probability P(A) of the event A satisfies the three axioms of Andrey Kolmogorov. Kolmogorov’s axioms

1. 0  P(A)  1; 2. P(Ω) ¼ 1; 3. for countable any

∞ sequence of mutually exclusive events (An)n, we have ∞ P S An ¼ PðAn Þ. P n¼1

n¼1

An important concept in the probability theory is the conditional probability. The conditional probability is the probability of one event occurring knowing the fact that another event took place. So, if we have two events A and B with P(B) 6¼ 0, then the probability of the event A to occur, taking into account that the event B happened is given by: PðAj BÞ ¼

Pð A \ BÞ : Pð BÞ

Bayes’ formula is given by: Pð A i j BÞ ¼

PðBj Ai Þ∙PðAi Þ , PðBÞ > 0, PðAi Þ > 0, i ¼ 1, …, n: n X PðBj Ai Þ∙PðAi Þ i¼1

n P taking into account the total probability law, PðBÞ ¼ PðBj Ai Þ∙PðAi Þ. i¼1 In practice, we use the following concepts:

• • • •

P(Ai j B) is the posterior probability; P(Ai) is the prior probability; P(Bj Ai) is the likelihood; P(B) is the evidence. In this context, we can say that Bayes’ formula can be “translated” as follows: posterior_probability ¼

likelihood  prior_probability : evidence

Returning to our problem, if we denote with B the fact that the person has cancer, with B the fact that the person does not have cancer, and with T the fact that the test is positive, we have the following probabilities:     PðBÞ ¼ 0:0001; PðCBÞ ¼ 0:9999, P Tj B ¼ 0:99; P Tj B ¼ 0:001 PðBj T Þ ¼

PðTj BÞ  PðBÞ     ¼ 0:09: PðTj BÞ  PðBÞ + P Tj B  P B

The chance that a person who had a positive test result to actually have that rare type of cancer is 9%. Surprisingly, isn’t it?

46

1. Life challenge. Cancer

The statistical analysis we have covered so far should be considered as a handbook. As a data scientist in cancer research we must do this analysis in order to obtain trustworthy results. Statistics was, is, and will be a state-of-the art concept in cancer research. Let us continue discussing cancer research nowadays. Recall, that the last section left us at the beginning of the year 2000, when the sequence of the first chromosome had been produced. On June 26, 2000, the draft version of the Human Genome Project was finished. Both the U.S. President Bill Clinton and the British Prime Minister Tony Blair announced the event—https://web.ornl.gov/sci/techresources/Human_ Genome/project/clinton1.shtml (Accessed May 30, 2019). The genetic blueprinting of human beings has begun, opening the era of genetic medicine. Three years later, on April 14, BBC news and The New York Times announce the completion of the genome—http://news.bbc.co.uk/ 2/hi/science/nature/2940601.stm (Accessed May 29, 2019); The New York Times—https:// www.nytimes.com/2013/04/16/science/the-human-genome-project-then-and-now.html (Accessed May 30, 2019). Year by year data concerning genetics and cancer started pilling up. In 2007 the DNA from 11 breast, and 11 colorectal tumors was isolated and cataloged genetic changes in cancer (Wood et al., 2007). It was a winter morning on December 14, 2005, when The Washington Post announced that NIH (National Institutes of Health—https//www.nih.gov—Accessed May 31, 2019) launched The Cancer Genome Atlas, or TCGA—https://www.washingtonpost.com/wpdyn/content/ article/2005/12/13AR2005121301667 (Accessed May 31, 2019). The approximate cost of the TCGA was estimated at $ 1 billion. TCGA catalogues certain genetic mutations that are considered to be triggering cancer. It uses genome sequencing and bioinformatics—https:// www.cancer.gov/about-nci/organization/ccg/research/structural-genomics/tcga (Accessed May 31, 2019). TCGA started as a 3-year pilot project that had as focus only three types of human cancers: gliobastoma multiforme, ovarian, and lung cancer. In 2008, TCGA research network uploads the first data from 206 glioblastoma (601 genes). The second phase of the TCGA started in 2009, and had as goal the characterization of 20–25 different tumor types. The deadline was 2014. Impressively, the target was surpassed, and 33 cancer types that included 10 rare cancers were described. The TCGA work is still in progress. Besides creating the human genotype, another important part in fighting cancer is the use of diagnostic imaging. Valuable information can be retrieved from pictures that contain body structures, tissues, and organs. Imaging can help find and detect tumors, determine whether there are metastases, and evaluate the effectiveness of the treatment. There are several types of imaging tests that are in use today: • X-rays use invisible electromagnetic energy beams to produce the images. They can be used on any part of the body. • CT scans use a merge of X-rays and computer technology to produce horizontal and axial images of the body. CTs present more details than X-rays. • Mammograms represent the X-ray of the breast. • Ultrasound sonography uses high-frequency sound waves, sonograms, in order to create images of organs, tissues, and blood vessels on a computer. Ultrasound can be used for producing images of the abdomen, liver, kidneys, pancreas, esophagus, anus, or vagina. It cannot be used for the chest due to the fact that the rib cage blocks the sound waves.

1.2 Where are we now?

47

• MRI (magnetic resonance imaging) uses a large magnet along with radiofrequencies, and also a computer to produce the body images. The MRI can be used to examine any body part from the brain to the bones. • PET scan (positron emission tomography) is a nuclear medicine imaging technique. It uses a small amount of radioactive substance, a radioactive tracer that is injected in the body. Using a special camera, the radioactivity in the body can be detected. The PET scan evaluates the use of tagged glucose molecules of an organ/tissue in order to find biochemical changes. These biochemical changes can spot the start of a disease before any anatomical changes can be seen using other imagining techniques. • Nuclear medicine scan takes images of the body after radionuclides are injected. Radionuclides release low-level radiations that make the tumor appear as a “hot spot.” • The EXPLORER is the world’s first medical imaging scanner that can capture 3D scan of the whole body at once in just 20–30 s. Reading a diagnostic image can be done in two ways: the first is the classical method, where a doctor performs the reading, whereas the second uses AI methods to process the image. Which one is better? Which one produces the best results? Each method has its ups and downs. When a doctor reads an image she/he can rely both on her/his experience, and also on her/his sixth sense. The downside is the fact that fatigue or even the subjectivity in a human’s eye can play tricks when diagnosing a tumor. On the other hand, the use of AI in processing medical images increases rapidly. Only in the last decade, the publication rate of AI in radiology has increased by 8 times from 100–150 papers per year to 700–800 papers per year. As imaging modality the most utilized methods are the MRI and CT, followed by ultrasound, radiography, mammography, and PET scans (Pesapane et al., 2018). We shall deepen this subject in the next chapter, when we will explain how the doctor + AI combo sets a diagnostic. Till now we have covered the genetic and imaging approach that is in use today for diagnosing and monitoring cancer. A very interesting approach in diagnosing cancer is a method that might shock most readers, due to its simplicity and cost effectiveness. Whilst being a beautiful, movie like story, this method is accurate and also has opened new possibilities. We are all familiar with the beautiful and smart Border collie dog (Fig. 1.16). Besides being a shepherd, and a wonderful playmate, this workaholic dog is also a great cancer diagnostician. In 2015, Ted, a 2-year-old Border collie began to cry, paw and nuzzle its owner’s chest. Using its super sensitive sense of smell, Ted was able to diagnose a grade three tumor in its owner’s breasts. Fortunately, the tumor was caught in time and removed—http://www.telegraph.co. uk/news/telegraphchristmasappeal/11320715/Rescue-dog-saved-owners-life-sniffing-outagrressive-tumour.html (Accessed June 11, 2019). Bessie, another Border collie hero saved the life of a 2-year-old girl. Bessie changed her behavior around the little girl, fussing and looking concerned. The 2-year-old had been diagnosed with lymphoblastic lymphoma and received proper treatment—https://www.express.co.uk/life-style/health/702999/cancerlymphoblastic-leukaemia-border-collie-dog-Bessie (Accessed June 11, 2019). Cancer cells produce and release different odor marks. All dogs can detect cancer signs in the skin, breath, urine, feces, and sweat. Research has proved that dogs can spot various forms of cancer. For instance, a 75-years old man presented into a dermatology clinic, after his

48

FIG. 1.16

1. Life challenge. Cancer

The author’s dog, Murray the Border Collie. Photo credit: Smaranda Belciug.

female Alsatian dog stubbornly licked an asymptomatic lesion he had behind his right ear. The patient was not aware he had a lesion. His medical history showed that he suffered with eczema as an adolescent and had multiple myeloma diagnosed 4 years earlier. The man received chemotherapy, bortezomib, and radiotherapy. He had no family history that included other skin diseases or skin cancer. The dog discovered that his cancer relapsed. The diagnostic tests confirmed malignant melanoma (Campbell et al., 2013). Another study showed that a specialized trained dog, a 3-year-old crossbreed between a Labrador retriever and a Pitbull, was able to diagnose lung cancer from the breath of patients. The dog sniffed 390 samples of exhaled gas that were collected from 85 patients with lung cancer, 11 patients that did not suffer from lung cancer, and 17 healthy persons. This procedure was applied 785 times. The results are fantastic: the dog diagnosed lung cancer with a sensitivity of 95%, specificity 98%, positive predictive value 95%, and negative predictive value 98%. The area under the receiver-operating characteristics curve was 0.971 (Guirao Montes et al., 2017). The numbers are impressive, but what do they actually express in terms of statistics? First of all, when we refer to cancer we usually have a two-class decision problem: malignant versus benign, or, recurrent versus non-recurrent events. Obviously, in case of malignancy, the decision problem can be extended to multiple classes referring the grade and stage of the disease. When making a decision, whether is made using our human brain, an AI algorithm, or a dog’s ability to sniff, we are interested in its precision, the accuracy per se. We have discussed above about false positive and false negative, now it is time to introduce other concepts such as true positive, false positive, true negative, and false negative.

49

1.2 Where are we now?

A nice explanation of these concepts is a well-known fable that was transformed into a social network meme: the boy who cried wolf by Aesop. Long story short a boy being bored while watching the village’s flock, decided to play a prank on the villagers. So he cried: “Wolf! Wolf,” even if there was no wolf around. Obviously, the villagers ran to protect the boy and their flock, but when they got to the pasture, they immediately got mad when they realized that the joke was on them. The story repeated itself for a number n of times. Until one dark night, a wolf did come near the boy and his flock of sheep. The boy desperately shouted: “Wolf! Wolf!.” But no one came. The villagers refused to be fooled once more. The wolf ate the flock. The villagers had nothing more to eat (Figs. 1.17–1.20). Now, let’s turn this bedside story into a statistic lesson: True positive (TP) Real fact: a wolf is on site Boy shouts: Wolf Output: boy is correct

FIG. 1.17 True positive. False negative (FN) Real fact: a wolf is on site Boy doesn’t shout: Wolf. Output: the wolf destroys the flock

FIG. 1.19 False negative.

False positive (FP) Real fact: no wolf is on site Boy shouts: Wolf Output: boy plays prank; villagers get upset.

FIG. 1.18

False positive.

True negative (TN) Real fact: no wolf on site Boy doesn’t shout: Wolf. Output: everything is ok.

FIG. 1.20

True negative.

Let us denote the real appearance of the wolf as class A, and the joke class B. Below we shall present the concept of the confusion matrix that depicts all the statistical terms mentioned above:

50

1. Life challenge. Cancer

Thus, a true positive means that the classifier has predicted correctly the positive class (Wolf); a true negative means that the classifier has predicted correctly the negative class; a false positive means that the classifier has incorrectly predicted the positive class; a false negative means that the classifier predicted incorrectly the negative class. The accuracy of a classifier is computed using the following formula: accuracy ¼

TP + TN : TP + TN + FP + FN

The values are tabulated in a confusion matrix: We can see that by adding the numbers from the first diagonal we obtain 21 + 14 ¼ 35 correctly diagnosed cases. The confusion matrix can be plotted as a heat map for a better understanding of the process. Fig. 1.21 presents the heat map for the confusion matrix in Table 1.10. 20

benign

21

16

0

Actual

12

malignant

8 7

14 4

0

malignant

benign Predicted

FIG. 1.21

Confusion matrix heat map.

TABLE 1.10 Confusion matrix example. Classification

Predicted class Class ¼ benign

Actual class Class ¼ benign Class ¼ malignant

Class ¼ malignant

21

0

7

14

The confusion matrix can be used when having multiple classes also. Let’s take an example where we have a set of patients with differential diagnoses depending on the stage of the cancer (0, 1, 2, 3, and 4). The heat map for this case with five decision classes is depicted in Fig. 1.22. Returning to our example with the diagnosis set by the Labrador retriever Pitbull mix (Guirao Montes et al., 2017), we mentioned four concepts: sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV). Sensitivity is the proportion of true positives that are correctly identified by the classifier; specificity is the proportion of true negatives that are correctly identified by the classifier.

51

1.2 Where are we now?

0

2

0

0

0

1

3

31

0

0

0

2

0

4

41

0

0

3

40 33

0

1

0

30

6

32

24

16

4

8 0

0

0

0

38

0

1

2

3

4

0

FIG. 1.22 Confusion matrix heat map example for five decision classes.

sensitivity ¼

TP TP + FN

specificity ¼

TN TN + FP

At first sight we might ask ourselves why is there a need for the other two concepts? If you analyze carefully these calculations, you will see that we do not have the whole picture of the diagnosing process. We need to know the proportion of patients that were diagnosed with cancer that truly had cancer, or the proportion of patients that were diagnosed without cancer that were truly cancer free. These issues are of huge importance. The probability of a classifier to give the correct diagnosis, positive or negative, is not given by sensitivity, or specificity. It is given by the other two concepts PPV and NPV. PPV is the proportion of patients that received positive test result and indeed have cancer; NPV is the proportion of patients that received negative test result and indeed they were cancer free. PPV ¼

TP TP + FP

NPV ¼

TN TN + FN

Most scientists in cancer research asses the prognosis results by using receiver operating characteristic curves (ROC). More on the history of the ROC curves can be found in Belciug and Gorunescu (2020). For plotting and understanding a ROC curve, one needs to know two more statistical parameters: false positive rate (FP rate), and false negative rate (FN rate). The math formulae for these statistical parameters are: FP rate ¼

FP ¼ 1  specificity FP + TN

FN rate ¼

FN ¼ 1  sensitivity TP + FN

52

1. Life challenge. Cancer

The ROC curve is a two-dimensional graph that has on its X-axis the FP rate, and on its Y-axis the TP rate. If the two decision classes were perfectly classified (which is impossible) we would have a ROC curve like in Fig. 1.23. Receiver Operating Characteristic (ROC) Curve 1.0

True Positive Rate

0.8 0.6 0.4 0.2 ROC

0.0 0.0

0.2

0.4

0.6

0.8

1.0

False Positive Rate

FIG.1.23

A ROC curve example in which the test predicted perfectly both malignant and benign tumors.

If the two decision classes were totally wrong classified (fortunately this is impossible also), we would have a ROC curve that would look like the one in Fig. 1.24. Receiver Operating Characteristic (ROC) Curve 1.0

ROC

True Positive Rate

0.8 0.6 0.4 0.2 0.0 0.0

0.2

0.4

0.6

0.8

1.0

False Positive Rate

FIG. 1.24

A ROC curve example in which the test classified incorrectly all the malignant and benign tumors.

The most commonly encountered ROC curve in real life is like the one presented in Fig. 1.25. We can see that in the legend there is a parameter named area that equals 0.79. The area under the ROC curve (AUC) is the translation of the ROC curve into a number. For more information regarding the AUC please see Fawcett (2006) and (Hanley and McNeil (1982). Below, we present a guide for interpreting the AUC:

53

1.2 Where are we now?

Receiver operating characteristic (ROC) curve 1.0

ROC curve (area = 0.79)

True Positive Rate

0.8 0.6 0.4 0.2 0.0 0.0

0.2

0.4

0.6

0.8

1.0

False Positive Rate

FIG. 1.25

Example of a ROC curve in which the test performed a fair classification of malignant versus benign

tumors.

• • • • •

0.90–1.00  excellent prediction 0.80–0.90  good classification 0.70–0.80  fair classification 0.60–0.70  poor classification 0.50–0.60  failure

We can see that in our example the classification is fair, approximately good. ROC curves should be used when we are dealing with balanced datasets, meaning having the same number of samples for both decision classes. If we have imbalanced datasets, then we should use the Precision-Recall (PR) curves. Precision is computed as the ratio of the true positives divided by the sum of true positives and false positives. Precision is just another name for the PPV. Precision ¼ PPV ¼

TP : TP + FP

Recall is the ratio of the number of true positives divided by the sum of true positives and false negatives, thus making the Recall another name for sensitivity. Recall ¼ Sensitivity: Both terms are connected to the F score or F1 score, which is computed as the harmonic mean of the precision and recall. F ¼ F1 ¼ 2

PR : P+R

Another concept related to PR is the average precision AP, which summarizes the curve as a weighted mean of precisions achieved at each threshold. The formula for the AP is: X AP ¼ ðRi  Ri1 ÞPi , i

where Pi and Ri are the PR at the ith threshold. We call the (Pi, Ri) an operating point.

54

1. Life challenge. Cancer

Besides the F score and the average precision, we also have the AUC of the PR curve. There is an ongoing debate regarding the two measures. Some say that the ROC curves are too “optimistic” or even “deceptive” and “leading to incorrect interpretations” of the models’ performance for imbalanced datasets, and therefore should not be used in those cases. For more details we refer the reader to Saito and Rehmsmeier (2015) and Davis and Goadrich (2006). Let us see on a concrete example how the two curves work on an imbalanced dataset. We have generated a two-class decision problem where there the first class has 2700 data samples and the second has 300 data samples. The ROC curve and PR curve along with the AUC, F1, and AP are presented in Figs. 1.26 and 1.27. We can see that the AUC for the ROC curve is 0.830 implying a good classification, whereas the AUC for the PR AUC is 0.550 implying failure. We hope that this example provides a good demonstration of why you should use PR curves instead of ROC curves when dealing with imbalanced datasets. AUC: 0.830 1.0 0.8 0.6 0.4 0.2 0.0 0.0

FIG. 1.26

0.2

0.4

0.6

0.8

1.0

Example of a ROC curve on an imbalanced dataset.

f1=0.478

auc=0.555

ap=0.555

1.0

0.8

0.6

0.4

0.2

0.0

FIG. 1.27

0.2

0.4

Example of a PR curve on an imbalanced dataset.

0.6

0.8

1.0

55

1.2 Where are we now?

What we have covered so far regards cancer diagnosis. In terms of treating cancer, new therapies are discovered each year. Each person being unique implies that her/his cancer is also unique, and the treatment’s response is different for each cancer patient. This is why tailored treatment is so important. All these methods of diagnosing, or treating cancer produce an enormous amount of data, data that cannot be processed by humans. This is where AI methods step in. Analyzing data that is gathered from millions of patients builds up a “super” doctor that has the experience of thousands of other doctors. It analyzes data objectively, no fatigue, no bias, nothing can interfere. It has only one great disadvantage: the human flair, the sixth sense. That is why there should be a doctor + AI combo. For the young inexperienced doctor, AI can provide a compass, just like having your attending watching over you. For the experienced doctor, AI can be seen as a check-up on her/his diagnosis. In the chapters that follow we shall discuss more about diagnosis and all sorts of treatments and how does AI help the doctor choose the best one for her/his patient. As a short recap of what we have discussed so far: today we have genetics, Imagistics, AI methods, and even dogs that help us diagnose cancer. Also, we have shown how all these techniques should be statistically analyzed. Thus being said, let’s look toward the future and try to predict how treating and diagnosing cancer will look like in the future.

How to read statistical tables You have seen throughout this chapter that we need to find out what the p-value is for different statistical tests in order to test our hypothesis. There are different sites that automatically compute the p-value, or that offer the statistical tables—e.g. https://home.ubalt. edu/ntsbarsh/Business-stat/StatisticalTables.pdf (Accessed July 2, 2019), but not everything that can be found online is well structured and easy to be read, or worse, sometimes the information is not correct. In what follows, we are going to present the statistical tables and we shall provide some guidance in how they should be read, so that the p-value can be found. If the reader is interested in more detailed tables we recommend (Lentner, 1982). We shall begin with the Kolmogorov Smirnov test Table A1. If the computed test statistic is greater than the critical value, then the null hypothesis will be rejected. TABLE A1 n

Kolmogorov Smirnov test table.

0.001

1

0.01

0.02

0.05

0.1

0.15

0.2

0.99500

0.99000

0.97500

0.95000

0.92500

0.90000

2

0.97764

0.92930

0.90000

0.84189

0.77639

0.72614

0.68377

3

0.92063

0.82900

0.78456

0.70760

0.63604

0.59582

0.56481

4

0.85046

0.73421

0.68887

0.62394

0.59582

0.52476

0.49265

5

0.78137

0.66855

0.62718

0.56327

0.50945

0.47439

0.44697

6

0.72479

0.61660

0.57741

0.51926

0.46799

0.43526

0.41035

7

0.67930

0.57580

0.53844

0.48343

0.43607

0.40497

0.38145 Continued

56

1. Life challenge. Cancer

TABLE A1

Kolmogorov Smirnov test table—cont’d

n

0.001

0.01

0.02

0.05

0.1

0.15

0.2

8

0.64098

0.54180

0.50654

0.45427

0.40962

0.38062

0.35828

9

0.60846

0.51330

0.47960

0.43001

0.38746

0.36006

0.33907

10

0.58042

0.48895

0.45662

0.40925

0.36866

0.34250

0.32257

11

0.55588

0.46770

0.43670

0.39122

0.35242

0.32734

0.30826

12

0.53422

0.44905

0.41918

0.37543

0.33815

0.31408

0.29573

13

0.51490

0.43246

0.40362

0.36143

0.32548

0.30233

0.28466

14

0.49753

0.41760

0.38970

0.34890

0.31417

0.29181

0.27477

15

0.48182

0.40420

0.37713

0.33760

0.30397

0.28233

0.26585

16

0.46750

0.39200

0.36571

0.32733

0.29471

0.27372

0.25774

17

0.45440

0.38085

0.35528

0.31796

0.28627

0.26587

0.25035

18

0.44234

0.37063

0.34569

0.30936

0.27851

0.25867

0.24356

19

0.43119

0.36116

0.33685

0.30142

0.27135

0.25202

0.23731

20

0.42085

0.35240

0.32866

0.29407

0.26473

0.24587

0.23152

25

0.37843

0.32656

0.30349

0.26404

0.23767

0.22074

0.20786

30

0.34672

0.28988

0.27704

0.24170

0.21756

0.20207

0.19029

35

0.32187

0.26898

0.25649

0.22424

0.20184

0.18748

0.17655

40

0.30169

0.25188

0.23993

0.21017

0.18939

0.17610

0.16601

45

0.28482

0.23780

0.22621

0.19842

0.17881

0.16626

0.15673

50

0.27051

0.22585

0.21460

0.18845

0.16982

0.15790

0.14886

>50

1:94947 pffiffiffi n

1:62762 pffiffiffi n

1:51743 pffiffiffi n

1:35810 pffiffiffi n

1:22385 pffiffiffi n

1:13795 pffiffiffi n

1:07275 pffiffiffi n

In Table A2, we present the critical values for the Lilliefors test for normality. TABLE A2

Critical values for the Lilliefors test for normality.

n jp

0.20

0.15

0.10

0.05

0.01

4

0.300

0.319

0.352

0.381

0.417

5

0.285

0.299

0.315

0.337

0.405

6

0.265

0.277

0.294

0.319

0.364

7

0.247

0.258

0.276

0.300

0.348

8

0.233

0.244

0.261

0.285

0.331

9

0.223

0.233

0.249

0.271

0.311

57

1.2 Where are we now?

TABLE A2

Critical values for the Lilliefors test for normality—cont’d

njp

0.20

0.15

0.10

0.05

0.01

10

0.215

0.224

0.239

0.258

0.294

11

0.206

0.217

0.230

0.249

0.284

12

0.199

0.212

0.223

0.242

0.275

13

0.190

0.202

0.214

0.234

0.268

14

0.183

0.194

0.207

0.227

0.261

15

0.177

0.187

0.201

0.220

0.257

16

0.173

0.182

0.195

0.213

0.250

17

0.169

0.177

0.189

0.206

0.245

18

0.166

0.173

0.184

0.200

0.239

19

0.163

0.169

0.179

0.195

0.235

20

0.160

0.166

0.174

0.190

0.213

25

0.142

0.147

0.158

0.173

0.200

30

0.131

0.136

0.144

0.161

0.187

>30

pffiffiffi 0:736 n

pffiffiffi 0:768 n

pffiffiffi 0:805 n

pffiffiffi 0:886 n

pffiffiffi 1:031 n

The next table, Table A3, contains the tabulated coefficients ai for the Shapiro Wilk test. TABLE A3

Shapiro Wilk tabulated coefficients ai.

n

2

3

4

5

6

7

8

9

10

11

12

13

14

a1

0.7071

0.7071

0.6872

0.6646

0.6431

0.6233

0.6052

0.5888

0.5739

0.5601

0.5475

0.5359

0.5251

0.1677

0.2413

0.2805

0.3031

0.3164

0.3244

0.3291

0.3315

0.3325

0.3325

0.3318

0.0875

0.1401

0.1743

0.1976

0.2141

0.2260

0.2347

0.2412

0.2460

0.0561

0.0947

0.1224

0.1429

0.1585

0.1707

0.1802

0.0399

0.0695

0.0922

0.1099

0.1240

0.0803

0.0539

0.0727

a2 a3 a4 a5 a6 a7

0.0240

n

15

16

17

18

19

20

21

22

23

24

25

26

a1

0.5150

0.5056

0.4968

04886

0.4808

0.4734

0.4643

0.4590

0.4542

0.4493

0.4450

0.4407

a2

0.3306

0.3290

0.3273

0.3253

0.3232

0.3211

0.3185

0.3156

0.3126

0.3098

0.3069

0.3043

a3

0.2495

0.2521

0.2540

0.2553

0.2561

0.2565

0.2578

0.2571

0.2563

0.2554

0.2543

0.2533

a4

0.1878

0.1939

0.1988

0.2027

0.2059

0.2085

0.2119

0.2131

0.2139

0.2145

0.2148

0.2151

a5

0.1353

0.1447

0.1524

0.1587

0.1641

0.1686

0.1736

0.1764

0.1787

0.1807

0.1822

0.1836 Continued

58

1. Life challenge. Cancer

TABLE A3

Shapiro Wilk tabulated coefficients ai—cont’d

n

15

16

17

18

19

20

21

22

23

24

25

26

a6

0.0880

0.1005

0.1109

0.1197

0.1271

0.1334

0.1399

0.1443

0.1480

0.1512

0.1539

0.1563

a7

0.0433

0.0593

0.0725

0.0837

0.0932

0.1013

0.1092

0.1150

0.1201

0.1245

0.1283

0.1316

0.0196

0.0359

0.0496

0.0612

0.0711

0.0804

0.0878

0.0941

0.0997

0.1046

0.1089

0.0163

0.0303

0.0422

0.0530

0.0618

0.0696

0.0764

0.0823

0.876

0.0140

0.0263

0.0368

0.0459

0.5939

0.0610

0.0672

0.0122

0.0228

0.0321

0.0403

0.0476

0.0000

0.0107

0.0200

0.0284

0.0000

0.0094

a8 a9 a10 a11 a12 a13 n

27

28

29

30

31

32

33

34

35

36

37

38

a1

0.4366

0.4328

0.4291

0.4254

0.4220

0.4188

0.4156

0.4127

0.4096

0.4068

0.4040

0.4015

a2

0.3018

0.2992

0.2968

0.2944

0.2921

0.2898

0.2876

0.2854

0.2834

0.2813

0.2794

0.2774

a3

0.2522

0.2510

0.2499

0.2487

0.2475

0.2463

0.2451

0.2439

0.2427

0.2415

0.2403

0.2391

a4

0.2152

0.2151

0.2150

0.2148

0.2145

0.2141

0.2137

0.2132

0.2127

0.2121

0.2116

0.2110

a5

0.1848

0.1857

0.1864

0.1870

0.1874

0.1878

0.1880

0.1882

0.1883

0.1883

0.1883

0.1881

a6

0.1584

0.1601

0.1616

0.1630

0.1641

0.1651

0.1660

0.1667

0.1673

0.1678

0.1683

0.1686

a7

0.1346

0.1372

0.1395

0.1415

0.1433

0.1449

0.1463

0.1475

0.1487

0.1496

0.1505

0.1513

a8

0.1128

0.1162

0.1192

0.1219

0.1243

0.1265

0.1284

0.1301

0.1317

0.1331

0.1344

0.1356

a9

0.0923

0.0965

0.1002

0.1036

0.1066

0.1093

0.1118

0.1140

0.1160

0.1179

0.1196

0.1211

a10

0.0728

0.0778

0.0822

0.0862

0.0899

0.0931

0.0961

0.0988

0.1013

0.1036

0.1056

0.1075

a11

0.0540

0.0598

0.0650

0.697

0.0739

0.0777

0.0812

0.0844

0.0873

0.0900

0.0924

0.0947

a12

0.0358

0.0424

0.0483

0.0537

0.0585

0.0629

0.0669

0.0706

0.0739

0.0770

0.0798

0.0824

a13

0.0178

0.0253

0.0320

0.0381

0.0435

0.0485

0.0530

0.0572

0.0610

0.0645

0.0677

0.0706

a14

0.0000

0.0084

0.0159

0.0227

0.0289

0.0344

0.0395

0.0441

0.0484

0.0523

0.0559

0.0592

0.0000

0.0076

0.0144

0.0206

0.0262

0.0314

0.0361

0.0404

0.0444

0.0481

0.0000

0.0068

0.0131

0.0187

0.0239

0.0287

0.0331

0.0372

0.0000

0.0062

0.0119

0.0172

0.0220

0.0264

0.0000

0.0057

0.0110

0.0158

0.0000

0.0053

a15 a16 a17 a18 a19 n

39

40

41

42

43

44

45

46

47

48

49

50

a1

0.3989

0.3964

0.3940

0.3917

0.3894

0.3872

0.3850

0.3830

0.3808

0.3789

0.3770

0.3751

a2

0.2755

0.2737

0.2719

0.2701

0.2684

0.2667

0.2651

0.2635

0.2620

0.2604

0.2589

0.2574

a3

0.2380

0.2368

0.2357

0.2345

0.2334

0.2323

0.2313

0.2302

0.2291

0.2281

0.2271

0.2260

59

1.2 Where are we now?

TABLE A3

Shapiro Wilk tabulated coefficients ai—cont’d

n

39

40

41

42

43

44

45

46

47

48

49

50

a4

0.2104

0.2098

0.2091

0.2085

0.2078

0.2072

0.2065

0.2058

0.2052

0.2045

0.2038

0.2032

a5

0.1880

0.1878

0.1876

0.1874

0.1871

0.1868

0.1865

0.1862

0.1859

0.1855

0.1851

0.1847

a6

0.1689

0.1691

0.1693

0.1694

0.1695

0.1695

0.1695

0.1695

0.1695

0.1693

0.1692

0.1691

a7

0.1520

0.1526

0.1531

0.1535

0.1539

0.1542

0.1545

0.1548

0.1550

0.1551

0.1553

0.1554

a8

0.1366

0.1376

0.1384

0.1392

0.1398

0.1405

0.1410

0.1415

0.1420

0.1423

0.1427

0.1430

a9

0.1225

0.1237

0.1249

0.1259

0.1269

0.1278

0.1286

0.1293

0.1300

0.1306

0.1312

0.1317

a10

0.1092

0.1108

0.1123

0.1136

0.1149

0.1160

0.1170

0.1180

0.1189

0.1197

0.1205

0.1212

a11

0.0967

0.0986

0.1003

0.1020

0.1035

0.1049

0.1062

0.1073

0.1085

0.1095

0.1105

0.1113

a12

0.0848

0.0870

0.0891

0.0909

0.0927

0.0943

0.0959

0.0972

0.0986

0.0998

0.1010

0.1020

a13

0.0733

0.0759

0.0782

0.0804

0.0824

0.0842

0.0860

0.0876

0.0892

0.0906

0.0919

0.0932

a14

0.0622

0.0651

0.0677

0.0701

0.0724

0.0745

0.0765

0.0783

0.0801

0.0817

0.0832

0.0846

a15

0.0515

0.0546

0.0575

0.0602

0.0628

0.0651

0.0673

0.0694

0.0713

0.0731

0.0748

0.0764

a16

0.0409

0.0444

0.0476

0.0506

0.0534

0.0560

0.0584

0.0607

0.0628

0.0648

0.0667

0.0685

a17

0.0305

0.0343

0.0379

0.0411

0.0442

0.0471

0.0497

0.0522

0.0546

0.0568

0.0588

0.0608

a18

0.0203

0.0244

0.0283

0.0318

0.0352

0.0383

0.0412

0.0439

0.0465

0.0489

0.0511

0.0532

a19

0.0101

0.0146

0.0188

0.0227

0.0263

0.0296

0.0328

0.0357

0.0385

0.0411

0.0436

0.0459

a20

0.0000

0.0049

0.0094

0.0136

0.0175

0.0211

0.0245

0.0277

0.0307

0.0335

0.0361

0.0386

0.0000

0.0045

0.0087

0.0126

0.0163

0.0197

0.0229

0.0259

0.288

0.0314

0.0000

0.0042

0.0081

0.0118

0.0153

0.0185

0.0215

0.244

0.0000

0.0039

0.0076

0.0111

0.0143

0.0174

0.0000

0.0037

0.0071

0.0104

0.000

0.0037

a21 a22 a23 a24 a25

Table A4 presents the p-value for the W statistic computed using the Shapiro Wilk test. TABLE A4

p-Value for the Shapiro Wilk test.

njp

0.01

0.02

0.05

0.1

0.5

0.9

0.95

0.98

0.99

3

0.753

0.756

0.767

0.789

0.959

0.998

0.999

1.000

1.000

4

0.687

0.707

0.748

0.792

0.935

0.987

0.992

0.996

0.997

5

0.686

0.715

0.762

0.806

0.927

0.979

0.986

0.991

0.993

6

0.713

0.743

0.788

0.826

0.927

0.974

0.981

0.986

0.989 Continued

60

1. Life challenge. Cancer

TABLE A4

p-Value for the Shapiro Wilk test—cont’d

n jp

0.01

0.02

0.05

0.1

0.5

0.9

0.95

0.98

0.99

7

0.730

0.760

0.803

0.838

0.928

0.972

0.979

0.985

0.988

8

0.749

0.778

0.818

0.851

0.932

0.972

0.978

0.984

0.987

9

0.764

0.791

0.829

0.859

0.935

0.972

0.978

0.983

0.986

10

0.781

0.806

0.842

0.869

0.938

0.972

0.978

0.983

0.986

11

0.792

0.817

0.850

0.876

0.940

0.973

0.979

0.984

0.986

12

0.805

0.828

0.859

0.883

0.943

0.973

0.979

0.984

0.986

13

0.814

0.837

0.866

0.889

0.945

0.974

0.979

0.984

0.986

14

0.825

0.846

0.874

0.895

0.947

0.975

0.980

0.984

0.986

15

0.835

0.855

0.881

0.901

0.950

0.975

0.980

0.984

0.987

16

0.844

0.863

0.887

0.906

0.952

0.976

0.981

0.985

0.987

17

0.851

0.869

0.892

0.910

0.954

0.977

0.981

0.985

0.987

18

0.858

0.874

0.897

0.914

0.956

0.978

0.982

0.986

0.988

19

0.863

0.879

0.901

0.917

0.957

0.978

0.982

0.986

0.988

20

0.868

0.884

0.905

0.920

0.959

0.979

0.983

0.987

0.989

21

0.873

0.888

0.908

0.923

0.960

0.980

0.983

0.987

0.989

22

0.878

0.892

0.911

0.926

0.960

0.980

0.983

0.987

0.989

23

0.881

0.895

0.914

0.928

0.962

0.981

0.984

0.987

0.989

24

0.884

0.898

0.916

0.930

0.963

0.981

0.984

0.988

0.989

25

0.888

0.901

0.918

0.931

0.964

0.981

0.985

0.988

0.989

26

0.891

0.904

0.920

0.933

0.965

0.982

0.985

0.988

0.989

27

0.894

0.906

0.923

0.935

0.965

0.982

0.985

0.988

0.990

28

0.896

0.908

0.924

0.936

0.966

0.982

0.985

0.988

0.990

29

0.898

0.910

0.926

0.937

0.966

0.982

0.985

0.988

0.990

30

0.900

0.912

0.927

0.939

0.967

0.983

0.985

0.988

0.990

31

0.902

0.914

0.929

0.940

0.967

0.983

0.986

0.988

0.990

32

0.904

0.915

0.030

0.941

0.968

0.983

0.986

0.988

0.990

33

0.906

0.917

0.931

0.942

0.968

0.983

0.986

0.989

0.990

34

0.908

0.919

0.933

0.943

0.969

0.983

0.986

0.989

0.990

35

0.910

0.920

0.934

0.944

0.969

0.984

0.986

0.989

0.990

36

0.912

0.922

0.935

0.945

0.970

0.984

0.986

0.989

0.990

37

0.914

0.924

0.936

0.946

0.970

0.984

0.987

0.989

0.990

61

1.2 Where are we now?

TABLE A4

p-Value for the Shapiro Wilk test—cont’d

njp

0.01

0.02

0.05

0.1

0.5

0.9

0.95

0.98

0.99

38

0.916

0.925

0.938

0.947

0.971

0.984

0.987

0.989

0.990

39

0.917

0.927

0.939

0.948

0.971

0.984

0.987

0.989

0.991

40

0.919

0.928

0.940

0.949

0.972

0.985

0.987

0.989

0.991

41

0.920

0.929

0.941

0.950

0.972

0.985

0.987

0.989

0.991

42

0.922

0.930

0.942

0.950

0.972

0.985

0.987

0.989

0.991

43

0.923

0.932

0.943

0.951

0.973

0.985

0.987

0.990

0.991

44

0.924

0.933

0.944

0.952

0.973

0.985

0.988

0.990

0.991

45

0.926

0.934

0.945

0.953

0.974

0.985

0.988

0.990

0.991

46

0.927

0.935

0.945

0.953

0.974

0.985

0.988

0.990

0.991

47

0.928

0.936

0.946

0.954

0.974

0.985

0.988

0.990

0.991

48

0.929

0.937

0.947

0.954

0.974

0.985

0.988

0.990

0.991

49

0.929

0.938

0.947

0.955

0.974

0.985

0.988

0.990

0.991

50

0.930

9.939

0.947

0.955

0.974

0.985

0.988

0.990

0.991

Table A5 presents the t distribution. For finding the p-value, one must look at the given number of degrees of freedom and on that row to look through the columns for the obtained t statistic value. Example: For the value of the test statistic t ¼ 2.445 with 32 degrees of freedom we have p < 0.02. TABLE A5

p-Value for the t distribution. Two-tailed probability (p)

Degrees of freedom

0.2

0.1

0.05

0.02

0.01

0.001

1

3.078

6.314

12.706

31.821

63.657

636.619

2

1.886

2.920

4.303

6.965

9.925

31.599

3

1.638

2.353

3.182

4.541

5.841

12.924

4

1.533

2.132

2.776

3.747

4.604

8.610

5

1.476

2.015

2.571

3.365

4.032

6.869

6

1.440

1.943

2.447

3.143

3.707

5.959

7

1.415

1.895

2.365

2.998

3.499

5.408

8

1.397

1.860

2.306

2.896

3.355

5.041

9

1.383

1.833

2.262

2.821

3.250

4.781 Continued

62 TABLE A5

1. Life challenge. Cancer

p-Value for the t distribution—cont’d Two-tailed probability (p)

Degrees of freedom

0.2

0.1

0.05

0.02

0.01

0.001

10

1.372

1.812

2.228

2.764

3.169

4.587

11

1.363

1.796

2.201

2.718

3.106

4.437

12

1.356

1.782

2.179

2.681

3.055

4.318

13

1.350

1.771

2.160

2.650

3.012

4.221

14

1.345

1.761

2.145

2.624

2.977

4.140

15

1.341

1.753

2.131

2.602

2.947

4.073

16

1.337

1.746

2.120

2.583

2.921

4.015

17

1.333

1.740

2.110

2.567

2.898

3.965

18

1.330

1.734

2.101

2.552

2.878

3.922

19

1.328

1.729

2.093

2.539

2.861

3.883

20

1.325

1.725

2.086

2.528

2.845

3.850

21

1.323

1.721

2.080

2.518

2.831

3.819

22

1.323

1.717

2.074

2.508

2.819

3.792

23

1.319

1.714

2.069

2.500

2.807

3.768

24

1.318

1.711

2.064

2.492

2.797

3.745

25

1.316

1.708

2.060

2.485

2.787

3.725

26

1.315

1.706

2.056

2.479

2.779

3.707

27

1.314

1.703

2.052

2.473

2.771

3.690

28

1.313

1.701

2.048

2.467

2.763

3.674

29

1.311

1.699

2.045

2.462

2.756

3.659

30

1.310

1.697

2.042

2.457

2.750

3.646

31

1.309

1.696

2.040

2.453

2.744

3.633

32

1.309

1.694

2.037

2.449

2.738

3.622

33

1.308

1.692

2.035

2.445

2.733

3.611

34

1.307

1.691

2.032

2.441

2.728

3.601

35

1.306

1.690

2.030

2.438

2.724

3.591

36

1.306

1.688

2.028

2.434

2.719

3.582

37

1.305

1.687

2.026

2.431

2.715

3.574

38

1.304

1.686

2.024

2.429

2.712

3.566

39

1.304

1.685

2.023

2.426

2.708

3.558

63

1.2 Where are we now?

TABLE A5

p-Value for the t distribution—cont’d Two-tailed probability (p)

Degrees of freedom

0.2

0.1

0.05

0.02

0.01

0.001

40

1.303

1.684

2.021

2.423

2.704

3.551

41

1.303

1.683

2.020

2.421

2.701

3.544

42

1.302

1.682

2.018

2.418

2.698

3.538

43

1.302

1.681

2.017

2.416

2.695

3.532

44

1.301

1.680

2.015

2.414

2.692

3.526

45

1.301

1.679

2.014

2.412

2.690

3.520

46

1.300

1.679

2.013

2.410

2.687

3.515

47

1.300

1.678

2.012

2.408

2.685

3.510

48

1.299

1.677

2.011

2.407

2.682

3.505

49

1.299

1.677

2.010

2.403

2.680

3.500

50

1.299

1.676

2.009

2.403

2.678

3.496

51

1.298

1.675

2.008

2.402

2.676

3.492

52

1.298

1.675

2.007

2.400

2.674

3.488

53

1.297

1.674

2.006

2.399

2.672

3.484

54

1.297

1.674

2.005

2.397

2.670

3.480

55

1.297

1.673

2.004

2.396

2.668

2.476

56

1.297

1.673

2.003

2.395

2.667

3.473

57

1.297

1.672

2.002

2.394

2.665

3.470

58

1.296

1.672

2.002

2.392

2.663

3.466

59

1.296

1.671

2.001

2.391

2.662

3.463

60

1.296

1.671

2.000

2.390

2.660

2.460

70

1.294

1.667

1.994

2.381

2.648

3.435

80

1.292

1.667

1.994

2.381

2.648

3.435

90

1.292

1.664

1.990

2.374

2.639

3.416

100

1.290

1.660

1.984

2.364

2.626

3.390

110

1.289

1.659

1.982

2.361

2.621

3.381

120

1.289

1.658

1.980

2.358

2.617

3.373

130

1.288

1.657

1.978

2.355

2.614

3.367

140

1.288

1.656

1.977

2.353

2.611

3.361

150

1.287

1.655

1.976

2.351

2.609

3.357

64

1. Life challenge. Cancer

In the next table, Table A6, that we will present is the two-tailed z table. The values that can be found in Table A6 are the proportions of the Gaussian distribution outside the range  z. To find the p-value that corresponds to a z statistic, we need to look into the table for that specific z value and read the p-value that is associated to it. Example: If the computed z statistic is 1.34, then the p-value is 0.1802. TABLE A6

Two-tailed z-test.

z

p

z

p

z

p

z

p

0.00

1.0000

0.78

0.4354

1.56

0.1188

2.34

0.0193

0.01

0.9920

0.79

0.4295

1.57

0.1164

2.35

0.0188

0.02

0.9840

0.80

0.4237

1.58

0.1141

2.36

0.0183

0.03

0.9761

0.81

0.4179

1.59

0.1118

2.37

0.0178

0.04

0.9681

0.82

0.4122

1.60

0.1096

2.38

0.0173

0.05

0.9601

0.83

0.4065

1.61

0.1074

2.39

0.0168

0.06

0.9522

0.84

0.4009

1.62

0.1052

2.40

0.0164

0.07

0.9442

0.85

0.2953

1.63

0.1031

2.41

0.0160

0.08

0.9362

0.86

0.3898

1.64

0.1010

2.42

0.0155

0.09

0.9283

0.87

0.3843

1.65

0.0989

2.43

0.0151

0.10

0.9203

0.88

0.3789

1.66

0.0969

2.44

0.0147

0.11

0.9124

0.89

0.3735

1.67

0.0949

2.45

0.0143

0.12

0.9045

0.90

0.3681

1.68

0.0930

2.46

0.0139

0.13

0.8966

0.91

0.3628

1.69

0.0910

2.47

0.0135

0.14

0.8887

0.92

0.3576

1.70

0.0891

2.48

0.0131

0.15

0.8808

0.93

0.3524

1.71

0.0873

2.49

0.0128

0.16

0.8729

0.94

0.3472

1.72

0.0854

2.50

0.0124

0.17

0.8650

0.95

0.3421

1.73

0.0836

2.51

0.0121

0.18

0.8572

0.96

0.3371

1.74

0.0819

2.52

0.0117

0.19

0.8493

0.97

0.3320

1.75

0.0801

2.53

0.0114

0.20

0.8415

0.98

0.3271

1.76

0.0784

2.54

0.0111

0.21

0.8337

0.99

0.3222

1.77

0.0767

2.55

0.0108

0.22

0.8529

1.00

0.3173

1.78

0.0751

2.56

0.0105

0.23

0.8181

1.01

0.3125

1.79

0.0735

2.57

0.0102

0.24

0.8103

1.02

0.3077

1.80

0.0719

2.58

0.0099

0.25

0.8026

1.03

0.3030

1.81

0.0703

2.59

0.0096

0.26

0.7949

1.04

0.2983

1.82

0.0688

2.60

0.0093

0.27

0.7872

1.05

0.2937

1.83

0.0672

2.61

0.0091

65

1.2 Where are we now?

TABLE A6

Two-tailed z-test—cont’d

z

p

z

p

z

p

z

p

0.28

0.7795

1.06

0.2891

1.84

0.0658

2.62

0.0088

0.29

0.7718

1.07

0.2846

1.85

0.0643

2.63

0.0085

0.30

0.7642

1.07

0.2801

1.86

0.0629

2.64

0.0083

0.31

0.7566

1.09

0.2757

1.87

0.0615

2.65

0.0080

0.32

0.7490

1.10

0.2713

1.88

0.0601

2.66

0.0078

0.33

0.7414

1.11

0.2670

1.89

0.0588

2.67

0.0076

0.34

0.7339

1.12

0.2627

1.90

0.0574

2.68

0.0074

0.35

0.7263

1.13

0.2585

1.91

0.0561

2.69

0.0071

0.36

0.7188

1.14

0.2543

1.92

0.0549

2.70

0.0069

0.37

0.7114

1.15

0.2501

1.93

0.0536

2.71

0.0067

0.38

0.7039

1.16

0.2460

1.94

0.0524

2.72

0.0065

0.39

0.6965

1.17

0.2420

1.95

0.0512

2.73

0.0063

0.40

0.6892

1.18

0.2380

1.96

0.0500

2.74

0.0061

0.41

0.6818

1.19

0.2340

1.97

0.0488

2.75

0.0060

0.42

0.6745

1.20

0.2301

1.98

0.0477

2.76

0.0058

0.43

0.6672

1.21

0.2263

1.99

0.0466

2.77

0.0056

0.44

0.6599

1.22

0.2225

2.00

0.0455

2.78

0.0054

0.45

0.6527

1.23

0.2187

2.01

0.0444

2.79

0.0053

0.46

0.6455

1.24

0.2150

2.02

0.0434

2.80

0.0051

0.47

0.6384

1.25

0.2113

2.03

0.0424

2.81

0.0050

0.48

0.6312

1.26

0.2077

2.04

0.0414

2.82

0.0048

0.49

0.6241

1.27

0.2041

2.05

0.0404

2.83

0.0047

0.50

0.6171

1.28

0.2005

2.06

0.0394

2.84

0.0045

0.51

0.6101

1.29

0.1971

2.07

0.0385

2.85

0.0044

0.52

0.6031

1.30

0.1936

2.08

0.0375

2.86

0.0042

0.53

0.5961

1.31

0.1902

2.09

0.0366

2.87

0.0041

0.54

0.5892

1.32

0.1868

2.10

0.0357

2.88

0.0040

0.55

0.5823

1.33

0.1835

2.11

0.0349

2.89

0.0039

0.56

0.5755

1.34

0.1802

2.12

0.0340

2.90

0.0037

0.57

0.5687

1.35

0.1770

2.13

0.0332

2.91

0.0039

0.58

0.5619

1.36

0.1738

2.14

0.0324

2.92

0.0035

0.59

0.5552

1.37

0.1707

2.15

0.0316

2.93

0.0034

0.60

0.5485

1.38

0.1676

2.16

0.0308

2.94

0.0033 Continued

66 TABLE A6

1. Life challenge. Cancer

Two-tailed z-test—cont’d

z

p

z

p

z

p

z

p

0.61

0.5419

1.39

0.1645

2.17

0.0300

2.95

0.0032

0.62

0.5353

1.40

0.1615

2.18

0.0293

2.96

0.0031

0.63

0.5287

1.41

0.1585

2.19

0.0285

2.97

0.0030

0.64

0.5222

1.42

0.1556

2.20

0.0278

2.98

0.0029

0.65

0.5157

1.43

0.1527

2.21

0.0271

2.99

0.0028

0.66

0.5093

1.44

0.1499

2.22

0.0264

3.00

0.0027

0.67

0.5029

1.45

0.1471

2.23

0.0257

3.10

0.00194

0.68

0.4965

1.46

0.1443

2.24

0.0251

3.20

0.00137

0.69

0.4902

1.47

0.1416

2.25

0.0244

3.30

0.00097

0.70

0.4839

1.48

0.1389

2.26

0.0238

3.40

0.00067

0.71

0.4777

1.49

0.1362

2.27

0.0232

3.50

0.00047

0.72

0.4715

1.50

0.1336

2.28

0.0226

3.60

0.00032

0.73

0.4654

1.51

0.1310

2.29

0.0220

3.70

0.00022

0.74

0.4593

1.52

0.1285

2.30

0.0214

3.80

0.00014

0.75

0.4533

1.53

0.1260

2.31

0.0209

3.90

0.00010

0.76

0.4473

1.54

0.1236

2.32

0.0203

4.00

0.00006

0.77

0.4413

1.55

0.1211

2.33

0.0198

For the non-parametric Wilcoxon one sample test we have Table A7 to work with. The table is read this way: after computing the sum of positive, or alternatively, negative ranks, we compare it to the values from the table. If the sum is equal or outside the range shown, the p-value is less than the value that is shown at the top of the column. The number n is the number of differences. Example: If we have a rank sum of 11 from a sample of 13 we must look along row 13, from left to right, until we find the last interval in which 11 is not part of. In our case the p-value is less than 0.02. TABLE A7

Wilcoxon one sample. p-Value

n

0.2

0.1

0.05

0.02

0.01

0.001

4

0–10











5

2–13

0–15









6

3–18

2–19

0–21







67

1.2 Where are we now?

TABLE A7

Wilcoxon one sample—cont’d p-Value

n

0.2

0.1

0.05

0.02

0.01

0.001

7

5–23

3–25

2–26

0–28





8

8–28

5–31

3–33

1–35

0–36



9

10–35

8–37

5–40

3–42

1–44



10

14–41

10–45

8–47

5–50

3–52



11

17–49

13–53

10–56

7–59

5–61

0–66

12

21–57

17–61

13–65

9–69

7–71

1–77

13

26–65

21–70

17–74

12–79

9–82

2–89

14

31–74

25–80

21–84

15–90

12–93

4–101

15

36–84

30–90

25–95

19–101

15–105

6–114

16

42–94

35–101

29–107

23–113

19–117

9–127

17

48–105

41–112

34–119

28–125

23–130

11–142

18

55–116

47–12

40–131

32–139

27–144

14–157

19

62–128

53–137

46–144

37–153

32–158

18–172

20

69–141

60–150

52–158

43–167

37–173

21–189

21

77–154

67–164

58–173

49–182

42–189

26–205

22

86–167

75–178

66–187

55–198

48–105

30–223

23

95–181

83–193

73–203

62–214

54–222

35–241

24

104–196

91–209

81–219

69–231

61–239

40–260

25

114–211

100–225

89–239

76–249

68–257

45–280

In Table A8, the values of the F distribution corresponding to the one-tailed p-values are displayed. The columns represent n1 the numerator, and the rows represent n2 the denominator. The p-value is less than the tabulated value found in Table A8. Example: For a test statistics F ¼ 3.22 with n1 ¼ 2 and n2 ¼ 20 degrees of freedom we have 0.1 < p < 0.05. TABLE A8

F distribution. n1

n2

p

1

0.1

39.9

49.5

53.6

55.8

57.2

58.2

59.1

59.7

60.1

60.5

61.0

61.5

62.0

63.3

0.05

161.4

199.5

215.8

224.7

230.4

234.2

237.0

239.1

240.8

242.1

244.2

246.2

248.3

254.3

0.01 4051.8

4999.5

5403.5

5624.8

5763.8

5859.2

5928.6

5981.3

6022.7

6056.1

6106.6

6157.6

6209.0

6265.9

1

2

3

4

5

6

7

8

9

10

12

15



20

Continued

68

1. Life challenge. Cancer

TABLE A8

F distribution—cont’d n1

n2

p

2

0.1

8.53

9.00

9.16

9.24

9.29

9.33

9.35

9.37

9.38

9.39

9.41

9.43

9.44

9.49

0.05

18.51

19.00

19.16

19.25

19.30

19.33

19.35

19.37

19.38

19.40

19.41

19.43

19.45

19.50

0.01

98.50

99.00

99.17

99.25

99.30

99.33

99.36

99.75

99.78

99.80

99.83

99.87

99.90

99.50

0.1

5.54

5.46

5.39

5.34

5.31

5.28

5.27

5.25

5.24

5.23

5.22

5.20

5.18

5.13

0.05

10.13

9.55

9.28

9.12

9.01

8.94

8.89

8.85

8.81

8.79

8.74

8.70

8.66

8.53

0.01

34.11

30.82

29.46

28.71

28.24

27.91

27.67

27.49

27.34

27.23

27.05

26.87

26.69

26.13

0.1

4.54

4.32

4.19

4.11

4.05

4.01

3.98

3.95

3.94

3.92

3.90

3.87

3.84

3.76

0.05

7.71

6.94

6.59

6.39

6.26

6.16

6.09

6.04

6.00

5.96

5.91

5.86

5.80

5.63

0.01

21.20

18.00

16.69

15.98

15.52

15.21

14.98

14.80

14.66

14.55

14.37

14.20

14.02

13.46

0.01

4.06

3.78

3.62

3.52

3.45

3.40

3.37

3.34

3.32

3.30

3.27

3.24

3.21

3.10

0.05

6.61

5.79

5.41

5.19

5.05

4.95

4.88

4.82

4.77

4.74

4.68

4.62

4.56

4.36

0.01

16.26

13.27

12.06

11.39

10.97

10.67

10.46

10.29

10.16

10.05

9.89

9.72

9.55

9.02

0.1

3.78

3.46

3.29

3.18

3.11

3.05

3.01

2.98

2.96

2.94

2.90

2.87

2.84

2.72

0.05

5.99

5.14

4.76

4.53

4.39

4.28

4.21

4.15

4.10

4.06

4.00

3.94

3.94

3.67

0.01

13.74

10.92

9.78

9.15

8.75

8.47

8.26

8.10

7.98

7.87

7.72

7.56

7.56

6.88

0.1

3.59

3.26

3.07

2.96

2.88

2.83

2.78

2.75

2.72

2.70

2.67

2.63

2.59

2.47

3

4

5

6

7

8

9

10

12

15

1

2

3

4

5

6

7

8

9

10

12

15



20

0.05

5.59

4.74

4.35

4.12

3.97

3.87

3.79

3.73

3.68

3.64

3.57

3.51

3.44

3.23

0.01

12.25

9.55

8.45

7.85

7.46

7.19

6.99

6.84

6.72

6.62

6.47

6.31

6.16

5.65

0.1

3.46

3.11

2.92

2.81

2.73

2.67

2.62

2.59

2.56

2.54

2.50

2.46

2.43

2.29

0.05

5.32

4.46

4.07

3.84

3.69

3.58

3.50

3.44

3.39

3.35

3.28

3.22

3.15

2.93

0.01

11.26

8.65

7.59

7.01

6.63

6.37

6.18

6.03

5.91

5.81

5.67

5.52

5.36

4.86

0.1

3.36

3.01

2.81

2.69

2.61

2.55

2.51

2.47

2.44

2.42

2.38

2.34

2.30

2.16

0.05

5.12

4.26

3.86

3.63

3.48

3.37

3.29

3.23

3.18

3.14

3.07

3.01

2.94

2.71

0.01

10.56

8.02

6.99

6.42

6.06

5.80

5.61

5.47

5.35

5.26

5.11

4.96

4.81

4.31

0.1

3.29

2.92

2.73

2.61

2.52

2.46

2.41

2.38

2.35

2.32

2.28

2.24

2.20

2.01

0.05

4.96

4.10

3.71

3.48

3.33

3.22

3.13

3.07

3.02

2.98

2.84

2.84

2.77

2.54

0.01

10.04

7.56

6.55

5.99

5.64

5.39

5.20

5.06

4.94

4.85

4.71

4.56

4.81

3.91

0.1

3.18

2.81

2.61

2.48

2.39

2.33

2.28

2.24

2.21

2.19

2.15

2.10

2.06

1.90

0.05

4.75

3.89

3.49

3.26

3.11

3.00

2.91

2.85

2.80

2.75

2.69

2.62

2.54

2.30

0.01

9.33

6.93

5.95

5.41

5.06

4.82

4.64

4.50

4.39

4.30

4.16

4.01

3.86

3.36

0.1

3.07

2.70

2.49

2.36

2.27

2.21

2.16

2.12

2.09

2.06

2.02

1.97

1.92

1.76

0.05

4.54

3.68

3.29

3.06

2.90

2.79

2.71

2.64

2.59

2.54

2.48

2.40

2.33

2.07

0.01

8.68

6.36

5.42

4.89

4.56

4.32

4.14

4.00

3.89

3.80

3.67

3.52

3.37

2.87

69

1.2 Where are we now?

TABLE A8

F distribution—cont’d n1

n2

p

20

0.1

2.97

2.59

2.38

2.25

2.16

2.09

2.04

2.00

1.96

1.94

1.89

1.84

1.79

1.61

0.05

4.35

3.49

2.92

2.87

2.71

2.60

2.51

2.45

2.39

2.35

2.28

2.20

2.12

1.84

0.01

8.10

5.85

4.94

4.43

4.10

3.87

3.56

3.56

3.46

2.37

3.23

3.09

2.94

2.42

30

40

60

1

3

4

5

6

7

8

9

10

12

15



20

0.1

2.88

2.49

2.28

2.14

2.05

1.98

1.93

1.88

1.85

1.82

1.77

1.72

1.67

1.46

0.05

4.17

3.32

2.92

2.69

2.53

2.42

2.33

2.27

2.21

2.16

2.09

2.01

1.93

1.62

0.01

7.56

5.39

4.51

4.02

3.70

3.47

3.30

3.17

3.07

2.98

2.84

2.70

2.55

2.01

0.1

2.84

2.44

2.23

2.09

2.00

1.93

1.87

1.83

1.79

1.76

1.71

1.66

1.61

1.38

0.05

4.08

3.23

2.84

2.61

2.45

2.34

2.23

2.18

2.12

2.08

2.00

1.92

1.84

1.51

0.01

7.31

5.18

4.31

3.83

3.51

3.29

3.12

2.99

2.89

2.80

2.66

2.52

2.37

1.80

0.1

2.79

2.39

2.18

2.04

1.95

1.87

1.82

1.77

1.74

1.71

1.66

1.60

1.54

1.29

0.05

4.00

3.15

2.76

2.53

2.37

2.25

2.17

2.10

2.04

1.99

1.92

1.84

1.75

1.39

0.01 120 0.1 0.05



2

7.08

4.98

4.13

3.65

3.34

3.12

2.95

2.82

2.72

2.63

2.50

2.35

2.20

1.60

2.75

2.35

2.13

1.99

1.90

1.82

1.77

1.72

1.68

1.65

1.60

1.54

1.48

1.19

3.92

3.07

2.68

2.45

2.29

2.18

2.09

2.02

1.96

1.91

1.83

1.75

1.66

1.25

0.01

6.85

4.79

3.95

3.48

3.17

2.96

2.79

2.66

2.56

2.47

2.34

2.19

2.03

1.38

0.1

2.71

2.30

2.08

1.94

1.85

1.77

1.72

1.67

1.63

1.60

1.55

1.49

1.42

1.13

0.05

3.84

3.00

2.60

2.37

2.21

2.10

2.01

1.94

1.88

1.83

1.75

1.67

1.57

1.17

0.01

6.63

4.61

3.78

3.32

3.02

2.80

2.64

2.51

2.41

2.32

2.18

2.04

1.88

1.24

The following two tables contain the cut-off values for the one sample z-test and the two proportion z-test. Table A9 presents the z-scores for the one sample z-test. In order to find the cut-off value you need to search the table for the value determined by the α-level, afterwards look to see on what line and column it is on. If you cannot find the exact value, find two values in the table that contain that specific value. For example, if you look for 0.45, you see that on line 1.6 you have the value 0.4495 that corresponds to the column 0.4, followed by 0.4505 that corresponds to the column 0.5. Recall that you were searching for 0.45 that is right in the middle, meaning it corresponds to the value 0.45. By combing the two you get the z-score 1.645. TABLE A9

z-Score for one sample z-test.

z

0.00

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.0

0.0000

0.0040

0.0080

0.0120

0.0160

0.0199

0.0239

0.279

0.319

0.359

0.1

0.0398

0.0438

0.0478

0.0517

0.0557

0.0596

0.0636

0.0675

0.714

0.0753

0.2

0.0793

0.0832

0.0871

0.0910

0.0948

0.0987

0.1026

0.1064

0.1103

0.1141

0.3

0.1179

0.1217

0.1255

0.1293

0.1331

0.1368

0.1406

0.1443

0.1480

0.1517 Continued

70

1. Life challenge. Cancer

TABLE A9

z-Score for one sample z-test—cont’d

z

0.00

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.4

0.1554

0.1591

0.1628

0.1664

0.1700

0.1736

0.1772

0.1808

0.1844

0.1879

0.5

0.1915

0.1950

0.1985

0.2019

0.2054

0.2088

0.2123

0.2157

0.2190

0.2224

0.6

0.2257

0.2291

0.2324

0.2357

0.2389

0.2422

0.2454

0.2486

0.2517

0.2549

0.7

0.2580

0.2611

0.2642

0.2673

0.2704

0.2734

0.2764

0.2794

0.2823

0.2852

0.8

0.2881

0.2910

0.2939

0.2967

0.2995

0.3023

0.3051

0.3078

0.3106

0.3133

0.9

0.3159

0.3186

0.3212

0.3238

0.3264

0.3289

0.3315

0.3340

0.3365

0.3389

1.0

0.3413

0.3438

0.3461

0.3485

0.3508

0.3531

0.3554

0.3577

0.3599

0.3621

1.1

0.3643

0.3665

0.3686

0.3708

0.3729

0.3749

0.3770

0.3790

0.3810

0.3830

1.2

0.3849

0.3869

0.3888

0.3907

0.3925

0.3944

0.3962

0.3980

0.3997

0.4015

1.3

0.4032

0.4049

0.4066

0.4082

0.4099

0.4115

0.4131

0.4147

0.4162

0.4177

1.4

0.4192

0.4207

0.4222

0.4236

0.4251

0.4265

0.4279

0.4292

0.4306

0.4319

1.5

0.4332

0.4345

0.4357

0.4370

0.4382

0.4394

0.4406

0.4418

0.4429

0.4441

1.6

0.4452

0.4463

0.4474

0.4484

0.4495

0.4505

0.4515

0.4525

0.4535

0.4545

1.7

0.4554

0.4564

0.4573

0.4582

0.4591

0.4599

0.4608

0.4616

0.4625

0.4633

1.8

0.4641

0.4649

0.4656

0.4664

0.4671

0.4678

0.4686

0.4693

0.4699

0.4706

1.9

0.4713

0.4719

0.4726

0.4732

0.4738

0.4744

0.4750

0.4756

0.4761

0.4767

2.0

0.4772

0.4778

0.4783

0.4788

0.4793

0.4798

0.4803

0.4808

0.4812

0.4817

2.1

0.4821

0.4826

0.4830

0.4834

0.4838

0.4843

0.4846

0.4850

0.4854

0.4857

2.2

0.4861

0.4864

0.4868

0.4871

0.4875

0.4878

0.4881

0.4884

0.4887

0.4890

2.3

0.4893

0.4896

0.4898

0.4901

0.4904

0.4906

0.4909

0.4911

0.4913

0.4916

2.4

0.4918

0.4920

0.4922

0.4925

0.4927

0.4929

0.4931

0.4932

0.4034

0.4936

2.5

0.4938

0.4940

0.4941

0.4943

0.4945

0.4946

0.4948

0.4949

0.4951

0.4952

2.6

0.4953

0.4955

0.4956

0.4957

0.4959

0.4960

0.4961

0.4962

0.4963

0.4964

2.7

0.4965

0.4966

0.4967

0.4968

0.3969

0.4970

0.4971

0.4972

0.4973

0.4974

2.8

0.4974

0.4975

0.4976

0.4977

0.4977

0.4978

0.4979

0.4979

0.4980

0.4981

2.9

0.4981

0.4982

0.4982

0.4983

0.4984

0.4984

0.4985

0.4985

0.4986

0.4986

3.0

0.4987

0.4987

0.4987

0.4988

0.4988

0.4989

0.4989

0.4989

0.4990

0.4990

3.1

0.4990

0.4991

0.4991

0.4991

0.4992

0.4992

0.4992

0.4992

0.4993

0.4993

3.2

0.4993

0.4993

0.4994

0.4994

0.4994

0.4994

0.4994

0.4995

0.4995

0.4995

3.3

0.4995

0.4995

0.4995

0.4996

0.4996

0.4996

0.4996

0.4996

0.4996

0.4997

3.4

0.4997

0.4997

0.4997

0.4997

0.4997

0.4997

0.4997

0.4997

0.4997

0.4998

71

1.2 Where are we now?

TABLE A9

z-Score for one sample z-test—cont’d

z

0.00

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

3.5

0.4998

0.4998

0.4998

0.4998

0.4998

0.4998

0.4998

0.4998

0.4998

0.4998

3.6

0.4998

0.4998

0.4999

0.4999

0.4999

0.4999

0.4999

0.4999

0.4999

0.4999

3.7

0.4999

0.4999

0.4999

0.4999

0.4999

0.4999

0.4999

0.4999

0.4999

0.4999

3.8

0.4999

0.4999

0.4999

0.4999

0.4999

0.4999

0.4999

0.4999

0.4999

0.4999

3.9

0.4999

0.4999

0.4999

0.4999

0.4999

0.4999

0.4999

0.4999

0.4999

0.4999

Table A10 depicts the z-scores for the two proportions z-test. The table is read in the same way as Table A9 for the one sample z-test. TABLE A10

z-Score for two tailed z-test.

z

0.00

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.0

0.500

0.5040

0.5080

0.5120

0.5160

0.5199

0.5239

0.5279

0.5319

0.5359

0.1

0.5398

0.5438

0.5478

0.5517

0.5557

0.5596

0.5636

0.5675

0.5714

0.5753

0.2

0.5793

0.5832

0.5871

0.5910

0.5948

0.5987

0.6026

0.6064

0.6103

0.6141

0.3

0.6179

0.6217

0.6255

0.6293

0.6331

0.6368

0.6406

0.6443

0.6480

0.6517

0.4

0.6554

0.6591

0.6628

0.6664

0.6700

0.6736

0.6772

0.6808

0.6844

0.6879

0.5

0.6915

0.6950

0.6985

0.7019

0.7054

0.7088

0.7123

0.7157

0.7190

0.7224

0.6

0.7257

0.7291

0.7324

0.7357

0.7389

0.7422

0.754

0.7486

0.7517

0.7549

0.7

0.7580

0.7611

0.7642

0.7673

0.7704

0.7734

0.7764

0.7794

0.7823

0.7852

0.8

0.7881

0.7910

0.7939

0.7967

0.7995

0.8023

0.8051

0.8078

0.8106

0.8133

0.9

0.8159

0.8186

0.8212

0.8238

0.8264

0.8289

0.8315

0.8340

0.8365

0.8389

1.0

0.8413

0.8438

0.8461

0.8485

0.8508

0.8531

0.8554

0.8577

0.8599

0.8621

1.1

0.8643

0.8665

0.8686

0.8708

0.8729

0.8749

0.8770

0.8790

0.8810

0.8830

1.2

0.8849

0.8869

0.8888

0.8907

0.8925

0.8944

0.8962

0.8980

0.8997

0.9015

1.3

0.9032

0.9049

0.9066

0.9082

0.9099

0.9115

0.9131

0.9147

0.9162

0.9177

1.4

0.9192

0.9207

0.9222

0.9236

0.9251

0.9265

0.9279

0.9292

0.9306

0.9319

1.5

0.9332

0.9345

0.9357

0.9370

0.9382

0.9394

0.9406

0.9418

0.9429

0.9441

1.6

0.9452

0.9463

0.9474

0.9484

0.9495

0.9505

0.9515

0.9525

0.9535

0.0545

1.7

0.9554

0.9564

0.9573

0.9582

0.9591

0.9599

0.9608

0.9616

0.9625

0.9633

1.8

0.9641

0.9649

0.9656

0.9664

0.9671

0.9678

0.9686

0.9693

0.9699

0.9706

1.9

0.9713

0.9719

0.9726

0.9732

0.9738

0.9744

0.9750

0.9756

0.9761

0.9767 Continued

72

1. Life challenge. Cancer

TABLE A10 z-Score for two tailed z-test—cont’d z

0.00

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

2.0

0.9772

0.9778

0.9783

0.9788

0.9793

0.9798

0.9881

0.9884

0.9887

0.9890

2.1

0.9821

0.9826

0.9830

0.9834

0.9838

0.9842

0.9846

0.9850

0.9854

0.9890

2.2

0.9861

0.9864

0.9868

0.9871

0.9875

0.9878

0.9803

0.9808

0.9812

0.9817

2.3

0.9893

0.9896

0.9898

0.9901

0.9904

0.9906

0.9909

0.9911

0.9913

0.9916

2.4

0.9918

0.9920

0.9922

0.9925

0.9927

0.9929

0.9931

0.9932

0.9934

0.9936

2.5

0.9938

0.9940

0.9941

0.9943

0.9945

0.9946

0.9948

0.9949

0.9951

0.9952

2.6

0.9953

0.9955

0.9956

0.9957

0.9959

0.9960

0.9961

0.9962

0.9963

0.9964

2.7

0.9965

0.9966

0.9967

0.9968

0.9969

0.9970

0.9971

0.9972

0.9973

0.9974

2.8

0.9974

0.9975

0.9976

0.9977

0.9977

0.9978

0.9979

0.9979

0.9980

0.9981

2.9

0.9981

0.9982

0.9982

0.9983

0.9984

0.9984

0.9985

0.9985

0.9986

0.9986

3.0

0.9987

0.9987

0.9987

0.9988

0.9988

0.9989

0.9989

0.9989

0.9990

0.9990

3.1

0.9990

0.9991

0.9991

0.9991

0.9992

0.9992

0.9992

0.9992

0.9993

0.9993

3.2

0.9993

0.9993

0.9994

0.9994

0.9994

0.9994

0.9994

0.9995

0.9995

0.9995

3.3

0.9995

0.9995

0.9995

0.9996

0.9996

0.9996

0.9996

0.9996

0.9996

0.9997

3.4

0.9997

0.9997

0.9997

0.9997

0.9997

0.9997

0.9997

0.9997

0.9997

0.9998

3.5

0.9998

0.9998

0.9998

0.9998

0.9998

0.9998

0.9998

0.9998

0.9998

0.9998

3.6

0.9998

0.9998

0.9999

Table A11 depicts the χ 2 test scores. TABLE A11 The χ 2 distribution table. Degrees of freedom

0.2

0.1

0.05

0.02

0.01

0.001

1

1.642

2.706

3.841

5.412

6.635

10.827

2

3.219

4.605

5.991

7.824

9.210

13.815

3

4.642

6.251

7.815

9.837

11.345

16.268

4

5.989

7.770

9.488

11.688

13.277

18.465

5

7.289

9.236

11.070

13.388

15.086

20.517

6

8.558

10.645

12.592

15.033

16.812

22.457

7

9.803

12.017

14.067

16.622

18.475

24.322

8

11.030

13.362

15.507

18.168

20.090

26.125

73

1.3 Hope is around the corner. Artificial Intelligence steps in

TABLE A11

The χ 2 distribution table—cont’d

Degrees of freedom

0.2

0.1

0.05

0.02

0.01

0.001

9

12.242

14.684

16.919

19.670

21.666

27.877

10

13.442

15.987

18.307

21.161

23.209

29.588

11

14.631

17.275

19.675

22.618

24.725

31.264

12

15.812

18.549

21.026

24.054

26.217

32.909

13

16.985

19.812

22.362

25.472

27.688

34.528

14

18.151

21.064

23.685

26.873

29.141

36.123

15

19.311

22.307

23.685

26.873

29.141

36.123

16

20.465

23.542

26.296

26.633

32.000

39.252

17

21.615

24.769

27.587

30.995

33.408

40.790

18

22.760

25.989

28.869

32.346

34.805

42.312

19

23.900

27.204

30.144

22.687

36.191

43.820

20

25.038

28.412

31.410

35.020

37.566

45.315

21

26.171

29.615

32.671

36.343

38.932

46.797

22

27.301

30.813

33.924

37.659

40.289

48.268

23

28.429

32.007

35.172

38.968

41.638

49.728

24

29.553

33.196

36.415

40.270

42.980

51.179

25

30.675

34.382

37.652

41.566

44.314

52.620

1.3 Hope is around the corner. Artificial Intelligence steps in Cancer is the biggest health challenge we face today. In October 1971, the epidemiologist Abdel Omran published a paper in The Milbank Memorial Fund Quarterly Journal in which he stated that a country’s development would change the way its people are going to die (Omran, 1971). If a country is more developed, its nation is wealthier, its medical system is better, and thus life expectancy increases. In under developed countries the causes of death are related to malnutrition, maternal and infancy mortality, infection diseases due to the lack of hygiene, etc. So, when we eliminate all these factors, we remain with diseases like cancer, heart diseases, diabetes, etc. So far, heart diseases are the primary cause of death around the world, but apparently things are about to change. A study performed by some data scientists at Stanford Medical School published in the Annals of Internal Medicine in December 2018, shows that cancer is becoming the first cause of death, at least in the United States (Hastings et al., 2018). The researchers used for their study the death certificates of 32 million US citizens that died between 2003 and 2015. Based on their income earned between 2007 and 2011, the

74

1. Life challenge. Cancer

individuals were separated into five categories: the lowest one having an income between $19,340 and $36,400, and the highest group having the median of income around $52,400. The annual death rates for cancer and heart diseases dropped at a different pace, heart diseases’ death rates falling faster. Why is that? Because we have found the causes of heart diseases: smoking, drinking, high blood pressure, high glucose levels, and we can prevent them. But we cannot prevent cancer, not yet at least. These trends imply that people are living longer, thus they become sick. But nothing explains cancer in the young population. Genetics has revealed the variability between cancers and even between patients with the same type of cancer. That implies that we should not try to find “a cure for cancer,” but we should try to find a cure for each specific cancer patient. In recent years, immuno-oncology, a new field in cancer treatment has developed. Practically, the immune system of the patient is helped in order for it to identify and attack tumors. For this method to give results, AI needs to analyze the gene expression in the tumor microenvironment and also the composition of tumor infiltrates. AI can establish certain pattern in this first-class treatment option. An important problem with immunotherapy that needs to be solved is the fact that it is very hard to predict whether the therapy will work against a specific tumor or not. AI could help in finding the specific patients that would respond to this treatment. For example, the researchers from the Institut Curie in France, paired up with the American startup Freenome, and started developing a new non-invasive alternative to surgical biopsy, that finds cancer DNA circulating in the blood. Basically, they are trying to predict a patient’s response to the treatment https:// techtransfer.institut-curie.org/news/partnership/institut-curie-and-freenome-announcestrategic-collaboration-cell-free-dna (Accessed July 22, 2019). Another treatment option will be unique cancer vaccines. The Danish company Evaxion— http://exavion-biotech.com (Accessed July 22, 2019)—is working on such a machine-learning project. It has been granted $1 million for this research. The “cancer vaccine” has cured up to 97% of tumors in mice. The difference between immunotherapy and cancer vaccine is that the vaccine can prevent the tumors from recurring. It activates the immune system’s T-cells to eliminate the cancer cells. The cells from the vaccines are trained to recognize cancer-specific proteins. This means that after destroying the tumor, the cells roam through the blood flow to search and destroy any other cancer cells that have migrated, the metastases. AI can identify the unique gene expression of each cancer patient and design the vaccine accordingly. We can fight cancer if we detect it early. Cyrcadia Health—http://cyrcadiahealth.com (Accessed July 22, 2019)—is a cancer therapy startup that developed a wearable patch that can be inserted under the bra to detect the temperature changes within the breast. Using AI algorithms the device can detect abnormal patterns and alert the woman. The manufacturer states that after the completion of early tests the smart gadget can detect up to 80% breast tumors. The scientists from China’s National Center for Nanoscience and Technology and Arizona State University developed robots that have the size of a few hundred nanometers that were injected into the bloodstream of mice. The nanorobots blocked the blood supply of the tumors, and thus the tumors shrunk. The nanorobots are made from sheets of DNA, which were rolled into a tube that contains a blood-clotting drug. The outside of the tube contains a small DNA molecule that has a protein found only in tumors. This molecule makes the DNA tube to unroll and release the drug when it reaches the tumor. Using AI to determine which patients are fit for a certain type of treatment implies that it can determine the patients who will not respond to that treatment. That means that we can spear

References

75

people from undergoing unnecessary, invasive, life-changing treatment, for no good outcome whatsoever. AI will help people decide whether palliative care is the better option for them. We must keep in mind that some people choose the quality of life over the quantity of it. Another important aspect that needs to be told is the sad fact that the newest cancer therapy is still a privilege available to the richest individuals, due to its high cost. Using AI in cancer research can help reduce the cost and make cancer treatment more affordable by reducing the waste. AI can help in patient management also. Beds and resources can be allocated better and costs reduced. This short subchapter is just a short preview of what comes next. Some diagnosing methods, and some courses of treatment will be discussed in future chapters. So, let us proceed further and see how AI has and will change the face of medicine regarding cancer. We will talk about techniques, how to validate them, and how to interpret the results obtained by data scientists.

References Adams, S.H., 1913. What can we do about cancer? The most vital and insistent question in the medical world. Ladies Home J. 30, 21–22. Altman, D.G., 1991. Practical Statistics for Medical Research. Chapman and Hall, New York. Ames, B.N., Durston, W.E., Yamasaki, E., Lee, F.D., 1973. Carcinogens are mutagens: a simple test system combining liver homogenates for activation and bacteria for detection. Proc. Natl. Acad. Sci. U. S. A. 70, 2281–2285. Balmain, A., 2001. Cancer genetics: from Boveri and Mendel to microarrays. Nat. Rev. Cancer 1, 77–82. https://doi. org/10.1038/35094086. Bartlett, M.S., 1937. Properties of sufficiency and statistical tests. Proc. R. Stat. Soc. A 160, 268–282. Bayes, M., Price, M., 1763. An essay towards solving a problem in the doctrine of chances. By the late Rev. Mr. Bayes, F.R.S. communicated by Mr. Price, in a letter to John Canton, A.M.F.R.S. Philos. Trans. R. Soc. Lond. 53, 370–418. https://doi.org/10.1098/rstl.1763.0053. Belciug, S., Gorunescu, F., 2020. Era of intelligent systems in healthcare. In: Belciug, S., Gorunescu, F. (Eds.), Intelligent Decision Support Systems—A Journey to Smarter Healthcare. Springer Nature Switzerland. https://doi.org/ 10.1007/978-3-030-14354-1. Bignold, L.P., Coghlan, B.L.D., Jersmann, H.P.A., 2006. Hansemann, Boveri, chromosomes and the gametogenesisrelated theories of tumors. Cell Biol. Int. 30 (7), 640–644. https://doi.org/10.1016/j.cellbi.2006.04.002. Bittner, J.J., 1936. Mammary Tumors in Mice in Relation to Nursing. http://cancerres.accrjournals.org/content/ amjcancer/30.3.530.full.pdf. Bittner, J.J., 1942. The milk-influence of breast tumors in mice. Science 95, 462–463. Boveri, T., 1914. Zur frage det entstehung maligner tumoren. Gustave Fisher, Jena, Germany. Boveri, T., 2008. Concerining the origin of malignant tumor by Theodor Boveri. Translated and annotated by Henry Harris. J. Cell Sci. 121 (Suppl. 1), 1–84. https://doi.org/10.1242/jcs.025742. Brown, M.B., Forsythe, A.B., 1974. Robust tests for the equality of variances. J. Am. Stat. Assoc. 69, 364–367. Campbell, L.F., Farmery, L., Creighton George, S.M., Farrant, P.B.J., 2013. Canine olfactory detection of malignant melanoma. BMJ Case Rep. https://doi.org/10.1136/bcr-2013-008566. Ciuffo, G., 1907. Imnesto positivo con filtrato di verruca volgare. Giorn. Ital. Mal. Venereol. 48, 12–17. Clish, C.B., 2015. Metabolomics: an emerging but powerful tool for precision medicine. Cold Spring Harb. Mol. Case Stud. 1 (1), a000588. https://doi.org/10.1101/mcs.a000588. Creech, H.J., 1979. Historical review of the American Association of Cancer Reserch, Inc. 1941-1948. Cancer Res. 39, 1863–1890. Davis, J., Goadrich, M., 2006. The relationship between Precision-Recall and ROC curves. In: ICML’06 Proceedings of the 23rd International Conference on Machinea Learning, Pittsburgh, PA, USA, June 25–29, pp. 233–240. https:// doi.org/10.1145/1143844.1143874. Di Lonardo, A., Nasi, S., Pulciani, S., 2015. Cancer: we should not forget the past. J. Cancer 6 (1), 29–39.

76

1. Life challenge. Cancer

Eddy, B.E., Stewart, S.E., 1959. Characteristics of the SE polyoma virus. Am. J. Public Health Nations Health 49 (11), 1486–1492. Ellerman, C., Bang, O., 1908. Experimentelle Leukamie bei Huhnern. Zentralbl. Bakteriol. Parasitenkd. Infectionskr. Hyg. Abt. I. 46, 595–609. Epstein, M.A., Achong, B.G., Barr, T.M., 1964. Virus particles in cultured lymphoblasts from Burkitt’s lymphoma. Lancet 1, 702–703. https://doi.org/10.1016/S0140-6736(64)91524-7. Fawcett, T., 2006. An introduction to ROC analysis. Pattern Recogn. Lett. 27, 861–874. Graffi, A., 1957. Chloroleukemia of mice. Ann. N.Y. Acad. Sci. 68 (2), 540–558. Gross, L., 1951. “Spontaneous” leukemia developing in C3H mice following inoculation, in infancy, with AK-leukemic extracts, or AK-embryos. Proc. Soc. Exp. Biol. Med. 62, 523–548. Gross, L., 1953. A filterable agent recovered from AK leukemic extracts, causing salivary gland carcinomas in C3H mice. Proc. Soc. Exp. Biol. Med. 83, 414–431. Guirao Montes, A., Molins Lopez-Rodo, L., Ramon Rodriguez, I., Sunyer Dequigiovanni, G., Vinolas Segarra, N., Marrades Sicart, R.M., Hernandez Ferrandez, J., Fibla Alfara, J.J., Agusti Garcia-Navarro, A., 2017. Lung cancer diagnosis by trained dogs. Eur. J. Cardiothorac. Surg. 52 (6), 1206–1210. https://doi.org/10.1093/ejcts/ezx152. Haas, L.F., 1999. Papyrus of Ebers and Smith. J. Neurol. Neurosurg. Psychiatry 67, 578. Hajdu, S.I., 2011. A note from history: landmarks in history of cancer, part 1. Cancer 117, 1097–1102. https://doi.org/ 10.1002/cncr.25553. Hamada, R., Rida, A., 1972. Orthopaedics and orthopaedic diseases in ancient and modern Egypt. Clin. Orthop. 8, 253. Hanley, J.A., McNeil, B.J., 1982. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 143, 29036. Hastings, K., Boothroyd, D., Kappahahn, K., Hu, J., Rehkopf, D., Cullen, M., Palaniappan, L., 2018. Socioeconomic differences in the epidemiologic transition from heart disease to cancer as the leading cause of death in the United States, 2003 to 2015: an observational study. Ann. Intern. Med. 169 (12), 836–844. https://doi.org/10.7326/M170796. Henschen, F., 1968. Yamagiwa’s tar cancer and its historical significance. From Percival Pott to Katsusaburo Yamagiwa. Gann 59, 447–451. Huebner, R.J., Todaro, G.J., 1969. Oncogenes of RNA tumor viruses as determinants of cancer. Proc. Natl. Acad. Sci. U. S. A. 64 (3), 1087–1094. Ikenberg, H., Gissmann, L., Gross, G., Grussendorf-Conen, E.I., zur Hausen, H., 1983. Human papillomavirus type16-related DNA in genital Bowen’s disease and Bowenoid papulosis. Int. J. Cancer 32 (5), 563–565. IMIA, 2000. Recommendations of the International Medical Informatics Association (IMIA) on education in health and medical informatics. Methods Inf. Med. 39, 267–277. Johnson, S.B., 2003. A framework for the biomedical informatics curriculum. AMIA Annu. Symp. Proc., 331–335. Kress, M., May, E., Cassingena, R., May, P., 1979. Simian virus 40-transformed cells express new species of proteins precipitable by anti-simian virus 40 tumor serum. J. Virol. 31, 472–483. Lentner, C. (Ed.), 1982. Geigy Scientific Tables, 2. Ciba-Geigy, Basle. Levene, H., 1960. Robust tests for equality of variances. In: Olkin, I., Hotelling, H., et al. (Eds.), Contribution to Probability and Statistics: Essays in Honor of Harold Hotelling. Stanford University Press, pp. 278–292. Mayor, S., 1999. First human chromosome sequenced. BMJ 319 (7223), 1453. https://doi.org/10.1136/bmj. 319.7223.1453a. Melero, J.A., Sitt, D.T., Mangel, W.F., Carroll, R.B., 1979. Identification of new polypeptide species (48-55K) immunoprecipitable by antiserum to purified large T antigen and present in SV40-infected and transformed celss. Virology 93, 466–480. Mitrus, I., Bryndza, E., Sochanik, A., Szala, S., 2012. Evolving models of tumors origin and progression. Tumor Biol. 33, 911–917. https://doi.org/10.1007/s13277-012-0389-0. NLM, 2007. NLM’s University-Based Biomedical Informatics Research Training Programs. http://nlm.nih.gov/ep/ GrantTrainInstitute.html. Omran, A., 1971. The epidemiologic transition: a theory of the epidemiology of population change. Milbank Mem. Fund Q. 49 (4), 509–538. Pesapane, F., Codari, M., Sardanelli, F., 2018. Artificial intelligence in medical imagining: threat of opportunity? Radiologists again at the forefront of innovation medicine. Eur. Radiol. Exp. https://doi.org/10.1186/s41747-0180061-6.

Further reading

77

Rous, P., 1911a. Transmission of a malignant new growth by means of a cell-free filtrate. JAMA. Rous, P., 1911b. A sarcoma of the fowl transmissible by an agent separable from the tumor cells. J. Exp. Med. 13, 397–411. Rowley, J.D., 1973. A new consistent chromosomal abnormality on chronic myelogenous leukaemia identified by quinacrine fluroescence and giemsa staining. Nature 243, 290–293. Saito, T., Rehmsmeier, M., 2015. The precision-recall is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets. PLoS One 10 (3). https://doi.org/10.1371/journal.pone.0118432. Scotch, M., Duggal, M., Brandt, C., Lin, Z., Shiffman, R., 2010. Use of statistical analysis in the biomedical informatics literature. J. Am. Med. Inform. Assoc. 17 (1), 3–5. https://doi.org/10.1197/jamia.M2853. Shimizu, K., Goldfarb, M., Suard, Y., Perucho, M., Li, Y., Kamata, T., Feramisco, J., Stavnezer, E., Fogh, J., Wigler, M.H., 1983. Three human transforming genes are related to the viral ras oncogenes. Proc. Natl. Acad. Sci. U. S. A. 80 (8), 2112–2116. Shope, R.E., Hurst, E.W., 1933. Infectious papillomatosis in rabbits. J. Exp. Med. 58, 607–624. https://doi.org/10.1084/ jem.58.5.607. Smith, A.E., Smith, R., Paucha, E., 1979. Characterization of different tumor antigens present in cells transformed by simian virus 40. Cell 18, 335–346. Snedecor, G.W., Cochran, W.G., 1989. Statistical Methods, eighth ed. Iowa State University Press. Sparano, J.A., 2006. TAILORx: trial assigning individualized options for treatment (Rx). Clin. Breast Cancer 7 (4), 347–350. Sparano, J.A., Gray, R.J., Makower, D.F., Pritchard, K.I., Albain, K.S., Hayes, D.F., Geyer, C.E., Dees, E.C., Perez, E.A., Olson, J.A., Zujewski, J., Lively, T., et al., 2015. Prospective validation of a 21-gene expression assay in breast cancer. N. Engl. J. Med. 373, 2005–2014. https://doi.org/10.10156/NEJMoa1510764. Sweet, B.H., Hilleman, M.R., 1960. The vacuolating virus, S.V.40. Proc. Soc. Exp. Biol. Med. 105, 420–427. Van Epps, H.L., 2005. Peyton Rous father of the tumor virus. J. Exp. Med. 201 (3), 320. https://doi.org/10.1084/ jem.2013fta. Weinberg, R.A., 1988. Finding the anti-oncogene. Sci. Am. 259 (3), 44–53. Windish, D.M., Huot, S.J., Green, M.L., 2007. Medicine residents’ understanding of the biostatistics and results in medical literature. JAMA 298, 1010–1022. Wood, L.D., Parsons, D.W., Jones, S., Lin, J., Sjoblom, T., Leary, R.J., Shen, D., Boca, S.M., Barber, T., Ptak, J., Silliman, N., Szabo, S., Dezso, Z., Ustyanksky, V., Nikolskaya, T., Nikolsky, Y., Karchin, R., Wilson, P.A., Kaminker, J.S., Zhang, Z., Croshaw, R., Willis, J., Dawson, D., Shipitsin, M., Willson, J.K., Sukumar, S., Polyak, K., Park, B.H., Pehtiyagoda, C.L., Pant, P.V., Ballinger, D.G., Sparks, A.B., Hartigan, J., Smith, D.R., Suh, E., Papadopoulos, N., Buckhaults, P., Markowitz, S.D., Parmigiani, G., Kinzler, K.W., Velculescu, V.E., Vogelstein, B., 2007. The genomic landscapes of human breast and colorectal cancers. Science 318 (5853), 1108–10013. Yamagiwa, K., Ichikawa, K., 1917. Experimental study of the pathogenesis of carcinoma. The Journal of Cancer Research, III 1, 1–29.

Further reading Davis, C.S., Stephens, M.A., 1978. The Covariance Matrix of Normal Order Statistics (Tehnical Report No. 14). https://apps.dtic.mil.dtic/tr/fulltext/u2/a053857.pdf. Marmor, M.F., 2006. Ophthalmology and art: simulation of Monet’s cataracts and Degas’ retinal disease. Arch. Ophthalmol. 124 (12), 1764–1769. Ravin, J.G., 1985. Monet’s caratarcts. JAMA 254 (3), 394–399. Rous, P., 1936. The virus tumors and the tumor problem (Harvey Lecture). Am. J. Cancer. 28, 233–271.

C H A P T E R

2

The beginnings 2.1 Doctor’s suspicion. Doctor + artificial intelligence combo’s diagnosis One of the most horrifying experiences a person could ever live is sitting in a doctor’s office and finding out that she/he or a loved one has cancer. It feels like the world has stopped moving and thoughts are spinning in your head. The doctor keeps on talking, but all you can think of is why did this happen to you and whether you are going to survive it. On the other side, you might be a hypochondriac person and every little symptom you have you believe is cancer. Let’s face it, if you have a mild headache, a soar throat, a bunion, and you Google it you will most definitely find cancer among the other causes. We are so scared that when we hear hoof beats we thing of zebras, not horses. In this subchapter we will discuss the way a doctor sets a cancer diagnosis, followed by how a doctor + AI combo sets it. In a discussion, when finding out that someone you know has cancer, you always ask: “Did she/he had any symptoms? Did she/he did not feel well?” hoping that the answer is yes. As long as you do not have symptoms you are out of the woods. But it does not always work like this. A person can find out she/he has cancer in two ways: she/he came into the doctor’s office with some symptoms, or she/he found out incidentally. The incidental finding of cancer can arise from three different situations. The first scenario is when a patient is concerned about a different disease, let us say per se, she/he thinks that she/he has a heart condition that causes her/him to be tired all the time and run out of breath while walking, when in fact her/his heart is perfectly ok, and the fatigue is caused by an anemia provoked by a stomach cancer. After checking her/his heart, the cardiologist might order a set of blood tests that would point out the presence of cancer. The second scenario refers to population screening. Different persons belong to certain types of risk groups that need to make preventive investigations (e.g. colonoscopy, endoscopy, Papanicolau test, etc.). Depending on the age and the sex of the person she/he should undergo medical screening. This way different types of cancers are found in early stages, thus making it possible to be treated. The third scenario is when patients do their annual check-ups with no symptoms whatsoever and their biological parameter measurements do not present normal values, and it turns out that they have some form of malignancy.

Artificial Intelligence in Cancer: Diagnostic to Tailored Treatment https://doi.org/10.1016/B978-0-12-820201-2.00002-7

79

# 2020 Elsevier Inc. All rights reserved.

80

2. The beginnings

Cancer suspicion arises from two different situations. The first scenario is when a patient that had no history of disease has a general or organ specific symptoms that have escalated fast. The other scenario is when the doctor prescribes a treatment for a non-malignant disease, but the patient does not respond to the treatment, or has a low response. For example, a person comes in with a cough after a cold that simply does not go away. Some coughs last for many weeks, but finally they do go away, only this time it just doesn’t. That is how a lung cancer or an esophageal cancer can be discovered. In Fig. 2.1 we present a schema for how is cancer diagnosed in the beginning.

FIG. 2.1 Diagnosing cancers.

When a doctor has a suspicion regarding cancer she/he starts to perform some tests to clarify the diagnosis. These investigations are: • physical exam—during this exam the doctor is looking at your body in search for lumps, changes in skin color, enlargement of an organ, all of which may indicate the presence of cancer; • biological parameters assessment—laboratory tests like blood and urine test can help detect abnormalities that may be caused by cancer. • imaging test—the doctor examines the internal organs and bones through noninvasive methods. • biopsy—till recently, biopsy is considered the gold technique for assessing cancer, but the method can be wrong if not done properly. The doctor collects a sample from the tumor and sends it to the laboratory. We mentioned earlier that this technique might have flaws. Why is that? Because the doctor takes just a little piece from the tumor, and maybe that piece is indeed malignant, but in a lower stage of cancer, when in fact the whole tumor has a higher stage of cancer. For example, during an endoscopy the gastroenterologist discovers an esophageal tumor and performs a biopsy. When the results come back from the pathologist the tumor is stage 2. After the esophageal cancer surgery has been performed, and the whole tumor had been taken out and sent to the pathologist, the biopsy results look totally different, the tumor being stage 4. The explanation is quite simple, the

2.1 Doctor’s suspicion. Doctor + artificial intelligence combo’s diagnosis

81

tumor was an atypical one, and expanded through the esophagus inside the body, and only afterwards inside the esophagus, thus deceiving the pathologist. So how do general practitioners (GP) assess cancer? Maybe you, just like us, were always amazed with how Dr. House set a diagnosis in the famous medical TV series. Thoughtful and rational clinical work can help early diagnosis of cancer. In Scheel et al. (2013) it has been shown that GPs manage to distinguish reasonably well between cancer and not cancer only based on the warning signs. In a study published by the Scandinavian Journal of Primary Health Care in 2015, Scheel and Holtedahl performed a statistical analysis on the some questionnaires received and completed by 396 GPs from Norway (Scheel and Holtedahl, 2015). The registrations regarded 51073 patients’ symptoms. After 6–7 months, on the follow-up report only 263 GPs reported whether their patients had developed cancer. From the 263 cancer cases, the GPs reported symptoms that helped diagnose cancer on 164 (62%) cases. This percent went up to 78% when clinical findings and test results were added. Only 12% of the patients had lower risk symptoms and only 7% cases had no warning signs. Among the patients who were diagnosed with cancer after the clinical tests were performed, 39% had no symptoms. Seven types of cancer occurred in over 20 cases (colorectal cancer, lung cancer, skin cancer, other digestive organ cancers, breast cancer, prostate cancer, lymphoid), and three other types of cancer in at least 7 cases. The most reported warning signs were: lump, no healing skin lesion, unusual bleeding, persistent digestive problem, cough or hoarseness, pigmented skin lesion, unintentional weight loss, unusual pain or fatigue. A worrying fact is that the known symptoms that are linked to cancer were not frequent. For example, only 8 out of 68 digestive organs cancer cases (12%) reported unusual bleeding, and only 25 cases (36%) reported persistent digestive problem. Only 10 out of 35 patients (29%) having breast cancer reported a lump. Only 8 out 23 (34%) lung cancer patients had a cough. AI can improve the number reported above. Of course these can be done only with the help of the doctor, whether they are GPs or specialists. Using AI is like finding the needle in the haystack. Patterns that are not visible to human eye can be spotted with the help of a computer. But AI is of no use, if the doctor does not have a suspicion, or if the patient does not participate in screening programs or does not do her/his check-ups. Sadly, as you have read earlier some cancers present no symptoms, or when they do it is too late. So let us suppose that the GP has a cancer suspicion and orders more tests and sends the patient for referral to a specialist. This is the moment when AI steps in. We shall discuss how can AI improve the assessment of biological parameters and the imaging tests, as for the biopsy we will present a special chapter (Chapter 3). Before we start we must introduce a statistical method used in AI: cross-validation. Because we shall present different AI algorithms we need to know how to estimate their “skill” in diagnosing cancer. After all, as data scientists we all want our model to perform well when dealing with new unseen data. A good model must not overfit or underfit on new data. The Oxford definition of overfitting is: “the production of an analysis which corresponds too closely or exactly to a particular set of data, and may therefore fail to fit additional data or predict future observations reliably.” An underfit model is a model that does not catch the underlying trend of data. Fig. 2.2 presents graphical both concepts, as well as how an AI algorithm should perform.

82

2. The beginnings

FIG. 2.2 Graphical representation of underfitting, appropriate, overfitting AI algorithms.

Overfitting can be avoided with cross-validation. The method is easy to understand and implement. Cross validation splits the data set we are working on into two separate subsets: training and testing. The steps are the following: 1. 2. 3. 4.

Split the data set into a number of subsets. Hold out a set at a time, while training your AI algorithm on the remaining set. Test the AI algorithm on the hold out test. Repeat the steps for all the subsets Fig. 2.3 is the graphical representation of the cross-validation method.

Round 1

Round 2

Round 3

Round 4

Round 8 Training set Testing set

FIG. 2.3 Cross-validation cycle.

There exist several types of cross-validation: • n-fold cross validation: the method uses the parameter n to split the data set into n groups. The most used value for n is 10, the method’s name becoming the 10-fold cross validation. The data set is split into 10 equal subsets, after which the below algorithm will run for 10 times, each time leaving out a different subsample. 1. The left out group is going to be the testing data set. 2. The rest of the nine groups are going to be the training data set. 3. Fit the AI algorithm on the training set, and afterwards evaluate its performance on the testing data set. 4. Retain its performance and discard the model. 5. Repeat for all subsets The value n is chosen taken into account the number of data samples. This number should be large enough so that the subsamples are representative for the initial data set (Fig. 2.4).

2.1 Doctor’s suspicion. Doctor + artificial intelligence combo’s diagnosis

83

FIG. 2.4 10-Fold cross validation.

• stratified n-fold cross-validation: the method resembles the n-fold cross validation, the only difference being the fact that when the split is performed one must take into account a certain criterion such as all the folds are balanced in the sense that they have the same proportion of observations with a given categorical value, for example the class value (i.e. benign vs malignant, recurrent vs non-recurrent). In Fig. 2.5 the stratified 10-fold crossvalidation is presented:

FIG. 2.5 Stratified 10-fold cross validation.

84

2. The beginnings

• leave-p-out cross validation: the method splits the N data points from the sample set into N  p data points that will represent the training set and p data points as the testing set. The process repeats after all combinations are made. The overall performance is computed as the average of all the rounds. The most common case is when p ¼ 1. In Fig. 2.6 the leave-pout cross validation is presented.

FIG. 2.6 Leave-p-out cross validation.

For further details, the reader is encouraged to read Langford (2005), Refaeilzadeh et al. (2009), and Little et al. (2017). In Chapter 1 we have discussed the means that are used to detect cancer. We have mentioned gene expression and proteomic data, imagistic methods, and other biochemical parameters. The time when cancer diagnosis was made without the use of AI has gone away. The mathematical precision of an AI algorithm is far too important to be neglected. We do not know who the person who reads this book is? Are you a doctor, a resident, a medical student? Or a data scientist, computer science student, etc.? If you come from the medical field, probably you do not agree 100% with our statement. If you come from the computer science field, you are probably excited. Let us present you a fascinating example of how a simple AI algorithm has changed the way pancreatic cancer is being diagnosed. It was the year 2012, when some data scientists and medical professionals from Romania, Denmark, Germany, Spain, Italy, France, Norway, and United Kingdom, published a paper that regards the use of neural networks for the differential diagnosis of pancreatic cancer and chronic pancreatitis patients (Saftoiu et al., 2012). The study was prospective, blind, and performed on 258 patients’ endoscopic ultrasound elastrography (EUS). The data set was unbalanced, meaning that the two classes (pancreatic cancer vs chronic pancreatitis) had an unequal number of instances. The first class, the pancreatic adenocarcinoma had 211 observations, whereas the second class, the chronic pancreatitis had 47 patients. The diagnosis was obtained by EUS fine needle

2.1 Doctor’s suspicion. Doctor + artificial intelligence combo’s diagnosis

85

aspiration cytology/microhistology, as well as surgical pathology and/or clinical follow-up. Even if there are only 8 countries, the data was collected form 13 academic medical centers in those countries. (EUS) is similar to color Doppler examinations or for the non-medical readers, similar to the weather map that shows the heat waves. Real-time elastography is used in many different types of cancers like breast, thyroid, prostate cancer, and more recently liver cancer (Bojunga et al., 2010; Bercoff et al., 2003; Cochlin et al., 2002; Sommerfeld et al., 2003; Friedrich-Rust et al., 2007). Regarding this study, the researchers used 774 EUS recordings. For each EUS examination there were recorded 3 movies that lasted 10 s each. Each video was split into individual frames and the hue histogram of these frames was computed. Each movie had 125 frames with 256 colors, thus producing 125 hue histograms. Two experienced doctors selected manually the tumor in each movie that was afterward processed by the IT experts. The IT data scientists used a type of neural network, a multi-layered perceptron (MLP) with two hidden layers to process the data. After reading the EUS videos, the two doctors obtained the following values: the first doctor had an 84.4% sensitivity and 46.8% specificity, while the second doctor obtained 75.4% sensitivity and 53.2% specificity. As for the neural network, the values were the following: sensitivity 87.59%, specificity 82.94%. The mean training accuracy was 91.14%, with the confidence interval (CI) ranging between 89.87% and 92.42%, while the mean testing accuracy was 84.27% with the CI ranging between 83.09% and 85.44%. We can see that there is a huge difference between the human performances versus AI performance, and this is normal. Each person sees colors and forms differently. Try to imagine how Claude Monet’s paintings would have been if he wouldn’t have suffered from cataract, or Degas’ if he wouldn’t have suffered from progressive retinal disease that caused him central macular damage. Monet stated “colors no longer had the same intensity for me,” and “my painting was getting more and more darkened.” AI software does not have such problems, thus it can analyze objectively each color using mathematics. The above research paper is a follow-up study of the results reported by another study performed on 68 patients using the same EUS-elastography technique at the Department of Gastrointestinal Surgery, Gentofte University Hospital, Copenhagen, Denmark and at the Department of Gastroenterology, University of Medicine and Pharmacy, Craiova, Romania. In Gorunescu et al. (2011), the authors developed an AI model based on a competitive/ collaborative neural computing system. We mentioned earlier a new concept: neural networks (NNs). Gorunescu et al. (2011) and Saftoiu et al. (2012) used MLP neural networks. This is just one example of a NN. Before we proceed any further, we must pause and explain these powerful AI algorithms, that led to the development of Deep Learning, which is trending nowadays.

Artificial neural networks The brain through its neural network makes us think, feel, remember, innovate, and create. Due to our capability of feeling emotions, creativity, and imagination, the human brain cannot be replicated. Our sixth sense, our intuition, is the fine line that separates us from AI. Great ideas do not come from mathematics, formulas, or logic. They come from out of nowhere. That place where AI cannot go. What we can replicate from the human brain has become

86

2. The beginnings

known as artificial neural networks (NNs). The first study regarding the artificial neuron was published in 1943 by the mathematician Walter Pitts and neuropsychiatrist Warren McCulloch (McCulloch and Pitts, 1943). Donald Hebb continued their work publishing in 1949 his book “The Organization of behavior” (Hebb, 1949). Frank Rosenblatt, a psychologist at Cornell, designed the first artificial neuron, the also known as Rosenblatt’s perceptron. The perceptron is based on the McCulloch-Pitts neuron. It performs a computation, a weighted sum, and if the result surpasses a certain previously set threshold the output is “1,” otherwise is “0.” The same way we teach a child the differences between a cat and a dog, by repeatedly showing her/him pictures of the animals, the perceptron receives inputs, thus learning and tuning its weights for minimizing the error between its output and the ground truth. Mathematically speaking, the perceptron receives a number of real inputs xi, i ¼ 1, …, p. Each input is afterwards weighted using some weights wi, and then summed up and passed through the activation function φ (Fig. 2.7).

FIG. 2.7 Rosenblatt’s perceptron.

An artificial NN is made up from neurons. Each neuron is connected to another neuron through a synapse, just like in the human brain. A neuron i is connected to a neuron j through wij, where wij represents the strength of this connection. To compute neuron’s j value we use the following equation: sj ¼

p X

wji  xi ¼ wj  x0 ,

i¼1

where x ¼ (x1, x2, x3, …, xp) represents the input vector, wj ¼ (wj1, wj2, …., wjp) is the synaptic weight vector and sj is the weighted sum, a scalar product, also known as a linear combiner. The output of the linear combiner is then passed through the activation function, this practically being the “firing” part. If the function’s output surpasses the threshold, then the neuron fires, otherwise it is inhibited. The formula is:    hj , if sj + bj  Tj yj ¼ φ sj + bj ¼ , 0, sj + bj < Tj where yj is the output value, φ is the activation function, and bj is the bias. The bias is used for the increase or decrease of the input on the activation. It can be given as an internal parameter, the weight wj0 for an extra input 1. The NN works very well when we are dealing with linear separable classes (see Fig. 2.8), instead of non-linear separable classes (see Fig. 2.9).

2.1 Doctor’s suspicion. Doctor + artificial intelligence combo’s diagnosis

87

FIG. 2.8 Linear separable classes.

FIG. 2.9 Non-linear separable classes.

For more details regarding activation functions we refer the reader to Belciug and Gorunescu (2020). Here, we will describe the most commonly used ones. • Sigmoid activation function, given by: φðsÞ ¼

1 : 1 + exp ða  sÞ

88

2. The beginnings

1.0 0.8 0.6 0.4

0.2 0.0 –10.0

FIG. 2.10

–7.5

–5.0

–2.5

0.0

2.5

5.0

7.5

10.0

Sigmoid function plot.

For instance, from Fig. 2.10 we can see that the sigmoid takes a number from ℝ and turns into a value that belongs to the interval (0, 1). • Hyperbolic tangent activation function, given by: φðsÞ ¼ tanh ðsÞ ¼

es  es es + es

1.00 0.75 0.50 0.25 0.00 –0.25 –0.50 –0.75 –1.00 –3

FIG. 2.11

–2

–1

0

1

2

3

Hyperbolic tangent plot.

Somehow different from the sigmoid, the hyperbolic function takes a number from ℝ and turns into a value that belongs to the interval (1, 1). This is why in the last decade the hyperbolic tangent has been preferred over the sigmoid (Fig. 2.11). • Rectified linear unit (ReLU) activation function, given by: φðsÞ ¼ max ð0, sÞ:

2.1 Doctor’s suspicion. Doctor + artificial intelligence combo’s diagnosis

89

Even if the ReLU activation function seams rather easy, it is very powerful when dealing with images (Fig. 2.12). 10 8 6

4 2 0 –10.0

–7.5

–5.0

–2.5

0.0

2.5

5.0

7.5

10.0

FIG. 2.12 ReLU activation function plot.

Depending on how the neurons are interconnected, there are many types of NNs architectures. Even so, each NN has three basic elements: the input neuron units that contain the data that we need to process, the output neurons that contain the result computed by the network, and the hidden units that do the actual computation. The NN’s architecture describes how the neurons are organized in terms of number of units, number of layers, number of neurons per layer, the signal’s direction, etc. A type of NN is the one that has a feedforward structure, meaning the signal goes from the input layer through the hidden layers toward the output. This is forward propagation (Fig. 2.13).

FIG. 2.13 Feedforward neural network architecture. Forward propagation.

90

2. The beginnings

If in our NN architecture we have feedback loops, then the network is called a recurrent NN (RNN). Different from a feedforward network in a RNN we have two inputs: present and recent past (Fig. 2.14).

FIG. 2.14

Recurrent network.

We mentioned earlier that in the network it is important how the neurons are organized in layers. Thus we can divide the networks into three categories: • Single-layer feedforward NN: contains only an input and an output layer (see Fig. 2.15).

FIG. 2.15

Single layer feedforward network.

• Multi-layer feedforward NN (MLP): contains an input layer, an output layer, and multiple hidden layers. The data goes from left to right, the output of one layer being the input for the next layer (see Fig. 2.16). Deep learning is a special case of MLPs where we have many hidden layers (see Fig. 2.17).

2.1 Doctor’s suspicion. Doctor + artificial intelligence combo’s diagnosis

91

FIG. 2.16 Multi-layer perceptron.

FIG. 2.17 Deep neural network.

• Recurrent NN: contains at least one loop (see Fig. 2.14). So far, we have discussed two main aspects: the activation functions and the types of network’s architectures. In what follows we want to describe the actual process of computing the output. It is clear that the network by summing and applying a linear combiner returns a real value. The only question remains how that output is transformed so it can give the actual class (e.g. benign vs malignant, recurrent vs non-recurrent, or the cancer’s stage and grade). If it is a two-decision class problem, then the simplest way of determining the class is with the use to the logistic regression that takes a continuous value and transforms it into a binary one; if we have multiple decisions (>2 classes), then we use the Softmax function. In order to understand both methods we need some prerequisites. Imagine you start watching a TV series in the middle of the season; it is natural you do not understand the connections between

92

2. The beginnings

the characters, how they got themselves into that situation, etc. So, we shall begin at the very beginning with the multiple linear regression.

Multiple linear regression The multiple linear regression is used when we want to define one variable (output) using its dependent variables (predictive). Sir France Galton, English Victorian era statistician, invented the concepts of correlation and regression. Galton wrote in “On men of science, their nature and their nurture” the phrase “nature over nurture,” (Galton, 1874). A person becomes a genius because of its genes or due to its education, nurturing? Basically this is the principle behind correlation and any type of regression, whether it is linear, non-linear, or logistic. What factors influence the outcome? The French physicist Auguste Bravais first developed the correlation coefficient (Bravais, 1846). Independently, Galton saw that there is a connection between a person’s height and it’s forearm (Galton, 1888). Karl Pearson extended the simple linear regression to the multiple linear regression (Pearson, 1896). Pearson was Galton’s colleague. From a mathematical point of view, we have a dependent variable that can be expressed as the linear combination of other predictor variables, or covariates, using the following equation: Y ¼ b0 + b1 X1 + b2 X2 + ⋯ + bm Xm , where X1, X2, …, Xm represent the predictive variables, Y represents the outcome, b0 is the intercept, while b1, b2, …, bm are the regression coefficients. The multiple linear regression can give us wrong results if, before applying it, we do not make sure that the predictors are correlated with the outcome. In order to check this, we need to compute the Pearson’s correlation coefficient r, and the p-level. While r measures the connection’s strength, the p-level measures the strength’s significance. Let us suppose that we want to apply the simplest case of regression, the linear regression. In what follows we shall see how to compute the correlation coefficient between two objects from a data set, X and Y. The objects have features. Each item is a statistical series {xi}, i ¼ 1, …, n and {yi}, i ¼ 1, …, n that corresponds to the statistical variables X and Y. We compute r using the following equation: n X

ð xi  xÞ  ð yi  yÞ

i¼1

r ¼ sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi , n n X X ðxi  xÞ2  ð yi  yÞ 2 i¼1

i¼1

having x and y as the means. The correlation coefficient takes a value from the interval [1, 1]. The interpretation of the results is as follows: • if the value of r is close to 1, then the two variables have a strong positive correlation, that is if the value of one of them increases, the value of the other one will increase also;

2.1 Doctor’s suspicion. Doctor + artificial intelligence combo’s diagnosis

93

• if the value of r equals 0 it means that there is no correlation between the two variables, thus the two variables are linear independent; • if the value of r is close to 1, then the two variables have a strong negative correlation, that is if the value of one of them decreases, the value of the other one will decrease also. The correlation coefficient’s value does not provide us sufficient information to perform regression. We need to use the p-level, which is associated with computing r. Even if the value of r is close to 0, but the associated p-level is less than 0.05, then it means that even if the variables do not have a strong connection, their connection is significant. The other way around is when we have a high value of r, and the associated p-level is greater than 0.05, that means that even if the variables have a strong connection, their connection is not statistically significant. For example, let us suppose we have two friends. One of them we see and speak on daily basis; with the other one we don’t. One day something bad happens to us, and we give each of our friends a call. The first picks up, but does not help us; the second picks up and comes to the rescue. This is how p and r work together. The geometrical representation is the regression line. The regression line is plotted together with the residuals. The residuals represent the difference between what we have predicted and what is actually the truth. Our goal is to minimize the residuals, the errors, and find the line that fits best our data. In order to do so, we need to use the least squared error (LSE), which minimizes the difference between the ground truth and the predicted values. To plot the regression line, first we need to know the regression equation, which resumes in finding out what the values of the intercept and the regression coefficient are. Mathematically speaking, if we have the simple linear regression expressed by: y ¼ a + bx, then the regression coefficient can be computed using the following formula: n X



i¼1

ðxi  xÞðyi  yÞ

n X

: ð xi  xÞ

2

i¼1

The intercept is computed as follows: a ¼ y  b  x: Next, we shall present how to compute the regression line from a specific training set. As stated above, the regression line equation is computed using the LSE. The data from the training set fits the following equation: yi ¼ a + bxi + εi , 1  i  n, where: • xi and yi, 1  i  n, are known data from the training set; • εi represent the residuals.

94

2. The beginnings

We can write the above equation under a matrix form: 1 0 1 0 0 1 y1 1 x1 ε1 B : C B: : C   B : C C B C B B C B : C ¼ B : : C  a + B : C: C b B C B B C @ : A @: : A @ : A yn 1 xn εn

1 1 x1 B: : C C B C In order to proceed with the computation, we presume that B B : : C has a 2-rank determinant different from 0. Using the LSE we compute a and b: @ : : A 1 xn 00 10 0 111 0 10 0 1 y1 1 x1 1 x1 1 x1   BB : : C B : : CC B : : C B : C BB C B CC B C B C a C B CC B C B C B ¼B BB : : C  B : : CC  B : : C  B : C: b @@ : : A @ : : AA @ : : A @ : A yn 1 xn 1 xn 1 xn 0 10 0 1 1 x1 1 x1 B: : C B: : C B C B C C B C where B B : : C is the transpose matrix of B : : C: @: : A @: : A 1 xn 1 xn 0

Let us take the following example: we have 100 female patients that have undergone chemotherapy. We have measured their creatinine and hemoglobin levels. We want to know whether the two parameters are linearly dependent or not. The creatinine levels in women range from 0.5 to 1.2, and the hemoglobin from 5 to 18. We have the following values: 8 9 0:974, 0:689, 1:014, 1:173, 0:674, 0:903, 0:914, 0:900, 0:656, 1:166, 0:812, 1:092, 0:989, 0:708, 1:069, > > > > > > > > > > > > 0:777, 1:116, 0:906, 1:117, 0:984, 1:007, 0:850, 1:169, 0:950, 0:796, 0:924, 0:513, 0:711, 0:962, 0:703, > > > > > > > > > > > > 0:932, 0:800, 0:594, 0:708, 0:898, 0:913, 0:902, 0:957, 0:956, 0:801, 1:127, 0:757, 0:805, 1:124, 1:064, > > < = Creatinine ¼ 0:992, 0:570, 1:143, 0:999, 1:199, 0:604, 1:107, 0:613, 0:930, 0:586, 1:093, 1:065, 0:898, 0:785, > > > > > > > > 0:548, 0:988, 0:817, 1:005, 1:106, 1:182, 1:099, 0:508, 0:751, 1:010, 0:620, 0:864, 0:5389, > > > > > > > > > > 0:639, 0:512, 1:055, 0:656, 0:741, 1:149, 0:993, 0:522, 0:615, 0:935, 0:904, 0:666, 1:153, 0:929, 0:874, > > > > > > > > : ; 0:912, 1:011, 0:718, 0:778, 0:646, 0:630, 1:161, 1:017, 0:843, 0:659, 0:678, 0:540, 0:804 8 9 11:235, 10:763, 11:421, 11:700, 10:047, 10:777, 11:422, 11:155, 10:505, 12:397, 11:429, 11:494, > > > > > > > > > > > > 11:785, 11:652, 11:425, 11:781, 11:650, 11:631, 10:387, 11:229, 11:091, 11:670, 11:104, 11:939, > > > > > > > > > > > > 10:575, 11:726, 10:227, 10:348, 11:833, 10:840, 11:051, 10:613, 10:302, 10:152, 10:904, 11:165, > > > > > > > > > > 11:080, 11:335, 11:147, 10:992, 12:246, 10:389, 10:932, 11:505, 11:909, 11:374, 10:275, 11:614, > > > > < = Hemoglobin ¼ 11:144, 12:085, 10:169, 12:263, 10:606, 11:541, 10:663, 11:364, 11:747, 11:279, 11:317, 9:937, > > > > > > 11:205, 10:552, 11:032, 12:248, 12:218, 12:082, 9:806, 10:842, 11:096, 10:346, 11:571, 10:490, > > > > > > > > > > > > > > 10:258, 10:500, 11:398, 10:919, 11:166, 12:248, 11:609, 10:441, 10:138, 11:654, 11:330, > > > > > > > > > > > > 10:012, 11:809, 10:937, 11:606, 11:217, 11:530, 10:794, 10:704, 10:077, 10:713, 11:673, 11:564, > > > > > > : ; 10:754, 10:075, 10:896, 10:594, 11:373

95

2.1 Doctor’s suspicion. Doctor + artificial intelligence combo’s diagnosis

The correlation coefficient and the p-level are: r ¼ 0:876 p  level ¼ 0:000: The regression equation is: hemoglobin ¼ 8:677 + 2:806  creatinine Fig. 2.18 represents the regression line.

12.5

12.0

hemoglobin

11.5

11.0

10.5

10.0

0.5

0.6

0.7

0.8

0.9 creatinine

1.0

1.1

1.2

FIG. 2.18 Regression line.

We can see from this example that it might appear that the two variables are linearly dependent, but the fact that there exists a mathematical correlation between them, does not imply the fact that there exists a medical correlation. Both parameters might be influenced by the chemotherapy, and in this case a linear dependency may develop by chance. This is why it is so important that a data scientist works in tandem with a medical professional, and also to use multiple validation data sets. This brings us to the concept of “correlation does not always imply causation.” You may have heard this phrase before: “correlation does not always imply causation.” A nice example comes from history. In the Middle Ages, people strongly believed that lice are good for one’s health. Why would they believe such a thing? Because sick people did not have lice, and thus they drew the conclusion that if lice would leave a person’s body, then

96

2. The beginnings

that person would get sick. The truth is that lice are sensitive to high temperatures, so when a person develops a fever, then the lice find another host. So, in fact the sickness was the cause, not the effect. Returning to our multiple linear regression, instead of only one explanatory variable, here we have m variables. The multiple linear regression equation is: Y ¼ b0 + b1  X1 + b2  X2 + ⋯ + bm  Xm , where b1, b2, …, bm are the regression coefficients, and b0 is the intercept. Computing the intercept and regression coefficients for a multiple linear regression is quite a bit different from the simple linear regression. Technically, it is a generalization of the simple linear regression, but instead of obtaining a regression line, we obtain a regression hyperplan that represents the “backbone” of the “cloud” of variable points. To compute the regression coefficients and the intercept we will apply the least square method. Let us presume that our training data fits the following equations: yi ¼ b0 + b1  xi1 + b2  xi2 + … + bm  xim + εi , 1  i  n where: • xij, 1  j  m, 1  i  n, and yi, 1  i  n, are the explanatory variables from the training data set; • εi are unknown random variables (residuals). Writing the equations under a matrix form, gives us: 0 1 0 1 0 1 0 10 1 y1 b0 ε1 ε1 1 x11 ⋯ xm 1 B : C B : : ⋯ : C B : C B : CB : C B C B C B C B CB C B : C ¼ B : : ⋯ : C  B : C + B : C B : C: B C B C B C B CB C @ : A @ : : ⋯ : A @ : A @ : A@ : A yn bm εn εn 1 x1n ⋯ xm n Thus, using the LSE 0 1 00 b0 1 B : C BB : B C BB B : C ¼ BB : B C BB @ : A @@ : bm 1

we compute b: 10 0 x11 ⋯ xm 1 1 B: : ⋯ : C C B B : ⋯ : C C B: A @: : ⋯ : 1 m xn ⋯ xn 1

x11 : : : x1n

111 0 1 ⋯ xm 1 C B: ⋯ : C CC B C B ⋯ : C CC  B : A A @: ⋯ : m 1 ⋯ xn

x11 : : : x1n

10 0 1 y1 ⋯ xm 1 B : C ⋯ : C C B C B C ⋯ : C C  B : C: A @ : A ⋯ : m yn ⋯ xn

Before proceeding to doing the actual computation, the first step that needs to be taken is computing the correlation coefficients, and p-levels between each feature and the dependent variable. The results represent a correlation matrix. Each cell of the matrix represents the correlation between two variables. Let us see how the multiple linear regression works on the following example. Suppose we have 20 men who suffer from prostate cancer. The data set contains the following attributes: the prostate-specific antigen (PSA) test score, the nucleoli size, and the number of smoked cigarettes per day. The dependent variable is the survival rate in months. We want to see whether we can predict the survival rate using the other three variables. Table 2.1 contains the prostate cancer data set:

2.1 Doctor’s suspicion. Doctor + artificial intelligence combo’s diagnosis

97

TABLE 2.1 Prostate cancer data set. Survival rate (months)

PSA

Nucleoli size

No. smoked cigarettes

10

9

1.00

22

22

2

0.30

18

40

3

0.35

4

110

199

2.50

24

70

87

0.70

29

103

144

1.03

16

22

4

0.22

27

83

121

0.83

21

64

2

0.64

11

50

50

2.10

20

89

127

0.89

25

77

121

0.77

23

114

192

2.00

24

2

0.50

23

102

144

2.50

26

111

168

1.11

18

111

50

180

2.10

24

30

2

0.30

15

55

54

0.55

14

50

3

0.42

12

113

191

1.13

21

33

27

0.33

16

77

90

0.77

23

62

62

0.62

29

115

197

2.40

6

As we mentioned before the first step that we need to perform is to compute the correlation coefficient between the dependent variable and each explanatory variable along with the corresponding p-level. Fig. 2.19 represents the correlation matrix heat map, whereas Table 2.2 contains the p-level.

98

2. The beginnings

FIG. 2.19

Correlation matrix heat map.

TABLE 2.2 p-Level for prostate cancer. Variables

p-Level

Survival (months)/PSA

0.000

Survival (months)/nucleoli size

0.000

Survival (months)/no. smoked cigarettes

0.540

PSA

Survival (months)

From Fig. 2.19 and Table 2.2 we can depict that only the first two explanatory variables, the PSA test score and nucleoli size are significantly and strongly correlated with the dependent variable: the correlation coefficient between the survival and PSA is 0.94 with the p-level equaling 0.000, and the correlation coefficient between the survival and nucleoli size is 0.68 with the p-level equaling 0.000. In the case of the number of smoked cigarettes we see that the correlation coefficient is 0.12 and the significance level 0.540, implying a weak and insignificant connection between the two variables. Another way to identify the correlation between data is by using the scatter correlation plot (Fig. 2.20). 100

50

200

100

No. cigarettes

Nucleoli

0 2

1

20

10

50

100

Survival (months)

FIG. 2.20

0

100

PSA

Scatter correlation matrix for prostate cancer.

200

1

Nucleoli

2

10

20

No. cigarettes

2.1 Doctor’s suspicion. Doctor + artificial intelligence combo’s diagnosis

99

Taking into account the information obtained from the correlation matrix and associated p-level, we decide that we shall use only the PSA test score and nucleoli size in our computation. Applying the theory from above we will get the following regression equation: Survival ðmonthsÞ ¼ 0:418 + 5:467  PSA + 1:250  Nucleoli size Next, we shall present other fitting methods for the linear models that replace the LSE when this classical procedure might fail. In general, if the number of observations is greater than the number of predictors, we can use the LSE without any issues. On the other hand, if we have the number of predictors greater than the number of observations (e.g. DNA gene arrays to predict cancer), the variance is too large, and LSE can provide the wrong answer. One should keep in mind the fact that LSE finds the best and unbiased coefficients. By unbiased coefficient we understand that the LSE does not consider an independent variable more important than another. As we have seen, the classical regression returns only one set of regression coefficients, the ones that fit the data best. But what happens if some variables are more important than others? By stating that some variables are more important than others, we do not mean that we should remove those variables, but rather give more “power” in the regression equation to the ones that are more important. This implies finding biased coefficients. An unbiased mode will try to find the relationship between the features and the outcome. This procedure will fit almost perfect the observations in order to minimize the LSE. One problem arises at this point: overfitting. Avoiding overfitting, the biased model treats each feature differently taking into account its importance. We will present an alternative to the unbiased model, the ridge regression. Ridge regression is a shrinking method. It uses a shrinkage estimator also known as ridge estimator. Ridge regression uses L2 regularization. The term regularization describes a method that can be performed in order to avoid overfitting. Regularization reduces the parameters, thus shrinks the model. The L2 regularization adds penalties to the model. These L2 penalties equal the square of the magnitude of coefficients. None of the coefficients are eliminated, just shrunk. The ridge regression uses a new parameter named lambda (λ). By tuning the lambda parameter we change the regression coefficients. If we set the value of λ to 0 then the ridge regression is the same as LSE. If the value of λ converges to ∞, all the regression coefficients converge to 0. Mathematically speaking, recall in the LSE regression the estimated coefficients are computed using the following formula: B ¼ ðX0  XÞ

1

 X0  Y:

In ridge regression, we add the new shrinking parameter λ multiplied to the identity matrix forming a new matrix (X 0  X + λ  I). Thus, the new formula for computing the estimated regression coefficients is: 1 0  X0  Y: Bd ridge ¼ ðX  X + λ  I Þ

The hardest part of using the ridge regression is setting the value of λ. For further reading regarding the choice of λ we refer the reader to Dorugade and Kashid (2010) and van Wieringen (2015).

100

2. The beginnings

Another type of regression that uses shrinkage is the least absolute shrinkage and selection operator, or lasso regression. We mentioned above that the ridge regression uses L2 regularization. Lasso regression uses L1 regularization, which adds a penalty that is equal to the absolute value of the magnitude of coefficients. The penalty in L1 regularization can lead to the elimination of coefficients, unlike the L2 regularization. Mathematically speaking, in lasso regression we need to minimize: 0 12 n X X X @ yi  xij  bj A + λ  |bj |: i¼1

j

j

Again, the hard part of using lasso regression is choosing λ. Depending on its value we can encounter the following situations: • if λ ¼ 0 then we do not eliminate any parameter. Lasso regression is LSE regression. • if λ ! ∞ then all coefficients are eliminated. As λ increases, more and more coefficients are set to 0, thus eliminated, also increasing the bias. As λ increases, the variance increases. With all this new information in mind, we can now proceed on explaining how the logistic regression works.

Logistic regression Logistic regression is used when the dependent variable is binary. Mathematically speaking, the logistic regression takes into account the explanatory variables and computes the probability of a certain output. The transformation of the dependent variable into a binary response is called logit. Logit takes a parameter, p, which is the proportion of objects that have the same feature. In cancer research we can see p as the probability of a tumor to be malignant or benign, or the response of a patient to a certain cancer treatment, or the recurrence of cancer. The logit( p) transforms the dependent variable into a binary one as it follows:   p logitðpÞ ¼ ln ¼ b0 + b1  X1 + b2  X2 + ⋯ + bm  Xm : 1p Let us say that: b0 + b1  X1 + b2  X2 + ⋯ + bm  Xm ¼ α Thus, the probability p is computed using the following formula: p¼

1 : 1 + eα

The computation of the intercept and regression coefficients resemble to the multiple linear regression process, in the cases where we are dealing with categorical features. We have the following matrix form of the model: logitðpÞ ¼ X  b + ε,

101

2.1 Doctor’s suspicion. Doctor + artificial intelligence combo’s diagnosis

where:

p1 1 B 1  p1 C C B : C B C B C: B : logitðpÞ ¼ B C C B : C B @ pn A ln 1  pn 0

ln

Thus, using the LSE we compute b: 0

1

00

b0 1 B : C BB : B C BB B : C ¼ BB : B C BB @ : A @@ : bm 1

x11 : : : x1n

1 0 m 0

1 x11 ⋯ x1 C B ⋯ : C B: : B ⋯ : C C B: : A @: : ⋯ : m 1 x1n ⋯ xn

111 0 m

⋯ x1 C ⋯ : C CC C ⋯ : CC C ⋯ : AA ⋯ xm n

1 B: B B B: @: 1

x11 : : : x1n

1 m 0

⋯ x1 ⋯ : C C ⋯ : C C ⋯ : A ⋯ xm n

p1 1 B 1  p1 C C B : C B C B C: B : B C C B : C B @ pn A ln 1  pn 0

ln

In what follows, we shall present an example to see how the logistic regression can be applied in cancer research when we have categorical explanatory variables. When dealing with numerical data the computation is hard, and it is best to use specialized programs to compute the regression coefficients. We have a statistics containing 450 patients, from which 100 patients have lung cancer. The risk factors associated with pulmonary cancer are tobacco, alcohol, and age. We are going to code each variable with a binary value: smoker ¼ 1, non-smoker ¼ 0, drinks a lot of alcohol ¼ 1, does not drink a lot of alcohol ¼ 0, above 60 years old ¼ 1, below 60 years old ¼ 0. We are interested in finding out what is the risk of a certain individual to develop lung cancer, if we are taking into account these three factors and the training data contained in Table 2.3. TABLE 2.3 Lung cancer risk factors data set. Tobacco

Alcohol

Age

No. subjects

No. lung cancer cases

0

0

0

75

5 (6%)

1

0

0

17

2 (12%)

0

1

0

8

1 (12%)

0

0

1

190

37 (19%)

1

1

0

2

0 (0%)

1

0

1

58

17 (29%)

0

1

1

53

10 (18%)

1

1

1

47

17 (36%)

Having the above table we will build a new table that contains all the possible situations for individuals with or without lung cancer. For each case we will consider all the combinations of the explanatory variables (Table 2.4).

102

2. The beginnings

TABLE 2.4 Working table for lung cancer risk factors data set. Tobacco

Alcohol

Age

No. subjects

Cancer

0

0

0

5

1

0

0

0

70

0

1

0

0

2

1

1

0

0

15

0

0

1

0

1

1

0

1

0

7

0

0

0

1

37

1

0

0

1

153

0

1

1

0

0

1

1

1

0

2

0

1

0

1

17

1

1

0

1

41

0

0

1

1

10

1

0

1

1

43

0

1

1

1

17

1

1

1

1

30

0

Following the theory, we compute the following logistic regression equation for this example: logitðpÞ ¼ 2:563 + 0:926  tobacco + 0:736  alcohol + 0:344  age If we were to use this data set to answer the following question: “I am afraid I might have cancer. I am a smoker, I do not drink, and I am 45 years old. What are the odds that I might have cancer?,”, we would have to do the following computation: logitðpÞ ¼ 2:563 + 0:926  1 + 0:736  0 + 0:344  0 thus, we have: logitðpÞ ¼ 1:637: To obtain the actual probability of that person having lung cancer, we compute: p¼

1 ¼ 0:16: 1 + e1:637

So, our answer would be: if you are a smoker, you do not drink, and you are 45 years old, you’re chance of having lung cancer is 16%.

2.1 Doctor’s suspicion. Doctor + artificial intelligence combo’s diagnosis

103

Another important aspect of the logistic regression is that we can use it as a prognostic index. For instance, if we wish to compute the risk of developing lung cancer for people below 60 years old and above 60 years old, we should undertake the following steps: • Compute logit(pAge < 60) and logit(pAge > 60); • Subtract logit(pAge > 60)  logit(pAge < 60), and find our result.   logit pAge < 60 ¼ 2:563 + 0:926  1 + 0:736  1 + 0:344  0 ¼ 0:901   logit pAge > 60 ¼ 2:563 + 0:926  1 + 0:736  1 + 0:344  1 ¼ 0:557     logit pAge > 60  logit pAge < 60 ¼ 0:557 + 0:901 ¼ 0:344 Thus, the risk of having lung cancer due to age is e0.344 ¼ 1.4105, meaning the risk of having lung cancer after the age of 60 is 1.41 higher than under the age of 60. In the logistic regression only the dependent variable must be categorical, the explanatory variables can be either categorical or numerical.

Softmax classifier The softmax function is an activation function that takes as inputs numbers and transforms them into probabilities. An important aspect is that the sum of the probabilities must be 1. In other words, the softmax function returns an array that contains the probability distributions of the potential outcomes. An important assumption is that the class labels are independent, in other words an item belongs only to one certain class. The softmax function formula is: eyi si ðyÞ ¼ X y : ej j In order to use the softmax classifier, first we need to one hot encode the classes. One hot encoding transforms the class label into an array of zeros and a one. Technically all the probability mass is on the right decision class. Let us presume that we have q ¼ 5 classes corresponding to cancer grading: 0—cancer free, 1—stage I cancer, 2—stage II cancer, 3— stage III cancer, and 4—stage IV cancer. After applying the one hot encoding procedure our data will look like the format below: • • • • •

class class class class class

0: 1: 2: 3: 4:

y1 ! [1, 0, 0, 0, 0]; y1 ! [0, 1, 0, 0, 0]; y1 ! [0, 0, 1, 0, 0]; y1 ! [0, 0, 0, 1, 0]; y1 ! [0, 0, 0, 0, 1];

The softmax classifier gives us the probability of an item to belong to each class. Following the winner-takes-all rule, the neural network will assign the class that has the highest probability. For example, from Fig. 2.21 we can see that for the output scores of 13.2, 9.8, 11.4, and 12.4, the resulting probabilities are 0.605 for class 0, 0.020 for class 1, 0.002 for class 2,

104

2. The beginnings

0.100 for class 3, and 0.272 for class 4. Applying the winner-takes-all rule, the network’s output is 0.605, thus the patient belongs to class 0, implying she/he is cancer free.

FIG. 2.21

Example of softmax function.

Being an exponential activation function, softmax amplifies differences, by pushing one result toward 1, and the others toward 0. If we plot both the softmax and ReLU functions we can see that the softmax function is basically an approximation of the max function (see Fig. 2.22). Softmax softness the max function, hence the name.

FIG. 2.22

Softmax and ReLU plot.

2.1 Doctor’s suspicion. Doctor + artificial intelligence combo’s diagnosis

105

Learning paradigms Besides the architecture, an important aspect is the way the network learns from the training set, also known as the learning paradigm. A neural network learns by tuning its parameters, just the way an old radio was tuned manually, making small adjustments to the weights’ values. There are several types of learning, and choosing one of them depends on the problem we need to solve: • we have a target (value or class) we want to predict. For example, we want to establish whether a mass is a malignant or benign tumor from different input images (e.g. histological). Then our model will be trained based on historical data, previous labeled images, which we are going to use to classify future tumors. Henceforward, our model is supervised; the neural network knows what it needs to learn. • we have unlabeled data and we are searching for different patterns. For instance, we want to evaluate certain molecular events for a certain type of cancer and to establish the risk of progression. Unsupervised models will discriminate automatically different data. • we want to achieve a specific objective. For example, we want to find the right combinations of cytostatic drugs for a tailored cancer treatment. For this, we must follow rules, protocols. Once we know the rules, we can use reinforcement learning models to find the best strategy in accomplishing our mission. When using supervised learning or learning with a “teacher” we “feed” the AI algorithm with the output also, not just only the input. Knowing the ground truth the algorithm starts to learn what it needs to recognize from the raw data (Fig. 2.23). The remaining task for the AI method is to find a way to reach that output from the given input. The algorithm learns from a training data set that guides it toward the correct answer. If the estimated output is different from the ground truth, then the learning paradigm and the training data set play their part and redirect

FIG. 2.23 Supervised learning.

106

2. The beginnings

the algorithm toward the right path. When using supervised learning we have the following steps to undertake: • the training set must contain the value we want to predict (ground truth output) together with the predictors (input data); • the AI model (e.g. neural network) uses the training set to establish a connection between the inputs and outputs. Our scope is to create a model that can generalize well the training data so that we can use it on new data with high accuracy. Unsupervised learning or learning without a “teacher” does not receive the expected output together with the training set (Fig. 2.24). Unsupervised learning is not used as much as the supervised one. Using this type of learning the AI algorithm tries to resolve a problem blinded. Obviously the process is much harder. The algorithm tries to find matching patterns in data in order to group it.

FIG. 2.24

Unsupervised learning.

Reinforcement learning has its roots in unsupervised learning. Reinforcement learning lets the machine decide its next move after the evaluation of the current state of the result. Reinforcement learning is somehow similar to training a dog. If the dog performs “sit” or “leave it” correctly you give it a piece of chicken meat, in other words a reward. If the dog fails at performing these tasks you don’t reward it. In the concept of reinforcement learning there is a reward function that lets the algorithm know that it is on the right path to resolving the problem at hand. In what follows we shall present different learning paradigms for neural networks. We shall begin, with one of the most used training algorithms: backpropagation.

The backpropagation algorithm The backpropagation algorithm is used for training multi-layered perceptrons (MLP) and convolutional neural networks (CNN). The backpropagation algorithm is supervised. The underlying idea behind this algorithm is that it computes the error between the ground truth and the networks’ output, and afterwards it propagates it back through the network in order

2.1 Doctor’s suspicion. Doctor + artificial intelligence combo’s diagnosis

107

to change the weights. The difference between the true label and the predicted label represents the loss (error) function of the algorithm. Our goal is to minimize the loss function through a process called optimization. In the case of the backpropagation, the algorithm tries to find the right set of weights that minimize the loss function (L). Using mathematics, we can write down the formula for the L as it follows: 1X L¼ jyi  di j2 , n i where yi is the output computed by the neural network, while di is the ground truth. To minimize the error or loss, we need to know how to tune the parameter weights. If we would proceed blindly it would take forever, due to the enormous range of possibilities. Luckily enough, we have the backpropagation algorithm that can give us the correct direction through a mathematical procedure called the gradient descent. The gradient descent assures us that it will produce the direction of the abrupt descent. Another very important parameter must be introduced at this time: the learning rate. You can imagine the learning rate as the size of the steps you might take while hiking. It is a very well known comparison between the gradient descent and a hiker’s trip back home to safety. The story goes like this: you are hiker and you got lost in the woods on a mountain’s top. You have no reception on your smartphone, no map, no compass, and the night sets in. Your only hope is that if you try to descend carefully you might find a village at the bottom. The direction of your boot on the ground represents the slope. Obviously, you will follow the descending path that seems to be the sharpest. The only issue that remains to be solved is how big are the steps that you are going to take. If you take really small steps you might not get down the mountain in a reasonable amount of time. If you take too large steps you might miss the village and get onto the other side of the mountain (Fig. 2.25). The learning rate is one of the most crucial hyperparameter of a neural network. In what follows we shall present the effect of the learning rate and different ways you can set its value.

FIG. 2.25 The gradient descent.

108

2. The beginnings

Most of the time, the learning rate has its value between 0.0 and 1.0. Through the learning rate we control how fast the model adapts to the problem at hand. By using a small valued learning rate, we slowly update the weights, making “baby steps” at each epoch. Using a large valued learning rate, we have fewer epochs of training by updating very fast the weights. Figs. 2.26 and 2.27 show a slow vs a fast gradient.

FIG. 2.26

Slow gradient. Low learning rate.

FIG. 2.27

Fast gradient. High learning rate.

Regarding setting the learning rate, there are a few things that we should keep in mind. First, the training process should start with a reasonably high learning rate, because in the beginning the weights are chosen randomly. As we pass each epoch, we can decrease the value of the learning rate and allow a slower updating process of the weights. A naive approach for setting the learning rate would be to start with a large value close to 0.1, and then to try lower values: 0.01, 0.001, etc. Please keep in mind, that when the training starts with a high valued learning rate, the loss doesn’t improve, and it may even grow the first epochs of

2.1 Doctor’s suspicion. Doctor + artificial intelligence combo’s diagnosis

109

training. On the other, if we start the training with a low valued learning rate, then after a few epochs of training the loss will start to decrease. A smarter way of setting the learning rate is described in Smith (2015). At every batch we record the learning rate as well as the training loss. After that, we plot both of them in the same graph. We choose the value of the learning rate, which is found on the graph at the point where the loss has the fastest decrease (Fig. 2.28).

FIG. 2.28 Plot loss and learning rate.

The gradient is an array of slopes (derivatives). The loss of a network is computed through a continuous and differentiable function of the weights. Hence, to compute the vanilla gradient, the standard gradient, we need the following mathematical formula:   ∂L ∂L rL ¼ , …, : ∂w1 ∂wp Each synaptic weight of the network will be updated taking into account the learning rate η and the increment, where L is the error function, and Δwi ¼ wi+1  wi: Δwi ¼ η

∂L : ∂wi

Before moving onto other types of neural networks, we would like to discuss a type of neural network that is derived from the MLP, the partially connected neural network (PCNN). The philosophy behind this model mimics the human brain when only certain neurons are excited by a signal, and only those participate in processing it (Belciug and El Darzi, 2010). For the artificial brain the model proposes a mechanism for selecting which neurons are excited and which neurons are inhibited. The PCNN is a MLP that has a part of its synaptic connections deactivated. Using the backpropagation algorithm we monitor the weights: if after a certain number of training samples the weights do not have a major modification made by the backpropagation algorithm, then those weights are inhibited, having their value set to 0.

110

2. The beginnings

The number of training samples and the threshold that measure the modifications are problem dependent. Moving forward, let us discuss another training paradigm for the MLP, the evolutionary computation approach.

Evolutionary computation learning paradigm Evolutionary computation is an optimization method. It is easy to understand and use, and that is why it makes a perfect learning paradigm for neural networks whether they are MLP, probabilistic neural networks (PNN), extreme learning machine, etc. The first time a neural network was trained using evolutionary computation was in 1975 (Holland, 1975). Genetic algorithms are a class of evolutionary computation. They can be used to simulate behaviors, environments, etc. Genetic algorithms are metaheuristic techniques that are inspired from the natural selection mechanism. The natural selection principle was described by Charles Darwin in the famous book “On the Origin of Species by Means of Natural Selection, or Preservation of Favored Races in the Struggle for Life” (Darwin, 1859). Besides Darwin’s theory, it was Mendel’s genetics ideas that were also at the root of genetic algorithms. In his groundbreaking study, “Versuche uber Pflanzenhybride/Research in plant hybrids,” 1865, Mendel showed that hereditary factors that have been passed on from parents to children have a discrete nature. Genetic algorithms generate good optimization solutions through bio-inspired operators: crossover, mutation, and selection. As we have noticed, in what regards neural networks, the greatest problem we are facing is the setting of the hyperparameters values. Since genetic algorithms solve optimization problems, the connection between the two AI strategies might come in handy. Henceforth, by using genetic algorithms we can: • • • •

set the values of the synaptic weights; interpret the behavior of neural networks; learn the topology of a neural network; select the items that will form the training data set, etc.

Even if in state-of-the-art literature we can find many ways of evolving neural networks with the use of genetic algorithms, in this book we are going to focus on how we can set the weights in the network’s architecture. Until now we saw that one way of finding the best weights for a neural network is through the backpropagation algorithm, but this method has some disadvantages: it might get trapped in local optima, and also it needs high level knowledge of differentiability for the gradient’s computation. On the other hand, genetic algorithms are very easy to understand and implement. Even so, genetic algorithms do not represent the “gold” solution, since the permutation problem (Radcliffe, 1990) states that the evolved structures can be disrupted due to the recombination operator. This problem has been reported under the name of competing conventions problem (Schaffer et al., 1992), structural-functional mapping problem (Whitley et al., 1990), and isomorphism problem (Hancock, 1992). The permutation problem states that it does not make sense to recombine two individuals that are dissimilar from genetic point of view, or how Watson and Pollack stated in Watson and Pollack (2000): “parents selected from different fitness peaks are likely to produce an offspring that lands in the valley in between.”

2.1 Doctor’s suspicion. Doctor + artificial intelligence combo’s diagnosis

111

Briefly, the process of natural selection consists in the selection of the fittest individuals from a given population. The current population produces through crossover offspring that inherit characteristics from their parents, and are added to the next generation. The individuals that have better fitness have a better chance at surviving. This process is repeated for a number of generations, and in the end the generation that has the fittest individuals will be discovered. Genetic algorithms start with a set of individuals randomly generated called population. Each individual represents the solution to the problem that needs to be solved. The individual is called a chromosome, and it is made out of genes. In Fig. 2.29, we have the binary representation of a gene, chromosome, and population. This representation is formed by an array of bits and it is used in solving Boolean decision problems. Another types of representation are: the integer representation, where all the genes are integers (i.e. images RGB values, categorical data, etc.); and the real-coded representation, where all the genes are floating point values (e.g. medical analysis, etc.). Fig. 2.30 presents all the three representation types.

FIG. 2.29 Population, chromosome, gene.

FIG. 2.30 Chromosome representation: binary, integer, real-coded.

At first the population is initialized, by the creation of randomly generated arrays of chromosomes. The population size is chosen by the user and ranges from hundreds to thousands chromosomes. Another concept is the fitness function, f(xi), i ¼ 1, …. , n, or the ability of an individual to compete with the other individuals in the created environment. The higher the fitness score,

112

2. The beginnings

the higher the chance for that individual to be selected for reproduction, or for being part of the next generation. The fitness function is nonnegative, and the overall performance of a population consisting of n individuals is given by: F¼

n X

f ðxi Þ:

i¼1

The idea behind the selection operator is to select the fittest individuals to either replenish the next generation, or to pass their genes to the next generation. Only the best individuals are let to produce offspring. This process permits the population to evolve generation by generation. In what follows, we shall present some selection methods for reproduction. More details can be found in Blickle and Thiele (1995) and Jebari and Madiafi (2013): • Tournament selection: in this type of selection the name speaks for itself, implying a tournament, a contest between the individuals. The first step in this selection is to choose the number of competitors, k. These k competitors compare their fitness’ score, and the one that has the better score is chosen. Having set a number of m parents, the process repeats until this number is reached. Using the tournament selection we can preserve diversity among the individuals, since each one of them has an equal chance of competing. A down site of using it is the fact that it can lead to a decrease in the convergence speed (Razali and Geraghty, 2011). We can compute the probability of a chromosome to be selected using the following formula: 8 k1 > < Cn1 , if i½1, n  k  1, Ckn PðiÞ ¼ > : 0, if i½n  k, n: • Roulette wheel selection or fitness proportional selection: this type of selection is proportionate to the fitness of each chromosome. Each chromosome occupies a slot on the wheel directly proportional to its weight. The higher the fitness value, the higher the probability of being selected. The fitness proportional selection chooses the solutions by repeated random samples of the population. The probability of an individual to be selected can be computed using the following formula: Pð i Þ ¼

f ði Þ : n X f ð jÞ j¼1

where f(i) is the fitness of chromosome i. • Ranking selection: in this type of selection each chromosome receives a rank according to its fitness score. When dealing with the ranking selection we need to keep in mind that the differences between the fitness score don’t matter, only the ranks. So, an individual can have a fitness score equal to 100 placing it on the first place, followed by a second individual with the fitness score of 99 that is the second best, which is followed by a third individual with the fitness score of 10. Even if the fitness score of the third individual is

2.1 Doctor’s suspicion. Doctor + artificial intelligence combo’s diagnosis

113

10 times lower than the other’s scores, there is only a slight difference between them in terms of ranking. The probability of chromosome to be selected is given by: PðiÞ ¼

rankðiÞ : n  ð n  1Þ

• Stochastic universal sampling selection is a variant of the fitness proportionate selection. Stochastic universal sampling uses a single random value to sample the solutions. By creating evenly spaced intervals, the weaker chromosomes have a better chance to be selected. • Exponential rank selection: this type of selection is similar to the rank selection, the only difference being that the probabilities of the ranked individuals are exponentially weighted. A parameter c, 0 < c < 1 is used for the ranking. c is the base of the exponent, so the closer the value of c is to 0, the lower is the “exponentiality.” The probability of an individual of being selected is given by the following formula: cNi PðiÞ ¼ XN : Nj c j¼1 • Truncation selection: mimics the truncation selection used in animal breeding. For example a dog breeder will rank the dogs on some trait such as herding, protection, hunting, etc. and the top percentage is reproduced. In our case, the chromosomes are ordered by fitness, and a certain proportion p of the fittest individuals are selected and reproduced 1p times. The truncation selection is not often used in practice. • Bolzmann selection: in this type of selection a continuously varying temperature controls the rate of selection. At first, the temperature is high, making the selection pressure low. As the temperature start to lower, the pressure increases. The result is a narrow search space plus a diverse population. In what regards the survival selection we must keep in mind that the environment has limited resources, so in each generation we will have the same number of individuals in a population. Different from the parent selection, which is classically a stochastic process, the survival selection is in some cases deterministic. Remaining a constant number, we are going to need a mechanism to select the best chromosomes that will form the next generation from the individuals that are selected to be parents, the offspring, and the non-parents of the current generation. There are two approaches to the survival selections: either we keep the best individuals based only on their performance, or we take into account the age concept. For further details we refer the reader to Eiben and Smith (2003). • Fitness based survival selection process: is the same as the parent selection. We can choose one of the methods presented in that section to form the next generation’s population from the parents, offspring, and non-parents of the current generation. • Age-based selection process: in this type of selection we do not measure the individuals based on their fitness score. It is assumed that each individual can survive in the system only for a certain period of time. If the number of offspring equals the number of parents, the first will

114

2. The beginnings

replace the latter. If the numbers are not equal, a parent can be chosen randomly for replacement. We have arrived to the point in our genetic algorithm’s journey, when we will present the two variation operators: crossover or recombination, and mutation. Both operators have the same scope: to create new individuals having as base the individuals that already exist in the current population. Crossover starts with the parent selection. A pair of parents creates offspring. The mutation can be only applied on the resulting offspring, or on the whole population. Through these operators we create diversity among the individuals.

Crossover or recombination By applying the crossover operator on two previously selected parents, we obtain one or two offspring. Recall that the fittest parents were selected in order to pass onto their offspring the good traits that they might have. It is likely that good genes will be inherited by the offspring, and thus their fitness score to be at least as good as their parents’ score. In the crossover process a new parameter is used, the recombination probability, pc  [0, 1]. After two parents are selected for crossover, a number is randomly generated between 0 and 1. If that number is lower than the recombination probability, then the crossover process starts. If that number is higher than the recombination rate, then the process is an asexually crossover, meaning that the offspring will be the exact copy of their parents, clones. In practice we use the following recombination scheme: p parents produce q offspring, with p ¼ q ¼ 2. Having different representations for the chromosomes implies the fact that we need to have different crossover processes directly related to the chromosome’s representation. In what follows, let us discover together some of the most used recombination techniques. • Binary representation crossover operators: we shall use the scheme p ¼ q ¼ 2:  one-point crossover: having m the number of genes in a chromosome, we generate randomly a number k ¼ {1, 2, …, m  1}. Splitting the two parents at point k, and exchanging the genes of the corresponding segments will create the two offspring. A visual representation is presented in Fig. 2.31.

FIG. 2.31

One-point crossover example for binary representation.

 n-point crossover: having m the number of genes in a chromosome, we randomly generate multiple points that split the parent chromosome’s into segments. Exchanging these segments we will create the two offspring. A visual representation is presented in Fig. 2.32.

2.1 Doctor’s suspicion. Doctor + artificial intelligence combo’s diagnosis

115

FIG. 2.32 n-Point crossover example for binary representation.

 uniform crossover: having m genes, we generate an array of length m that contains uniformly distributed values ranging from 0 to 1. The offspring are created using another threshold p (frequently p ¼ 0.5): for offspring 1, if the values from the interval surpass the value of p, then those genes are inherited from parent 2, otherwise from parent 1; for offspring 2 the procedure is vice versa. See Fig. 2.33 for details.

FIG. 2.33 Uniform crossover example for binary representation.

• Integer representation crossover operators: we shall use the scheme p ¼ q ¼ 2:  one-point crossover: having m the number of genes in a chromosome, we generate randomly a number k ¼ {1, 2, …, m  1}. Splitting the two parents at point k, and exchanging the genes of the corresponding segments will create the two offspring. A special attention is needed so that a gene does not repeat itself. If a gene appears in the first segment of the first parent, and the same value appears in the second segment of the second parent, then we copy the different genes until the second segment ends, and afterwards we continue with the first segment of the second parent. A visual representation is presented in Fig. 2.34.

FIG. 2.34 One-point crossover example for binary representation.

116

2. The beginnings

 n-point crossover: having m the number of genes in a chromosome, we randomly generate multiple points that split the parent chromosome’s into segments. Exchanging these segments we will create the two offspring. Keep in mind that the genes should not repeat themselves just like in the one-point crossover case.  uniform crossover: is exactly the same as the uniform crossover for the binary representation. • Real-coded representation crossover operators: are the one-point crossover, n-point crossover, uniform crossover, or the following:  arithmetic recombination: there are three types of arithmetic recombination: simple arithmetic recombination, single arithmetic recombination, and total arithmetic recombination. We shall present all of these methods applied on the following configuration: we have two parents x and y, the offspring z1 and z2, and the recombination parameter α. The user sets the value of α, and it takes values from the interval [0, 1]. In our examples we shall use α ¼ 0.3. ▪ simple arithmetic crossover: having m genes, we generate randomly a number k ¼ {1,2, …, m  1}. The first offspring is created by copying the first k genes from the first parent, while the remaining genes will be computed using the following formula: z1 ¼ α  xi + (1  α)  yi, i ¼ k + 1, …, m. The second offspring is created vice versa. For a visual description see Fig. 2.35.

FIG. 2.35

Simple arithmetic crossovers.

▪ single arithmetic crossover: having m genes, we generate randomly a number k ¼ {1,2, …, m  1}. The first offspring is created by copying all the genes from the first parent, except gene k, which is computed using the following formula: zk ¼ α  xk + (1  α)  yk. The second offspring is created vice versa. For a visual description see Fig. 2.36.

FIG. 2.36

Single arithmetic crossover example.

2.1 Doctor’s suspicion. Doctor + artificial intelligence combo’s diagnosis

117

▪ total arithmetic crossover: all the m genes are computed using the following formula: zi ¼ α  xi + (1  α)  yi. This type of recombination is the most frequently used method. For a visual description see Fig. 2.37.

FIG. 2.37 Total arithmetic example.

 blend crossover or BLX-α crossover: in this type of crossover, we create the offspring by a uniform random generation of values in the following range [chromosomemin  I  α; chromosomemax + I  α], where α is a number between 0 and 1, and chromosomemin ¼ (min{xi, yi}, i ¼ 1, …, n), chromosomemax ¼ (max{xi, yi}, i ¼ 1, …, n), and I ¼ chromosomemax  chromosomemin. More details regarding the blend crossover can be found in Eshelman and Schaffer (1993).  linear BGA crossover: in this type of crossover we use the fitness score of each parent. For example, if the fitness score of chromosome x is better than the fitness score of chromosome y, then by using the following formula we create the two offspring: zi ¼ xi  ri  γ  Λ, where ¼

Pm

k¼0α  2

k

ðx y Þ

, Λ ¼ kxii yii k, and α is chosen by the user from the [0, 1] range.

 Wright’s heuristic crossover: in this type of crossover we also use the fitness score of each parent. For example, if the fitness score of chromosome x is better than the fitness score of chromosome y, then we can produce the two offspring using the following formula: zki ¼ u  ðxi  yi Þ + xi , with k ¼ 1, 2 and u is a uniform random generated number. For more details on the Wright’s heuristic crossover we refer the reader to Wright (1991). Another recombination method is the multi-parent crossover, which is an extension of the crossover methods presented above, the only difference being the fact that in this type of recombination more than two parents are used in order to produce the offspring. For more details regarding this type of crossover we refer the reader to Eiben and Smith (2003) and Eiben (2003).

118

2. The beginnings

Mutation Since this entire book regards cancer, we can make a parallel between the mutation of a gene in genetic algorithms and the mutation of a cancer cell. A mutation is the alteration of the genetic material of a cell. Mutation can be permanent or not. The only difference between the two types of mutation, cancer and genetic algorithms is that in the artificial intelligence field by using the unary mutation operator, our aim is to improve the quality of an individual. Mutation is a stochastic operator. It is done using different randomly generated numbers. If we would only use the recombination operator, we might end up with a uniform population. Mutation helps improve the diversity among individuals, speeding up the evolution. Using a parameter, called the mutation probability, we can alter the original individual by changing only one gene, more genes, or even the entire chromosome. In what follows, we shall present different types of mutation taking into account the chromosome’s representation. • mutation for binary chromosome representation also known as bitwise mutation: having set a certain value for the mutation probability, pm, for each gene we randomly generate a number between 0 and 1; if that number is less than the pm’s value, then we apply the mutation to that gene, transforming its value into the opposed one (e.g. if the gene’s value is 0 we set it to 1, and to 0 otherwise); if that number is greater than the pm’s value, then we do not apply the mutation. • mutation for integer chromosome representation: two methods can be applied when we are dealing with integer representation:  random setting: this type of mutation can only be applied if the genes code cardinal attributes. The process is an extension of the bitwise mutation. For each gene, we generate a number between 0 and 1; if that number is less than the pm’s value, then a new value is randomly generated from the domain value and the old value is replaced by it.  creep mutation: this type of mutation can be applied if the genes code ordinal attributes. For each gene, we generate a number between 0 and 1. If the number is less than the pm’s value, then we randomly generate another number, sign,between 0 and 1. If sign < 0.5, then we subtract a small value from the original gene, if sign > 0.5, then we add a small value to the original gene. • mutation for real-coded chromosome representation: we shall present two methods that can be used for this type of representation:  uniform mutation: each gene’s value is chosen using the uniform distribution.  non-uniform mutation or normally distributed mutation: this type of mutation is the same as the creep mutation. For each gene, we generate a number between 0 and 1. If the number is less than the pm’s value, then we randomly generate another number, sign,between 0 and 1. If sign < 0.5, then we subtract a small value from the original gene, if sign > 0.5, then we add a small value to the original gene. The added value is randomly chosen from a normal distribution with mean zero and arbitrary standard deviation, thus from the interval (1.96 SD, 1.96 SD).

2.1 Doctor’s suspicion. Doctor + artificial intelligence combo’s diagnosis

119

Before putting all the parts of the genetic algorithm together, we still have to discuss a crucial part of the method: the stopping criterion. To stop the evolutionary process we can choose from two procedures: • If the fitness value is a priori known, we will stop the evolutionary process when we achieve that value. This case is rarely encountered in practice. • When dealing with stochastic algorithms, we must keep in mind that reaching the best solution might not ever happen. Genetic algorithms are stochastic methods, so while having our chromosome population evolve, we might miss the optimum. Thus, we can stop the process by applying one or a combination of the following stopping conditions:  the fitness doesn’t improve anymore as generations pass;  having a predetermined threshold the population diversity reaches its limit;  we have predetermined a certain number of generations, and that number is reached;  we have predetermined certain number of fitness evaluations, and that number is reached. Having all these information we can now present the general scheme of a genetic algorithm: 1. Initialize the population with random candidate solutions. 2. Apply the fitness function to each candidate and obtain the fitness score. 3. Until the stopping criterion is met, repeat the following steps: 3.1 Select the parents. 3.2 Apply the crossover operator on the parents. 3.3 Apply the mutation operator on the resulting offspring. 3.4 Apply the fitness function on the new individuals. 3.5 Select the chromosomes that will form the next generation. The most difficult part when dealing with AI is setting the parameters. A wrongful setting of the parameters might make our method fail in returning the desired results. In genetic algorithms we have several parameters that need tuning: the population size, the crossover and mutation probabilities, and the number of generations, etc. Remember that we need to take decisions regarding what operators we will use, or what fitness function. To make things a little less complicated, we shall present two tips on how to deal with all this:  parameter tuning: is done before your method is officially released. The procedure implies changing the parameters and redoing the experiments, until you find the ones that give you the best results. Once you achieved this goal, the parameters will remain fixed during the official run. The problems that are associated to this method are: ▪ the procedure costs to much time; ▪ having multiple parameters that interact implies an exhaustive search, which is not practicable; ▪ user’s mistakes in settings can lead to errors or low performance; ▪ initial “good” values can become “bad” values during the official run.  parameter control: the changing of the parameters is done dynamically during the official run. The process can use a predetermined time-varying schedule, a feedback of the search

120

2. The beginnings

process, or even apply evolutionary computation by encoding all the parameters in chromosomes and optimize them like this. The problems associated with this strategy are: ▪ how to determine the correct time-varying schedule? ▪ how to optimize the feedback? ▪ does natural selection even work in this case? What parameters would we use in the process? Isn’t that just running around in circles? More or less, you have gotten a taste of evolutionary computing by now. Thus, let us discover a way we can set the weights of a neural network using the evolutionary paradigm. In the beginning the method was proposed by Belciug and Gorunescu, back in 2013, and addressed a two-class decision problem (Belciug and Gorunescu, 2013). The authors designed the following MLP architecture: • the input layer contains n neurons and represent the predictive attributes from the data set; • the hidden layer contains a number of neurons that equals the number of classes (e.g. 2, 3, …, q); • the output unit contains only one neuron, the predicted class; • the activation function is the sigmoid; • the output is computed using the winner-takes-all-rule; • the backpropagation algorithm is replaced by a genetic algorithm. In Fig. 2.38 we have a visual representation of the MLP/GA architecture for a twoclass decision problem.

FIG. 2.38

MLP/GA architecture for a two-class decision problem.

As we can depict from Fig. 2.38, the weight vector is a real-numbered array. This vector is encoded in a chromosome. The genes represent the input features. The synaptic weights between the input and hidden layer are in a fixed order, ¼(wx11, wx12, wx21, wx22, wx13, wx23, …, wxn1, wxn2), having xi, i ¼ 1, 2, 3, …, n the input features. The parameters are obtained heuristically. You need to run the algorithm multiple times to set the optimal parameters such as: population size, number of generations, crossover probability, mutation probability, etc. The best selection, crossover or mutation operator is chosen by the user depending on the problem

2.1 Doctor’s suspicion. Doctor + artificial intelligence combo’s diagnosis

121

at hand. For each chromosome in the population we use the whole training data set to compute the accuracy. Thus, the accuracy obtained by each chromosome will represent the fitness score. The higher the accuracy, the higher the fitness score. The authors chose 40% of chromosomes to become parents, the best ones according to the selection operator, parents that will produce an equal number (40% of the population) of offspring. The mutation operator uses as the value that will be added or subtracted, the chromosome’s error, having the following formula: chromosome Error ¼

100  fitness Score : 100

The process is repeated until the maximum generations number is reached. The fitness function is again applied and the “best” chromosome will represent the weights obtained after the training step. This method can be extended to other types of neural networks.

Bayesian learning paradigm In Chapter 1, we presented Bayes’ theorem. We can use Bayes’ theorem for decisionmaking/classification problems. Having an N items that have the following attributes {A1, A2, …, Ap} we wish to see in which class they belong to. Having multiple number of classes, Ω1, Ω2, …, Ωq, according to Bayes’ formula we will chose the class that maximizes the probability P{A1, A2, …, Ap j Ωi}, i ¼ 1, 2, …, q. The naive Bayes classifier, or Idiot’s Bayes, assumes that all the attributes are independent from each other, assumption that is in general false, making the above formula:   P A1 , A2 , …, Ap j Ωi ¼ PfA1 j Ωi g  PfA2 j Ωi g  …  P Ap j Ωi : Starting with this idea, Belciug and Gorunescu proposed the following Bayesian learning paradigm (Belciug and Gorunescu, 2014), for a MLP. The new method has the following architecture: • the input layer contains normalized, and shuffled examples; • the training is performed in batch mode (LeCun et al., 2012; Haykin, 1999); • a modified form of the hyperbolic tangent is used as the activation function; the modification makes the function to converge faster (LeCun et al., 2012):   2 u ; f ðuÞ ¼ 1:7159  tanh 3 • a hidden layer with a number of hidden neurons equaling the number of classes; • the output is computed using the softmax function; • the weights are initialized using the rank correlation Goodman-Kruskal Gamma computed between the input features and the decision classes. Each hidden neuron has a corresponding weight of the attribute xi that belongs to the item x, wij, i ¼ 1, 2, …, p, j ¼ 1, 2, …, q. In the Bayesian learning paradigm we consider the synaptic weights’ values of a random variable, Wij, i ¼ 1, 2, …, p, j ¼ 1, 2, …, q. To minimize the

122

2. The beginnings

network’s error, we use the conditional probability P(E j wij) as the “likelihood” from the Bayesian theory. If we consider that the events Aij corresponding to Wij offer a partition of the weight space, W, and E(n) is the network’s error at iteration n, then using the total probability formula, we have: X   P f E ð nÞ g ¼ P EðnÞj Aij  P Aij , ij

for i ¼ 1, 2, …, p and j ¼ 1, 2, …, q. Using the subjective Bayesianism, we can interpret the weight between two neurons as a measure of belief (Hajek, 2012; Press, 2003). This means that the weights will have values that range from 0 to 1, and they will encode the strength of the synaptic connection between neurons. The stronger the connection implies the higher the probability. We can use the Bayesian rule to update the weights, as long as we consider them being posterior probabilities. Mathematically speaking, the value of wij at iteration (n + 1) is given by:    P EðnÞj Aij  P Aij wij ðn + 1Þ ¼ P Aij j EðnÞ ¼ X   , P EðnÞj Aij  P Aij ij

for i ¼ 1, 2, …, p and j ¼ 1, 2, …, q, having P{E(n)} as the evidence, P{Aij} as the prior probability, and P{E(n)j Aij} as the likelihood. In order to estimate the initial weights, the likelihood and the priors, we use a correlationbased method. The modulus Γ of the correlation coefficient can encode the relationship’s strength. In the learning method presented here, a subjective approach is followed: we presume that the weights are related to the feature’s influence on the decision class, making the priors P{Aij} subjective. Thus, we presume that through the correlation coefficient between each attribute and the decision class, the priors are expressing specific information regarding the item x. Mathematically speaking we can write the above presumption as:    P Aij ¼ Γ Xi , Yj , i ¼ 1, 2,…, p, j ¼ 1, 2,…,q: Using the same principle, we can write the formula for likelihood as the Goodman-Kruskal correlation rank between the each feature and the error at step n:  P EðnÞj Aij ¼ ΓðXi , EðnÞÞ, i ¼ 1, 2, …,p, j ¼ 1, 2, …,q: More details regarding the objective vs subjective dilemma are presented in Wagenmakers et al. (2008). Having all these information we can now present the general scheme of the Bayesian learning paradigm: 1. Compute the mean attribute value mij per attribute Ai, i ¼ 1, 2, …, p, and per class Ωj, j ¼ 1, 2, …. , q. 2. Initialize the weights between the input and hidden layer using the following formula:

j wij ¼ Γ xki  mi , yk , i ¼ 1, 2,…,p, j ¼ 1, 2, …q, k ¼ 1, …,N:

2.1 Doctor’s suspicion. Doctor + artificial intelligence combo’s diagnosis

123

3. Compute the discriminant linear function uj, j ¼ 1, 2, . . , q, for each unit in the hidden layer: 0 1 Xp B 1 C uj ¼ @xki  wij 

2 A i¼1 j xki  mi for i ¼ 1, 2, …, p, j ¼ 1, 2, …, q, and k ¼ 1, 2, …, N. 4. Consider the non-linear activation function for each hidden unit:     2  uj , j ¼ 1, 2,…, q: f uj ¼ 1:7159  tanh 3 5. One-hot encode the classes with the corresponding labels using the 1-of-q rule: y1

(1, 0, …, 0), y2 (0, 1, …, 0), …, yq (0, 0, …, 1). 6. Interpret the hidden layer as a discrete random variable governed by the probability mass function, gj ¼ g(f(uj)), having the following formula:    exp f uj  max f f ðui Þg gj ¼ p X     exp f uj  max f f ðui Þg j¼1

j ¼ 1, 2,…, q:

7. Compute the error for each object xk in the training set: errork ¼

q

X j¼1

ykj  gkj , k ¼ 1, 2,…,N:

8. Built the error vector taking into account the error computed for each training sample: E ¼ (error1, error2, …, errorN). 9. Update the weight using the Bayesian learning paradigm: ΓðXi , EðnÞÞ  ΓðXi , yÞ wij ðn + 1Þ ¼ w∗ij ¼ X , ΓðXi , EðnÞÞ  ΓðXi , yÞ i

for i ¼ 1, 2, …, p and j ¼ 1, 2, …, q. 10. Repeat the process until the stopping criterion is met.

Radial basis function neural networks Up until now we have discussed learning paradigms that applied on MLP networks. Now we shall present another interesting type of neural network: radial basis function (RBF) (Broomhead and Lowe, 1988). We have seen so far that the computing neurons use a nonlinear activation function applied on the dot product between the input and the weights. In this type of neural networks, the activation function of hidden neurons is applied on

124

2. The beginnings

the distance between the input vector and a centered prototype vector. In a RBF neural network’s architecture we have: • the input layer that connects the rest of the network to the environment; • one hidden layer that transforms the input space through a non-linear function into the hidden space; • the output layer. RBF uses hyperspheres to split the problem space. The hyperspheres have centers and radii. If we have M numbers of basis functions, then we have the following RBF mapping: XM yk ð xÞ ¼ w  ϕj ðxÞ, j¼1 kj where ϕ is the Gaussian basis function:

0 2 1 x2μj B C ϕj ðxÞ ¼ exp @ A, 2 2σ j

and x is the input vector, μj is the vector that determines the center of the basis function ϕ, while σ j is the width. The training of a RBF is done in two steps. The first step we determine the parameters, μj and σ j, of the basis function ϕj using the input training set. In the second step, we compute the synaptic weights wkj. To minimize the training error we use the sum-of-squares error. The architecture of a RBF is presented in Fig. 2.39.

Extreme learning machine Extreme learning machines (ELM) are a fairly new type of neural networks, introduced in the early 2000s by Guang-Bin Huang. A little bit of scandal arose in the scientific community.

FIG. 2.39

Radial basis function neural network.

2.1 Doctor’s suspicion. Doctor + artificial intelligence combo’s diagnosis

125

In 2004, Huang published his first paper regarding a new type of neural networks, the ELM (Huang et al., 2004). A follow-up study was published in 2006 (Liang et al., 2006). The problem was he excluded references that studied models similar to his new technique. In 2008, Lipo P. Wang and Chunru R. Wan wrote some comments to the editor of the IEEE Transactions on Neural Networks, regarding the fact that there was nothing new about the ELM, the idea of using a hidden layer connected through random unstrained weights to the input layer is in fact the idea behind the RBF network (Wang and Wan, 2008). Naturally, Huang replied that this was nothing but a “malign and attack” of his work (Huang, 2015a), and also explained what is an ELM and how he filled the gap between “Rosenblatt’s dream” and “John von Neumann’s puzzle” in Huang (2015b). The ELM’s architecture contains only one hidden layer. The weights that connect the input layer to the hidden layer are randomly initialized, whereas the weights that connect the hidden layer to the output layer are computed in just a single step, by using the Moore-Penrose generalized inverse (Huang and Zhu, 2006; Huang et al., 2010, 2011). Technically, having N input training objects, the ELM paradigm is: 1. Generate random values for the weights that connect the input layer to the hidden layer, wi, and for the bias, bi, for i ¼ 1, 2, …, N. 2. Compute M, the hidden layer output matrix. 3. Compute the weights that connect the hidden layer to the output layer, B ¼ M+  Y, where M+ is the Moore-Penrose generalized inverse of M, and Y is the output vector. Even if this method is simple to understand and implement, the results obtained when using it are excellent. Many variants of the ELM have been modeled; here we are going to present only two of them: an adaptive single hidden feedforward network with knowledge embedded in data, and a hybrid ELM/ant colony optimization method.

Adaptive single layer feedforward neural network The adaptive single layer feedforward neural network (aSLFN) is inspired by the ELM, but has a mechanism for generating the weights between the input layer and the hidden layer by embedding the knowledge from data into the AI model. The learning paradigm is based again on the rank correlation Goodman-Kruskal Gamma Γ (Belciug and Gorunescu, 2018). We compute the value of Γ, by the following formula: Γ¼

CD , C+D

where C are the concordant pairs, and D are the discordant pairs. The first step of the algorithm is to initialize the synaptic weights between the input and the hidden layer by computing the correlation rank between the features and decision-class, exactly as the initialization presented in the Bayesian learning paradigm for a MLP. Besides this, the model contains a second step, a filtering module, which reduces the number of the attributes used in the classification by taking into account the level of influence of each feature on the output. The influence of each attribute is determined by the p-level. If the p-level

126

2. The beginnings

of a certain feature is greater than or equal to a specified threshold, then that feature is removed from the architecture. Statistically speaking, the null hypothesis in this case states that there is no association between the feature and labels, implying zero correlation. We shall use the p-level threshold as 0.05, thus only the rank correlations with two-sided p < 0.05 are kept in the model’s architecture. Due to the fact that the Goodman-Kruskal Gamma Γ correlation rank and Kendall Tau T are comparable, the filtering module is inspired from the Kendall Tau independence test. Thus, we can compute the significance with the following formula: pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 3  Γ  n  ðn  1 Þ z ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi : 2  ð2n + 5Þ Using the Z normal distribution, two-tailed, 1  cumulative p, we remove all the features that have the z-score | z |  1.96. Hence, the aSLFN’s learning paradigm is as follows: 1. For each decision class, Ωj, j ¼ 1, 2, …, q, and for each attribute, Ai, i ¼ 1, 2, …, p, compute the mean attribute mji. 2. Compute the correlation rank between the input layer and hidden layer. Remove statistically insignificant attributes. 3. Assign the Goodman-Kruskal correlation rank to each synaptic weight. 4. Compute the hidden layer matrix: X M¼ f ðxk , wik Þ, k ¼ 1, 2, …,N, i ¼ 1, 2,…,p: ik

5. Compute the weights between the hidden layer and the output layer in one step: B ¼ M+  Y where M+ is the pseudo-inverse Moore-Penrose matrix, and Y is the output.

ELM/ant colony optimization hybrid An interesting study is adding an ant colony optimization algorithm as a feature selection mechanism for an ELM. This model was proposed by Berglund and Belciug (2018). Ant colony optimization is a metaheuristic that uses populations of artificial ants to find solutions for optimization problems. The method is inspired from biology, mimicking the behavior of ants when they are gathering food. In their colonies, ants indirectly communicate with each other through an odorous substance, named pheromone. When ants go searching for food they lay pheromone once a source of food is found. The distance between the nest and the food determines the quantity of the laid pheromone. If while randomly moving inside the search space an ant detects pheromone it decides to follow that path. When that ant reaches the food source it lays itself more pheromone, thus as the quantity of pheromone rises, the path becomes more appealing to the other ants. By this process, the probability of that path

2.1 Doctor’s suspicion. Doctor + artificial intelligence combo’s diagnosis

127

to be chosen by other ants increases. In order to perform the ant colony optimization process for feature selection one must apply the following steps:

Ant colony optimization feature selection algorithm 1. Having n ants in a colony, randomly initialize a subset of features from the data set, with the constraint that each ant has a unique feature set. 2. Build the pheromone matrix T with all its values set to 0. 3. Compute the pheromone level for each ant. The pheromone values will be transformed into probabilities for the attributes that will be chosen in the next iteration by the ants. Taking into consideration that each ant has unique features in its list, the probability of each feature to belong to ant Sj is given by: pi ¼

Ti : n X Ti i¼0

4. Each ant has a number of selected features, m. m’s value decreases at each step by 1, in order to have selected the most significant attribute. If m’s value is below 2, then we generate its value using a process similar to the Monte Carlo roulette. 5. Each ant’s feature set is evaluated using the ELM method. The one who obtains the best results will be used during the testing phase. 6. The pheromone is updated using the mean squared error computed over the k best ants:  Δ;Ti ¼

if fi Sj max ðMSEÞ  MSEj , 0, otherwise

having Ti as the feature’s fi associated pheromone, Sj as the subset of features, j ¼ 1, 2, …, k, MSEj the estimated mean squared error of the Sj’s obtained performance after applying ELM.

Probabilistic neural networks In the probabilistic neural networks’ (PNN) learning paradigm the output values are seen as probabilities, making the whole training process about estimating a probability density function. Specht introduced the PNNs in his studies (Specht, 1967, 1988, 1990). Each class has a probability density function that needs to be approximated. Once all the probability density functions are approximated the one with the highest probability value is selected as the correct answer. The statistical algorithm Kernal Fisher and the Bayesian networks are the starting point of the PNNs. The activation function of a PNN is the exponential function. In order to predict, the PNN takes into account the likelihood of the events and the a priori information. The Parzen’s estimates are used to approximate the probability density functions. If certain conditions are met, a class of estimates can asymptotically converge to the expected density

128

2. The beginnings

(Parzen, 1962). The idea of the Parzen’s estimates from 1962 has been extended in 1966 by Cacoulos for multivariate distributions (Cacoulos, 1966). PNNs surpass MLPs when it comes to the computational speed. In the training process of a PNN, the training objects are learned only once by the neural network. The downside is that every single training object is copied into the network. Depending on the problem, and the type of data it has to deal with, the PNN’s performance sometimes surpasses the MLP’s performance. The a priori probabilities hi ¼ P(Ωi), i ¼ 1, 2, …, q, are computed as the number of objects from a certain class that are in the data set over the total number of objects in the data set. To compute the class probability density function f(x) we can use one the following functions: m

P 1 • f ðxÞ ¼ nð2λ 1,when xi  xij  λ, i ¼ 1,2, …,p, j ¼ 1, 2, …,m: Þp j¼1





xi  xij 

, when xi  xij  λ, i ¼ 1,2,…, p, j ¼ 1,2, …, m: λ j¼1 i¼1 2 P p  2 3 "  2 #  x  x i ij m m Q X 6 i¼1 7 P 1 xi  xij 1 p 7. exp 6 ¼ • f ð xÞ ¼ 1 p p i¼1 exp  4 5 2 2 2 λ 2λ nð2π Þ2 λp j¼1 nð2π Þ2 λp j¼1



f ðxÞ ¼ mλ1 p

p m Q P

1 • f ðxÞ ¼ nð2λ Þp

1 f ðxÞ ¼ nðπλ Þp

1

 X    m |xi  xij | 1 X 1 p exp  exp  j x  x j ¼ i ij . i¼1 λ λ nð2λÞp j¼1 j¼1 i¼1 p m Q P

p m Q P

"



xi  xij

2 #1

: λ2  3 2 2 xi  xij sin m Q P p 6 λ2  7 1 6  7 . • f ðxÞ ¼ nð2πλ p i¼1 4 5 Þ x x •

j¼1 i¼1

1+

i

j¼1

• fkn ðxÞ5



1 p ð2π Þ2 σ 2

fr ðxÞ5 1p ð2π Þ2 σ p



ij

2λ    2 m P dðx, xj Þ 1 n m exp k  2σ2 , k  2,n  1: j¼1

1 m

0

B B  exp B B j¼1 k¼1 @ m P r P

 2 ! 1 d x, xj  C 2σ 2 C C, r  1: C k! A

In what follows, we shall use the Gaussian distribution as the density function:  2 ! mj d x, xj 1 1 X f Ωi ð x Þ ¼   exp  , p 2σ 2 ð2π Þ2  σ p mi j¼1

2.1 Doctor’s suspicion. Doctor + artificial intelligence combo’s diagnosis

129

where: • • • • •

i ¼ 1, 2,…, q; the number of training objects that have the class label Ωi is mi; xj—jth training object of the class Ωi; p is the input space dimension; σ is the smoothing parameter. σ acts like the standard deviation, and needs to decrease as the training size increases. The width of the area of influence of each decision class is defined by σ. The PNN’s architecture has:

• an input layer with the objects from the training data set that are going to be presented to the neural network; • a pattern layer that contains pattern neurons for each training object, zi ¼ x  wTi , where x is the input vector, wi is the weight vector, and wTi is the transposed weight vector. Having x and wi normalized to unit length, we perform a non-linear operation: exp[(wi  x)  (wi  x)T/(2σ 2)]; • a summation layer: h X  i exp ðwi  xÞ  ðwi  xÞT = 2σ 2 ; i

in the summation layer all the values received from the pattern nodes are summed up; • an output layer that computes the decision class between two classes, taking into account he following formula: h h  X  i X   T  i exp ðwi  xÞ  ðwi  xÞT = 2σ 2 > exp  wj  x  wj  x = 2σ 2 , i

j

if the inequality is true, than the object belongs to class i, otherwise to class j. The PNN’s training algorithm has the following steps (Belciug and Gorunescu, 2020): 1. For each class, Ωi, i ¼ 1, 2, …, q, compute the difference between every pair of objects, d1, i! d2, …, dri, where ri ¼ C2mi ¼ 2!  ðm mi 2Þ!, where mi is the total number of features. 2. Compute for each class Ωi, i ¼ 1, 2, …, q, the mean and standard deviation of the distances vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi uP r u i ðd D Þ2 Pr i j i t dj j¼1 computed at step 1, Di ¼ rj¼1 . and SD ¼ i ri i 3. Search for the optimum value of σ in the 99.7% confidence interval. 4. Compute the Parzen Cacoulos estimates for each class Ωi, i ¼ 1, 2, …, q.  2 ! d x, xj 1 Xmi fΩi ðxÞ ¼   exp  : p j¼1 2σ 2 ð2π Þ2 σ p mi 1

5. Select random objects from the training set and apply the Bayesian decision rule for each decision class. Compute the inequality:

130

2. The beginnings

IF li  hi  fi > lj  hj  fj, for all i 6¼ j THEN the object belongs to class Ωi; otherwise it belongs to class Ωj. The search for σ can be done in various ways. Here, we shall mention three methods: • incremental: this method is the classic one and the most used. The search domain is split in N + 1 equal parts, resulting σ 1, σ 2, …, σ N. We compute the performance of the algorithm using each σ i, and select the one who gives the best accuracy. • genetic algorithms: in this method the chromosome has only one gene, σ. We use a population formed by N chromosomes; the Monte Carlo method is used as the selection operator; the recombination operator is the total arithmetic crossover, and as for the mutation we use the non-uniform method. After a certain number of generations, the chromosome that gives the best accuracy is chosen. • Monte Carlo: in this method we divide the search space into N uniformly distributed random points. After we run the algorithm for each of these values, we select the one that gives us the best performance. Now, we have arrived to the final point in our magical mystery tour of what neural networks are: deep learning/convolutional neural networks.

Deep learning/convolutional neural networks A lot of people want to use deep learning no matter what. It is new, hip, and cool, so immediately it needs to be the best, right? Well, deep learning and convolutional neural networks (CNNs) should be used exactly when needed. If you have a choice and you can use another more simple, less time and computational consuming method, to solve your problem, then you should choose that. CNNs are just like regular neural networks, except they have a lot of layers, and when we say a lot, well we mean a lot! CNNs are very good in processing images, video and sound. The first CNN was developed by Kunihiko Fukushima (Fukushima, 1980). In 1979, he developed the neocognitron, a neural network with a multitude of layers that were set into a hierarchical manner. The learning paradigm was a reinforcement strategy, different from the modern CNNs. In the neocognitron, some features had the possibility to be manually tuned in order to increase the weight of some connections between the neurons. In 1989, Yann LeCun presented at Bell Labs a CNN trained with backpropagation that could read handwritten digits (LeCun, 1989). Unfortunately, between 1985 and the 1990s, 15 years after the first AI winter, the second AI winter began. The research on neural networks and deep learning slowed down. AI reached pseudoscience status. Deep Learning rebirthed in 1999, when computers became faster and graphics processing units (GPUs) were developed. The vanishing gradient problem appeared around the year 2000. Due to some activation functions that condensed their input, some features that were learned in the lower layers, did not reach the upper layers, due to the loss of the learning signal. This problem affected the neural networks that used the gradient descent as learning method. The solutions that were proposed for solving this were: layer-by-layer pre-training, and long short-term memory.

2.1 Doctor’s suspicion. Doctor + artificial intelligence combo’s diagnosis

131

In 2001, the Big Data revolution started. Data grew to be three-dimensional. In 2009, thanks to Fei-Fei Li, a Professor of AI at Stanford, the ImageNet free database was released. ImageNet contains 14 million labeled images for training CNNs. In 2011, due to the increase in performance of the GPUs, the CNNs could be trained without the layer-by-layer pre-training. Now let us see what is all this fuss about. We have seen that a regular neural network receives as input a vector. The vector contains the features of the object we want to classify. If we wish to classify an image with 40 px height, 40 px width, and 3 color channels (red, green, and blue—RGB), so an image with a size of 40 40 3, we would need 4800 synaptic weights to connect all the neurons from the input layer to the neurons in the first hidden layer. But, let us face the real facts, a CT scan, an X-Ray, or histopathology images, do not have the dimension of 40 40 3. If they would have at least a dimension of 400 400 3, that would imply 480,000 synaptic weights. For this kind of classification, CNN are definitely needed. The neurons from a CNN layer are organized in three dimensions: we have a dimension for the height, a dimension for the width, and a dimension for the depth. Here, when we refer to the dimension depth we talk about the three color channels, not the depth of the neural network. Another interesting fact about CNNs is that not all the neurons are interconnected. The neurons from a layer connect with only a small part of the neurons from the previous layer, and so on. The CNN’s architecture contains three types of layers: the convolutional layer, the pooling layer, and the fully-connected layer. Through these layers the CNN transforms the threedimensional input into a three-dimensional output, 1 1 number_of_classes. The CNN’s architecture is depicted in Fig. 2.40.

FIG. 2.40 The CNN’s architecture.

In the convolutional layer the computation part takes place. In this layer we have hyperparameters called filters or kernels. The filter acts like a magnifying glass that examines each part of the image. The filter is also three-dimensional, for example 5 5 3, 5 px for the height, 5 px for the width, and 3 color channels. In the training phase, we slide the filter across the width and the height of the input image, and compute the dot product between it and the part of the input image that is highlighted by it. This process produces two-dimensional activation maps that contain the response of the filter over every part of the input image. A filter

132

2. The beginnings

activates only when it recognizes something in the image: an edge, a spot, or a specific color. Moving forward layer by layer, each filter will create its two-dimensional activation map that searches within the images certain patterns. We set the size of the filter by tuning the receptive field hyperparameter. Using the value of the receptive field we can compute the number of weights of each neuron in every convolutional layer. For example, if we have an image with the size 40 40 3, and the receptive field is 5 5 3, then each neuron will have 5  5  3 ¼ 75 weights. If we have another value for the depth image and filter size, that would make our computation: 40 40 9, and the filter size is 5 5 9, that makes every neuron in the convolutional layers to have 5  5  9 ¼ 225 weights. Besides all these connections, one must not forget to add the bias also. A visual representation of a convolutional layer is depicted in Fig. 2.41

FIG. 2.41

A convolutional layer from a CNN.

In the output layer, we must take into account other three hyperparameters: the depth, the stride, and the zero-padding. When we decide the number of filters we will use, we set the depth. All neurons that are set to discover patterns in a specific part of the image form a depth column, or fiber. The neurons activate only when they discover a certain pattern in their depth column (an edge, a color, etc.). When we decide how fast we are sliding the filter across the image, we are setting the stride. If we have the stride set to 1, then our filter moves 1 px at a time, if we have the stride set 2, then our filter moves 2 px at a time, and so on. Keep in mind, that even if the choice is yours, in general practice the value of the stride is 1 or 2. If we create a border of the image by padding 0 to the edges, we are zero-padding the image. The number of neurons that will be in the output layer equals: weights  filter + 2  zeroPadding + 1: stride Please make sure that the above equation has an integer as result. We shall continue by presenting a computational example. Fig. 2.42 depicts a twodimensional convolution with no padding, and stride 1. In 2016, the term of dilated convolution appeared (Yu and Koltun, 2016). If we add space between the cells of a filter, we create a dilation. In Fig. 2.43 we present a one dilated

2.1 Doctor’s suspicion. Doctor + artificial intelligence combo’s diagnosis

133

FIG. 2.42 Example of a two-dimensional convolution.

convolution versus a two dilated convolution. We use dilated convolution if we wish to increase our global view of the network, and also the linear increase of the hyperparameters. Basically, we can integrate knowledge of a wide context using fewer resources.

FIG. 2.43 Example of one dilated convolution and two dilated convolution.

Recall that when we are using classical neural networks, our output is fully connected to all the input neurons. In a CNN we can change that by using sparse connectivity or sparse weights (Goodfellow et al., 2016). By tuning the kernel size we can control the sparse connectivity. A smaller kernel can detect significant details in the input image. This is especially important when it is used in processing medical images, due to the fact that some tumors might be very small, and the human eye might not detect them. Sparse connectivity saves resources and gains statistical efficiency. Figs. 2.44 and 2.45 show the differences between full connectivity and sparse connectivity.

134

2. The beginnings

FIG. 2.44

Input view of connectivity in a classical neural network.

FIG. 2.45

Input view of connectivity in a CNN.

In a fully connected neural network any neuron from the input layer is connected to all the neurons in the output layer, and the other way around: a neuron from the output layer is connected to all the neurons in the input layer. Using sparse connectivity, we see that a neuron from the output layer has interaction with only a certain number of input neurons (see Figs. 2.46 and 2.47). This does not imply that the output neurons are not affected at all

FIG. 2.46 Output view of connectivity in a classical neural network.

2.1 Doctor’s suspicion. Doctor + artificial intelligence combo’s diagnosis

135

FIG. 2.47 Output view of connectivity in a CNN.

FIG. 2.48 Sparse connectivity in a CNN.

by all the neurons from the input layer. The connections still exist, only they are indirect, not direct (see Fig. 2.48). Besides sparse connectivity, in a CNN we can share the same parameter for more then one function. This process is called parameter sharing. Mathematically speaking, in a classical neural network a weight is used by only one neuron, in a CNN by sliding the kernel across the input, the weights are used by multiple neurons, hence they are shared. The pooling layer is a special case of layer, and it is usually inserted after several successive convolutional layers. We use the pooling layer to reduce the amount of parameters, the spatial size, hence the computational cost, and to control the overfitting process. There are several types of pooling layers that can be used in a CNN: max pooling, average pooling, and L2-norm pooling. Ultimately, each type of the pooling layer replaces a convolutional layer with a layer that contains the summary statistics of the convolutional one. The most frequently used is the max pooling, introduced by Zhou and Cheppalla in 1988 (Zhou and Chellappa, 1988). The max pooling works like this: we divide the space into rectangular parts according to the size of the pooling layer; we return the maximum size of each rectangular slice, thus reducing the convolutional layer size. In practice the pooling layer is of 2 2, with stride 2, no zero padding. Another pooling layer that can be used is the overlapping pooling with size 3 3, with stride 2, no zero padding. If we have 4 numbers and we keep only the maximum of them, we reduce the size of the activation by 75%. Fig. 2.49 presents a max pooling example.

136

FIG. 2.49

2. The beginnings

Max pooling layer example.

The pooling layer has another major advantage: by applying it we make the image invariant to translation. This means that we are interested more if a specific pattern exists in the image, rather where is that pattern positioned in the image. For example, if we are looking for a mass in a mammography, we are interested if that mass exists, not if it is on the right or left side of the breast. Obviously, the idea of using pooling layers is not shared with the same joy by every scientist. Another idea is to use repeatedly convolutional layers, with no pooling layer in between them (Springenberg et al., 2015). For the reduction of the representation, a large stride can be used once in a while. The fully connected layer is exactly like the layers that are present in a traditional neural network. In general, CNNs are trained using the backpropagation algorithm. Luckily enough, the activation function between the layers is the ReLU, so the computation that is done during the gradient descent is easy, due to the fact that the gradient is the highest value in the forward pass. The hard part, just like in any other AI method is the setting of the hyperparameters. Various methods can be applied depending on the problem that needs to be resolved. For more details regarding CNNs we refer the reader to Goodfellow et al. (2016). Various classification methods can be applied in cancer research. We mentioned above that we do not always have to use complicated techniques such as deep learning to solve problems that can be solved with the use of more simple models. In what follows we are going to present some algorithms that are easy to understand, implement, and apply, while giving excellent results.

k-Nearest neighbor k-Nearest neighbor (k-NN) is a supervised machine learning method that can easily solve classification problems. The idea behind the k-NN is that we presume that similar objects are in close proximity. Taking into account the method’s name, the assumption is that similar objects are found near each other, being neighbors. k-NN is good because it is fast, easy, and does not make any presumption regarding the data (e.g. like the naive Bayes classifier when it assumes that the features are independent from each other). On the other hand, the method has its disadvantages also, such as: the performance of the method depends on the quality of data it is applied on; its accuracy drops

2.1 Doctor’s suspicion. Doctor + artificial intelligence combo’s diagnosis

137

when dealing with borderline objects that can be classified either way. The performance of the k-NN is low if the data contains noise or has irrelevant attributes. A way to resolve this is to scale the features according to their importance. For further reading regarding the scaling of features we refer the reader to Garcia Laencina et al. (2009), Nigsch et al. (2006), and Sorjamaa et al. (2005). Another problem with the k-NN is that it is a lazy algorithm, because it does not generalize data. k-NN is recommended to be used when we want a simple recommendation system (e.g. Netflix or Amazon use k-NN to make recommendations regarding what movies to watch, or buy, given what you have been watching/buying so far). Before we dive into the mathematical backgrounds of the algorithm, we will present visually how k-NN tries to classify an object given a data set (Fig. 2.50).

FIG. 2.50 k-Nearest neighbor method.

Mathematically speaking, in the k-NN algorithm we compute the distances between the objects in the data set. The most used metrics is the Euclidean distance. Thus, the first step that we should undertake is to transform the data into vectors and compute the distances using the following formula: sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi n X dðp, qÞ ¼ ðqi  pi Þ2 : i¼1

k-NN identifies the k nearest objects (neighbors), by ordering the distances computed between the new object and the vectors in the training data set. After this step, all the k neighbors vote, by using their label, to determine what is the label of the new object. Technically, we assign the label, which is the most encountered among the k training vectors. The hardest part of using this algorithm is finding the optimum value of k. If we use a large value, then we reduce the effect of the noise in our decision problem, but the disadvantage is that the boundaries between classes fade away. One way to set k is to use the cross-validation method. We split the data set into m subsets. The subsets must be randomly drawn and disjoint. We use a fixed value for k and apply it on the mth subset, evaluating its error on a cycle. After the m cycles have passed, we compute the mean of the obtained errors and see how the

138

2. The beginnings

classifier performed using that value of k. The process is repeated for different ks, and the value that obtains the highest performance is the optimal value of k. Over time, k-NN has improved in various ways. Here we are going to mention only: proximity graphs (Toussaint, 2005), adaptive distance measure (Wang et al., 2007), receiving operating characteristics (Hassan et al., 2008), support vector machines (Srisawat et al., 2006), and parallel exemplar learning system (Cost and Salzberg, 1993).

Clustering Up until now we have covered only supervised learning methods. In what follows we shall present the concept of clustering analysis, also known as clustering. Clustering finds groups of objects that resemble in some sort of way. A good clustering analysis will put together objects that have great similarity within the group, while objects that belong to different groups will be very different from each other. Fig. 2.51 illustrates the concept of clustering.

FIG. 2.51

Clustering analysis.

When it comes to clustering we have two approaches: • non-hierarchical/partitional/flat clustering • hierarchical clustering Three steps must be performed when we want to use clustering analysis: at first, we need to define a similarity measure; secondly to define what building process we want for our data; and third to design and implement the clustering process we have decided upon. Because in clustering we want to find the objects that resemble in some way, we need to settle on what similarity measure we will be using during the process. Besides this, one must take into consideration which method of validation she/he wants to use for the validation of the obtained results (Hand et al., 2001; Tan et al., 2005). We can choose the way we want to validate our model between: • internal validation: evaluates the performance internally (e.g. sum-of-squares, MSE), not taking into account any external information.

2.1 Doctor’s suspicion. Doctor + artificial intelligence combo’s diagnosis

139

• external validation: compares the performance of our clusters relative to another model’s performance through statistical testing. When we want to perform cluster analysis we should use the following checklist: • select objects for clustering or problem statement; • select an appropriate similarity measure based on the nature of the objects we want to classify; • select a clustering method; • select the stopping criteria; • interpret the results (e.g. graphical interpretation, etc.); • validate model. When selecting the similarity measure we need to take into account two issues: the similarity measure must minimize the intra-cluster distance, and at the same time it must maximize the inter-cluster distance. For this, a definition of the distance between two clusters is in order: • single linkage or nearest neighbor: the distance between the two closest objects that belong to different clusters determines the distance between the two clusters. • complete linkage or furthest neighbor: the distance between the two furthest object that belong to different clusters determines the distance between the two clusters. • group average-unweighted pair group average: the average distance between all pairs of objects that belong to different clusters determines the distance between the two clusters. • group average-weighted pair group average: is the same as the unweighted pair group average, the only difference being that in the average computation we are taking into account the number of objects in the clusters as a weight. • unweighted pair group centroid: we compute each cluster’s centroid (the average point in the multidimensional space defined by the objects); the distance between the two centroids determines the distance between the two clusters. • weighted pair group centroid: is the same as the unweighted pair group centroids, the only difference is that we are taking into consideration the number of objects in each cluster. • Ward’s method: is different from the above methods due to the fact that it uses the analysis of variance to determine the distance between two clusters. For example it tries to minimize the sum of squares of two clusters that can be hypothetically formed (Ward, 1963). Next, we shall continue to present some of the most frequently used similarity measures. Thus, having two objects with the same number of features n, x ¼ (x1, x2, …, xn) and y ¼ (y1, y2, …, yn), we can compute their distance using one of the following formulas: 1. Minkowski distance: dp ðx, yÞ ¼

n X i¼1

! jxi  yi j

p

1 p,

pN

140

2. The beginnings

We use p to control the weight that is placed on the differences between the individual dimensions (features). There is a generalization of the Minkowski distance, named the power distance, which has the following formula: !1=r n X dp,r ðx, yÞ ¼ : jxi  yi jp i¼1

In this case r controls the weight that is placed on the difference between the objects. If we set the value of p to 1, then we have the Manhattan distance/city block/taxicab/L1 distance: dM ðx, yÞ ¼

n X

|xi  yi |:

i¼1

If we set the value of p to 2 then we are dealing with the Euclidian distance/L2 distance/ crow flies distance: !1=2 n X dE ðx, yÞ ¼ : j xi  yi j 2 i¼1

If we care to use the Euclidian distance, we must take into account that we need to use raw data to compute the distance, not standardized data. If we set the value of p to ∞ then we are dealing with the Chebychev distance: dC ðx, yÞ ¼ max i |xi  yi |: If we use the Manhattan distance for binary objects then we are dealing with the Hamming distance. The difference is computed as the number of bits that differ between the two objects. 2. Tanimoto distance: dT ðx, yÞ ¼

xy : x  xT + y  yT  x  yT

3. Cosine distance: dc ðx, yÞ ¼

x  yT : kxk  kyk

4. Jaccard distance together with the Jaccard index: are often used in statistics to measure the similarity of two sample sets. To compute the Jaccard distance, we first have to compute the Jaccard index J(A, B) which is the ratio of the size of the intersection between the two samples A and B and the size of their union: J ðA, BÞ ¼

j A \ Bj : j A [ Bj

2.1 Doctor’s suspicion. Doctor + artificial intelligence combo’s diagnosis

141

The Jaccard distance is given by the following formula: [1  J(A, B)]. 5. Pearson’s r distance: n X

ð xi  xÞ  ð yi  yÞ

i¼1

rðx, yÞ5 sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi : n n X X ð xi  xÞ 2  ðyi  yÞ2 i¼1

i¼1

6. Mahalanobis distance: uses a symmetric, positive definite matrix, B. dB ðx, yÞ ¼

qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ðx  yÞ  B  ðx  yÞT :

If the two samples have the same distribution and covariance matrix, then we can write the Mahalanobis distance as: qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi dD ðx, yÞ ¼ ðx  yÞ  covðDÞ1  ðx  yÞT : If the covariance matrix is the identity matrix, then the Mahalanobis distance is reduced to the Euclidian distance. 7. Fuzzy extensions distances: are tools that are already built for comparing vectors that have values from the [0, 1] interval. These vectors are called fuzzy vectors. If the fuzzy vector has only binary elements, then we are dealing with crisp vectors. A fuzzy extension distance is the fuzzy Minkowski measure that has the following formula: !1=p n X p p dF ðx, yÞ ¼ sðxi , yi Þ , i¼1

where s(xi, yi) ¼ max {min{xi, yi}, min{1  xi, 1  yi}}. If we wish to rank the importance of the vector’s features, we can incorporate weights in the similarity measures. Let us take as example the weighted Minkowsky measure, which has the αi, i ¼ 1,P 2, …, n parameters that weight each feature. We have two constrains regarding αi: αi > 0 and i αi ¼ 1: !1=r n X p dp,r ðx, yÞ ¼ αi  jxi  yi j : i¼1

When we are interested in applying AI in cancer, we need to take note of the fact that the objects are represented through mixed vectors that have components of different natures: numerical (e.g. age, weight, hemoglobin, creatinine, etc.), categorical (e.g. cancer type, gender), rank (e.g. cancer stage, cancer grade, tumor size, etc.), fuzzy (e.g. different risk factors: smoking, alcohol, etc.). If this is the case, we cannot use the similarity

142

2. The beginnings

measures mentioned before, we need to apply special ones. An idea is: if we have mixed vectors, why not have mixed similarity measures? Let us take for instance two objects with s mixed types of data in them. We have the following dimensions: k1, k2, …, ks. Our vectors look like this:





x ¼ x11 , x12 , …, x1k1 , x21 , x22 , …, x2k2 , …, xs1 , xs2 , …, xsks , y¼







y11 , y12 , …, y1k1 , y21 , y22 , …, y2k2 , …, ys1 , ys2 , …, ysks :

The first step we need to perform is to establish a ranking between the attributes, and asP sign certain weights to them αj , j αj ¼ 1. Next we will order them according to their rank. For each type of attribute we shall choose the appropriate similarity measure. Let us suppose that xj ¼ (xj1, xj2, …, xkjj) and yj ¼ (yj1, yj2, …, ykjj), then we would have the following mixed-weighted distance measure: Xs   α  dj xj , yj : dðx, yÞ ¼ j¼1 j With all these new knowledge in mind, let us see several types of cluster analysis algorithms.

Non-hierarchical clustering K-means is the most used clustering algorithm (MacQueen, 1967). This is because it easy to understand, use, and implement. With k-means we assume an a priori knowledge of the number of clusters. The problem of determining the optimal number of clusters that leads to the best linear separation of the objects still remains open. Just like in the k-nearest neighbors’ case, the most common way to determine k is the cross-validation technique. In order to apply the k-means clustering, one must follow the next steps: 1. Select k the number of groups and generate randomly their center points (centroids). The center points are vectors that have the same length as the objects in the data set. 2. Assign a cluster for each object by computing the distance between each point and each group center, and choosing the group whose center is the closest. 3. Taking into account the objects groups at this iteration, compute the group centers as the mean of all the objects in that group. 4. Repeat these steps until the clusters do not change too much between iterations, or until we reach a certain predefined number of iterations. Mathematically speaking, k-means algorithm is: 1. Having N objects in the training data set, xl ¼ (xl1, xl2, …, xln), l ¼ 1, 2, …, N, set k the centroids cj, j ¼ 1, 2, …, k. 2. Split the objects into k disjoint subsets Sj, that contain Nj objects, so that the sum of squares clustering function is minimized. S¼

k X X xl  cj 2 , j¼1 lSj

2.1 Doctor’s suspicion. Doctor + artificial intelligence combo’s diagnosis

where cj is the mean of the objects from Sj:

cj ¼

X

143

xl

lSj

Nj

:

Besides its advantages, the k-means algorithm has also disadvantages. The most serious one is the fact that you have to select how many clusters there are. This is not an easy task, and when we are applying clustering analysis we are interested in a method that establishes this by its own. The second disadvantage is that the first step of the algorithm starts with a random choice of the centroids, thus in different runs of the algorithm we might end up with different results. k-Means can be used for cancer treatment segmentation. For instance in Gorunescu (2003), the authors considered four types of cancer treatment: chemotherapy; chemotherapy + hormotherapy; chemotherapy + radiotherapy + curietherapy; and chemotherapy + radiotherapy + curietherapy + hormotherapy. The features that were taken into consideration were: average tumor diameter, age, and disease stage. Beside this the authors have taken into consideration the treatment type of each patient. Thus, they tailored a specific model that correlated a certain patient with a specific treatment. In Belciug et al. (2010) the authors used clustering to detect breast cancer recurrence. k-Medians is another clustering algorithm that resembles with the k-means, except it computes the median vector not the mean vector ( Jain and Dubes, 1988; Bradley et al., 1997). The advantage of this method is just like the advantage of using the median instead of the mean, it is less sensitive to outliers. The disadvantage is that by sorting all the vectors at each iteration, the algorithm is very slow. The mean-shift clustering method is a mode-seeking algorithm, that tries to find dense areas of objects (Cheng, 1995). The mean-shift works with a so-called sliding-windows. The goal of the algorithm is to locate the centroids of each cluster, by updating candidates for center points so that they would be the mean of the objects within the sliding window. A filtering process is done in the post processing stage in order to eliminate the near duplicates. For a better understanding we shall present step by step the mean-shift algorithm. 1. Consider a set of objects in a two-dimensional space. Having a circular sliding window centered in a random centroid point, with radius r as kernel, we begin to shift it iteratively to a region with a higher density (the density is directly proportional to the number of points that exist in that window), until we reach convergence. 2. At each iteration, we shift the center point to the mean of the objects that are situated in that window. By shifting the center point toward the mean, we gradually reach the higher point density. 3. We continue to shift until we see that the density stops to increase. 4. If two or more windows overlap, the window that contains the most points is kept. The advantage of this method is that unlike the k-means or k-medians, the mean-shift discovers the number of clusters by itself. The disadvantage consists in the fact that we must choose the window size r, which is not an easy job to do.

144

2. The beginnings

Density based spatial clustering of applications with noise or DBSCAN is another density based clustering analysis method that somehow resembles to the mean-shift technique (Martin et al., 1996). When using DBSCAN we choose randomly a starting point that we haven’t visited yet. Using a distance epsilon ε we extract the neighbors: if the distance between that pinpointed point and another point is less than ε, then the two points are neighbors. We set a value to a parameter named minimumPoints. If the number points in the current neighborhood surpasses minimumPoints, then we begin the clustering analysis process and the current point becomes the new cluster’s first point and we mark it as being visited. If the numbers of points are not enough, their number being less than the minimumPoints, then we label the current point as outlier, and we mark it as being visited as well. Again, taking into account the distance ε, we populate the new cluster with the points within this distance. We repeat these steps until all the points within that cluster have been visited. We randomly choose another unvisited point and repeat the steps for it. When the process ends we will have clusters and noise. The main advantage of the DBSCAN is that is can figure out by itself the needed number of clusters, on the other hand setting the distance ε and the value of minimumPoints is difficult. The last partitional clustering method that we are going to present is the Expectation maximization (EM) clustering using Gaussian Mixture Model (GMM) (Bishop, 2006). GMMs are flexible, and presume that the objects are Gaussian (normally) distributed. Recall, that k-means presumes that the objects are circular around the mean. If we use GMMs, besides the mean, we also use the standard deviation. Thus, our clusters won’t look like circles; they will have elliptical shapes. Each cluster will have its Gaussian distribution. A new question arises: how to find the mean and standard deviation for each cluster? We can find them by using EM. The first step is to select a number of clusters and to randomly set the mean and standard deviation for each cluster. Even if we start with a bad choice of mean and standard deviation, the GMM optimize quickly. Having these parameters, we can compute the probability that each object to belong to a certain cluster. The closer the point to the mean of a cluster, the higher the probability of that point to belong to that cluster. Having these probabilities, we estimate the new mean and the new standard deviation of the GMMs. The mean and standard deviation are computed using a weighted sum, with the weights equaling the probabilities. We repeat the process until there are no more changes in the clusters from one iteration to the next. We have several advantages when using the GMMs. The GMMs are flexible, and k-means is not. The standard deviation makes the clusters’ shape to be elliptical, not circular. Mathematically speaking, the k-means algorithm is in fact a special case of GMMs, where the covariance of the cluster equals 0 along all the dimensions. Another advantage is that GMMs are not “black and white.” Using probabilities we can say that a point belongs with some probability to the first cluster and with another probability to the second cluster, thus a point can belong to multiple clusters with a certain probability.

Hierarchical clustering When we are dealing with hierarchical clustering we have three approaches: top-down or divisive, bottom-up or agglomerative, and conceptual.

2.1 Doctor’s suspicion. Doctor + artificial intelligence combo’s diagnosis

145

The agglomerative method considers in the beginning that each object represents a cluster, and afterwards it successively merges pairs of small clusters until all the data is merged into a single one. The clusters are graphical represented through a dendrogram. The dendrogram is a tree that has as root the unique cluster that contains all the data, and as leaves each sample data. In order for us to build an agglomerative hierarchical cluster we begin with assigning a small cluster for each data in our data set. If we have n objects, then we will have n clusters. The next natural step is to select the distance metric that we would like to use to obtain the distance between two clusters (e.g. single linkage). At each iteration we combine clusters two by two, taking into account the smallest value of the linkage. This process is repeated until we have just one cluster that contains all the data. We can choose which number of clusters we want our data to be grouped on. The divisive clustering works the other way around, the only difference being the way we build the tree. In the beginning of this chapter we mentioned that imagistic methods are used to diagnose cancer. Every time a doctor has a suspicion regarding cancer, the radiology industry gets to work by performing X-rays, CT scans, MRI, and other tools that allows the doctor to look inside the patient’s body without making a cut. The radiology technicians take the pictures and the radiologist examines in order to place their diagnosis. Can AI help them read faster and more accurately the images? The answer is yes. An article published in the Journal of National Cancer Institute in September 2019, presents a comparison between the performance of a stand-alone AI system that detects breast cancer in mammography and the performance of 101 Radiologists (Rodriguez Ruiz et al., 2019). The study was applied on nine data sets (Wallis et al., 2012; Visser et al., 2012; Hupse et al., 2013; Siemens Medical Solutions, 2015; Garayoa et al., 2018; Rodriguez Ruiz et al., 2018; Clauser et al., 2019) from seven countries (Sweden, United Kingdom, Netherlands, Italy, United States, Spain, Austria). The mammograms were taken with devices from Hologic, Siemens, General Electric, and Philips. The reported results show that the AI performed better than the average of the 101 radiologists. The AUROC of the AI was 0.840, with (0.820, 0.860) 95% confidence interval, and the average AUROC of the radiologists was 0.814, with (0.787, 0.841) 95% confidence interval. The AI system outperformed 61.4% radiologists, if we take into consideration the AUC. The reported radiologist’s specificity ranged from 0.49 to 0.79, whereas the sensitivity ranged from 0.76 to 0.84. In which regards the specificity and sensitivity of the AI system, the reported results show a higher sensitivity on five data sets (1.0%–8.0%) and a lower in three (1%–2%). The AI system uses deep learning CNNs, feature classifiers, and image analysis algorithms for the soft tissue lesions and calcification detection (Bria et al., 2014; Hupse and Karssemeijer, 2009; Karssemeijer, 1998; Karssemeijer and Te Brake, 1996; Mordang et al., 2016). The images processed by the system use mediolateral oblique and cranio-caudal views of each breast. The data sets used were unbalanced, the AI system being trained, validated, and tested on 9000 cancer mammograms (one third of which were with lesions with calcifications), and 180,000 cancer free mammograms. The data set features were: the digital mammography exam, the radiologists’ score of each digital mammography exam, and the ground truth classification. The classes that corresponded to the radiologists’ score were reported in the form of the Breast Imaging

146

2. The beginnings

Reporting and Data System (BI-RADS—a risk assessment and quality assurance tool that has been developed by the American College of Radiology). The decision classes are: 1—negative, 2—benign finding, 3—probably benign, 4—suspicious abnormality, 5—highly suspicious of malignancy. The probabilities of malignancy scores that range from 1 to 100 were used also. Different from the standard practice applied in screening programs where the interpretation involved double reading plus consensus or arbitration, depending on the case, here the digital mammographies were evaluated by individual radiologists. The ground truth was confirmed by histopathology or at the follow-up (after at least 1 year). The total number of interpretations was 28,296 from 2652 cases from which 653 were malignant. The conclusion of the study was that the AI system is as good as the average screening radiologist. Perhaps a more throughout statistical analysis should have been made, taking into account the fact that the data set was unbalanced, the precision recall curves should have been performed also. A study that has been published on May 2019 in Nature Medicine, shows how a threedimensional deep learning neural network can screen for lung cancer on low dose chest CT (Ardila et al., 2019). Regular chest X-rays that were used for screening for lung cancer do not help people live longer. Fortunately, a new test known as a low dose CAT scan or CT scan has proved that it is more efficient when in comes to saving lives. The low dose CT can reduce mortality by 20%–43% and now it is part of the US screening guidelines (American Lung Association, 2019; Jemal and Fedewa, 2017; US Preventive Services Task Force, 2018; National Lung Screening Trial Research Team et al., 2011; Black et al., 2014). The AI system proposed in this study uses for the lung cancer risk prediction both the patient’s current and prior CT volumes. The authors reported a 0.944 AUC obtained on predicting 6716 National Lung Cancer Screening Trial cases, and also a similar AUC for the validation set of 1139 cases. Another research paper tries to set the prostate cancer diagnosis without measuring the Gleason score of biopsied tissues (Yuan et al., 2018). The authors developed a multiparametric magnetic resonance transfer learning that can discriminate features in prostate images and automatically stage prostate cancer. A deep learning neural network with three branch architectures was designed. The CNN transferred a pre-trained model to compute features from multiparametric MRI images, T2w transaxial, T2w sagittal, and apparent diffusion coefficient. The multiparametric MRI images are obtained by concatenating all the learned features. Joint constrains of softmax loss and image similarity loss help the system to classify the images. Two data sets were used for testing the performance and robustness of the proposed model: the first contains 132 cases from the researchers’ institutional review board patient data set, and the second contains 112 cases from the PROSTATEx-2 Challenge. The model obtained a mean accuracy of 86.92%. Another AI system, three incremental deep neural networks were developed for fully automatic brain tumor segmentation (Ben Naceur et al., 2018). The method was used for segmenting both low and high grade Glioblastomas. When dealing with Glioblastomas the doctors take into account four features: size, shape, contrast and the place where they appeared in the brain (they can appear anywhere). The authors applied an ensemble learning technique to design an efficient model. The AI system was tested on the BRATS-2017 data set.

2.1 Doctor’s suspicion. Doctor + artificial intelligence combo’s diagnosis

147

The authors reported their model achieved similar performance as other state-of-the-art techniques, while being the fastest on the market, mean accuracy of 88% in 20.87 s. Remaining in the brain tumor department, a deep learning CNN was developed for segmenting brain metastases on high contrast enhanced T1 weighted MRI data sets (Liu et al., 2017). The model was tested on the BRATS-2017 data set and clinical patients’ data. The reported result on BRATS data was and AUC of 0.98  0.01. And more is yet to come, with the release of a massive data set of CT images by the National Institutes of Health that any data scientist can use to build more efficient and robust AI algorithms. The data set contains 32,000 annotated images of 4400 anonymzed patients. The data set contains all kinds of radiology findings such as liver tumors, lung nodules, enlarged lymph nodes, etc. Images can be found via Box at https://nihcc.box.com/v/DeepLesion. Besides this massive data set, NIH also published a roadmap for AI in medical imaging. Their hope is that this roadmap might bring together data scientists from different fields, standard bodies, private industry, governmental agencies, etc. so that their work would enhance the use of AI in imaging, thus patients around the world would benefit from this. The roadmap has been published in Radiology (Langlotz et al., 2018). If we look at the numbers, we see that 10% of patients’ deaths are caused by diagnostic errors (National Academy of Medicine, 2015). Radiologists misinterpret between 3% and 6% images (Borgstede et al., 2004; Berlin, 2007; Waite et al., 2017). The report points out to the major barrier to progress in AI applied in medical imaging: the lack of standard and accessible data sets for the training and testing of the AI techniques. Unfortunately, even if the health care system has in control large data sets concerning medical imagining, the access to these data sets is off limits for the data scientists. For the situation to improve, we need to find efficient methods to collect data, to de-identify data so it remains anonymous, to correctly label it, to link it to the ground-truth, and to find the means to manage what it has been collected (Wilkinson et al., 2016). For data scientists it is hard to find good medical imaging data sets that are publicly available. Fortunately, Google developed a data set search tool that returns web-based repositories that contain medical data sets (Noy, 2018). An interesting point raised by the NIH roadmap is the reconstruction and enhancement of CT, MRI, PET, US, and optical scans. An emerging area of extensive study is the conversion of source data from the sensors into reconstructed tomographic images that can be interpreted by humans or radiomics (the extraction of mineable high dimensional data from clinical images (Rizzo et al., 2018)). Deep learning can be used to reconstruct these images (Wang et al., 2018). MRI reconstruction can be made with the use of deep learning from a dose counterpart or low radiation dose scan. The result will be shorter scan times and lower cost imaging devices (Chen et al., 2019). Another important research direction for AI applied in medical imaging is the development of a rapid labeling and annotation method for clinical images. The imaging studies that are stored now are not suitable for the training of AI algorithms. One idea is to use algorithms that semi automatically discover patterns in images, so the only thing that a doctor needs to do is to modify the generated pattern rather than to generate each annotation from the start (Hoogi et al., 2017). In Weston et al. (2019), the authors developed a convolutional neural network that was based on the U-net architecture, which was tested for automated abdominal

148

2. The beginnings

segmentation of CT scans. The training data set contains 2430 two-dimensional CT scans and the testing data set contains 270 CT scans. The authors did a second testing phase, this time on a data set of 2369 patients with hepatocellular carcinoma. The patients’ age was between 29 and 94 years for male patients, and 31 and 97 years for female patients. A two-way analysis with Bonferroni correction has been used for assessing the differences in segmentation performance. The Bonferroni correction is used for several dependent or independent tests. It is a multiple comparison correction applied when performing statistical tests simultaneously (Bonferonni, 1935, 1936). In the reported training performances for the first test data set were: for subcutaneous compartment mean 0.98 with standard deviation 0.03; for the muscle compartment mean 0.96 with standard deviation 0.02; and for the adipose tissue compartment mean 0.97 with standard deviation 0.01. As for the second test data set, the hepatocellular carcinoma, the authors reported the following performances: for subcutaneous compartment mean 0.94 with standard deviation 0.05; for the muscle compartment mean 0.92 with standard deviation 0.04; and for the adipose tissue compartment mean 0.98 with standard deviation 0.02. Some AI methods can be applied on classifying data that has been extracted from medical imagining. For instance features of the tumor extracted from mammographies: radius, texture, perimeter, smoothness, area, concavity, concave points, compactness, symmetry, fractal dimension, etc. For data sets that contain these kinds of attributes we can apply different methods, not just convolutional neural networks. An example is the application of the Bayesian paradigm presented above. In Belciug and Gorunescu (2014) the authors applied this type of learning in classifying breast and lung cancers. The data set used are publicly available, and can be found at the following links: http://archive.ics.uci.edu/ml/datasets/Breast+CancerWisconsin+%28Diagnostic%29 (accessed November 9, 2019), http://archive.ics.uci.edu/ml/datasets/Breast+Cancer (accessed November 9, 2019), http://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+% 28Prognostic%29, and http://archive.ics.uci.edu/ml/machine-learning-databases/lungcancer/lung-cancer.names (accessed November 9, 2019). The researchers applied their method and obtained accuracies between 62.88% and 81.31%, comparable with those obtained by state-of-the-art AI algorithms (Gorunescu et al., 2010). The evolutionary paradigm mentioned above has been tested on the same breast cancer data set used in Belciug and Gorunescu (2014). Here, the authors compared their method with classical MLP, RBF, PNN, and PCNN, and obtained higher accuracies, ranging from 81.11% to 93.58%. Their model was surpassed only on one data set by the MLP, which obtained 95.52% (Belciug and Gorunescu, 2013). The evolutionary paradigm has been applied also in Gorunescu and Belciug (2016), Gorunescu and Belciug (2014), and Gorunescu et al. (2012). We believe that probably the most interesting point raised by the NIH is the lack of understanding of how an AI algorithm makes a decision. In the roadmap article it is stated that this is an important impediment that stops AI from going further to clinical adoption and regulatory approval. We hope that by the explanations in this book the famous saying that “a neural network is just like black box” is long gone. In their report, they point out that deep neural networks depend on the bias (Char et al., 2018), and adversarial attacks can be done (Finlayson et al., 2018). But what is exactly an adversarial attack?—https://towardsdatascience.com/tagged/ security-for-ml (accessed November, 12, 2019). An adversarial attack is basically an evasion attack that feeds your network with adversarial examples. An adversarial example is a

2.1 Doctor’s suspicion. Doctor + artificial intelligence combo’s diagnosis

149

cautiously perturbed input that looks the same for a human eye, but which completely messes up your classifier. This is not a new issue data scientists have to confront with. Battista Biggio and Fabio Roli showed in their paper (Biggio and Roli, 2018) that these attacks on AI algorithms date back from 2004. The first attacks were spam emails, and the examples studied were spam filters. The most recent types of attack came along when deep learning became the star of AI. While trying to understand how deep learning “thinks,” Christian Szegedy from Google AI discovered that deep learning neural networks can be easily fooled by small perturbations (Szegedy et al., 2013). For more details regarding adversarial attacks examples we refer the reader to Sharif et al. (2016) for the fooling face detection by just placing adversarial glasses onto someone’s face example; https://github.com/advboxes/AdvBox/blob/ master/applications/StealthTshirt/README.md (accessed November 12, 2019); for the T-shirt that has a pattern printed on it to hide the person who is wearing it from an object detector (Anderson et al., 2018); for the evasion of a malware classifier using reinforcement learning. But does this phenomenon happen? We encounter one hypothesis in Szegedy’s paper (Szegedy et al., 2013), where it is argued that adversarial attacks exist because of the nonlinearity and poor regularization of neural networks. Another theory, that is opposite to Szegedy’s hypothesis is the one of Goodfellow’s, which states that the reason we have adversarial examples is because of too much linearity in deep neural networks (Goodfellow et al., 2014). It is interesting to see that even if the two theories contradict themselves, they are both co-authored by Christian Szegedy. Goodfellow’s hypothesis is that both the ReLU and the Sigmoid functions are just straight lines in the middle, the exact spot where we keep the gradient so it does not explode or vanish. Thus imagine what happens if tiny perturbations appear in the deep neural networks, which contain a lot of linear functions. If only some pixels are modified, eventually they will build up into a huge difference at the end of the feedforward pass. The third hypothesis is the one of Thomas Tanay and Lewis Griffin (Tanay and Griffin, 2016). Their theory is that due to the fact that the model never fits the data perfectly we will always encounter adversarial pockets of inputs that are between the sample data and the classifier’s boundary. Beside very mathematical theories such as the lack of sufficient training data (Schmidt et al., 2018), or computational intractability of robust classifiers (Bubeck et al., 2018), a recent paper states that the adversarial examples are not bugs, they are features. Weird, right? But if you take a second and think about it, you will understand that even if humans are limited to a three-dimensional universe, the fact that we cannot distinguish noise patterns from another dimension does not imply that those patterns are not actually good features (Ilays et al., 2019). We think that a proper comparison would be the way we process sounds versus how dogs process sounds. The fact that we do not understand a certain pattern does not mean that it is bad. How to prevent adversarial attacks? Well, we again state the famous quote “knowledge is power.” The attacker has power only if she/he knows the targeted system. The more she/he knows, the easier she/he attacks. Calypso—calupsoai.com (accessed November 12, 2019)— states that to prevent attacks you must answer the following questions: Who will attack you? Why will they attack you? How will they attack you? Who is the attacker? Is it a hacker? But not just any hacker, a hacker with good know-how regarding AI and who has a mathematical background? Why does she/he attack? To sell what you have produced to another entity? Most probably. The hardest question to answer is how they will attack.

150

2. The beginnings

Calypsto classified attacks into five classes: • the one that uses gradients: these types of attacks need access to the model’s gradient, thus making it a type of WhiteBox attack. This type of attack is extremely powerful, since the attacker has access to the math behind the model (Carlini and Wagner, 2017). Today we have knowledge about three gradient based attacks: EAD (L1 norm) (Chen et al., 2017a); C&W (L2 norm) (Carlini and Wagner, 2017); Madry (Li norm) (Madry et al., 2017). • the one that use confidence score: in this kind of approach knowledge about the model is not needed, making it a Black box type of attack. The attack is directed to the output classification confidence, in order to estimate the gradients of the model, and then apply the same attack as in the gradient-based attack. We mention here three confidence based attacks: ZOO (Chen et al., 2017b); SPSA (Uesato et al., 2018); and NES (Ilays et al., 2018). • the one that uses hard labels: in this kind of attack the attacker uses the output decision classes. The attack is not as smart as the other types of attacks. The best known attack that uses hard labels is the Boundary Attack (Brendel et al., 2017). • the one that uses surrogate models: this type of model is somewhat similar to the gradient based attack, the only difference is that it does not have access to the model, thus it needs to built a surrogate model. For this task it has the following options: o if the model that is targeted acts like an oracle, that is it answers asked questions, then the attacker can learn the mathematics behind the model by asking a lot of questions and taking notes of the input-output pairs. o if the model that is targeted performs a standard classification task, then the attacker can presume the architecture of the model and built a surrogate model. o if the attacker has no knowledge at all regarding the targeted model, then she/he could just take an image classifier and produce imperfect adversarial examples. • and the brute-force attack: this type of attack just uses simple tactics such as: randomly rotating and translating images, adding Gaussian noise with large standard deviations, etc. The last thing on our agenda referring to adversarial attacks is some pointers regarding how to defend our AI algorithms from them. There are two kinds of line defenses: the formal defense and the empirical defense. Formal defense practically tries every possible attack scenario and sees how it can be defended. For example, if we have a 1200 1200 3, that is an image with 1200 px height, 1200 px width, and three color channels, and we want to test all the perturbations in a range of 3 px in each direction, then we would have (3 + 1 + 3)(312001200). As it can be easily seen, the formal defense method is not the best method to be used. For further reading we refer you to Katz et al. (2017), Huang et al. (2016), and Tjeng et al. (2017). The empirical defense is a trial error method. Practically, different experiments are involved to demonstrate if a defense is effective or not. Hopefully, by each simulated attack your defense gets stronger. In the empirical defense, contrary to the formal defense where you try every single possibility, you follow the old saying “if it works, don’t fix it.” Here we are going to mention some of the most used defense experiments that exist on the market nowadays. • adversarial training: this defense method is considered to be the most effective. The method simply trains the model feeding it adversarial examples that are properly labeled. Through this, the model is trained to ignore the noise in the images, and only learn from the “good”

2.1 Doctor’s suspicion. Doctor + artificial intelligence combo’s diagnosis









151

features. Unfortunately, the defense method has its flaw: it works only on the same attacks used to build the adversarial examples. If a hacker creates adversarial examples built with a different algorithm, or tries a white box attack, then our model has no defense line. If we would consider training more adversarial examples to the model, to try to make it learn new fake data, our model will learn too much fake data and will become useless for our purpose. For further reading on how to protect your model using adversarial training we refer the reader to: ensemble adversarial attack (Tramer et al., 2017), cascade adversarial attack (Na et al., 2017), robust optimization approach to adversarial attack (Madry et al., 2017), and adversarial attack via spectral normalization (Farnia et al., 2018). gradient masking: this type of defense is kind of old news, since the most popular method, the defense distillation (Papernot et al., 2015), was proved to be wrong (Carlini and Wagner, 2016). The defense proposed is to simply hide the gradients, but recall that even if the attacker does not have access to your model’s gradients, she/he can still build a surrogate. input modification: this type of defense preprocesses the input first, to make sure it is clean, with no adversarial noise. Some of the most popular proposed solutions are: high level representational denoisers (Liao et al., 2017), smoothing (Xu et al., 2017), JPEG compression (Dziugaite et al., 2016), pixel deflection (Prakash et al., 2018), reforming GANs (Samangouei et al., 2018), autoencoder (Gu and Rigazio, 2014), foveation (Luo et al., 2015), general basis function transformations (Shaham et al., 2018), etc. detection: this type of defense resembles to the input modification defense. Technically, after an input is cleaned, we compare its prediction with the original unclean prediction, and if the two inputs’ labels are too different, then that input is perturbed (Meng and Chen, 2017; Akhtar et al., 2017). Other detection defense techniques actually train a different neural network whose job is to decide if an input is fake or not (Metzen et al., 2017). Other detection defenses statistically filter for adversarial attacks on: the input (Groose et al., 2017), the convolutional filters (Li and Li, 2016), ReLU activations (Lu et al., 2017), and the logits (Kannan et al., 2018). extra class: this type of defense trains the model on a particular data distribution, that makes it inexperienced with data outside this distribution. So, technically, the model does not decide any class if it doesn’t know what it is, it simple refrains from deciding (Hosseini et al., 2017).

No matter what line of defense you choose, it will still have flaws and it might break during an adversarial attack. If you want to get your hands dirty and learn how to defend your model we suggest the compilation of open source tools from the Ethical Institute for AI & Machine Learning—https://github.com/EthicalML/awesome-production-machine-learning# adversarial-robustness-libraries (accessed November 13, 2019). The European Society of Radiology—myesr.org (accessed November 12, 2019)—is fully aware of the wide impact that AI has on imaging. The impact is rather overwhelming; it affects every corner of Radiology from different point of views: technical, scientific, ethical, and economical. The statements released by the media are not comforting for the radiologists. Andrew Ng from Stanford said that “a highly trained and specialized radiologist may now be in greater danger of being replaced by a machine than his own executive assistant” (Morgenstern, 2017), while Geoffrey Hinton from Toronto said that “if you work as a

152

2. The beginnings

radiologist, you’re like the coyote that’s already over the edge of the cliff, but hasn’t yet looked down so doesn’t realize there’s no ground underneath him. People should stop training radiologists now. It’s just completely obvious that within 5 years, deep learning is going to do better than radiologists… We’ve got plenty of radiologists already” (Mukherjee, 2017). Despite these statements, it is our strong belief that AI will never ever replace doctors in any specialty, they will just provide guidelines, the final call still being made by the doctor. With this, we end our presentation regarding classification and clustering analysis. All the methods presented in this chapter are going to show you in the chapters that are to come how great they are at taking the fight against cancer to another level, the AI level!

2.2 World collapses. Making an informed decision The world indeed collapses when you hear the word cancer connected to you. As a doctor you need to find the means to explain to your patient what are the choices she/he can make, the costs that come along with each decision. You need to explain with numbers, with statistics, you need to understand that your patient and her/his family are terrified, and you need to “translate” what is happening or going to happen. Science through statistics and AI can provide a good guidance. AI brings new ways to look at the disease, and by this it brings new ideas on how to win the fight against cancer. A good doctor will always try to keep up with new discoveries in the field, and in order to do that she/he must herself/himself be able to understand the mathematical concepts behind each new AI method. Facts, numbers, always act like a compass. And when they fail, there will always be Hope. As a computer data scientist, you need to make a difference in the world. And what better way to do that than to contribute to new ways of diagnosing more accurately the disease, new tailored treatment plans, new drugs, new therapies, etc. And as regular person reading this book, you need to always be informed when making a decision. We know that many will say that medicine is not mathematics, but in a troubled state of mind, when you hear this diagnosis, maybe the only thing that you can trust is mathematics. AI allows the doctors, the computer scientists, and the patients to make faster and more reliable decisions, speeding up the treatment process, and making this awful experience less stressful for everyone involved. AI can establish tailored treatment planning, and can bring the experience of collective oncology care teams by embedding their vast knowledge inside a computer. Through AI, the best practices discovered can be spread globally, and thus the cancer treatments can be improved all over the world. As a final note, besides the doctors, besides the AI, you need to be able to make an informed decision regarding your life, and for this to happen you need to understand the odds and you need to understand what is it you are going up to: what is the best treatment option, what are your chances taking into consideration your personal health, etc. That is why in the following chapters, we shall go through each part of the cancer process: from diagnosing it, through cancer surgery, chemotherapy, radiotherapy, survival analysis, remission, recurrence, etc. But through this journey we will walk hand in hand with our new two best friends: AI and Statistics.

References

153

References Akhtar, N., Liu, J., Mian, A., 2017. Defense Against Universal Adversarial Perturbations. arxiv.org/abs/1711.05929. American Lung Association, 2019. Lung Cancer Fact Sheet. American Lung Association.http://lung.org/lung-healthand-disease-lookup/lung-cancer/resource-library/lung-cancer-fact-sheet.html. (Accessed 11 November 2019). Anderson, H., Kharkar, A., Filar, B., Evans, D., Roth, P., 2018. Learning to Evade Static PE Machine Learning Malware Models via Reinforcement Learning. arxiv.org/pdf/1801.08917.pdf. Ardila, D., Kiraly, A.P., Bharadwaj, S., Choi, B., Reicher, J.J., Peng, L., Tse, D., Etemadi, M., Ye, W., Corrado, G., Naidich, D.P., Shetty, S., 2019. End to end lung cancer screening with three dimensional deep learning on low dose chest computed tomography. Nat. Med. 25, 954–961. Belciug, S., El Darzi, E., 2010. A partially connected neural network based approach with application to breast cancer detection and recurrence. In: Proceeding of the IEEE Conference on Intelligent Systems, pp. 191–196. Belciug, S., Gorunescu, F., 2013. A hybrid neural network/genetic algorithm system applied to the breast cancer detection and recurrence. Expert Syst. J. Knowl. Eng. 30 (3), 243–254. Belciug, S., Gorunescu, F., 2014. Error-correction learning for artificial neural networks using the Bayesian paradigm. Application to automated medical diagnosis. J. Biomed. Inform. 52, 329–337. https://doi.org/10.1016/j. jbi.2014.07.013. Belciug, S., Gorunescu, F., 2018. Learning a single-hidden layer feedforward neural network using a rank correlationbased strategy with application to high dimensional gene expression and proteomic spectra data sets in cancer detection. J. Biomed. Inform. 83, 159–166. https://doi.org/10.1016/j.jbi.2018.06.003. Belciug, S., Gorunescu, F., 2020. Data mining based intelligent decision support systems. In: Belciug, S., Gorunescu, F. (Eds.), Intelligent Decision Support Systems—A Journey to Smarter Healthcare. In: Intelligent Systems Reference LibrarySpringer Nature Switzerland. ISBN 978-3-030-14353-4. Belciug, S., Gorunescu, F., Gorunescu, M., Salem, A.B., 2010. Clustering based approach for detecting breast cancer recurrence. In: Proceedings of the 10th IEEE International Conference on Intelligent Systems Design and Applications (ISDA), Cairo, pp. 533–538. Ben Naceur, M., Saouli, R., Akil, M., Kachouri, R., 2018. Fully automatic brain tumor segmentation using end to end incremental deep neural networks in MRI images. Comput. Methods Programs Biomed. 166, 39–49. https://doi. org/10.1016/j.cmpb.2018.09.007. Bercoff, J., Chaffai, S., Tanter, M., Sandrin, L., Cathelin, S., Fink, M., Gennisson, J.L., Meunier, M., 2003. In vivo breast tumor detection using transient elastography. Ultrasound Med. Biol. 29, 1387–1396. Berglund, R., Belciug, S., 2018. Improving extreme learning machine performance using ant colony optimization feature selection. Application to automated medical diagnosis. Ann. Univ. Craiova Math. Comput. Sci. Ser. 45 (1), 151–155. Berlin, L., 2007. Accuracy of diagnostic procedures: has it improved over the past decades? AJR Am. J. Roentgenol. 188 (5), 1173–1178. Biggio, B., Roli, F., 2018. Wild Patterns: Ten Years After the Rise of Adversarial Machine Learning. arxiv.org/pdf/ 1712.03141.pdf. Bishop, C.M., 2006. Pattern Recognition and Machine Learning. Springer. Black, W.C., Gareen, I.F., Soneji, S.S., Sicks, J.D., Keeler, E.B., Aberle, D.R., Naeim, A., Church, T.R., Silvestri, G.A., Gorelick, J., Gatsonis, C., 2014. Cost effectiveness of CT screening in the national lung screening trial. N. Engl. J. Med. 371, 1793–1802. Blickle, T., Thiele, K., 1995. A Comparison of Selection Schemes Used in Genetic Algorithms (TIK-Report), p. 11. Bojunga, J., Herrmann, E., Meyer, G., Weber, S., Zeuzem, S., Friedrich-Rust, M., 2010. Real-time elastography for the differentiation of benign and malignant thyroid nodules: a meta-analysis. Thyroid 20, 1145–1150. Bonferonni, C.E., 1935. Il calcolo delle assicurazioni su gruppi di teste. In: Studi in Onore del Professore Salvatore Ortu Carboni, Italy, pp. 13–60. Bonferonni, C.E., 1936. Teoria statistica delle classi con e calcolo delle probabilita. Publicazioni del R Insitituto Superiori di Scienze Economiche e Commerciali di Firenze 8, 3–62. Borgstede, J.P., Lewis, R.S., Bhargavan, M., Sunshine, J.H., 2004. RADPEER quality assurance program; a multifacility study of interpretive disagreement rates. J. Am. Coll. Radiol. 1 (1), 59–64. Bradley, P.S., Mangasarian, O.L., Street, W.N., 1997. Clustering via concave minimization. In: Mozer, M.C., Jordan, M.I., Petsche, T. (Eds.), Advances in Neural Information Processing Systems. In: vol. 9. MIT Press, Cambridge, MA, pp. 368–374.

154

2. The beginnings

Bravais, A., 1846. Analyse mathematique sur le probabilites des erreurs de situation d’un point. Memoires Presents Par Divers Savants a l’Academie des Sciences de France. Sciences Mathematiques et Physiques 9, 255–332. Brendel, W., Rauber, J., Bethge, M., 2017. Decision Based Adversarial Attacks: Reliable Attacks Against Black Box Machine Learning Models. arxiv.org/abs/1712.04248. Bria, A., Karssenmeijer, N., Tortella, F., 2014. Learning from unbalanced data: a cascade based approach for detecting clustered microcalcifications. Med. Image Anal. 182, 241–252. Broomhead, D.H., Lowe, D., 1988. Multivariable functional interpolation and adaptive networks. Complex Syst. 2, 321–355. Bubeck, S., Price, E., Raznshteyn, I., 2018. Adversarial Examples From Computational Constrains. arxiv.org/abs/ 1804.11285. Cacoulos, T., 1966. Estimation of multivariate density. Ann. Inst. Stat. Math (Tokyo) 18, 179–189. Carlini, N., Wagner, D., 2016. Defensive Distillation Is Not Robust to Adversarial Examples. arxiv.org/pdf/1607. 04311.pdf. Carlini, N., Wagner, D., 2017. Adversarial Examples Are Not Easily Detected: Bypassing Ten Detection Methods. arxiv.org/abs/1705.07263. Char, D.S., Shah, N.H., Magnus, D., 2018. Implementing machine learning in health case—addressing ethical challenges. N. Engl. J. Med. 378 (11), 981–983. https://doi.org/10.1056/NEJMp1714229. Chen, P.Y., Sharma, Y., Zhang, H., Yi, J., Hsieh, C.J., 2017a. EAD. Elastic Net Attacks to Deep Neural Networks via Adversarial Examples. arxiv.org/abs/1709.04114. Chen, P.Y., Zhang, H., Sharma, Y., Yi, J., Hsieh, C.J., 2017b. ZOO: Zeroth Order Optimization Based Black Box Attacks to Deep Neural Networks Without Training Substitute Models. arxiv.org/abs.1709.03999. Chen, K.T., Gong, E., de Carvalho Macruz, F.B., Xu, J., Boumis, A., Khalighi, M., Poston, K.L., Sha, S.J., Greicius, M.D., Mormino, E., Pauly, J.M., Srinivas, S., Zaharchuk, G., 2019. Ultra low dose 18F-florbetaben amyloid PET imaging using deep learning with multi contrast MRI inputs. Radiology 290 (3), 649–656. https://doi.org/10.1148/ radiol.2018180940. Cheng, Y., 1995. Mean shift, mode seeking, and clustering. IEEE Trans. Pattern Anal. Mach. Intell. 17 (8), 790–799. Clauser, P., Baltzer, P.A.T., Kapetas, P., Woitek, R., Weber, R., Leone, F., Bernathova, M., Helbich, T.H., 2019. Synthetic 2-dimensional mammography can replace digital mammography as an adjunct to wide-angle digital breast tomosynthesis. Invest. Radiol. 54 (2), 83–88. Cochlin, D.L., Ganatra, R.H., Griffiths, D.F.R., 2002. Elastography in the detection of prostatic cancer. Clin. Radiol. 57, 1014–1020. Cost, S., Salzberg, S., 1993. A weighted nearest neighbor algorithm for learning with symbolic features. Mach. Learn. 10 (1), 57–78. Darwin, C., 1859. On the Origin of Species by Means of Natural Selection, or Preservation of Favored Races in the Struggle of Life. John Murray, London. Dorugade, A.V., Kashid, D.N., 2010. Alternative method for choosing ridge parameter for regression. Appl. Math. Sci. 4 (9), 447–456. Dziugaite, G.K., Ghahramani, Z., Roy, D.M., 2016. A Study of the Effect of JPG Compression on Adversarial Images. arxiv.org/abs/1608.00853. Eiben, A.E., 2003. Multiparent recombination in evolutionary computing. In: Gosh, A., Tsutsui, S. (Eds.), Advances in Evolutionary Computation: Theory and Applications. Springer, Heidelberg, pp. 175–192. Eiben, A.E., Smith, J.E., 2003. Introduction to Evolutionary Computing. Springer, Heildelberg. Eshelman, L., Schaffer, D.J., 1993. Real-coded genetic algorithms and interval schemata. In: Foundations of Genetic Algorithms.vol. 2, pp. 187–202. https://doi.org/10.1016/B978-0-08-094832-4.50018-0. Farnia, F., Zhang, J., Tse, D., 2018. Generalizable Adversarial Training via Spectral Normalization. arxiv.org/abs/ 1811.07457. Finlayson, S., Chung, H.W., Kohane, I.S., Beam, A.L., 2018. Adversarial Attacks Against Medical Deep Learning Systems. arxiv.abs/1804.05296. Friedrich-Rust, M., Ong, M.F., Herrmann, E., Dries, V., Samaras, P., Zeuzem, S., Sarrazin, C., 2007. Real-time elastography for noninvasive assessment of liver fibrosis in chronic viral hepatitis. AJR Am. J. Roentgenol. 188 (3), 758–764. Fukushima, K., 1980. Neocognitron: a self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biol. Cybern. 36 (4), 193–202.

References

155

Galton, F., 1874. One men of science, their nature and their nurture. Proc. R. Inst. Great Brit. 7, 227–236. Galton, F., 1888. Co-relations and their measurement, chiefly from anthropometric data. Proc. R. Soc. Lond. 45 (273–279), 135–145. Garayoa, J., Chevalier, M., Castillo, M., Mahillo Fernandez, I., Amallal El Ouahabi, N., Estrada, C., Tejerina, A., Benitez, O., Valverde, J., 2018. Diagnostic value of the stand-alone synthetic image in digital breast tomosynthesis examinations. Eur. Radiol. 282, 565–572. Garcia Laencina, P.J., Sancho Gomez, J.L., Figueiras Vidal, A.R., Verleysen, M., 2009. K-nearest neighbors with mutual information for simultaneous classification and missing data imputation. Neurocomputing 72 (7–9), 1483–1493. Goodfellow, I., Shlens, J., Szedegy, C., 2014. Explaining and Harnessing Adversarial Examples. arxiv.org/abs/1412. 6572. Goodfellow, I., Bengio, Y., Courville, A., 2016. Deep Learning. MIT Press, Cambridge, MA/London. Gorunescu, F., 2003. k-Means clustering: a heuristic approach to improve the treatment effectiveness. Craiova Med. J. 5 (3), 421–433. Gorunescu, F., Belciug, S., 2014. Evolutionary strategy to develop learning based decision systems. Application to breast cancer and liver fibrosis stadialization. J. Biomed. Inform. 49, 112–118. Gorunescu, F., Belciug, S., 2016. Boosting backpropagation algorithm by stimulus sampling. Application in computer aided medical diagnosis. J. Biomed. Inform. 63, 74–81. https://doi.org/10.1016/j.jbi.2016.08.004. Gorunescu, F., Gorunescu, M., El-Darzi, E., Gorunescu, S., 2010. A statistical framework for evaluating neural networks to predict recurrent events in breast cancer. Int. J. Gen. Syst. 39 (5), 471–488. Gorunescu, F., Gorunescu, M., Saftoiu, A., Vilmann, P., Belciug, S., 2011. Competitive/collaborative neural computing system for medical diagnosis in pancreatic cancer detection. Expert Syst. 28 (1), 33–48. Gorunescu, F., Belciug, S., Gorunescu, M., Badea, R., 2012. Intelligent decision making for liver fibrosis stadialization based on tandem feature selection and evolutionary driven neural network. Expert Syst. Appl. 39, 12824–12832. Groose, K., Manoharan, P., Papernot, N., Backes, M., McDaniel, P., 2017. On the Statistical Detection of Adversarial Examples. arxiv.org/abs/1702.06280. Gu, S., Rigazio, L., 2014. Towards Deep Neural Network Architectures Robust to Adversarial Examples. arxiv.org/ abs/1412.5068. Hajek, A., 2012. Interpretation of probability. In: Edward, N.Z. (Ed.), The Standford Encyclopedia of Philosophy, Winter. http://plato.standford.edu/archives.win2012/entries/probability-interpret/. Hancock, P.J.B., 1992. Genetic algorithms and permutation problems: a comparison of recombination operators for neural net structure specification. In: Whitley, D.L., Schaffer, J.D. (Eds.), Proc. Int. Workshop on Combinations of Genetic Algorithms and Neural Networks. IEEE Computer Society, Los Alamitos, CA, pp. 108–122. Hand, D., Mannila, H., Smyth, P., 2001. Principles of Data Mining. MIT Press, Cambridge. Hassan, M.R., Hossain, M.M., Bailey, J., Ramamohanarao, K., 2008. Improving k-nearest neighbor classification with distance functions based on receiver operating characteristics. In: Daelemans, W., Goethals, B., Morik, K. (Eds.), ECML PKDD 2008. Part I. LNCS (LNAI). In: vol. 5211. Springer, Heidelberg, pp. 489–504. Haykin, S., 1999. Neural Networks. A Comprehensive Foundation, second ed. Prentice-Hall. Hebb, D.O., 1949. The Organization of Behavior. A Neuropsychological Theory. Wiley, New York. Holland, J.H., 1975. Adaptation in Natural and Artificial Systems: An Introductory Analysis With Applications in Biology, Control and Artificial Intelligence. University of Michigan Press. Hoogi, A., Beaulieu, C.F., Cunha, G.M., Heba, E., Sirlin, C.B., Napel, S., Rubin, D.L., 2017. Adaptive local window for level set segmentation of CT and MRI liver lesions. Med. Image Anal. 37, 46–55. https://doi.org/10.1016/j. media.2017.01.002. Hosseini, H., Chen, Y., Kannan, S., Zhang, S., Zhang, B., Roovendran, R., 2017. Blocking Transferability of Adversarial Examples in Black Box Learning Systems. arxiv.org/abs/1703.04318. Huang, G.B., 2015a. WHO Behind the Malign and Attack on ELM, GOAL of the Attack and ESSENCE of ELM. www. extreme-learning-machines.org. Huang, G.B., 2015b. What are extreme learning machines? Filling the gap between Frank Rosenblatt’s dream and John von Neumann’s puzzle. Cogn. Comput. 7. https://doi.org/10.1007/s12559-015-9333-0. Huang, G.B., Zhu, Q.C., Siew, C.-K., 2006. Extreme learning machine: a new learning scheme of feedforward neural networks. In: Proc. Intl. Joint Conf. Neur. Net, pp. 985–990.

156

2. The beginnings

Huang, G.B., Chen, L., Siew, C.K., 2004. Universal approximation using incremental constructive feedforward networks with random hidden nodes. IEEE Trans. Neural Netw. 17 (4), 879–892. Huang, G.B., Ding, X., Zhou, H., 2010. Optimization method based extreme learning machine for classification. Neurocomputing 74, 155–163. Huang, G.B., Zhou, H., Ding, X., Zhang, R., 2011. Extreme learning machine for regression and multiclass classification. IEEE Trans. Syst. Man Cybern. 42 (2), 513–529. Huang, X., Kwiatkowska, M., Wang, S., Wu, M., 2016. Safety Verification of Deep Neural Networks. arxiv.org/abs/ 1610.06940. Hupse, R., Karssemeijer, N., 2009. Use of normal tissues context in computer aided detection for masses in mammograms. IEEE Trans. Med. Imaging 2812, 2033–2041. Hupse, R., Samulski, M., Lobbes, M.B., Mann, R.M., Mus, R., den Heeten, G.J., Beijerinck, D., Pijnappel, R.M., Boetes, C., Karssemeijer, N., 2013. Computer aided detection of masses at mammography: interactive decision support versus prompts. Radiology 2661, 123–129. Ilays, A., Engstrom, L., Athalye, A., Lin, J., 2018. Black-Box Adversarial Attacks With Limited Queries and Information. arxiv.org/abs/1804.08598. Ilays, A., Santurkar, S., Tsipars, D., Engstrom, L., Tran, B., Madry, A., 2019. Adversarial Examples Are Not Bugs, They Are Features. arxiv.org/abs/1905.02175. Jain, A.K., Dubes, R.C., 1988. Algorithms for Clustering Data. Prentice-Hall. Jebari, K., Madiafi, M., 2013. Selection methods for genetic algorithm. Int. J. Emerg. Sci. 3 (4), 333–344. Jemal, A., Fedewa, S.A., 2017. Lung cancer screening with low dose computed tomography in the United States— 2010-2015. JAMA Oncol. 3, 1278. Kannan, H., Kurakin, A., Goodfellow, I., 2018. Adversarial Logit Pairing. arxiv.org/abs/1803.06373. Karssemeijer, N., 1998. Automated classification of parenchymal patterns in mammograms. Phys. Med. Biol. 432, 365–378. Karssemeijer, N., Te Brake, G.M., 1996. Detection of stellate distortions in mammograms. IEEE Trans. Med. Imaging 155, 611–619. Katz, G., Barrett, C., Dill, D., Julian, K., Kochenderfer, M., 2017. Reluplex: An Efficient SMT Solver for Verifying Deep Neural Networks. arxiv.org/1702.01135. Langford, J., 2005. The cross validation problem. In: Auer, P., Meir, R. (Eds.), Learning Theory. COLT 2005. In: Lecture Notes in Computer Science, vol. 3559. Springer, Berlin, Heidelberg. Langlotz, C.P., Allen, B., Erickson, B.J., Kalpathy-Cramer, J., Bigelow, K., Cook, T.S., Flanders, A.E., Lungren, M.P., Mendelson, D.S., Rudie, J.D., Wang, G., Kandarpa, K., 2018. A roadmap for foundational research on artificial intelligence in medical imaging, from the 2018 NIH/RSNA/ACR/The academy workshop. Radiology 291 (3), 781–791. LeCun, Y., 1989. Generalization and Network Design Strategies (Technical Report, CRG-TR-89-4). University of Toronto. LeCun, Y., Bottou, L., Orr, G., Muller, K.-L., 2012. Efficient BackProp. In: Neural Networks: Tricks of the Trade.Lecture Notes in Computer Science. vol. 1524, pp. 9–50. Li, X., Li, F., 2016. Adversarial Examples Detection in Deep Networks With Convolutional Filter Statistics. arxiv.org/ abs/1612.07767. Liang, N.Y., Huang, G.B., Saratchandran, P., Sundararahan, N., 2006. A fast and accurate online sequential learning algorithm for feedforward networks. IEEE Trans. Neural Netw. 17 (6), 1411–1423. Liao, F., Liang, M., Dong, Y., Pang, T., Hu, X., Zhu, J., 2017. Defense Against Adversarial Attacks Using High Level Representation Guided Denoiser. arxiv.org/abs.1712.02976. Little, M.A., Varoquaux, G., Saeb, S., Lonini, L., Jayaraman, A., Mohr, D.C., Kording, K.P., 2017. Using and understanding cross-validation strategies. Perspectives on Saeb et al. Gigascience 6 (5), 1–6. Liu, Y., Stojadinovic, S., Hrycushko, B., Wardak, Z., Lau, S., Lu, W., Yulong, Y., Jiang, S.B., Zhen, X., Timmerman, R., Nedzi, L., Gu, X., 2017. A deep convolutional neural network based automatic delineation strategy for multiple brain metastases stereotactic radiosurgery. PLoS One. 12(10) e0185844. https://doi.org/10.1371/journal. pone.0185844. Lu, J., Issaranon, T., Forsyth, D., 2017. SafetyNet: Detecting and Rejecting Adversarial Examples Robustly. arxiv.org/ abs/1704.00103.

References

157

Luo, Y., Boix, X., Roig, G., Poggio, T., Zhao, Q., 2015. Foveation-Based Mechanisms Alleviate Adversarial Examples. arxiv.org/abs/1511.06292. MacQueen, J.B., 1967. Some methods for classification and analysis of multivariate observations. In: Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability. vol. 1. University of California Press, Berkeley, pp. 281–297. Madry, A., Makelov, A., Schmidt, L., Tsipras, D., Vladu, A., 2017. Towards Deep Learning Models Resistant to Adversarial Attacks. arxiv.org/abs/1706.06083. Martin, E., Kriegel, H.P., Sander, J., Xu, X., 1996. A density based algorithm for discovering clusters in large spatial databases with noise. In: Simoudis, E., Han, J., Fayyad, U.M. (Eds.), Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD-96). AAAI Press, pp. 226–231. McCulloch, W., Pitts, W., 1943. A logical calculus of the ideas immanent in nervous activity. Bull. Math. Biophys. 5, 115–133. Meng, D., Chen, H., 2017. MagNet: A Two-Pronged Defense Against Adversarial Examples. arxiv.org/abs/1705. 09064. Metzen, J.H., Genewein, T., Fischer, V., Bischoff, B., 2017. On Detecting Adversarial Perturbations. arxiv.org/abs/ 1702.04267. Mordang, J.-J., Janssen, T., Bria, A., Kooi, T., Gubern Merida, A., Karssemeijer, N., 2016. Automatic microcalcification detection in multi vendor mammograph using convolutional neural networks. In: International Workshop on Digital Mammography (IWDM), Malmo, Sweden. Morgenstern, M., 2017. Automation and anxiety. The Economist. https://www.economist.com/special-report/ 2016/06/25/automation-and-anxiety. (Accessed 13 November 2019). Mukherjee, S., 2017. AI versus MD. New Yorker. http://www.newyorker.com/magazine/2017/04/03/ai-versusmd. (Accessed 13 November 2019). Na, T., Ko, J.H., Mukhopadhyay, S., 2017. Cascade Adversarial Machine Learning Regularized With Unified Embedding. arxiv.org/abs/1708.02582. National Academy of Medicine, 2015. Improving Diagnosis in Health Care. The National Academies Press, Washington, DC. National Lung Screening Trial Research Team, Aberle, D.R., Adams, A.M., Berg, C.D., Black, W.C., Clapp, J.D., Fagerstrom, R.M., Gareen, I.F., Gatsonis, C., Marcus, P.M., Sicks, J.D., 2011. Reduced lung cancer mortality with low dose computed tomography screening. N. Engl. J. Med. 365, 395–409. Nigsch, F., Bender, A., van Buuren, B., Tissen, J., Nigsch, E., Mitchell, J.B.O., 2006. Melting point prediction employing k-nearest neighbor algorithms and genetic parameter optimization. J. Chem. Inf. Model. 46 (6), 2412–2422. Noy, N., 2018. Making It Easier to Discover Data Sets. https://www.blog.google/products/search/making-it-easierdiscover-datasets/. (Accessed 11 November 2019). Papernot, N., McDaniel, P., Wu, X., Jha, S., Swmi, A., 2015. Distillation as a Defense to Adversarial Perturbation Against Deep Neural Networks. arxiv.org/abs/1511.04508. Parzen, E., 1962. On estimation of a probability density function and mode. Ann. Math. Stat. 33, 1065–1076. Pearson, K., 1896. Mathematical contributions to the theory of evolution. III. Regression, heredity and panmixia. Phil. Trans. R. Soc. Lond. 187, 253–318. Prakash, A., Moran, N., Garber, S., DiLillo, A., Storer, J., 2018. Deflecting Adversarial Attacks With Pixel Deflection. arxiv.org/abs/1801.08926. Press, J., 2003. Subjective and Objective Bayesian Statistics: Principles, Models, and Applications, second ed. Wiley. http://onlinelibrary.wiley.com/doi/10.1002/9780470317105.fmatter/pdf. Radcliffe, N.J., 1990. Genetic Neural Networks on MIMD Computers (Unpublished D. Phil. Thesis). University of Edinburg, Edinburg, Scotland. Razali, N.M., Geraghty, J., 2011. Genetic algorithm performance with different selection strategies in solving TSP. In: Proceedings of the World Congress on Engineering, II. UK. Refaeilzadeh, P., Tang, L., Liu, H., 2009. Cross-validation. In: Liu, L., Ozsu, M.T. (Eds.), Encyclopedia of Database Systems. Springer, Boston, MA. Rizzo, S., Botta, F., Raimondi, S., Origgi, D., Fanciullo, C., Morganti, A.G., Bellomi, M., 2018. Radiomics: the facts and the challenges of image analysis. Eur. Radiol. Exp. 2, 36. https://doi.org/10.1186/s41747-018-0068-z.

158

2. The beginnings

Rodriguez Ruiz, A., Gubern Merida, A., Imhof Tas, M., Lardenoije, S., Wanders, A.J.T., Andersson, I., Zackrisson, S., Lang, K., Dustler, M., Krssemeijer, N., Mann, R.M., Sechopoulos, I., 2018. One view digital breast tomosynthesis as a stand alone modality for breast cancer detection: do we need more? Eur. Radiol. 28, 1938–1948. Rodriguez Ruiz, A., Lang, K., Gubern Merida, A., Broeders, M., Gennaro, G., Clauser, P., Helbich, T.H., Chevalier, M., Tan, T., Mertelmeier, T., Wallis, M.G., Andersson, I., Zackrisson, S., Mann, R.M., Sechopoulos, I., 2019. Stand-alone artificial intelligence for breast cancer detection in mammography: comparison with 101 radiologists. J. Natl. Cancer Inst. 111 (9), 916–922. Saftoiu, A., Vilmann, P., Gorunescu, F., Janssen, J., Hocke, M., Larsen, M., Iglesias-Garcia, J.M., Arcidiacono, P., Will, U., Giovannini, M., Dietrich, C.F., Havre, R., Gheorghe, C., McKay, C., Gheonea, D.I., Ciurea, T., European EUS Elastrography Multicentric Study Group, 2012. Efficacy of an artificial neural network-based approach to endoscopic ultrasound elastography in diagnosis of focal pancreatic masses. Clin. Gastroenterol. Hepatol. 10 (1), 84–90. Samangouei, P., Kabkab, M., Chellappa, R., 2018. Defense—GAN: Protecting Classifiers Against Adversarial Attacks Using Generative Models. arxiv.org/abs/1805.06605. Schaffer, J.D., Whitley, D.L., Eshelman, L.J., 1992. Combinations of genetic algorithms and neural networks: a survey of the state of the art. In: Whitley, D.L., Schaffer, J.D. (Eds.), Proc. Int. Workshop on Combinations of Genetic Algorithms and Neural Networks. IEEE Computer Society, Los Alamitos, CA, pp. 1–37. Scheel, B.I., Holtedahl, K., 2015. Symptoms, signs, and tests: the general practitioner’s comprehensive approach towards a cancer diagnosis. Scand. J. Prim. Health Care 33 (3), 170–177. https://doi.org/10.3109/02813432.215.1067512. Scheel, B.I., Ingebrigtsen, S.G., Thorsen, T., Holtedalhl, K., 2013. Cancer suspicion in general practice: the role of symptoms and patient characteristics, and their association with subsequent cancer. Br. J. Gen. Pract. 63, 627–635. Schmidt, L., Santurkar, S., Tsipras, D., Talwar, K., Madry, A., 2018. Adversarially Robust Generalization Requires More Data. arxiv.org/abs/1804.11285. Shaham, U., Garritano, J., Yamada, Y., Weinberger, E., Cloninger, A., Cheng, X., Staton, K., Kluger, Y., 2018. Defending Against Adversarial Images Using Basis Functions Transformations. arxiv.org/abs/1803.10840. Sharif, M., Bhagavatua, S., Bauer, L., Reiter, M.K., 2016. Accessorize to a crime and stealthy attacks on state-of-the-art face recognition. In: CCS’16, Vienna, Austria, October 24–28. https://doi.org/10.1145/2976749.2978392. Siemens Medical Solutions USA, Inc., 2015. FDA Application: MAMMOMAT Inspiration With Digital Breast Tomosynthesis. https://www.accessdata.fda.gov/cdrh_docs/pdf14/p140011b.pdf. Smith, L.N., 2015. Cyclical learning rates for training neural networks. Comput. Vis. Patter. Recognit. https://arxiv. org/abs/1506.01186. Sommerfeld, H.J., Garcia-Schurmann, J.M., Schewe, J., Kuhne, K., Berges, R.R., Lorenz, A., Pesavento, A., Scheipers, U., Ermert, H., Pannek, J., Philippou, S., Senge, T., 2003. Prostate cancer diagnosis using ultrasound elastography. Introduction of a novel technique and first clinical results. Urologe A 42, 941–945. Sorjamaa, A., Hao, J., Lendasse, A., 2005. Approximator for time series prediction. In: Duch, W., Kacprzyk, J., Oja, E., Zadrozny, S. (Eds.), ICANN 2005, LNCS. In: vol. 3697. Springer, Heidelberg, pp. 553–558. Specht, D.F., 1967. Generation of polynomial discriminant functions for pattern recognition. IEEE Trans. Electron. Comput. EC-16 (3), 308–319. https://doi.org/10.1109/PGEC.1967.264667. Specht, D.F., 1988. Probabilistic neural networks for classification mapping or associative memory. In: Proc. IEEE International Conference on Neural Networks. vol. 1, pp. 525–532. Specht, D.F., 1990. Probabilistic neural networks. Neural Netw. 3, 109–118. https://doi.org/10.1016/0893-6080(90) 90049-Q. Springenberg, J.T., Dosovitskiy, A., Brox, T., Riedmiller, M., 2015. Striving for Simplicity: The All Convolutional Net. https://arxiv.org/abs/1412.6806. Srisawat, A., Phienthrakul, T., Kijsirikul, B., 2006. SV-kNN: an algorithm for improving the efficiency of k-nearest neighbor. In: PRICAI 2006: Trends in Artificial Intelligence. pp. 975–979. Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., Fergus, R., 2013. Intriguing Properties of Neural Networks. arxiv.org/abs/1312.6199. Tan, P.N., Steinbach, M., Kumar, V., 2005. Introduction to Data Mining. Addison-Wesley Reading. Tanay, T., Griffin, L., 2016. A Boundary Tilting Perspective on the Phenomenon of Adversarial Examples. arxiv.org/ abs/1608.07690. Tjeng, V., Xiao, K., Tedrake, R., 2017. Evaluating Robustness of Neural Networks With Mixed Integer Programming. arxiv.org/abs/1711.07356.

References

159

Toussaint, G.T., 2005. Geometric proximity graphs for improving nearest neighbor methods in instance based learning and data mining. Int. J. Comput. Geom. Appl. 15 (2), 101–150. Tramer, F., Kurakin, A., Papernot, N., Goodfellow, I., Boneh, D., McDaniel, P., 2017. Ensemble Adversarial Training: Attacks and Defenses. arxiv.org/abs/1705.07204. Uesato, J., O’Donoghue, B., van den Oord, A., Kohli, P., 2018. Adversarial Risk and the Dangers of Evaluating Against Weak Attacks. arxiv.org/abs/1802.05666. US Preventive Services Task Force, 2018. Final Update Summary: Lung Cancer: Screening (1AD). US Preventive Services Task Force. https://www.uspreventiveservicestaskforce.org/Page/Document/UpdateSummaryFinal/ lung-cancer-screening. van Wieringen, W.N., 2015. Lecture Notes on Ridge Regression. https://arxiv.org/abs/1509.09169. Visser, R., Veldkamp, W.J., Beijerinck, D., Bun, P.A., Deurenberg, J.J., Imhof Tas, M.W., Schuur, K.H., Soeren, M.M., den Heeten, G.J., Karssemeijer, N., Broeders, M.J., 2012. Increase of perceived case suspiciousness due to local contrast optimisation in digital screening mammography. Eur. Radiol. 224, 908–914. Wagenmakers, E.-J., Lee, M., Lodewyckx, T., Iverson, G., 2008. Bayesian evaluation of informative hypotheses (statistics for social and behavioral science). In: Hoijtink, H., Klugkist, I., Boelen, P. (Eds.), Bayesian Versus Frequentist Interference. Springer, pp. 181–207 (Chapter 9). Waite, S., Scott, J., Gale, B., Fuchs, T., Kolla, S., Reede, D., 2017. Interpretive error in radiology. AJR Am. J. Roentgenol. 208 (4), 739–749. Wallis, M.G., Moa, E., Zanca, F., Leifland, K., Danielsson, M., 2012. Two view and single view tomosynthesis versus full-field digital mammography: high resolution X-ray imaging observer study. Radiology 262 (3), 788–796. Wang, L.P., Wan, C.R., 2008. Comments on “The Extreme Learning Machine”. IEEE Trans. Neural Netw. 19 (8), 1494. Wang, J., Neskovic, P., Cooper, L.N., 2007. Improving nearest neighbor rule with a simple adaptive distance measure. Pattern Recogn. Lett. 28, 207–213. Wang, G., Ye, J.C., Mueller, K., Fessler, J.A., 2018. Image reconstruction is a new frontier of machine learning. IEEE Trans. Med. Imaging 37 (6), 1289–1296. https://doi.org/10.1109/TMI.2018.2833635. Ward, J.H., 1963. Hierarchical grouping to optimize an objective function. J. Am. Stat. Assoc. 58 (301), 236–244. Watson, R.A., Pollack, J.B., 2000. Recombination without respect: schema combination and disruption in genetic algorithm crossover. In: Whitley, D., Golberg, E.C.-P., Spector, L., Parmee, I., Beyer, H.-G. (Eds.), Proc. of the 2000 Genetic and Evolutionary Computation Conference. Morgan Kaufmann, San Mateo, CA, pp. 112–119. Weston, A.D., Korfiatis, P., Kline, T.L., Philbrick, K.A., Kostandy, P., Sakinis, T., Sugimoto, M., Takahashi, N., Erickson, B.J., 2019. Automated abdominal segmentation of CT scans for body composition analysis using deep neural networks. Radiology 290 (3), 669–679. https://doi.org/10.1148/radiol.2018181432. Whitley, D.L., Starkweather, T., Bogart, C., 1990. Genetic algorithms and neural networks: optimizing connection and connectivity. Parallel Comput. 14 (3), 347–361. Wilkinson, M.D., Dumontier, M., Aalbersberg, I.J., Appleton, G., Axton, M., Baak, A., Blomberg, N., Boiten, J.W., da Silva Santon, L.B., Bourne, P.E., Bouwman, J., Brookes, A.J., Clark, T., Crosas, M., Dillo, I., Dumon, O., Edmuns, S., Evelo, C.T., Finkers, R., Gonzalez-Beltran, A., Gray, A.J., Groth, P., Goble, C., Grethe, J.S., Heringa, J., ’t Hoen, P.A., Hooft, R., Kuhn, T., Kok, R., Kok, J., Lusher, S.J., Martone, M.E., Mons, A., Packer, A.L., Persson, B., Rocca Serra, P., Roos, M., van Schaik, R., Sansone, S.A., Schultes, E., Sengstag, T., Slater, T., Strawn, G., Swertz, M.A., Thompson, M., van der Lei, J., van Mulligen, E., Velterop, J., Waagmeester, A., Wittenburg, P., Wostencroft, K., Zhao, J., Mons, B., 2016. The FAIR guiding principles for scientific data management and stewardship. Sci. Data. 3. https://doi.org/10.1038/sdata.2016.18. Wright, A., 1991. Genetic algorithms for real parameter optimization. In: Rawlins, G.J.E. (Ed.), Foundations of Genetic Algorithms. Morgan Kaufmann, San Mateo, CA, pp. 205–218. Xu, W., Evans, D., Qi, Y., 2017. Feature Squeezing: Detecting Adversarial Examples in Deep Neural Networks. arxiv. org/abs/1704.01155. Yu, F., Koltun, V., 2016. Multi-Scale Context Aggregation by Dilated Convolutions. Computer Vision and Pattern Recognition. arvix.org/abs/1511.07122. Yuan, Y., Qin, W., Buyyounouski, M., Ibragimov, B., Hancock, S., Han, B., Xing, L., 2018. Prostate cancer classification with multiparametric MRI transfer learning model. Med. Phys. https://doi.org/10.1002/mp.13367. Zhou, Y., Chellappa, R., 1988. Computation of optical flow using a neural network. In: IEEE International Conference on Neural Networks, pp. 71–78.

160

2. The beginnings

Further reading Gennaro, G., Hendrick, R.E., Ruppel, P., Chersevani, R., di Maggio, C., La Grassa, M., Pescarini, L., Polico, I., Proietti, A., Baldan, E., Bezzon, E., Pomerri, F., Muzzio, P.C., 2013. Performance comparison of single view digital breast tomosystenthesis plus single view digital mammography with two view digital mammography. Eur. Radiol. 233, 644–672.

C H A P T E R

3

Pathologist at work 3.1 Building the tumor’s pattern Feeling. Suspicion. Evidence. Preliminary diagnostic. Tests. More tests. Waiting. Hoping. Despair. Fear. Hope. The doctor’s suspicion was correct. The diagnostic struck like a knife’s blade: cancer. While patients wait, think, overthink, and hope, the doctors and data scientists work around the clock to make a plan on how to save someone’s life. Just like in a war, you need to know who you are up against to, who you are fighting, to know its weaknesses, its strengths. Being a disease that attacks at cellular level, cancer is unique, just like any person. The pathologist represents the first line of defense, by setting a diagnosis after studying tissue samples. The size of the tissue sample can range from tiny biopsies (e.g. a piece of esophagus’ tumor removed during an endoscopy), or organs removed during surgery (e.g. a colon effected by cancer). Different selected areas are examined under a microscope. The pathologist decides whether the tumor is benign or malignant. By building the tumor’s pattern they play an essential part in a patient’s treatment. Recall the history of cancer, with the first documentations of the disease (17th century BC), the Edwin Smith Papyrus and the Papyrus Ebers. Those set the grounds of pathology. Still, it was Hippocrates ideas that established the definition of pathology. With his theories regarding the nature of disease, he documented many pathological features: tumors, inflammations, etc. Other pioneers of pathology were Herophilos (335-280 BC) and Erasistratos (304-250) who made the first human dissections. Sadly enough, all their documentation has been lost. Both of them tried to determine correlations between the anatomical structures and the disease. Greek theories regarding medicine were brought to Rome. Cornelius Celsus is the most important Roman medical writer, whose book De Re Medicina has been read by every medical student. The book contains the definition of inflammation: “Notae vero inflammationis sunt quator, rubor et tumor, cum calore et dolore.” In the first century AD, human dissections were illegal, thus were stopped. In the second century, Galen (129–201 AD) made huge discoveries in the medical field. By performing operations on living animals (vivi dissections) such as pigs and monkey, he discovered the “crab” structure of the growth of cancer and also introduced the medical practice of bloodletting. His theories on pathology are in his two books “Seats of Diseases” and “Abnormal Tumors.”

Artificial Intelligence in Cancer: Diagnostic to Tailored Treatment https://doi.org/10.1016/B978-0-12-820201-2.00003-9

161

# 2020 Elsevier Inc. All rights reserved.

162

3. Pathologist at work

Up until the late Middle Ages, pathology theories were written by Byzantine and Arab physicians. The Emperor Justinian had a personal physician named Aetius of Amida (502–575). Aetius wrote about uterus cancer and rectum cancer. Aviccenna (980–1037) wrote the “Canon Medicinae,” a work that was the best at its time. Avenzoar (1070–1162) described cancer of the esophagus and stomach. Arab medicine had fallen after the crusades, leaving medicine to the monks from European monasteries. They become physicians and started to copy ancient medical manuscripts. It was when the first Italian universities established, the interest in pathology revived. Human dissections were introduced din 1270 AD as part of medical teaching. Pathology become a separate wing of medicine at the end of the fifteenth century, through the works of Antonio Benivieni (1443–1502), a Florentine physician. 111 Cases were described in the book: “De Abditis Nonnullis ac Mirandis Morborum et Sanationum Causis” or, in English, “About the Hidden Causes of Disease.” During the sixteenth century, there were written pathological studies that are in use even today. Some of them unfortunately were lost. For instance a work of Vesalius (1514–1564) who disagreed with Galen’s theory was never found (van de Tweel and Taylor, 2010). To prove that medicine walked hand in hand with mathematics, we mention that it was a mathematician Jean Fernel (1497–1558) who became a pathologist and wrote “Medicina,” a book that had been a standard in Europe. Other pathology theories were added by Felix Plater (1536–1614), Volcher Coiter (1534–1576), and Johann Schenk von Graftenberg (1530–1598). Because there was no oncological treatment, the only therapy that was provided for cancer patients was surgery, and that was performed only if it was possible. In the 19th century, the only thing that connected pathology to cancer diagnosis was a postmortem description of how the tumor had spread. Clearly this was good for science in general, but bad for the patient. Luckily, Rudolf Virchow (1821–1902) discovered light microscopy. Thus, he earned his name as the “father of modern pathology” (Byers, 1989). Tumors can be analyzed under the microscope in order to discover their structure. Another successful milestone was the development of the Papanicolaou test by George N. Papanicolaou in 1920. The test can detect precursor lesions of cervical cancer, thus making it treatable. Various biopsy techniques were developed allowing the pathologist to detect malignant tissues before surgery. Through this, needless surgeries were prevented (Birner et al., 2016). The patients’ prognosis depends a lot on what the pathologist sees on the microscope. Tumors behave according to their morphology. Some tumors spread to the lymph nodes and may look like normal tissue in the eye of the surgeon or on a CT scan, but at a microscopic level they are actually metastases. All these findings influence the survival of the patient. Besides this, morphological tumor tissues might respond differently to chemotherapy or radiotherapy. Even if the role of the pathologist is crucial in diagnosing cancer, the way the job is done hasn’t changed so much in the last 100 years. She/he still takes the biopsy tissue and slides it under the microscope. But things are about to change. AI can and it has changed cancer pathology. In Table 3.1 we present a timeline with the most important milestones that led to AI in pathology: Supercomputers can be trained to recognize cancer from microscope slides. Pathologists can focus on the slides that present unusual patterns. The team from Memorial Sloan Kettering computational pathology lab, The Thomas Fuchs Lab—https://mskcc.org/research-areas/ labs/thomas-fuchs/ (Accessed November 8, 2019) published some fascinating results in the

3.1 Building the tumor’s pattern

TABLE 3.1

163

Timeline of milestone achieved that led to AI in pathology.

Year

Milestone

1956

John McCarthy develops AI

1959

Arthur Samuel develops machine learning

1965

Judith Prewitt and Mortimer Mendelsohn analyze microscopy images of cells and chromosomes with the use of the computer

1986

Rina Dechter develops deep learning (Dechter, 1986)

1990

The whole slide scanners are introduced

1998

Yann LeCun develops convolutional neural networks (LeCun et al., 1998)

2013

Photoacoustic microscopy imaging is developed

2014

Ian Goodfellow develops generative adversarial networks (Goodfellow et al., 2014)

2016

High resolution imaging without tissue consumption is enabled through MUSE microscopy

2018

FDA approves first AI medical device that detects diabetic retinopathy in adults (IDx-DR)

Nature Medicine, regarding a study done on more than 44,000 digitized microscope slides images from more than 15,000 cancer patients (Campanella et al., 2019). In Chapter 2, we have seen that for a convolutional neural network to classify correctly images, it needs annotated datasets. The datasets are large, and to annotate manually areas in images might take forever. In this study, the team used a multiple instance learning convolutional neural network that learns from already reported diagnosis. The AI system analyzed 44,732 whole slides from 15,187 patients. No preprocessing of data was done. The results are outstanding, the intelligent decision system being able to diagnose prostate cancer, basal cell carcinoma, and breast metastases to axillary lymph nodes, with an AUROC over 0.98. The sensitivity reported in the study was nearly 100%, which means that the AI system could recognize cancer in the digitized slides. The data is publicly available at https://wiki.cancerimagingarchive.net/display/Public/Breast +Metastases+to+Axillary+Lymph+Nodes. AI reshapes pathology. For some, the process is not fast enough. Associate Professor for Biomedical Informatics and research director at the Institute for Molecular Medicine Finland (FIMM)—https://fimm.fi (Accessed November 8, 2019), Johan Lundin, thinks that the digitalization of the equipment is slow. After the 14th European Congress on Digital Pathology (ECDP—ecdp2018.org—Accessed November 8, 2019) held together with the 5th Nordic Symposium on Digital Pathology, where 375 delegates from 33 different countries participated, Associate Professor Lundin said: “It was really exciting. Somehow the whole spirit has changed because of the use of artificial intelligence in the form of machine learning for pathology - maybe that’s the reason for the high numbers attending.” The conference focused on AI for pathology. Are AI and pathologists frenemies? The answer is no, they are just friends. But in order to understand what AI can do to help a pathologist, we must first explain what a pathologist does. A tissue sample is removed during a biopsy. That tissue sample is called a specimen. The specimen is placed in a container that has a fluid, which preserves the tissue.

164

3. Pathologist at work

The pathologist writes a report where she/he describes the details of the tissue that can be seen with the naked eye. These include: texture, color, size, etc. This macroscopic test is followed by microscopic assessment. Before the actual examination of the tissue, a technician prepares the histological sections that are stained with various chemicals. Depending on the tests that need to be performed the sections can be: • permanent: the specimen is placed in formalin for some time to prevent the autolysis process (the destruction of a cell through the action the enzymes contained in the affected tissue, or self-digestion). Next, the water is removed from the tissue and replaced with paraffin wax. The specimen is embedded in a large paraffin block to be stored indefinitely. Once the paraffin gets cold it hardens, allowing the specimen to be cut into thin slices by a microtome. The thin slices are dumped in warm water to be scooped onto glass slides. Once the slice specimen is on the slide, the paraffin is chemically removed. After this the slide can be stained with different staining techniques, the most common one being hematoxylin and eosin that make the nucleus blue and the cytoplasm pink. • frozen: during surgery, the surgeon removes the tumor from the patient and sends it to the pathology lab to be frozen. The specimen is cut into thin slices using a cryotome. The specimen is stained with blue toluene. The method lacks precision, but gains speed. In a matter of minutes the pathologist can decide whether the specimen contains cancer cells. This procedure is done usually during cancer surgery, so that the surgeon knows if she/he needs to remove more tissue. • smear: is used when the specimen is liquid. The specimen is placed on a microscope slide and let out to air dry. After that it is fixed by the use of a spray or liquid. The cells are stained and microscope examined. The pathology report covers: the types of cells seen in the specimen, their architecture, as well as their connection with other biological structures (e.g. vessels, nerves, etc.). The pathologist can ask for more normal tissue before making his diagnosis. The diagnosis usually falls under the following categories: • hyperplasia: there is an abnormal increase of cells in a tissue or organ. • atypical: refers to cells that are not normal, but are not cancerous either. Atypical cells might transform into cancer later. • dysplasia: an increase in the number of atypical cells in an organ. It can be a response to an external stimulus, or a stage between normal and cancer cells. • carcinoma: the cancer has started from epithelial cells. • sarcoma: the cancer has started from stromal cells. • lymphoma: the cancer has started from the lymphatic system. • leukemia: the cancer has started in the blood or the bone marrow. • neoplasia: an unusual cell growth. The cells may be either benign or malignant. This is stated when the specimen is too small to make a decision. • etc. Other two important facts must be determined in the process of building the tumor’s pattern: cancer staging and grading. Staging describes how far the cancer has spread. Grading predicts the tumor’s aggressiveness.

165

3.2 Artificial Intelligence and histology

The cancer stages are: • stage 1: the cancer is found rather locally. • stages II and III: cancer has spread regionally to the nearby tissues and perhaps to the lymph nodes. • stage IV: cancer has spread beyond the lymph nodes into other parts of the body (metastases.) The staging labels are: T for the tumor size, N for the spread of the cancer to the nearby lymph nodes, and M for metastasis (Table 3.2). TABLE 3.2

Staging labels TNM.

T tumor

N Lymph nodes

M metastasis

TX: the tumor could not be assessed. T0: no tumor. T1–T4: the tumor has been detected, and its size varies by different degrees, depending on the examined tissue. The numbers from 1 to 4 represent the growth of the tumor.

NX: the lymph nodes could not be evaluated. N0: the lymph nodes have no cancer. N1–N3: cancer has spread into the lymph nodes. The numbers represent the degree of the cancer spread (e.g. 3—lots of lymph nodes contain cancer cells).

MX: the metastasis could not be assessed. M0: there are no metastases. The cancer did not spread to other parts of the body. M1: cancer has spread to other parts of the body.

This TNM staging system is used to describe a variety of cancers (e.g. except blood, bone marrow, etc.). The tumor grade is a number between 1 and 4. Tumors that resemble to the original cells are grade 1 tumors, and they usually grow slow. Tumors that have been graded with grade 4 are aggressive tumors that grow and spread fast, and no longer resemble to the original cells. Again, this grading system is used for most cancers (e.g. cancers like prostate, breast, kidney, etc., use different grading methods). Another vital aspect is the tumor’s relation to the nearby structures such as blood vessels and nerves, etc. If the tumor invaded the surrounding blood vessel, it has a higher chance to produce metastasis. On the other hand, if the tumor invaded the nerves, the nerve sheath acts like a slide, that enhances the tumor to progress more rapidly. Basically, more or less, this is the pathologist’s job, this is how she/he builds the tumor’s pattern. Additionally, molecular tests can be performed to determine whether some genes are active, missing or changed. Besides this, other tests determine if the cancer would respond to certain type of treatment by checking specific receptors. So, how exactly is AI helping? Buckle up your seatbelts, cause further in this chapter you will be reading about AI and histology, AI and immunohistochemistry, and last but not least AI and genetics.

3.2 Artificial Intelligence and histology Training a pathologist takes over 11 years. After this period, a pathologist is able to recognize specific patterns using the microscope. In the forthcoming future, traditional histology will become history. AI techniques already started helping pathologists to make more accurate diagnosis, which in the end lead to the identification of targeted therapies for patients.

166

3. Pathologist at work

Besides this, using AI the pathologist can store and share their cases with other pathologists around the world. The workflow is improved through speed and accuracy. In November 2018, a group of scientists from the Laboratory of Pathology East Netherlands published a study, which evaluated the time savings brought by digital pathology (Baidoshvili et al., 2018). In the study it was not taken into account the pathologists’ time for making the diagnostic decision. All the measurements done in the Laboratorium Pathologie Oost Nederland are regarded: routine diagnosis, multidisciplinary meetings, external revision requests, extra staining, and external consultation. Their findings were that working digitally saved more than 19 working hours per day. The greatest impact regarding time saving was in routine diagnosis and multidisciplinary workflows. AI can help pathologists identify different patterns and by the use of the stored databases it could find different correlations related to other cases. Pathologists gain time for analyzing more thoroughly the specimens. Besides this, staying for hours long looking through a microscope at different specimens leads to fatigue, and fatigue leads to mistakes. Using AI in the pathology lab can help reduce the risk of potential errors. The most used AI method in histology is deep learning. Convolutional neural networks can make pixel-based distinctions leading to high performance classifications. Deep learning can detect and count mitotic events, classify tissue in cancerous or non-cancerous, segment nuclei, etc. The only thing AI needs is a dataset with labeled images. On April 12, 2017, The U.S. Food and Drug Administration (FDA) approved the marketing of the Phillips IntelliSite Pathology Solution (PIPS). PIPS is the first whole slide imaging system that reviews pathology slides digitally. The Director of the Office of In Vitro Diagnostic and Radiological Health from the FDA Center for Devices and Radiological Health, Alberto Gutierrez, PhD, stated that “the system enables pathologists to read tissue slides digitally in order to make diagnosis, rather than looking directly at a tissue sample mounted on a glass slide under a conventional light microscope”—https://www.fda.gov/news-events/pressannouncements/fda-allows-marketing-first-whole-slide-imaging-system-digital-pathology. The technology has been tested in the largest trial of digital pathology in the United States, and proved its performance to be equal to conventional pathology, which used light microscope (Mukhapodhyay et al., 2018). All the surgical pathology cases that included biopsy and resections stained with hematoxylin and eosin, followed by immunohistochemistry, etc., were gathered from four institutions using as ground truth the diagnosis set by traditional pathology. Sixteen pathologists reviewed the cases both by microscopy and digital pathology. 15,925 Reviews were made from 1992 cases. The differences reported were 4.9% disagreement between the ground truth diagnosis and digital pathology, and 4.6% disagreement between the ground truth and microscopy. Between digital pathology and microscopy there existed a discordance rate of 0.4%. The major differences between the two technologies appeared when reviewing specimens from endocrine, neoplastic kidney, urinary bladder, and gynecological pathologies, and raged between 1.8% and 1.2%. On June 19, 2019, it was announced that LabCorp—labcorp.com (Accessed November 9, 2019) and Mount Sinai Health System are going to establish the Mount Sinai Digital and Artificial Intelligence enabled Pathology Center of Excellence. LabCorp wishes to integrate digital pathology into clinical practice throughout all Mount Sinai’s hospitals. LabCorp already implemented Phillips IntelliSite Pathology Solution—usa.phillips.com/healthcare/ solutions/pathology (Accessed November 9, 2019) in four laboratories.

3.2 Artificial Intelligence and histology

167

In June 2019, Proscia—proscia.com (accessed November 9, 2019) released DermAI— proscia/com/concentriq/clinical/derm-ai/ (Accessed November 9, 2019). DermAI is a software that enables image viewing and workflow management. The module is not yet been cleared for primary diagnosis. So, the wheel is in motion. But to get to digitalize pathology labs, medical professional must work hand in hand with computer data scientists. All these software modules could not be possible if it weren’t for the research studies published in high impact, peer-reviewed journals and conferences. In 2018, the number of academic publications regarding pathology and AI increased with over 1000 papers registered in PubMed. Pathology AI start-ups are receiving lots of money for their practical AI applications in diagnostics. In the United Kingdom there was announced a $1.3 billion investment in AI for detecting diseases earlier (Hardaker, 2019). In what follows, we shall present some research studies regarding the histology + AI combo. In July 2019, a paper was published regarding LYNA or Lymph Node Assistant, an AI software that evaluates lymph nodes for breast cancer early micrometastases (Liu et al., 2019). Reviewing lymph nodes is crucial for the treatment plant, thus is time consuming and stressful (Badve et al., 2018). LYNA is a deep learning algorithm, which was built and tested on the Camelyon16 publicly available challenge dataset—camelyon16.grand-challenge.org (Accessed November 9, 2019). Images from 399 patients contained hematoxylin-eosin-stained lymph nodes. The training set had 270 slides, while the testing set had 129 slides. The results reported in the study were AUC 99%, and sensitivity 91% at 1 false positive per patients. LYNA also identified two slides that were considered “normal,” having micrometastases. On the second dataset the reported AUC was 99.6%. In an interesting paper that uses the same dataset is presented the evaluation results of the performance obtained by 11 pathologists that had to review the images from the data set with time constrained conditions of a 2 h session versus AI algorithms. The time constrain led to a mean sensitivity of 38% for the detection of micrometastases, whereas the best performing AI algorithm achieved a true positive fraction of 72.4%, comparable with the one obtained by a pathologist without time constrainted conditions (Bejnordi et al., 2017). Another paper (Steiner et al., 2018) on the same subject presents the following study: 6 US board certified pathologists diagnosed 70 digitized slides with and without assistance from AI. Their experience as attending pathologists ranged from 1 to 15 years. Using AI improved the detection of micrometastases from 83% to 91%, p-level ¼ 0.02. One point must be noted: the algorithm by itself, the pathologist by herself/himself, were both surpasses in gaining higher performance by the pathologists that used AI for diagnosing. The average review time per slide that contained micrometastases was 61s for the pathologist + AI combo, whereas for the pathologist without AI 116s, p-level ¼ 0.002, as for the negative slides 111s versus 137s, p-level ¼ 0.018. A survey was done among the six pathologists and all of them agreed that it is significantly easier to interpret a slide with micrometastases with the help of the AI, rather than by their own, p-level ¼ 0.0005. Recall the statistics presented in Chapter 1. Besides sensitivity we need to see what the specificity, positive predictive value, and negative predictive value of the model are. If the specificity value is low, then we are dealing with increased false positives, which is bad news. This is why AI and the pathologist must be friends, not enemies. Only the experience and understanding of the clinical context of the pathologist can distinguish between false and true positives. Until an AI algorithm is thoroughly statistical analyzed we cannot trust 100% the reported results.

168

3. Pathologist at work

Another AI platform can analyze images of lung tumors. The software developed by Google can determine cancer types, and identify altered genes. The researchers for the New York University (NYU) School of Medicine, led by Associate Professor of Pathology Aristotelis Tsirigos, trained a convolutional neural network, Inception v3, on a data set that contained slide images from the Cancer Genome Atlas (Coudray et al., 2018). The dataset contained labeled histology images. Inception v3 gave comparable performances to the ones of an experienced pathologist. It can classify adenocarcinoma, squamous cell carcinoma, and normal lung tissue. Besides this, the AI algorithm can identify 10 mutations in a second, whereas a genetic test that confirms the presence of mutations can take weeks before it gives the results. Six out of 10 mutations, STK11, EGFR, FAT1, SETBP1, KRAS, and TP53, were predicted with AUC that ranged between 0.733 and 0.856. The data used for training is available at the Genomic Data Commons portal—https://gdc.portal.nci.nih.gov (Accessed November 9, 2019) and were generated by the TCGA Research Network—http:// cancergenome.nih.gov/ (Accessed November 9, 2019). Discovering gene mutation fast and accurately is vital. The mutation in the gene epidermal growth factor receptor, EGFR, is treatable with proper approved drugs. Around 20% of patients with adenocarcinoma could benefit from the fast finding of this mutation. The study revealed that it is hard to tell one cancer type apart from the other one. Both Inception v3 and the pathologists misclassified half of the small percentage of tumor images, while 45 out of 54 slides misclassified by the doctors, were classified correctly by AI. Developing AI algorithms for histology implies teamwork. A deep learning software that can establish the Gleason score on 112 million images of prostatectomies labeled by pathologists is presented in Nagpal et al. (2019). The Gleason score determines the aggressiveness of prostate cancer. It can range from 1 to 5. Unfortunately, when discovered, most cancers have a Gleason score over 3. The final Gleason score is composed of two numbers: the first grade that describes the cells that are found in the largest area of the tumor, and the secondary grade that describes the cells that are found in the next largest area. If most of the tumor has a score of 4, and the next largest area has a score of 5, then the overall Gleason score is 9. This implies that the most reported Gleason score range from 6 to 10, where: • a score of 6 means that the cancer cells look similar to the original one, and that the cancer might progress at a slower pace; • a score of 7 describes a cancer that has intermediate aggression. Seeing how the score 7 can either be composed of a score 4 for the primary area and a score 3 for the secondary area, or vice versa, we need to discuss both situations. If a tumor has the primary score 3 and secondary score 4 then the outlook is not so bad, the tumor is not very aggressive; if the tumor has the primary score 4 and the secondary score of 3 then the tumor might grow and spread more aggressively; • any score that is equal or is higher than 8 describes a cancer that is aggressive. For training there have been used 112 million labeled image patches from 1226 slides, and for model validation there have used 331 slides. The dataset was afterwards fed to the AI systems in a two-step approach: first, a convolutional neural network has assessed the regional Gleason score, and secondly a k-nearest neighbors algorithm was applied for the whole slide classification. The total tissue that has been used measured up to 115,000 mm2, and the pathologist worked on it for roughly 900 h. The slides are 4 times larger in the labeled tissue than in the Camelyon16.

3.2 Artificial Intelligence and histology

169

The reported results are: the mean accuracy among 29 pathologists that established independently the Gleason score for each slide was 61%, with (56%, 66%) as the 95% confidence interval; whereas the mean accuracy of the AI system was 70%, with (65%, 70%) as the 95% confidence interval. Statistically comparing the difference between the two means, the authors have established a significance level of 0.002. Ten pathologists assessed the entire validation set. Their accuracy ranged from 53% to 73%, with a mean of 64%. The AI system surpassed eight of the pathologists. The AUROC score obtained for three grade groups decision thresholds 2, 3, and 4, ranged between 0.95 and 0.96. For the grade score 4 the AI system obtained high sensitivity as well as high specificity, surpassing 9 out of the 10 pathologists. The study also used Cox models to evaluate the prognostic abilities of the found patterns. We cannot proceed any further until we discuss the Cox models.

Cox regression model Cox regression model is one of the main branches of Statistics. It is highly applied in the medical field as well as in mechanical systems. In cancer research we refer it by its original name: survival analysis. Chapter 7 of this book is dedicated to this concept. The basic event that is studied in survival analysis is the concept of “death.” More specifically, the analysis of the survival time refers mainly to the actual survival of a patient after she/he has undergone surgery and/or cancer treatment. Technically, the time since the medical process began (surgery, chemotherapy, radiotherapy, etc.) till the death of the patient is recorded. This time period is known as the survival time. We can use the survival time term when we refer to: the time of death, the time when a certain symptom developed, cancer relapse after remission, etc. While performing survival analysis we compute the probability of surviving, that is the proportion of people that have undergone a cancer surgery, a chemotherapy treatment, a radiotherapy treatment, a hormonal treatment, or a combination of the above, that might survive a length of time given the same circumstances. In order to calculate this probability we denote a random variable X that represents the survival time. Next, by dividing the time in small intervals, (0, t1), …, (tk1, tk) and by estimating the probability: PfX  tn g ¼ PfX  t1 g  PfX  t2 j X ¼ t1 g… PfX  tn j X ¼ tn1 g we compute the survival probability. In words, the above formula can be translated as such: the probability of a patient to survive 2 days after cancer surgery is the probability of her/him surviving the first day multiplied by the probability of her/him surviving the second day conditioned by the fact that she/he survived the first day. One of the most complicated issues in cancer studies is to determine what is the effect of a continuous independent variable on the survival time, more exactly to determine whether the surviving time and the predictors are correlated. There are several techniques used in survival analysis such as: Kaplan-Meier survival curve, life table analysis, hazard ratio, logrank test, etc., but unfortunately cannot be used to explore what is the simultaneous effect of multiple variables on the survival time. One may think that in this case we can apply the

170

3. Pathologist at work

multiple linear regression (presented in Chapter 2), but because of two reasons we cannot do this: • the multiple linear regression cannot be used if the data does not have a Gaussian distribution, and usually the variable that describes the survival time has an exponential or a Weibull distribution. • survival analysis deals with a lot of censored data. Censored data means that the observations are not complete. Let us presume that we want to do a follow-up cancer study for a group of patients that have received a certain type of treatment. We monitor these patients for a period of time. When this period ends, the monitoring ends, and when we perform a survival time analysis we do not know whether the patients are still alive or not. Another type of missing data is when a patient does not want to participate anymore in the follow-up study, so we cannot monitor him anymore. We can perform a survival analysis data using special regression models such as: • • • • •

Cox proportional hazard regression analysis (Cox, 1972); Cox proportional hazard model with time dependent covariances; Exponential regression model; Normal linear regression model; Log-normal linear regression model.

Since the discussion is related to the Cox proportional hazard regression model in what follows we shall present only this method. When working with the Cox proportional hazard regression model we deal with two functions: the survival function and the hazard function. The survival function is computed using the following probability: SðtÞ ¼ PfT > tg, where T is the time remaining till the patients’ death, and t is time. The lifetime distribution can be written using the following formula: FðtÞ ¼ 1  SðtÞ: The death rate, the number of recorded deaths per time unit, is f ðtÞ ¼ dtd FðtÞ. The hazard function is: λðtÞ ¼ Pft < T < t + dtg ¼

f ðtÞdt S0 ðtÞdt ¼ : SðtÞ SðtÞ

The hazard function computes the patient’s risk of dying within a short period of time dt, after it had been given time T left. The Cox proportional hazard regression model does not make any presumption regarding the survival distribution. This is why it is considered a general regression model. The only presumption made is that the hazard is a function of independent variables Z1, Z2, …, Zk, where each variable might be a predictor or covariance: hðt; Z1 , Z2 , …, Zk Þ ¼ h0 ðtÞ  expðb1 Z1 + b2 Z2 + ⋯ + bk Zk Þ,

3.2 Artificial Intelligence and histology

171

which can be transformed into:   hðt; Z1 , Z2 , …, Zk Þ ln ¼ b1 Z1 + b2 Z2 + ⋯ + bk Zk : h0 ð t Þ From this we can draw the following conclusions: no assumption is made regarding the distribution type of the survival times, but we presume that different variables have a constant effect over time on the survival, and also that these variables are additive in a particular scale. We call the h0(t) the underlying hazard function, and it represents the hazard for the particular case when all the independent variables have the value of 0. Despite the value of the variables, two conditions must be fulfilled: • multiplicative relationship between the underlying hazard function and the log-linear function of covariates must exist—hypothesis of proportionality. • the hazard and the independent variables must be in a log-linear relationship. Let us see the Cox regression model in action, in an interesting study regarding the comparison between placebo and azathioprine treatment in patients with primary biliary cirrhosis (Christensen et al., 1985). The effect of the azathioprine, an immunosuppressive drug, was studied in double-blind randomized clinical trial that included 127 patients that received azathioprine, and 121 patients that received placebo. The group who received azathioprine recorded 57 deaths, whereas in the placebo group there were recorded 62 deaths. The authors used Cox multiple regression analysis to see the difference between the two groups and reported that the therapeutic effect of the azathioprine was statistically significant with the p-level equaling 0.01. The drug reduced the risk of dying to 59%, and also improved the survival time by 20 months in the average patient. Based on this result azathioprine was recommended as a routine treatment of primary biliary cirrhosis. For more details on methods regarding survival analysis please read Chapter 7 of this book. Returning to our paper, we mentioned that the authors used Cox models to evaluate the prognostic ability of the Gleason patterns. The reported results were the 0.697 for the deep learning systems, 0.674 for the cohort of 29 pathologists, and 0.690 for the specialist reference standard. Another research study where the diagnosis was set with the use of a deep learning neural network is (Kumagai et al., 2019). The endocytoscopic system does a virtual histology examination, thus confirming the histological diagnosis in vivo. The authors’ aim is to propose the use of the endocytoscopic system for the diagnosis of esophageal squamous cell carcinoma instead of the traditional biopsy based on histology analysis. The endocytoscopic images were analyzed with a deep learning neural network. The convolutional neural network was designed having as starting point the GoogLeNet and was trained using 4715 endocytoscopic images of the esophagus. The data set used was unbalanced having 3574 benign images, and only 1141 malignant images. The AI system was evaluated using an independent test set that contained 1520 images collected from 55 patients, from which 28 had benign esophageal lesions, and 27 had malignant esophageal lesions. The statistical analysis consisted in computing the AUROC, which was 0.85. The system diagnosed correctly 25 out 27 cancer cases, having an overall sensitivity of 92.6%. Out of the 28 non-cancerous cases the AI system diagnosed 25, having an overall specificity of 89.3%, and accuracy 90.9%.

172

3. Pathologist at work

A deep learning network, MVPNet, together with a tailored data augmentation technique, NuView, were used for magnification invariant diagnosis in breast cancer ( Jonnalagedda et al., 2018). The MVPNet has less parameters than the standard deep learning neural networks, but still can achieve comparable performances. The NuView technique extracts the tumor nuclei location and signals it to the MVPNet. The AI system was tested on the BreakHis data set and the reported result was an accuracy of 92.2%, compared to 83% accuracy, which is reported in literature. The increase in using AI in histology is influenced also by the “Grand Challenges” (Serag et al., 2019). These Grand Challenges are public competitions that provide open data sets with annotations in order for data scientists to develop new algorithms. In 2010, there was held a challenge in histology at the International Conference for Pattern Recognition (ICPR) (Gurcan et al., 2010), which addressed the following two issues: (a) the count of lymphocytes on images of breast cancer slides stained with hematoxylin and eosin, and (b) the count of centroblasts on images of follicular lymphoma slides stained with hematoxylin and eosin. The lymphocytic infiltration is correlated with the recurrence of breast cancer. Five out of twenty three teams that registered for this challenge submitted their results. Two years later, at the same conference, the second grand challenge in histology was held. This time the challenge referred to the detection of mitosis in breast cancer histological images. Professor Frederique Capron of the pathology department at Pitie-Salpetriere Hospital in Paris, France, and his team prepared the data set (Roux et al., 2013). The mitotic count varies among pathologists. Detecting mitosis is a difficult task, since mitosis have a wide range of shapes and configurations, and also are very small. This time 17 out of 129 teams submitted their results. This was the first time in AI histology when a deep learning max-pooling convolutional neural network outperformed all the other methods (Ciresan et al., 2013). Their method obtained a precision recall F-score of 0.782, which was significantly higher than the F-score of 0.718 obtained by the closest competitor. The Assessment of Mitosis Detection Algorithms 2013 (AMIDA13) challenge, held at the International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2013), had the same winner as the ICPR 2012 (Veta et al., 2015). The data set used in AMIDA13 was larger, making the contest more difficult. Besides the huge amount of data, the set had also many ambiguous cases and also imperfect slide stained images. Eighty-nine universities and companies registered for AMIDA13, but only 14 submitted their findings. In 2014, ICPR grand challenge theme was to review breast cancer H&E stained biopsies. The goal was to detect mitosis and evaluate the score of nuclear atypia (Roux, 2014). Nuclear atypia can be classified in three classes corresponding to a low, moderate or strong nuclear atypia. It measures the aggressiveness of the cancer. The contest was won by the researcher that developed a fast deep cascaded convolutional neural network composed of two parts: a coarse retrieval model for the identification of the mitosis that maintained high sensitivity, and a discrimination model (Chen et al., 2016). The International Symposium in Applied Bioimaging held a grand challenge in 2015, where a new data set that contained hematoxylin and eosin stained breast cancer specimens was presented. The data had to be classified in four classes: normal tissue, benign lesion, in situ carcinoma, and invasive carcinoma (Pego, 2015). The convolutional neural networks were once again the best at classifying images and obtained an accuracy comparable to

3.2 Artificial Intelligence and histology

173

state-of-the-art methods. The authors normalized the images using optical density, patch augmentation and normalization, and they have trained support vector machines on the features that had been extracted by the convolutional neural network (Araujo et al., 2017). Diagnosing colon cancer was the task of the MICCAI 2015 challenge. The data set contained gland segmentation in hematoxylin and eosin stained slides of colorectal adenocarcinoma specimens. Again, the team that won ICPR 2014, also won MICAI 2015, using a different convolutional neural network (Chen et al., 2016). The architecture of the deep contour was a neural network that resembled the architecture of the U-net, which won the IEEE International Symposium on Biomedical Imaging (IBSI) that same year (Ronneberger et al., 2015). For the MICCAI 2016 contest the organizers used a very large dataset, The Cancer Genome Atlas (TCGA) (Cancer Genome Atlas Network, 2012; Heng et al., 2017). The challenge’s name was TUmor Proliferation Assessment Challenge (TUPAC 2016) (Veta et al., 2019). The winners developed a unified framework that integrated three modules: an image processing component, a deep learning neural network for the detection of mitosis, and a proliferation scores prediction module (Paeng et al., 2016). The CAMELYON16 dataset was used at the ISBI 2016. The winners developed a convolutional neural network that performed tissue detection and normalization as a preprocessing step, followed by a pretrained 22-layer GoogLeNet architecture. After this they used a random forest classifier on the post-processed features (Bandi et al., 2019). Seeing the success that CAMELYON16 obtained, the organizers of ISBI brought CAMELYON17 grand challenge with the largest publicly available histological data set that had around 3 terabytes of data (Litjens et al., 2018). The task changed from reviewing whole slide images to combining the assessment of multiple slides into one. The winner team was the same as in the TUPAC 2016 challenge. Their AI system was an ensemble of three pre-trained ReSNet-101, each of them being optimized with patch augmentation algorithms. ISBI 2017 challenged focused on the assessment of thyroid cancer (Wang et al., 2018). In 2018, several challenges took place: MICCAI 2018 held three challenges: the first referring to the assessment of brain tumors using a combo of radiology and pathology images; the second’s scope was the assessment of brain tumors from nuclei segmentation using pathology images; and the third was a multi-organ nuclei segmentation. Still in 2018, Kaggle opened a competition that aimed at segmenting nuclei on microscopy images from different organs. Kaggle chose a data set, which contained hematoxylin and eosin, stained biopsies, but also florescence images. 700 Teams participated, and the winner was an AI system with a committee of four deep convolutional neural networks, with additional augmentation techniques, post-processing for extracting the morphological features, followed by gradient boosted trees. In 2019, the challenges continued with themes like Lymphocyte Assessment Hackathon (LYSTO). This challenge had as data set tissue sections of several cancer types that were stained with CD3 and CD8 immunohistochemistry. We shall further discuss this subject in Section 3.3. More details about these grand challenges can be found at grand-challenge.org/ challenges/ (Accessed November 1, 2019). We have seen how histopathology evolves year by year with the use of AI. Now, it is time for us to proceed to another fascinating part of pathology: immunohistochemistry

174

3. Pathologist at work

3.3 Artificial Intelligence and immunohistochemistry Immunohistochemistry is a laboratory technique that verifies whether a tissue contains certain antigens, or markers. The verification process is done with the use of antibodies. The antibodies are conjugated to enzymes or fluorescent dye. After they reach the antigens, the enzyme or dye activates, and the pathologist has a clear image of the antigen (under a microscope). Immunohistochemistry was developed in 1940s (Coons et al., 1941), even if its principles had been known since 1930s (Childs, 2014). Coons et al. used FITC labeled antibodies in order to verify if an infected tissue contained pneumococcal antigens. The origins of immunohistochemistry date back since 1890s, when Doctor Emil von Behring was working as an assistant of Robert Koch, Nobel Laureate, a pioneer in studies regarding bacteriology, at the Pharmacological Institute, University of Bonn. Together with another colleague, Kitasato, he discovered that if an animal is injected with attenuated forms of diphtheria, then that animal’s body will start producing anti-toxins in their sera. These anti-toxins could have been used in the treatment of this disease. Through this discovery, the successfully treatment of a child in 1890 was possible. The death rate caused by diphtheria in Germany was 50,000 per year. The discovered serum reduced the death rate from 50% to 13%, and also saved von Behring’s wife, Else. In 1901, he received the Nobel Prize in physiology and medicine. In the early 1900s, researchers did not know that active immunization stimulated the antibodies to function. In 1913, von Behring developed his first vaccine for diphtheria (Kantha, 1991). Doctor von Behring teamed up with Doctor Paul Erlich, and set up a joint laboratory in Berlin. Professor Erlich was a noted pioneer in histochemistry and histopathology. As a student he already made diagnoses in hematology and bacterial diseases. His work regarding immunocytochemistry (the staining of single layer of cell grown) started in 1896. It was back then when he developed the methods that are in use today for testing and applying immunolabeling protocols. Besides the fact that he discovered that the antibody-antigen reaction is accelerated by heat and decelerated by cold, he also recognized the selectivity of the antitoxin reaction, thinking that this discovery could be used in chemotherapy (Kaufmann, 2008; Schmalstieg and Goldman, 2008; Piro et al., 2008). We believe that it is fascinating that the basic tools of immunocytochemistry were developed over 100 years ago. In 1920, researchers discovered the chemistry of antibodies. Heidelberger, the father of immunology, made the first colored antibody-antigen combo. Heidelberger showed that the antibodies for pneumococcal polysaccharides were proteins. He realized this just by detecting nitrogen. The only problem that remained unresolved was that even if the antibodies could be detected with the use of nitrogen, they couldn’t be distinguished from the antigen’s protein. Heidelberger and Kendall added purple azo dye to the antigen, egg albumin. When the antibody was added the complex precipitated. It was Professor John R. Marrack that added dye to the antibodies (Marrack, 1934). Adding R-salt tetrazotized benzidine he made red antityphoid and anticholera antibodies. This was the beginning of immunohistochemistry. Fluorescence immunohistochemistry appeared when trying to solve a problem with lesions in rheumatic fever. During the summer of 1935, Albert H. Coons started working with Dr. John Enders, and learned how to handle different species of antibodies and the precipitin reaction that was used to visualize the complex antigen-antibody. Despite his hard work, the

3.3 Artificial Intelligence and immunohistochemistry

175

antibodies were still active in joining their antigens, and the signal was faint. Returning to Harvard he applied for a research fellowship. He needed a dye to improve the signal, so he started looking for help from organic chemists. Luckily, he was introduced by Doctor Feiser, Professor of organic chemistry at Harvard, to Drs. Hugh Creech and Norman Jones. Their work so far involved conjugating isocyanates of various carcinogens to protein molecules. The two started working trying to couple anthracene isocyanate to antipneumococcal antiserum. The bacteria became brilliantly fluorescent in the UV light, when they used antiserum conjugate to adhere the pneumococci. Because a lot of tissues autofluoresced blue or red, Dr. Coons used fluorescein isocyanate due to its green color. The only thing missing was a tool for visualizing. Fortunately, in the Department of Anatomy at Harvard, Dr. Allan Grafflin was developing the fluorescent microscope. Thus, Dr. Coons could finish his work and publish it in 1942 (Coons et al., 1942). Since 1942, major discoveries have been made in immunohistochemistry: tissue fixation and sectioning methods, antibody conjugation, antigen/epitope retrieval, immunostaining methods and reagents, etc. Through all these breakthroughs, immunohistochemistry became a routine. Paul Nakane introduced the immunohistochemistry that is performed these days, in 1967 (Nakane and Pierce, 1967). In cancer, immunohistochemistry is a technique used to determine whether specific cellular proteins are present or not. The presence of antibodies can be detected using the microscope, because the area that contains bound antibodies will have a different color than the area without antibodies. Sample specimens that have more protein will attach more antibodies and thus it will appear darker. Thus, the pathologist can see not only if a protein is present, but also the relative amount of that protein. Things are about to change in the routine of immunohistochemistry, due to AI. The growth of immunotherapy placed immunohistochemistry in a central role, as tailored treatments are directly linked to the antigens found in the tumors. Nowadays there are still some crucial limitations of immunohistochemistry. For instance, the need for signal amplification through the enzyme linked conjugation. In this step, the number of antigens in the specimen is difficult to be absolutely quantified. Being obtained after reaction saturation, a visual variability is present, and thus the diagnostic interpretation is in the visual sight of the pathologist, which is subjective. A solution proposed by IBM is a new technology the microfluidic probe or MFP. Through the MFP the researchers obtained a saturation approach matrix that shows the kinetic curve of the antibody-antigen reaction (Kashyap et al., 2019). Other solutions imply convolutional neural networks. For example, researchers from China’s University of Jinan registered 103 patients (28 men, 75 women, median age 58 years) with thyroid nodules who had thyroidectomy and immunohistochemistry analysis between January 2013 and January 2016. All of the patients had CT scans before surgery and their surgical collected specimens were analyzed by a 3D Slicer v 4.8.1. The researchers used the Kruskal-Wallis test (SPSS v 19, IBM) to improve the classification performance between the immunohistochemistry characteristic and texture features. The significance level for the characteristics was p < 0.05. The model was trained using support vector machine methods. Only 86 non-redundant features were selected, out of 828. The authors reported as best accuracy for the cytokeratin 19 radiomic model 84.4% in the training phase, and 80% in the testing phase; the thyroperoxidase and galectin accuracies were 81.4% and 82.5% in the training phase, and 84.2% and 85% in the testing phase. The performance of the high molecular weight cytokeratin predictive model was poor, yielding 65.7% accuracy,

176

3. Pathologist at work

and was not validated. The authors state “this model may be used to identify benign and malignant thyroid nodules” (Gu et al., 2019). A deep neural network that can automatically localize and quantify regions with biomarkers is proposed in Sheikhzadeh et al. (2018). The authors proposed a fully convolutional network named Whole Image (WI)-Net, that takes as input a RGB color image of a tissue, and generates as output a map that shows the location of each biomarker. The WI-Net uses a different network, named Nuclei (N)-Net, which is another convolutional neural network that classified every nucleus taking into account its biomarker. The data set consisted of uterine cervical biopsy specimens obtained from colposcopies at the Women’s Clinic at Vancouver General Hospital (Sheikhzadeh et al., 2015). The specimens were collected between April 2013 and February 2015. The data set contained of 4679 RGB images that labeled as p16 positive, Ki-67 positive, p16 and Ki-67 positive, and p16 and Ki-67 negative. The N-Net reached 92% accuracy in the testing phase. The results of the N-Net were fed to the WI-Net that generated the map. The results were compared with manual human labeling and yield an average F-score of 0.96. An AI system called morphological based molecular profiling (MBMP) applied in breast cancer has two components: a logistic regression component that computes the correlations between histomorphology and biomarker expressions, and a deep convolutional neural network based on the ResNet that is used for the prediction of the biomarker found in the specimens (Shamai et al., 2019). The data set that the MBMP had been tested on contains 20,600 digitized hematoxylin and eosin stains sections from 5356 patients that suffer from breast cancer. The data was collected from the Vancouver General Hospital, British Columbia, Canada, between July 1, 2015 and July 1, 2018. The data set is publicly available, published by the Genetic Pathology Evaluation Centre. It can be accessed at http://bliss.gpec.ubc.ca, http://www.gpecimage.uvc.ca and https://tma.im/tma_portal/C-Path (Accessed November 15, 2019). In the study there were involved two cohorts of patients: the first contained 412 patients with the median age of 61 years, and the second contained 4944 patients with the median age of 62 years. The data scientists discovered that after performing the logistic regression there were 19 most significantly correlated features, from which we mention estrogen receptor, progesterone receptor, HER2 or ERBB2. The reported results were: the estrogen receptor was predicted for 50.7% patients from cohort 1 with a PPV of 97%, and for 51.8% patients from cohort 2 with an NPV of 98%. The corresponding NPVs were 68% and 76%. The overall accuracies were 91% and 92%. Using the Bhattacharyya distance the researchers showed that the morphological analysis of patients with ER-positive status by immunohistochemistry resembled to the patients with ER-positive status, Bhattacharyya distance ¼ 0.03, but not to those with ER-negative/PR negative status, Bhattacharyya ¼ 0.25. This is an important result, since it yield that there is a false negative immunohistochemistry finding, and suggests that these patients should receive antihormonal therapy. In Catto et al. (2010) the authors developed an AI system for the analysis of microarrays of bladder cancer’s progression that combines neurofuzzy modeling and artificial neural networks. In the United States it is estimated that in 2019 there will be diagnosed 61,700 men and 18,770 women with bladder cancer. The 5-year survival rate is 77%, and it drops to 65% for the 15-year survival rate, depending on the stage that it has been diagnosed. Catto et al. used AI and statistics to identify the genes that are related to the progression of bladder cancer in 66 tumors with 2800 genes. The genes selected by the AI were afterwards

3.3 Artificial Intelligence and immunohistochemistry

177

investigated using immunohistochemistry in a second cohort that contained 262 tumors. The AI system identified 11 progression-associated genes. Commercial antibodies determined the expression of six selected genes, LIG3, FAS, KRT18, ICAM1, DSG2, BRCA2, and successfully identified the progression of the tumors, concordance index 0.66, log-rank test p ¼ 0.01 (for further details regarding log-rank test please see Chapter 7). The selected genes proved to be more discriminative than the pathologic criteria in the task of establishing the progression of tumors, Cox multivariate analysis p ¼ 0.01. Immunohistochemistry is not an easy task to tackle. Shariff et al. presented an overview of the issues that exist in the immunohistochemistry image analysis, and proposed several solutions for handling them (Shariff et al., 2010). The researchers suggest that four steps should be followed: (a) applying normalization, segmentation, tracing and tracking methods for image preprocessing; (b) using registration to bring all the images to a common reference frame; (c) computing the image features; and (d) applying AI for modeling and interpreting data. Preprocessing the images is a crucial part. Images can have uneven illuminations due to the differences that appear from different runs of the imaging pipeline, or maybe due to imperfections of the optical systems. If an image has a brighter part, then an AI algorithm might wrongly assign a higher marker expression value to those cells. The issues found in images are called artifacts. If we wish to eliminate such illumination artifacts we can compute the mean or the median of the pixel value and then apply a smoothing method for them (e.g. Gaussian smoothing). The pixels will be normalized by the mean pixel value found at that location, eliminating thus the artifact. For cell segmentation and object detection, the authors propose Voronoi segmentation, seeded watershed, active contour based approaches, graphical model segmentation, model based methods, and active masks (for further details regarding these methods we refer the reader to Chan and Vese, 2001; Jones et al., 2005; Chen et al., 2006; Gould et al., 2009; Lin et al., 2003). After cell segmentation and object detection clustering methods can be applied. Depending on the experiment that is done, some measurements might be needed (e.g. lengths, numbers, branches, etc.). In the pathology case these measurements refer to vasculature, neurites, etc. For tracing there, the authors suggest three classical methods: skeletonization (further reading in Cohen et al., 1994; He et al., 2003), vectorizing (further reading in Al-Kofani et al., 2002), and superellipsoids (Tyrrell et al., 2007). During the skeletonization process the image is segmented or a threshold is used. The pixels that remain are removed step by step taking into consideration the surrounding neighborhood, leaving the original images’ “skeleton.” During the vectorization process we analyze in a step only a small portion of an image. After the automatically or manually discovery of the starting point in the image, the algorithm explores the region of interest recursively. Superellipsoids are cylinders with an elliptical cross section that model the structures of interest in the image. When it comes to choosing which tracing method to use, we just need to perform each one of them, and compare its results to the ground truth, and then decide which one fits our problem best. Another important aspect is the dynamics of the movements that go around within the cells. For this we need to be able to track the objects from one image to the next. An interesting study refers to the cell cycle in an unsynchronized population (Signal et al., 2006). Signal et al. examined the cell cycle in the proteome of human cells by measuring in each living cell the protein dynamics and computationally aligning their trajectories. Other tracking methods can be studied in Genovesio and Olivo Martin (2008). The most used

178

3. Pathologist at work

approach for tracking is first to detect the object and then to track them. In the beginning we have to know what objects we need to track, and afterwards to describe them with a set of measurements, which must include the position of the object and other significant measurements such as size, shape, brightness, etc. By defining a distance between two objects, for instance the Euclidian distance, we can link the objects. The Hungarian algorithm minimizes the distance of linked objects in two consecutive frames (Kuhn, 1955). In the case of multiple images, we need to model the movement’s inertia. Some techniques that can be used are particle filtering (Smal et al., 2008), or scoring a whole set of images in one step (Li et al., 2008). Registration is the process of aligning an object in one image to a pattern object in another image. There are several methods for applying registration such as: point based, surface based, and intensity based techniques (Fitzpatrick et al., 2000). The point based technique finds a spatial transformation that aligns two feature point sets. The surface based technique computes and also aligns a three dimensional boundary surface of the first object and the pattern object. The intensity based technique uses an iterative process that optimizes the similarity score between two consecutive images. The similarity score can be based on cross-correlation, mutual information, or the least square. Intensity based methods are gaining popularity in the field. Slowly, but steadily, we approach our final section from this chapter, and that is how AI can be used together with genetics in cancer research.

3.4 Artificial Intelligence and genetics Cancer is a genetic disease. It is caused by mutations that happen to certain genes in our body that control how our cells function, how they grow, divide and die. Genes make proteins, which make our cells work. By the modification of some genes, cells start to grow uncontrollably and become cancer. In some cells the genes can increase the production of a certain protein, and that increase can cause the cell to grow. Other cases include the production of a nonfunctional protein that is designed to repair cellular damage. Being nonfunctional the protein cannot repair the cellular damage. These genes can be inherited from our parents; other can be acquired during our life caused by different carcinogenic substances that damage the DNA (tobacco smoke, radiation, etc.). Some damages affect just a nucleotide, which is one unit of the DNA, or it can add or remove chemical marks that influence gene expression, which affects how and if the messenger RNA is produced. Each cancer is unique, because it is a unique merge between modified genes. These modifications can be the cause or the result of cancer. As cancer grows, the alteration process worsens. In the same tumor the cancer cells can be genetically modified. Only 5%–10% of all cancers are from inherited genetic mutations. You may have heard that some people undergo genetic testing to see whether they have hereditary cancer syndromes. Experts suggest that genetic testing for cancer risk should be done only when a person has a personal or family history that suggests a high possibility of developing the disease, and that they should be accurately interpreted. Even if someone might believe that cancer may run in the family, that cancer might not be inherited, rather than obtained due to shared lifestyle. If a family appears to have a certain pattern, meaning the family members develop certain types

3.4 Artificial Intelligence and genetics

179

of cancer around a certain age, etc., that might indicate the presence of a hereditary cancer syndrome. We hope that you are not scared now when you are reading these words. You must know that even if a mutation that is cancer predisposed exists in your family it does not mean that if you have inherited it you will have cancer. But stressing over it will ruin your lifestyle, whether you just end up with depression or even cause cancer. Studying cancer research papers you may or may have not encountered discussions about different genes that are responsible for hereditary cancer syndromes. Here we present some of these genes: • TP53 is the most commonly mutated gene that produces a certain protein that overpowers tumor growth. Besides this, the TP53 can cause Li-Fraumeni syndrome (Guha and Malkin, 2017). The syndrome is rare, but when one inherits it, they have a higher risk of developing certain cancers. • BRCA1 and BRCA2 are the inherited mutations genes that are related to hereditary breast and ovarian cancer syndrome. Other cancers like pancreatic and prostate cancer have been associated with this syndrome also. Recall that the American actress Angelina Jolie underwent surgery to remove her breast, ovaries and fallopian tubes, in order to reduce her changes of developing breast cancer after discovering that she was carrying a mutated BRCA1 gene inherited from her mother, who passed away due to ovarian cancer. 1 of 1000 people inherits a mutated gene of BRCA1, and 5% of 50,000 women diagnosed with breast cancer per year are carrying the BRCA1 mutated gene. Angelina Jolie’s action caused a debate in the world, women wondering what to do if they discover they have mutated BRCA1. Is surgery the only option? That is a call that each woman must make for herself. They can choose surgery or intensive monitoring so they catch the cancer in its earlier stage and treat it successfully. • PTEN mutated gene is associated with Cowden syndrome that increases the risk of breast, thyroid, endometrial, etc., cancer. How are genetic changes identified? Basically lab tests or DNA sequencing tests read the DNA. Thus, off course genetics is another place where AI can help. Merging patient’s gene expression data with AI, intelligent decision systems can diagnose accurately cancer. To diagnose cancer from tissue biopsy, pathologists need high quality specimens that have identifiable features. Rare cancers do not match this criterion, and are hard to be diagnosed using classical methods. In Belciug and Gorunescu (2018), an adaptive single hidden layer feedforward neural network was used for classifying microarrays DNA gene expressions regarding breast, lung, colon, and ovarian cancer. The data sets are publicly available at http://mldata.org (Accessed May 3, 2019). Due to the fact that the analysis was performed on DNA microarrays the researchers faced the “curse of dimensionality” and also the “curse of sparsity.” The data sets are highly dimensional, having from 2000 to 24,481 features, and few observations that range from 62 to 253 cases. The reported results were competitive with other traditional methods. The accuracies ranged from 54.62% to 92.84%. The AI system also used a feature selection mechanism that reduced the attributes used from 32.29% to 72.27%. Another study performed on the same data sets, used a Bayesian learning framework for extreme learning machines in order to classify the DNA gene expressions microarrays (Belciug and Ivanescu, 2019).

180

3. Pathologist at work

The authors reported accuracies that ranged between 53.33% and 92.58%. In Berglund and Belciug (2018), an extreme learning machine/ant colony optimization algorithm tandem was developed to reduce the size of the DNA gene expressions microarrays. The AI system was applied yet again on the two of the above data sets and the reported results ranged from 52% to 93.70%. Another study performed by the scientists from the BC Cancer’s Genome Sciences Centre show that an AI system can provide cancer diagnosis with precision. In Grewal et al. (2019), the researchers used a Supervised Cancer Origin Prediction Using Expression (SCOPE), an AI technique that assesses RNA sequencing data. The testing data set contained untreated primary cancers and treated metastases cases from volunteer adult patients at the BC Cancer in Vancouver British Columbia. The data was collected between January 2013 and March 2016. The AI system analyzed 17,688 genes and generated a diagnosis on 40 different types of cancer. The reported results were: an accuracy rate of 99% for the epithelioid mesotheliomas tested (125 cases out of 126); the remaining mesotheliomas had sarcomatoid mesotheliomas, and SCOPE correctly classified them as a mixture of primary components. The overall accuracy was 86% on the 201 treatment resistant cancers. Sophia Genetics—http://sophiagenetics.com (accessed November 16, 2019)—is a company located in Switzerland. It was founded in 2011 and its purpose is to “democratize data driven science”—Kevin Puylaert, company General Manager of North America. Hospitals send their gathered data to Sophia’s AI platform. Sophia’s AI analyzes the data and performs genomic variant detection, annotation, and interpretation, after which it writes down a report and sends it back to the hospital. Sophia uses statistical analysis and pattern recognition. It works with 1000 institutions in 81 countries and analyzes over 15,000 cases per month. The main problem of genetics and AI is the same like in all the other cases discussed so far: data. Genomic data is noisy, and noise might represent noise to a researcher, and at the same time it might be a pathogenic variant to another researcher. Thus totally opposite diagnosis might be established on the same case depending on who analyses it. That is why each new lab that collaborates with Sophia needs to preprocess its data first, and must make sure that it corresponds to the quality imposed by Sophia. Eric Lefkofsky created Tempus—www.tempus.com (Accessed November 17, 2019)— another AI platform that uses advanced machine learning on genomics. It was created with the same purpose as Sophia, to help doctors make a more accurate diagnosis and to decide on the corresponding treatment. In October 2017, Tempus xT a sequencing panel was introduced. It reviews 595 genes that are related to the diagnosis, the prognosis and therapeutic targeting of cancer. Tempus gathers clinical data (phenotypic, therapeutic), sequences DNA and RNA, and computes a decision class. Tempus raised over $520 million. Another genomic profiling company, Foundation Medicine—https://www. foundationmedicine.com (Accessed November 18, 2019)—uses the output of its analysis based on the human genome to conceive targeted clinical trial, therapies, and immunotherapies. In 2017, FDA approved FoundationOne CDx, a platform that detects 324 gene mutations, selects genomic signatures and gene rearrangements in solid tumors. Due to the resources and logistics, these companies do a better job at gathering, annotating, and curating genomic data, than any academic researchers. Another option for clinicians to know how to interpret genomic data is ClinVar—ncbi.nlm. nih.gov/clinvar (Accessed November 18, 2019)—a publicly available archive of relationships among genetic variants and phenotypes.

3.4 Artificial Intelligence and genetics

181

Professor Geoffrey Hinton, from the University of Toronto, and his team are focusing on how to use AI algorithms to understand the genome. During a talk at EmTech Digital organized by MIT Technology Review, Professor Frey stated that a crucial part is still missing in the genomic area, and that is a reliable explanatory connection between the genotype and phenotype. Technically speaking, we should be not interested only in the classification of a certain mutation, but also to try to provide an explanation, going back through the network and understand why is a certain variant leading to a disease. If the solution to this problem is to be found, then we might be able to module a protein level or edit genes with CRISPR (a method that permits editing living organisms’ genomes; Ormond et al., 2017; Gupta and Musunuru, 2014; Lander, 2016). Besides AI, augmented reality (AR), and virtual reality (VR), another technology is rapidly evolving, genomic data visual analytics. In order to speed up the decisions and to find the best treatment for an individual patient, personalized medicine needs new genomic visualization tools that can extract knowledge from genomic data. Combining AI and data visualization can be used for targeted personalized therapy (Nguyen et al., 2011). Three-dimensional VR/AR techniques simulated movement in three dimensions to increase the bandwidth of data to be perceived by our brains (Leung et al., 2016; Nguyen et al., 2016). Different types of alterations together with clinical experience can be integrated by the use of visualization techniques. Multidimensional oncogenomic data can be depicted in heat maps, genomic coordinates, and networks (Bhojwani et al., 2008; Rebeiz and Posakony, 2004; Schroeder et al., 2013). There is a need in developing visual stratification tools in order to see the tumor’s genomic profile and all its connections, in order to produce effective drug treatments (Albuquerque et al., 2017; Schroeder et al., 2013). While performing an analysis, researchers can normalize the experimental difference between the used samples and afterwards identify different genes taking into account fold-change level during the comparison across samples, for instance non-healthy versus healthy tissues. Cluster analysis can be used to group the genes that have similar behavior patterns, followed by a scatter plot to visualize the clusters (Ciaramella et al., 2008; Pollard and van der Laan, 2005). Hierarchical clustering based on expression correlation can be performed to group behavior patterns, thus data from distant genome loci can be clustered and visualized together for assessment (Eisen et al., 1998; Huang da et al., 2009). It is not an easy task when it comes to Big Data visualization. How can you visualize 260 terabytes of human genome data—http://www.1000genomes.org/ (Accessed November 18, 2019) (Via et al., 2010)? More and more data becomes available: Internet Archive-Wayback Machine 2015—http://archive.org/web/web.php (Accessed November 18, 2019), The Lemur Project: The ClueWeb09 Data set—http://lemurproject.org/clueweb09.php (Accessed November 18, 2019). As we have seen, Big Data is becoming more and more used by industry and businesses, and according to the International Data Corporation (IDC), trading data is becoming a separate market. Large organizations are purchasing external data (Gantz and Reinsel, 2012). By 2020, it is forecasted that the number of digital bits will equal the number of stars in the universe, 44 zettabytes (Turner et al., 2014). For further details on how genomic cancer data can be visualized we refer the reader to Qu et al. (2019). The potential of AI in genetics is enormous. Precision medicine cannot be done without this merge. In 2018, the United Kingdom government announced a new Genomic Medicine Service, which enrolls children and adults with rare diseases, and some types of cancer, so

182

3. Pathologist at work

that personalized treatments can be developed for them. The goal is sequencing five million genomes over the next 4 years. We can go on and on talking about genetics, cancer, and AI, but for now it is enough. We must proceed further in this book, and discover how surgeons can learn everything about a tumor before making the first cut, and all sorts of exciting new things that are or can be brought by AI in oncological surgeries.

References Albuquerque, M.A., Grande, B.M., Ritch, E., Pararajalingam, P., Jessa, S., Krzywinski, M., Grewal, J.K., Shah, S.P., Boutros, P.C., Morin, R.D., 2017. Enhancing knowledge discovery from cancer genomics data with Galaxy. GigaScience. 6 (5). https://doi.org/10.1093/gigascience/gix015. Al-Kofani, K., Lasek, S., Szarowski, D.H., Page, C.J., Nagy, G., Turner, J.N., 2002. Rapid automated three dimensional tracing of neurons from confocal image stacks. IEEE Trans. Inf. Technol. Biomed. 6, 171–187. Araujo, T., Aresta, G., Castro, E., Rouco, J., Aguiar, P., Eloy, C., Polonia, A., Campilho, A., 2017. Classification of breast cancer histology images using convolutional neural networks. PLoS One. 12, e0177544. https://doi.org/10.1371/ journal.pone.0177544. Badve, S.S., Beitsch, P.D., Bose, S., Byrd, D., Chen, V.W., Connolly, J.L., Dogan, B., D’Orsi, C.J., Edge, S.B., Giuliano, A., Hortobagyi, G.N., Mahar, A.L., Mayer, I.A., McCormick, B., Mitterndorf, E., Recht, A., Reis-Filho, J.S., Rugo, H.S., Simpson, J.F., Solin, L.J., Symmans, F.W., Valerand, T.M., Van Eycken, L.J., Weaver, D.L., Winchester, D.J., 2018. Breast Cancer Staging System: AJCC Cancer Staging Manual, 8th ed., 2017. https://cancerstaging.org/referencestools/deskreferences/Documents/AJCC%20Breast%20Cancer%20Cancer%20Staging%20System.pdf. Baidoshvili, A., Bucur, A., van Leeuwen, J., van der Laak, J., Kluin, P., van Diest, P.J., 2018. Evaluating the benefits of digital pathology implementation: time savings in laboratory logistics. Hispathology 73 (5), 784–794. https://doi. org/10.1111/his.13691. Bandi, P., Geessink, O., Manson, Q., Van Dijk, M., Balkenhol, M., Hermsen, M., Ehteshami Bejnordi, B., Lee, B., Paeng, K., Zhong, A., Li, Q., Zanjani, F.G., Zinger, S., Fukuta, K., Komura, D., Ovtcharov, V., Cheng, S., Zeng, S., Thagaard, J., Dahl, A.B., Lin, H., Chen, H., Jacobsson, L., Hedlund, M., Cetin, M., Halici, E., Jackson, H., Chen, R., Both, F., Franke, J., Kusters Vandelvelde, H., Vreuls, W., Bult, P., van Ginneken, B., van der Laak, J., Litjens, G., 2019. From detection of individual metastases to classification of lymph node status at the patient level: the CAMELYON17 challenge. IEEE Trans. Med. Imaging 38 (2), 550–560. https://doi.org/ 10.1109/TMI.2018.2867350. Bejnordi, B.E., Veta, M., van Diest, P.J., van Ginneken, B., Karssemeijer, N., Litjens, G., van der Laak, J.A.W., Manson, Q., Balkenhol, M., Geessink, O., Stathonikos, N., van Dijk, M., Bult, P., Beca, F., Beck, A., Wang, D., Khosla, A., Gargeya, R., Irshad, H., Zhong, A., Dou, Q., Li, Q., Chen, H., Lin, H.-J., Heng, P.-A., Has, C., Bruni, E., Wong, Q., Halici, U., Oner, M., Cetin-Atalay, R., Berseth, M., Khvatkov, V., Vylegzhanin, A., Kraus, O., Shaban, M., Rajpoot, N., Awan, R., Sirinukunwattana, K., Latonen, L., Ruusuvuori, P., Liimatainen, K., Albarqouni, S., Mungal, B., George, A., Demirci, S., Navab, N., Watanabe, S., Seno, S., Takenake, Y., Matsuda, H., Phoulady, H.A., Kovalev, V., Kalinovsky, A., Liauchuk, V., Bueno, G., FernandezCarrobles, M.M., Serrano, I., Deniz, O., Racoceanu, D., Venancio, R., 2017. Diagnostic assessment of deep learning algorithms for detection of lymph node metastases in women with breast cancer. JAMA 318 (22), 2199–2210. Belciug, S., Gorunescu, F., 2018. Learning a single hidden layer feedforward neural network using rank correlation based strategy with application to high dimensional gene expression and proteomic spectra dataset in cancer detection. J. Biomed. Inform. 83, 159–166. https://doi.org/10.1016/j.jbi.2018.06.003. Belciug, S., Ivanescu, R.C., 2019. A Bayesian framework for extreme learning machine with application for automated cancer detection. Ann. Univ. Craiova Math. Comput. Sci. Ser. 46 (1), 189–202. Berglund, R., Belciug, S., 2018. Improving extreme learning machine performance using ant colony optimization feature selection. Application to automated medical diagnosis. Ann. Univ. Craiova Math. Comput. Sci. Ser. 45 (1), 151–155. Bhojwani, D., Kang, H., Menezes, R.X., Yang, W., Sather, H., Moskowitz, N.P., Min, D.J., Potter, J.W., Harvey, R., Hunger, S.P., Seibel, N., Raetz, E.A., Pieters, R., Horstmann, M.A., Relling, M.V., den Boer, M.L., Willman, C.L., Carroll, W.L., 2008. Gene expression signatures predictive of early response and outcome in high risk childhood acute lymphoblastic leukemia: a children’s oncology group study. J. Clin. Oncol. 26, 4376–4384.

References

183

Birner, P., Prager, G., Streubel, B., 2016. Molecular pathology of cancer: how to communicate with disease. ESMO Open. 1 (5), e000085. Byers III, J.M., 1989. Rudolf Virchow—father of cellular pathology. Am. J. Clin. Pathol. 92 (1), 2–8. Campanella, G., Hanna, M.G., Geneslaw, L., Miraflor, A., Werneck, K.S., Busam, K., Brogi, E., Reuter, V.E., Klimstra, D.S., Fuchs, T.J., 2019. Clinical-grade computational pathology using weakly supervised deep learning on whole slide images. Nat. Med. 25, 1301–1309. Cancer Genome Atlas Network, 2012. Comprehensive molecular portraits of human breast tumors. Nature 490, 61–70. https://doi.org/10.1038/nature11412. Catto, J.W.F., Maysam, F.A., Wild, P.J., Linkens, D.A., Pilarsky, C., Rehman, I., Rosario, D.J., Denzinger, S., Burger, M., Stoehr, R., Knuechel, R., Hartmann, A., Hamdy, F., 2010. The application of artificial intelligence to microarray data: identification of a novel gene signature to identify bladder cancer progression. Eur. Urol. 57 (3), 398–406. Chan, T.F., Vese, L.A., 2001. Active contours without edges. IEEE Trans. Image Process. 10, 266–277. Chen, S.C., Zhao, T., Gordon, G.J., Murphy, R.F., 2006. A novel graphical model approach to segmenting cell images. In: Proceedings of the IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, pp. 1–8. Chen, H., Dou, Q., Wang, X., Qin, J., Heng, P.A., 2016. Mitosis detection in breast cancer histology images via deep cascaded networks. In: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence. AAAI Press, pp. 1160–1166. Childs, G., 2014. History of immunohistochemistry. In: McManus, L.M., Mitchell, R.N. (Eds.), Pathobiology of Human Disease: A Dynamic Encyclopedia of Disease Mechanisms. https://doi.org/10.1016/B978-0-12-3864567.07401-3. Christensen, E., Neuberger, J., Crowe, J., Altman, D.G., Popper, H., Portmann, B., Doniach, D., Ranek, L., Tygstrup, N., Williams, R., 1985. Beneficial effect of azathioprine and prediction of prognosis in primary biliary cirrhosis: final results of an international trial. Gastroenterology 89 (5), 1084–1091. Ciaramella, A., Cocozza, S., Iorio, F., Miele, G., Napolitano, F., Pinelli, M., Raiconi, G., Tagliaferri, R., 2008. Interactive data analysis and clustering of genomic data. Neural Netw. 21 (2–3), 368–378. Ciresan, D.G., Giusti, A., Gambardella, L.M., Schmidhuber, J., 2013. Mitosis detection in breast cancer histology images with deep neural networks. In: Mori, K., Sakuma, I., Sato, Y., Barillot, C., Navad, N. (Eds.), Medical Image Computing and Computer Assisted Intervention—MICCAI 2013. Springer, Berlin, Heidelberg, pp. 411–418. Cohen, A.R., Roysam, B., Turner, J.N., 1994. Automated tracing and volume measurements of neurons from 3D confocal fluorescence microscopy data. J. Microsc. 173, 103–114. Coons, A.H., Creech, H.J., Jone, R.N., 1941. Immunological properties of an antibody containing a fluorescent group. Exp. Biol. Med. https://doi.org/10.3181/00379727-47-13084P. Coons, A.H., Creech, H.J., Joens, R.N., Berliner, G., 1942. The demonstration of pneumoccocal antigen in tissues by the use of fluorescent antibody. J. Immunol. 45, 159–170. Coudray, N., Ocampo, P.S., Sakellaropoulos, T., Narula, N., Snuderl, M., Fenyo, D., Moreira, A.L., Razavian, N., Tsirigos, A., 2018. Classification and mutation prediction from non-small cell lung cancer histopathology images using deep learning. Nat. Med. 24, 1559–1567. Cox, D.R., 1972. Regression models and life tables. J. R. Stat. Soc. B 34, 187–202. Dechter, R., 1986. Learning while searching in constraint-satisfaction-problems. In: AAAI-86 Proceedings. http:// www.aai.org/papers/AAAI/AAAi86-o29.pdf. Eisen, M.B., Spellman, P.T., Brown, P.O., Botstein, D., 1998. Cluster analysis and display of genome wide expression patterns. Proc. Natl. Acad. Sci. U. S. A. 95, 14863–14868. Fitzpatrick, J.M., Hill, D.L.G., Maurer, C.R., 2000. Imagine registration. In: Fitzpatrick, J.M., Sonka, M. (Eds.), Handbook of Medical Imaging: Volume 2. Medical Image Processing and Analysis. SPIE—The International Society for Optical Engineering, Bellingham, WA, pp. 447–513. Gantz, J., Reinsel, D., 2012. The digital universe in 2020: big data, bigger digital shadows, and biggest growth in the far east. In: IDC iView IDC Anal. Future, pp. 1–16. Genovesio, A., Olivo Martin, J.C., 2008. Particle tracking in 3 D + t biological imaging. In: Rittscher, J., Machiraju, R., Wong, S.T.C. (Eds.), Microscopic Image Analysis for Life Science Applications. Norwood, MA, Artech House, pp. 223–282. Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y., 2014. Generative Adversarial Networks. arvix.org/abs/1406.2661. Gould, S., Gao, T., Koller, D., 2009. Region based segmentation and object detection. In: Advances in Neural Information Processing Systems (NIPS 2009). Neural Information Processing Systems Foundation, La Jolla, CA.

184

3. Pathologist at work

Grewal, J.K., Tessier Cloutier, B., Jones, M., Gakkhar, S., Ma, Y., Moore, R., Mungall, A.J., Zhao, Y., Taylor, M.D., Elmon, K., Lim, H., Renouf, D., Laskin, J., Marra, M., Yip, S., Jones, S.J.M., 2019. Application of a neural network whole transcriptome based pan cancer method for diagnosis of primary and metastatic cancers. JAMA Netw. Open. 2 (4), e192597. https://doi.org/10.1001/jamanetworkopen.2019.2597. Gu, J., Zhu, J., Qiu, Q., Wang, Y., Bai, T., Yin, Y., 2019. Prediction of immunohistochemistry of suspect thyroid nodules by use of machine learning based radiomics. Am. J. Roentgenol, 1. https://doi.org/10.2214/ AJR.19.21626. Guha, T., Malkin, D., 2017. Inherited TP53 mutations and the Li-Fraumeni Syndrome. Cold Spring Harb. Perspect. Med. 7 (4), a026187. https://doi.org/10.1101/cshperspect.a026187. Gupta, R.M., Musunuru, K., 2014. Expanding the genetic editing tool kit: ZFNs, TALENs, and CRISPR-Cas9. J. Clin. Invest. 124 (10), 4154–4161. https://doi.org/10.1172/JCI72992. Gurcan, M.N., Madabhushi, A., Rajpoot, N., 2010. Pattern recognition in histopathological images: an ICPR 2010 contest. In: Unay, D., Catalpete, Z., Aksoy, S. (Eds.), Recognizing Patterns in Signals, Speech, Images and Video. Springer, Berlin; Heidelberg, pp. 226–234. Hardaker, A., 2019. UK AI Investment Hits $ 1.3 bn as Government Invests in Skills. https://www.businesscloud.co. uk/news/uk-ai-investment-hits-13bn-as-governement-invests-in-skills. He, W., Hamilton, T.A., Cohen, A.R., Holmes, T.J., Pace, C., Szarowki, D.H., 2003. Automated three dimensional tracing of neurons in confocal and brightfield images. Microsc. Microanal. 9, 296–319. Heng, Y.J., Lester, S.C., Tse, G.M., Factor, R.E., Allison, K.H., Collins, L.C., Chen, Y.Y., Jensen, K.C., Johnson, N.B., Jeong, J.C., Punjabi, R., Shin, S.J., Singh, K., Krings, G., Eberhard, D.A., Tan, P.H., Korski, K., Waldman, F.M., Gutman, D.A., Sanders, M., Reis-Filho, J.S., Flanagan, S.R., Gendoo, D.M., Chen, G.M., Haibe-Kains, B., Ciriellp, G., Hoadley, K.A., Perou, C.M., Beck, A.H., 2017. The molecular basis of breast cancer pathological phenotypes. J. Pathol. 241 (3), 375–391. https://doi.org/10.1002/path.4847. Huang da, W., Sherman, B.T., Lempicki, R.A., 2009. Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Res. 37, 1–13. Jones, T.R., Carpenter, A.E., Golland, P., 2005. Voronoi based segmentation of cells on image manifolds. In: Liu, Y., Jiang, T., Zhang, C. (Eds.), Computer Vision for Biomedical Image Applications. Springer, Berlin, pp. 535–543. Jonnalagedda, P., Schmolze, B., Bhanu, B., 2018. MVPNets: multi-viewing path deep learning neural networks for magnification invariant diagnosis in breast cancer. In: 2018 IEEE 18th International Conference on Bioinformatics and Bioengineering (BIBE), pp. 189–194. https://ieeexplore.ieee.org/document/856783. Kantha, S.S., 1991. A centennial review; the 1890 tetanus antitoxin paper of von Behring and Kitasato and the related developments. Keio J. Med. 40 (1), 35–39. Kashyap, A., Fomitcheva Khartchenko, A., Pati, P., Gabrani, M., Schraml, P., Kaigala, G.V., 2019. Quantitative microimmunohistochemistry for the grading of immunostains on tumor tissues. Nat. Biomed. Eng. 3 (6), 478–490. https://doi.org/10.1038/s41551-019-0386-3. Kaufmann, S.H., 2008. Immunology’s foundation: the 100 year anniversary of the Nobel Prize to Paul Ehrlich and Elie Metchnikoff. Nat. Immunol. 9, 705–712. Kuhn, H.W., 1955. The Hungarian Method for the Assignment Problem. https://web.eecs.umich.edu/pettie/ matching/Kuhn-hungarian-assignment.pdf. Kumagai, Y., Takubo, K., Kawada, K., Aoyama, K., Endo, Y., Ozawa, T., Hirasawa, T., Yoshio, T., Ishihara, S., Fujishiro, M., Tamaru, J.I., Mochiki, E., Ishida, H., Tada, T., 2019. Diagnosis using deep-learning artificial intelligence based on the endocytoscopic observation of the esophagus. Esophagus 16 (2), 180–187. https://doi.org/ 10.1007/s10388-018-0651-7. Lander, E.S., 2016. The heroes of CRISPR. Cell 164 (1–2), 18–28. https://doi.org/10.1016/j.cell.2015.12.041. LeCun, Y., Bottou, L., Bengio, Y., Haffner, P., 1998. Gradient based learning applied to document recognition. Proc. IEEE 86 (11), 2278–2324. https://doi.org/10.1109/5.726791. Leung, M.K.K., Delong, A., Alipanahi, B., Freu, B.J., 2016. Machine learning in genomic medicine: a review of computational problems and data sets. Proc. IEEE 104, 176–197. Li, K., Miller, E.D., Chen, M., Kanade, T., Weiss, L.E., Campbell, P.G., 2008. Cell population tracking and lineage construction with spatiotemporal context. Med. Image Anal. 12, 546–566. Lin, G., Adiga, U., Olson, K., Guzowski, J.F., Barnes, C.A., Roysam, B., 2003. A hybrid 3D watershed algorithm incorporating gradient cues and object models for automatic segmentation of nuclei in confocal image stacks. Cytometry A 56A, 23–36.

References

185

Litjens, G., Bandi, P., Ehteshami Bejnordi, B., Geessink, P., Balkenhol, M., Bult, P., Halitovic, A., Hermsen, M., van de Loo, R., Vogels, R., Manson, Q.F., Stathonikos, N., Baidoshvili, A., van Diest, P., Wauters, C., van Dijk, M., van der Laak, J., 2018. 1399 H&E stained sentinel lymph node sections of breast cancer patients: the CAMELYON dataset. GigaScience. 7 (6). https://doi.org/10.1093/gigasicence/giy065. Liu, Y., Kohlberger, T., Norouzi, M., Dahl, G.E., Smith, J.L., Mohtashamian, A., Olson, N., Peng, L.H., Hipp, J.D., Stumpe, M.C., 2019. Artificial intelligence-based breast cancer nodal metastasis detection: insights into black box for pathologists. Arch. Pathol. Lab. Med. 143 (7), 859–868. https://doi.org/10.5858/arpa.2018-0147-OA. Marrack, J., 1934. Nature of antibodies. Nature 133, 292–293. Mukhapodhyay, S., Feldman, M.D., Abels, E., Ashfaq, R., Beltaifa, S., Cacciabeve, N.G., Cathro, H.P., Cheng, L., Cooper, J., Dickey, G.E., Gill, R.M., Heaton Jr., R.P., Kerstens, R., Lindberg, G.M., Malhotra, R.K., Mandell, J.W., Manlucu, E.D., Mills, A.M., Mills, S.E., Moskaluk, C.A., Nelis, M., Patil, D.T., Przybycin, C.G., Reynolds, J.P., Rubin, B.P., Saboorian, M.H., Salicru, M., Samols, M.A., Sturgis, C.D., Turner, K.O., Wick, M.R., Yoon, J.Y., Zhao, P., Taylor, C.R., 2018. Whole slide imaging versus microscopy for primary diagnosis in surgical pathology: a multicenter blinded randomized noninferiority study of 1992 cases (pivotal study). Am. J. Surg. Pathol. 42 (1), 39–52. https://doi.org/10.1097/PAS.0000000000000948. Nagpal, K., Foote, D., Liu, Y., Cameron Chen, P.-H., Wulczyn, E., Tan, F., Olson, N., Smith, J.L., Mohtashamian, A., Wren, J.H., Corrado, G.S., MacDonald, R., Peng, L.H., Amin, M.B., Evans, A.J., Sangoi, A.R., Mermel, C.H., Hipp, J.D., Stumpe, M.C., 2019. Development and validation of a deep learning algorithm for improving Gleason scoring of prostate cancer. NPJ Digit. Med., 2–48. https://doi.org/10.10138/s41746-019-0112-2. Nakane, P., Pierce Jr., G.B., 1967. Enzyme labeled antibodies for the light and electron microscopic localization of tissue antigens. J. Cell Biol. 33 (2), 307–318. Nguyen, Q.V., Gleeson, A., Ho, N., Simoff, S., Catchpoole, D., 2011. Visual analytics of clinical and genetic datasets of acute lymphoblastic leukaemia. In: Lu, B.L., Zhang, L., Kwok, J. (Eds.), Neural Information Processing: 18th International Conference (ICONIP 2011), Shanghai, China, November 13–17, 2011, Proceedings, Part I. Springer, Berlin, Germany, pp. 113–120. Nguyen, Q.V., Khalifa, N.H., Alzamora, P., Gleeson, A., Catchpoole, D., Kennedy, P., Simoff, S., 2016. Visual analytics of complex genomics data to guide effective treatment decision. J. Imaging, 2 (4), 2–29. Ormond, K.E., Mortolock, D.P., Scholes, D.T., Bombard, Y., Brody, L.C., Faucett, W.A., Garrisson, N.A., Hercer, L., Isasi, R., Middleton, A., Musunuru, K., Shriner, D., Virani, A., Young, C.E., 2017. Human germline genome editing. Am. J. Hum. Genet. 101 (2), 167–176. Paeng, K., Hwang, S., Park, S., Kim, M., 2016. A Unified Framework for Tumor Proliferation Score Prediction in Breast Histopathology. arxiv.org/abs/1612.07180. Pego, A.A.P., 2015. Grand Challenge: Bioimaging 2015. http://www.bioimaging2015.ineb.up.pt/challenge_ overview.html. Piro, A., Tagarelli, A., Tagarelli, G., Lagonia, P., Quattrone, A., 2008. Paul Ehrlich: the Nobel Prize in physiology or medicine 1908. Int. Rev. Immunol. 27 (1–2), 1–17. https://doi.org/10.1080/08830180701848995. Pollard, K.S., van der Laan, M.J., 2005. Cluster analysis of genomic data. In: Gentleman, R., Carey, V.J., Huber, W., Irizarry, R.A., Dudoit, S. (Eds.), Bioinformatics and Computational Biology Solutions Using R and Bioconductor (Statistics for Biology and Health). Springer, New York, NY, pp. 209–228. Qu, Z., Lua, C.W., Nguyen, Q.V., Zhou, Y., Catchpoole, D.R., 2019. Visual analytics of genomic and cancer data: a systematic review. Cancer Informat. 18. https://doi.org/10.1177/1176935119835546. Rebeiz, M., Posakony, J.W., 2004. GenePalette: a universal software tool for genome sequence visualization and analysis. Dev. Biol. 271, 431–438. Ronneberger, O., Fischer, P., Brox, T., 2015. U-net: convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. http://arxiv. org/abs/1505.04597. Roux, L., 2014. Detection of mitosis and evaluation of nuclear atypia score in breast cancer histological images. In: 22nd International Conference on Pattern Recognition 2014. MITOS-ATYPIA Contest. http://ludo17.free. fr/mitos_atypia_2014/icpr2014_MitosAtypia_DataDescription.pdf. Roux, L., Racoceanu, D., Lomenie, N., Kulikova, M., Irshad, H., Klossa, J., Capron, F., Genestie, C., Le Naour, G., Gurcan, M.N., 2013. Mitosis detection in breast cancer histological images an ICPR 2012 contest. J. Pathol. Inform. 4 (8). https://doi.org/10.4103/2153-3539.112693. Schmalstieg Jr., F.C., Goldman, A.S., 2008. Ilya Illich Metchnikoff (1845-1915) and Paul Ehrlich (1854-1915): the centennial of the 1908 Nobel Prize in physiology or medicine. J. Med. Biogr. 16, 96–103.

186

3. Pathologist at work

Schroeder, M.P., Gonzalez Perez, A., Lopez Bigas, N., 2013. Visualizing multidimensional cancer genomics data. Genome Med., 5–9. Serag, A., Margineanu, A.I., Qureshi, H., McMillan, R., Saint Martin, M.J., Diamond, J., O’Reilly, P., Hamilton, P., 2019. Translational AI and deep learning in diagnostic pathology. Front. Med. https://doi.org/10.3389/ fmed.2019.00185. Shamai, G., Binenbaum, Y., Slossberg, R., Duek, I., Gil, Z., Kimmel, R., 2019. Artificial intelligence algorithms to assess hormonal status from tissue microarrays in patients with breast cancer. JAMA Netw. Open. 2 (7), e197700. https://doi.org/10.1001/jamanetworkopen.2019.7700. Shariff, A., Kangas, J., Coelho, L.P., Quinn, S., Murphy, R.F., 2010. Automated image analysis for high-content screening and analysis. J. Biomol. Screen. 15 (7), 726–734. Sheikhzadeh, F., Ward, R.K., Carraro, A., Chen, Z.Y., van Niekerk, D., Miller, D., Ehlen, T., MacAulay, C.E., Follen, M., Lane, P.M., Guillaud, M., 2015. Quantification of confocal fluorescence microscopy for the detection of cervical intraepithelial neoplasia. Biomed. Eng. Online 14, 96. https://doi.org/10.1186/s12938-015-00893-6. Sheikhzadeh, F., Ward, R.K., van Niekerk, D., Guillaud, M., 2018. Automatic labeling of molecular biomarkers of immunohistochemistry images using fully convolutional networks. PLoS One. 13 (1), e0190783. https://doi.org/ 10.1371/journal.pone.0190783. Signal, A., Milo, R., Cohen, A., Geva-Zatorsky, N., Klein, Y., Alaluf, I., 2006. Dynamic proteomics in individual human cells uncovers widespread cell-cycle dependence of nuclear proteins. Nat. Methods 3, 525–531. Smal, I., Draegestein, K., Galjart, N., Niessen, W., Meijering, E., 2008. Particle filtering for multiple object tracking in dynamic fluorescence microscopy images: application to microtubule growth analysis. IEEE Trans. Med. Imaging 27, 789–804. Steiner, D.F., MacDonald, R., Liu, Y., Truszkowski, P., Hipp, J., Gammage, C., Thng, F., Peng, L., Stumpe, M.C., 2018. Impact of deep learning assistance on the histopathologic review of lymph nodes for metastatic breast cancer. Am. J. Surg. Pathol. 42 (12), 1636–1646. https://doi.org/10.1097/PAS.0000000000001151. Turner, V., Reinsel, D., Gantz, J.F., Minton, S., 2014. The Digital Universe of Opportunities: Rich Data and the Increasing Value of the Internet of Things. IDC Analyze the Future. Tyrrell, J.A., di Tomaso, E., Fuja, D., Kozak, K., Roysam, B., 2007. Robust 3-D modeling of vasculature imagery using superellipsoids. IEEE Trans. Med. Imaging 26 (2), 223–237. van de Tweel, J.G., Taylor, C.R., 2010. A brief history of pathology. Preface to a forthcoming series that highlights milestones in the evolution of pathology as a discipline. Virchows Arch. 457 (1), 3–10. Veta, M., van Diest, P.J., Willems, S.M., Wang, H., Madabhushi, A., Cruz Roa, A., Gonzalez, F., Larsen, A.B., Vestergaard, J.S., Dahl, A.B., Ciresan, D.C., Schmidhuber, J., Giusti, A., Gambardella, L.M., Tek, F.B., Walter, T., Wang, C.W., Kondo, S., Matuszewski, B.J., Preciose, F., Snell, V., Kittler, J., de Campos, T.E., Khan, A.M., Rajpoot, N.M., Arkoumani, E., Lacle, M.M., Viergever, M.A., Pluim, J.P., 2015. Assessment of algorithms for mitosis detection in breast cancer histopathology images. Med. Image Anal. 20 (1), 237–248. https://doi. org/10.1016/j.media.2014.11.010. Veta, M., Heng, Y.J., Stathonikos, N., Bejnordi, B.E., Beca, F., Wollmann, T., Rohr, K., Shah, M.A., Wang, D., Rousson, M., Hedlund, M., Tellez, D., Ciompi, F., Zerhouni, E., Lanyi, D., Viana, M., Kovalev, V., Liauchuk, V., Phoulady, H.A., Qaiser, T., Graham, S., Rajpoot, N., Sjoblom, E., Molin, J., Paeng, K., Hwang, S., Park, S., Jia, Z., Chang, E.I., Xu, Y., Beck, A.H., van Diest, P.J., Pluim, J.P.W., 2019. Predicting breast tumor proliferation from whole-slide images: the TUPAC16 challenge. Med. Image Anal. 54, 111–121. https://doi.org/ 10.1016/j.media.2019.02.012. Via, M., Gignoux, C., Burchard, E.G., 2010. The Genome Projects: new opportunities for research and social challenges. Genome Med. 2 (3). Wang, C.W., Lee, Y.C., Calista, E., Zhou, F., Zhu, H., Suzuki, R., Komura, D., Ishikawa, S., Cheng, S.P., 2018. A benchmark for comparing precision medicine methods in thyroid cancer diagnosis using tissue microarrays. Bioinformatics 34 (10), 1767–1773.

Further reading Chen, H., Qi, X., Yu, L., Dou, Q., Qin, J., Heng, P.A., 2017. DCAN: deep contour aware networks for object instance segmentation from histology images. Med. Image Anal. 36, 135146. https://doi.org/10.1016/j.media.2016.11.004.

C H A P T E R

4

Surgeon at work 4.1 Learning everything about the tumor: Tumor profiling After following the pathologist at work in Chapter 3, we now turn our attention to the surgeon. Surgeons that operate on cancer patients focus on diagnosing, staging, and treating the disease. When there is nothing else that can be done for the patient, the surgical oncologist may perform palliative surgeries to control the pain, increase the patient’s comfort level, and also to manage cancer side effects. Before performing the actual surgery, the surgeon must learn everything about the tumor: type, size, location, grade and stage, not to mention everything related to the patient that might influence the outcome of the surgery: age, medical conditions, physical status, etc. The oncologist must decide whether the surgery is performed after other treatments have been administered, neoadjuvant therapy like chemotherapy, radiation therapy and/or hormone therapy, or before the adjuvant therapies are administered. After the surgeon sees all the information related to the cancer type, the size and location of the tumor, she/he decides if surgery can be performed with minimally invasive techniques, like laparoscopy or robotic surgery, or if the surgery should be open. Nevertheless, even if the option is laparoscopy, you never know what will happen in the operating room (OR), so the surgical team must always be ready to revert to open. The primary purpose of cancer surgery is to remove all of the cancer along with some surrounding healthy tissue, in order to be sure that all the cancer is removed. Besides this, the surgeon might also remove some lymph nodes that she/he will send to the pathologist to be evaluated. Everyday new surgical techniques are developed, but here we are going to present the most used types: • cryosurgery: in this type of surgery, the tumor is frozen and destroyed with a very cold material, such as liquid nitrogen spray (e.g. cervical cancer); • electrosurgery: in this type of surgery, the tumor is killed by the use of a high frequency electrical current (e.g. mouth or skin cancer); • laser surgery: in this type of surgery, the tumor is shrunk or vaporized by the use of beams of high intensity light;

Artificial Intelligence in Cancer: Diagnostic to Tailored Treatment https://doi.org/10.1016/B978-0-12-820201-2.00004-0

187

# 2020 Elsevier Inc. All rights reserved.

188

4. Surgeon at work

• moth surgery: in this type of surgery, the tumor is removed layer by layer with a scalpel. It can be performed in cases where the tumor is localized near the eye, or when the surgeon wants to assess how deep did the cancer go. After each layer has been removed, the doctor must evaluate it under a microscope until she/he reaches a cancer free tissue. • laparoscopic surgery: this type of surgery involves a laparoscope to see inside the patient’s body, without major incisions. A camera and the surgical tools are inserted in the body, and the surgeon watches on a monitor what the camera projects while performing the actual surgery. • robotic surgery: in this type of surgery, the surgeon uses hand controls to maneuver a robot that performs the actual surgery, while watching a screen that projects a three dimensional image of the area. Hard to reach areas can be operated with the use of the robot. • natural orifice surgery: this type of surgery is still in the beginning, being still experimental. Its aim is to operate on organs located in the abdomen without cutting the skin, rather using a natural body opening (e.g. mouth, vagina, or rectum). When preparing for a surgery where a tumor must be removed from a patient, the surgeon makes a plan of “attack.” Some tumors are “smart” and try to trick the medical imagistic, due to the fact that they might be hidden in the rib cage where they are harder to be detected. For instance it is difficult to detect early lung cancer on a chest X-ray, or an esophageal tumor that has developed outside the esophagus cannot be seen even on a CT scan. When entering the O.R. the surgeons are prepared for the worst. The real “battle field” looks different than the virtual one. Still, AI is here to help also. Surgeons now can create three dimensional reconstructions of the tumors from scans, and practice the surgery before actually performing it. We shall present the case of a 3-year-old girl who was admitted to the hospital due to a massive tumor in her abdomen (Zhang et al., 2016). After the physical examination the doctors found a large solid mass in the right abdomen. The girl had no family medical history, no significant medical history herself, and presented with normal blood pressure. All the blood tests look normal (blood cell count, renal function, electrolytes, etc.). The results for serum alpha fetoprotein and urine vanillylmandelic acid were both negative. The girl had an ultrasonic examination performed using a SIEMENS Sequoia512, which pointed out that she had three tumors: one in the right kidney (13.0 cm  11.2 cm  9.6 cm), one on the left kidney (3.0 cm  2.5 cm  2.5 cm), and a long embolus tumor that started from the right renal vein all the way through inferior vena cava into the right atrium (8.3 cm  2.0 cm). The patient was diagnosed with Wilms’ tumor. This is a rare cancer of the kidneys that effects children. The tumors are mostly encountered in children between the ages of 3 and 4. The incidence drops after 5 years. The child underwent chemotherapy for 6 weeks. After the chemotherapy session the ultrasound was repeated and the tumors were assessed. The tumor in the left kidney reduced in size, but the other two remained the same. In order to prepare for surgery a CT scan with a GE light Speed VCT 64 Slice was performed. The CT scans were used for a three dimensional reconstruction of the tumor, which gave detailed and accurate information that was used in preparing a detailed plan for the surgery (blood supply vessels, adhesions, consistency, relationship with surrounding organs and tissues, etc.). Using the three dimensional virtual tumors the surgeons were able to remove the actual tumors and also were able to predict the postoperative management plan. After the surgery, the girl underwent eight courses

4.1 Learning everything about the tumor: Tumor profiling

189

of chemotherapy, and the assessment made after eight months post surgery revealed that she was cancer free and with renal function back in normal range. The 3D volume visualization can be achieved by performing a reconstruction process that uses a series of 2D images. For this task to be done, a data collection of 2D scans is required. After all the data has been gathered the 3D volume is reconstructed using interpolation and approximation algorithms. The most used methods are: pixel-nearest neighbor, voxel-nearest neighbor, radial basis function, and image-based algorithm. Rendering represents the next step of the process. There are several rendering techniques from which we mention: surface rendering, multiplanar reformatting, and volume rendering. The data is collected using different types of scanning systems. After this step is done, image-processing techniques enhance the 2D images quality, remove noise, and also preserve the edge boundary. From all the image processing techniques we mention: noise removing technique, histogram equalization, 2D Gaussian filtering, median filtering, etc. After the images have been preprocessed, segmentation is needed in order to distinguish between the scanned objects: skin, bones, etc. Before the actual reconstruction of the volume, the volume size, axes of volume, origin of axes and the voxel size need to be set (San Jose Estepar et al., 2003). Precision medicine has entered the surgery area also. In what follows, we shall present how surgery has changed in uro-oncology, due to the new technologies. We shall present a case of prostate, cancer. In 2016, the surgeons at Guy’s and St. Thomas have used for the first time in history a 3D printing of a tumor in order to enhance precision and accuracy during its dissection. The patient’s cancerous prostate was printed enabling the surgical team to actual feel and see it, so they could plan the robotic surgery step-by-step. The surgery was presented as a showcase in the World Wide Robotic Surgery 24 Hour Event. Professor Prokar Dasgupta comes up with idea of merging 3D printing technologies with robotic surgery. Recall that in robotic surgery, the surgeons use monitors to see the actual organ and the tumor, thus making the procedure harder for them. Having a 3D model before the surgery, allows the surgeons to feel the tumor, see its position in the body, to measure its distance from the vital nerves or muscles, etc. The 3D tumor was reconstructed using MRI scans from the patient’s prostate, and afterwards it was 3D printed. Since then, the surgeons from Guy’s and St. Thomas have been using 3D printing to help them with their surgeries. One of their successes was using a 3D model to transplant an adult kidney into a child. The robotic assisted laparoscopic radical prostatectomy allows the magnification of the surgical field, thus making possible a better visualization and appreciation of the anatomical details. Thus this type of surgery can be seen as a tailored treatment (Walz et al., 2016). Based on the anatomic findings of each patient’s prostate and surrounding tissue, the urologist can use the 3D printed model to individualize the dissection during the surgery taking into account the patient’s and tumor’s characteristics, thus improving the oncological and functional result. The robotic assisted laparoscopic radical prostatectomy standard measurements are cancer control, continence and potency (Ou et al., 2013). The main objectives during this type of surgery are the preservation of the urinary continence and sexuality functions. The continence status is determined by parameters such as age, prostate size, body mass index, and membranous urethral length, and by the impact of the surgical dissection that may damage the neurovascular bundles, and cause postoperative fibrosis. The same factors determine the sexual function. Seeing how each patient is unique, the surgeon cannot reproduce the exact

190

4. Surgeon at work

same dissection in all the patients, so she/he must find a balance between the oncological safe margin and nervous system. It is clear that the robotic assisted surgery is ideal for precision medicine, but many still wonder why the classical open surgery isn’t better (Haglind et al., 2015; Hu et al., 2014, 2017; Leow et al., 2016; Seo et al., 2016; Thompson et al., 2014; Yaxley et al., 2016). Porpiglia et al. evaluated the reliability of a software that reproduced in vivo the anatomical structures of the kidney during a robot assisted partial nephrectomy (Porpiglia et al., 2017). The study was twofold, thus the authors also compared the management of renal pedicle after performance of the robot assisted partial nephrectomy using the hyper accuracy 3D reconstruction software, with the same surgery using 2D CT scans as preoperative planning. The patients that were enrolled in the study underwent abdominal CT scans with angiography between January 1, 2016 and July 31, 2017. Since February 1, 2017, all the patients underwent high resolution abdominal CT scans with hyper accuracy 3D reconstruction. A single surgeon with high expertise performed the surgeries. For the patients that had hyper accuracy 3D reconstruction, the dissection of the pedicle was done by the guidance of the 3D reconstruction that was manually oriented by an assistant on a tablet next to the console. The orientation was done taken into account the vivo anatomy. The main artery dissection was done exactly as anticipated by the 3D reconstruction. In the majority of cases where this procedure was applied, the global renal ischemia had been avoided. The choice of the renal artery branches that were clamped proved to be a success, due to the fact that the surgeons reported almost bloodless renal tumor resections. The researchers also reported the limitations of the study: the sample size was limited; the segmentation technique is not adaptable to MRI scans that do not have contrast substance; the reconstructed images that were embedded in the console, had to be moved in order to have the vivo image and the overlapping of the virtual image; and there were some cases that had a great interval of ischemia. A study case by Glybochko et al. is presented next (Glybochko et al., 2019). The authors used a contrast-enhanced multi-slice computed tomography that is used in the diagnosis of malignant tumors with complex structures. The technology gives information regarding the tumor size and shape in relation with the affected tissues and organs. The study case refers to a patient that was operated after the 3D reconstruction of a contrast-enhanced multi-slice computed tomography, which was used in the preoperative planning. The patient suffered from late stage sarcomatoid renal cell cancer, which is a rare type of kidney cancer. The CT showed no metastases to the veins, lung or other organs, and that the liver margin was negative. Even if the survival rates are bad, the patient was still alive and well 17 months after the computer assisted surgery. Another paper shows how the 3D reconstruction of complex liver tumors was clinically applied in treating infants and young children. In Su et al. (2016) the 3D imagining was used instead of the 2D CT scans for the diagnosis and preoperative planning of surgeries on children. A retrospective analysis of 26 children and infants that had giant liver tumors that involved the hepatic hilum was performed. All the patients underwent precise hepatectomy at the Affiliated Hospital of Qingdao University between February 2012 and January 2015. Before the surgery, the children underwent upper abdominal contrast enhanced CT scanning. Hisense CAS system for 3D reconstruction was used in 16 patients. The other 10 patients had their 3D CT reconstruction made using the CT Workstation. They were the control group. The outcomes of the two groups were analyzed and statistically compared. Fortunately all the

4.1 Learning everything about the tumor: Tumor profiling

191

surgeries went well. The 3D models showed the association between liver tumors and intrahepatic vascular system. Using these models the surgeons decided what procedure to use during the surgery. The mean operation time proved to be shorter in the reconstruction group rather than in the control group, 137.81  17.51 min, versus 192  34.66 min, having p < 0.01. In what regards the intraoperative blood loss, the reconstruction group reported less blood loss than the control group, 21.81  14.05, 53.50  21.35 mL, respectively, having p < 0.01. Chan et al. presented a research regarding a preoperative 3D CT of a lung reconstruction before segmentectomy or lobectomy for stage I small cell lung cancer (Chan et al., 2015). All the information provided by the preoperative scans enable the surgeons to generate a “mental model” of the lung that helps them predict how the resectability of the lung nodule will go, to predict whether different difficulties might appear when performing the chosen procedure (Shields, 1994). When performing segmentectomy the cancer localization must be very accurate, and negative resection margins are crucial for the success of the procedure. The study evaluates a software package that allows automated segmentation of the pulmonary parenchyma, by a 3D assessment of the size and location of the tumor, as well as an estimate of the surgical margins. The data set that the software has been tested on contains 36 cases of patients who received segmentectomy and 15 patients that received lobectomy. All the patients were diagnosed as having stage 1 non-small cell lung cancer. The tumor sizes were less than 2 cm and the forced expiratory volume in 1 s was greater than 60%. The data came from the Lung Cancer Database of the Department of Cardiothoracic Surgery from the University of Pittsburgh Medical Center. 53% of the patients were male, and 47% women. 22 Tumors were adenocarcinoma, 13 were squamous cell carcinoma, and 1 unidentified type. The preoperative scans were obtained within 6 weeks of the surgical procedure. Using the bronchial arborization, the software reconstructs automatically the anatomic pulmonary segments of the lung. Only in 72.7% of the preoperative CT scans with thicknesses of 3 mm or less the auto segmentation had been achieved. Multiple factors influenced this outcome: whole body scan instead of only a chest scan, lower CT resolution, pneumonitis, fibrosis, or local severe emphysema. The localization of the tumor was achieved in all scans. In terms of statistical assessment there has been achieved a positive predictive value of 87% for the prediction of margin to tumor diameter in relation with the surgical pathology assessment. In the case of 11 patients that had lobectomy, the software simulated a segmentectomy to see whether the surgical margins could have permitted this kind of procedure rather than the lobectomy. For 10 out of 11 cases, the simulated margin of the tumor diameter ratios suggested that applying segmentectomy might not be the correct choice, since it might not be able to achieve adequate margins. For each case the software took approximately between 15 and 20 min to complete the segmentation, analyze the lung volume, and evaluate the marginal status. The authors report no statistically significant difference between the actual margin of the tumor’s diameter ratio and the predicted one, for both the segmentectomy cohort, p ¼ 0.187, or lobectomy cohort, p ¼ 0.588. Besides this software package, on the market exist several other programs that perform similar tasks, except the auto segmentation of the lungs: VelocityAI, Varian Medical Systems, Palo Alto, Calif; Appollo, Vida Diagnostics, Inc., Coralville, Iowa. Another computer aided preoperative planning system is the Liver 2.0, a software that has been used for liver surgery based on CT images (Song et al., 2011). Liver 2.0 is twofold: provides image analysis, and suggests treatment planning. The first module creates the 3D

192

4. Surgeon at work

visualization of the relevant anatomic and pathologic structure, whereas the second module gives the measurement and resection tools for a virtual liver surgery planning. The goal is to provide a safer liver resection, the possibility of anticipating risks, and also being a confidence booster. Using this software, the surgeons can elaborate their surgical strategy, and also are able to analyze the volume of healthy liver tissue that will remain once the procedure is over. Liver parenchyma can suffer from malignant transformation, and develop in several types of cancer. CT texture features of the liver parenchyma can be used to predict metastasis and overall survival in colorectal cancer patients (Lee et al., 2018). The manual segmentation of liver parenchyma can be done in approximately one hour per data set. As the number of slides increases, the time period increases. Here, the authors have used a fast marching method (Huang et al., 2009), a technique that takes into account the liver shape continuity and comparability, and depends on the curve-fitting algorithm. The method takes up to 5 min to perform a study of good quality and average size. The next step in the process regards vessel segmentation and visualization. There are four vessels systems that supply and drain the liver: hepatic vein, hepatic artery, portal vein, and biliary ducts. A tenuous tree structure is extracted from in vivo 3D medical images using a seed point. In these images the spatial continuity is poor when it comes to seeing elongated objects, and intensity based segmentation procedure truncate the branches that are smaller (Cheng et al., 2008). The processing time takes about 20 s. After the segmentation of the liver and of the vessels is done, the tumor is segmented also. The researchers segmented the tumor using a seed point also. One tumor can be segmented in 5 s. The next step is treatment planning. Having the 3D reconstruction, the surgeon assesses the tumor’s location and all the major branches of vessels, and using this information she/ he decides where the virtual line of dissection will be. Once this virtual resection is set, the program displays it in different ways. Using this tool, the surgeon analysis the risk of different dissection approaches before the actual surgery. For this, she/he must use the measurement tools provided by the software. The measurements include: analysis of area, volume, angle, and distance, which are unique for each patient. This software provides the merge between the operative planning and surgical intervention, which enhance the safety of the operation. Yamanaka et al. used the data outputted by a 3D CT analytic program for comparison with the values estimated by manually traced boundaries on CT scans using an electronic cursor (Yamanaka et al., 2007). The study was performed on 113 patients with impaired liver function that were undergoing hepatectomy for hepatocellular carcinoma. A detailed 3D vascular structure was reconstructed and the liver volume was computed using a hepatectomy simulation software. The reported results stated that the simulation showed a higher correlation between the virtual simulated and the actual liver resection volume, r ¼ 0.96, than the conventional method r ¼ 0.74. The situation presented the same in the case of discrepancy, the simulated liver having the discrepancy of 9.3 mL, whereas the conventional method had 174 mL. The study showed that there is a stronger and more significant correlation between the simulated liver resection volumes and the actual resected specimen, r ¼ 0.96, p < 0.0001, than the correlation between the traditional measurements and the resected specimen r ¼ 0.74, p < 0.05. Practically, the difference between the prediction and the ground truth was of 1.6  2.6 mm. Park et al. wrote a research paper concerning a retrospective cohort study regarding a 3D multidetector CT (MDC) for preoperative local staging of gastric cancer (Park et al., 2010). The data set contained 113 consecutive patients, from which 72 were men, and 41 women.

4.1 Learning everything about the tumor: Tumor profiling

193

All of the patients had confirmed gastric cancer. The patients underwent preoperative MDCT, 55 of them with effervescent granules taken orally, and the rest after having drunk 1000 mL of tap water for the creation of gastric distention. For the clinical interpretation CT scans were used for 3D reconstruction on a dedicated workstation Rapidia Infinitt. Two board certified radiologists have reviewed all CT images (10 and 12 years of experience). The review was independent and performed in two sessions. The radiologists had to locate the lesions according to the cardia, fundus, upper body, lower body, proximal antrum, distal antrum, anterior wall, posterior wall, lesser curvature, and greater curvature. The primary gastric tumor’s detection rate was higher on gas distention CT as a result of the use of 3D reconstruction of the images. The 3D reconstruction provided information regarding subtle changes in the gastric wall. In the group where the doctors studied only the 2D MDCT images the gastric cancer detection rate was 26.7% for the first radiologist, and 30% for the second radiologist. This proves that the 2D MDCT fails to depict flat or depressed lesions, which can be seen on the 3D reconstructed volume. Thus using the 3D module the doctors can plan an optimal surgery treatment plan for patients with gastric cancer. 3D reconstruction has been performed in Heuts et al. paper (Heuts et al., 2016) for thoracic surgery. In the Maastricht University Medical Center, Netherlands, the 3D model has been reconstructed using the software Vesalius 3D, PS Medtech, Amsterdam, Netherlands. The software was used to segment different tissues from each other. By applying this method, the exact lesion in the thorax could be assessed. Besides the location of the lesion, the affiliated vessels and bronchial branches were also determined. By segmenting the blood vessels, the pulmonary artery, vein and bronchus, potential vascular or bronchial anomalies were detected. A similar approach was presented in Ikeda et al., where the authors reported a detection rate of 95%–97% of the pulmonary arteries for preoperative planning for lobectomy or segmentectomy (Ikeda et al., 2013). Small pulmonary nodules are difficult to be detected during throracoscopic procedures. By localizing and marking lesions preoperative, the surgeon can create a plan in order to resect within the correct surgical margin. Besides the 3D reconstruction of the volume, the surgery can be performed in virtual reality. Saji et al. presented a surgery that was performed in a virtual simulator (Saji et al., 2013). After the 3D model was constructed from multi detector CT angiography images, it had been loaded into a simulation software. The tumor was reconstructed virtually along with the pulmonary vessels and bronchi. The computer calculated the bronchial ventilation area. After this, the resection lines for vessels, bronchi, and veins were chosen. When this step of the process was done, the surgical margin was calculated. None of the patients that had undergone the surgery after the doctors trained in virtual reality mode had any recurrence of the tumor after the follow-up of 12–14 months. Another virtual reality study is presented in Kanzaki et al. (2015). Ten patients underwent CT scanning of the tumor and hilum. The tumor was 3D reconstructed and loaded in a virtual reality software. The surgeons marked the arteries, veins, tumor, and bronchi. The surgeons wore 3D glasses, which allowed them to view a 3D screen during surgery. A preoperative planning for a segmentectomy or lobectomy was done in two cases using 3D printing (Akiba et al., 2015). The 3D reconstructed volume was printed. Bronchi and vessels had different colors, so that the doctors could tell them apart. The pulmonary branches that had deviations were detected more easily. Using surgical simulation reduces intraoperative errors and improves the surgeons’ skills. Besides these advantages, the operating time decreases, the end results are better, and fewer complications are reported (Ahlberg et al., 2007; Seymour et al., 2002; van Sickle et al., 2008).

194

4. Surgeon at work

With the information received from the pathology department, plus the 3D reconstruction of the organ that contains the tumor and its surrounding blood vessels and tissues, surgeons learn everything about the tumor. They know how the procedure should go, they can practice it before actually performing it in virtual reality, but can they use AI to make a clean cut? Find out the answer in the next chapter.

4.2 Making a clean cut with the help of Artificial Intelligence You are on the operating table. Dark thoughts are running through your head back and forth. You know you have a tumor, any you know that undergoing this surgery might save your life. You are scared, and you wish that your surgeons would make their cut as precise, clean, and accurate as humanly possible. All of the tumor will be gone, leaving no fragment that may cause the cancer to come back, and yet let your other functions to remain intact. No blood loss, not nerve damage, no too much healthy tissue removed for nothing. But what if your tumor is cut with accuracy beyond what it is humanly possible? What if the clean cut is made with the help of a robot with AI? Artificial Intelligence continues to grow even in the surgical field. In 2016, the Smart Tissue Autonomous Robot (STAR) performed a complete surgery in vivo. The robot performed an anastomosis on porcine intestine (Leonard et al., 2014; Shademan et al., 2016). The surgery had been performed in a controlled setting, and STAR outperformed human surgeons. These experiments had opened a new path to autonomous soft tissue surgeries. STAR is controlled by an AI system that received its input from a vector of visual and haptic sensors. Using the best human surgical practices, the AI system generates a roadmap with all the steps needed to complete surgical tasks with high complexity, such as suturing and intestinal anastomosis. The method was measured in terms of: consistency of suturing computed from the average suturing spacing, the number of mistakes that made the needle to be taken out of the tissue, lumen reduction, and the pressure point at which appeared leaking from the anastomosis. These measurements were compared between STAR and manual laparoscopic surgery and robot assisted surgeries. STAR proved to be more accurate, four times more consistent, four times faster than surgeons that performed with the Endo360 suturing device, five times faster than robot assisted surgeons, and nine times faster than surgeons who performed a manual laparoscopy. Other robots that perform with a variable autonomy are: the DaVinci from Intuitive Surgical, Sunnyvale, CA, no autonomy, dependent completely on human; TSolution One from THINK Surgical Fremont, CA, an orthopedic robot that has a reduced human intervention, Mazor X from Caesarea Israel, a partially autonomous spinal robot having also a reduced human intervention; and CyberKnife (Accuray, Sunnyvale, CA) that is already in clinical use and is a radiation therapy robot that uses external radiation beams. You might have concerns to why we should use Artificial Intelligence to make a clean cut. Well, the most simplistic answer is that we are humans, and they are robots. Us, humans, suffer from a multitude of symptoms such as mental and physical fatigue, lack of concentration due to certain events that might or might not have happened in our lives, etc. Surgeons are human too, no matter what super powers they have in the OR. All the above factors can

4.2 Making a clean cut with the help of Artificial Intelligence

195

influence outcomes, complication rates, survivals, etc. On the other hand, robots are, well, robots. They do not suffer from fatigue, do not have tremor, but they do have a greater range of axial movement and a scalable motion (Lanfranco et al., 2004), all of which proved to determine lower morbidity rates and enhances margins (Ramsay et al., 2012). The combination between AI and surgical robots may reduce technical errors and operative time, which prolonged can lead to other health problems (e.g. surgical site infection, venous thromboembolism, bleeding, hematoma, necrosis, etc. (Cheng et al., 2018; Shah and Hamilton, 2013; Ng et al., 2010; Phillips et al., 2012; Saxena et al., 2010; Tranchart et al., 2015)). Besides the clean cut that an autonomous surgical robot could do, such an AI system might rapidly share and disseminate surgical skills over the Internet, providing a standardization of surgical events around the globe. Panesar and Ashkan paper brings a new perspective of autonomous surgery in areas where healthcare infrastructure is missing, that is surgeries performed aboard a spacecraft in deep space, or surgeries performed after environmental disasters or in war zones (Panesar and Ashkan, 2018). Autonomous surgery is just at the beginning, and the road until the goal is achieved is hard and long. AI robots must have the ability to see, think, and act accordingly to the situation faced without human intervention. Huang defined three parameters for the surgical robot: mission complexity, environmental difficulty, and human independence (Huang, 2006). The AI robot must possess visual and physical sensors so it can understand the environment, an AI unit that takes as input all the information sent by the sensors and computes the outputs, and a mechanical part that performs the output of the AI unit. The AI robot has to be taught two different things: first, the standard procedure that needs to be performed, which ultimately is not such a difficult task, and secondly it has to be taught how to dynamically respond in the case of an unexpected event. The robot has to map all the information received from the sensors and then to process it. Robots that have been training using AI can act on novel cases using prior knowledge. If they have been taught that a certain event might happen during a surgery, they can predict it and act accordingly (Moustris et al., 2011; Kassahun et al., 2016). Another hard task is to teach the robot to actually perform the surgery. This can be achieved either by explicit learning, the robot to be directly programmed to do perform certain steps, or implicit learning, that is the robot is indirectly trained by observing a surgeon live or video performing the surgery. The training and testing of the robot should be done in virtual reality. The training data set must contain all the information provided by the sensors (visual and tactical information from the surgical field), as well as a priori knowledge learned from previous surgeries, so that the robot can anticipate based on what it has seen before, what might happen during the actual operation. Probably the best way to teach a robot is to blend the two types of learning, explicit and implicit, and also add reinforcement learning to them. (Panesar et al., 2019). As robots are more and more used in surgeries, it is clear that we are going to need AI for better data management and its analysis. Procedures that today are considered to be too dangerous, or even impossible, will become reality with more flexible surgical robots. For this to happen, data scientists have access to surgery recordings, and obviously to pair up surgeons and engineers. Looking forward for the future, we now take a further step and reach Chapter 5.

196

4. Surgeon at work

References Ahlberg, G., Enochsson, L., Gallagher, A.G., Hedman, L., Hogman, C., McCluscky, D.A., Ramel, S., Smith, C.D., Arvidsson, D., 2007. Proficiency based virtual reality training significantly reduces the error rate for residents during their first 10 laparoscopic cholecystectomies. Am. J. Surg. 193 (6), 797–804. Akiba, T., Nakada, T., Inagaki, T., 2015. Simulation of the fissureless technique for thoracoscopic segmentectomy using rapid prototyping. Ann. Thorac. Cardiovasc. Surg. 21 (1), 84–86. Chan, E.G., Landreneau, J.R., Schuchert, M.J., Odell, D.D., Gu, S., Pu, J., Luketich, J.D., Landreneau, R.J., 2015. Preoperative (3 dimensional) computed tomography lung reconstruction before anatomic segmentectomy or lobectomy for stage I non-small cell lung cancer. J. Thorac. Cardiovasc. Surg. 150 (3), 523–528. Cheng, H., Huang, X., Huang, S., Wang, B., 2008. Extraction of tenuous vasculature in medical images. In: 2nd International Conference in Bioinformatics and Biomedical Engineering, ICBBE, pp. 2430–2433. Cheng, H., Clymer, J.W., Po Han Chen, B., Sadeghirad, B., Ferko, N.C., Cameron, C.G., Hinoul, P., 2018. Prolonged operative duration is associated with complications: a systematic review and meta analysis. J. Surg. Res. 229, 134–144. Glybochko, P.V., Alyaev, Y.G., Khokhlachev, S.B., Fiev, D.N., Shpot, E.V., Petrovsky, N.V., Zhang, D., Proskura, A.V., Yurova, M., Matz, E.L., Wang, X., Atala, A., Zhang, Y., Butnaru, D.V., 2019. 3D reconstruction of CT scans aid in preoperative planning for sarcomatoid renal cancer: a case report and mini-review. J. X-ray Sci. Technol. 27 (2), 389–395. https://doi.org/10.3233/XST-180387. Haglind, E., Carlsson, S., Stranne, J., Wallerstedt, A., Wilderang, U., Thorsteinsdottir, T., Lagerkvist, M., Damber, J.E., Bjartell, A., Hugosson, J., Wiklund, P., Steineck, G., 2015. LAPPRO steering committee. Eur. Urol. 68 (2), 216–225. https://doi.org/10.1016/j.eururo.2015.02.029. Heuts, S., Nia, P.S., Maessen, J.G., 2016. Preoperative planning of thoracic surgery with use of three dimensional reconstruction, rapid prototyping, simulation and virtual navigation. J. Visc. Surg. 2, 4. jovs.amegroups.com/ article/view/10407. Hu, J.C., Gandaglia, G., Karakiewicz, P.I., Nguyen, P.L., Trinh, Q.D., Shih, Y.C., Abdollah, F., Chamie, K., Wright, J.L., Ganz, P.A., Sun, M., 2014. Comparative effectiveness of robot assisted versus open radical prostatectomy cancer control. Eur. Urol. 66 (4). https://doi.org/10.1016/j.eururo.2014.02.015. Hu, J.C., O’Malley, P., Chughtai, B., Isaacs, A., Mao, J., Wright, J.D., Hershman, D., Sedrakyan, A., 2017. Comparative effectiveness of cancer control and survival after robot assisted versus open radical prostatectomy. J. Urol. 197 (1), 115–121. https://doi.org/10.1016/j.juro.2016.09.115. Huang, H.M., 2006. The autonomy levels for unmanned systems (ALFUS) framework: interim results. In: Performances Metrics for Intelligent Systems (PerMIS) Workshop. https://nvlpubs.nist.gov/nistpubs/ Legacy/SP/nistspecialpublication1011-II-1.0.pdf. Huang, S., Wang, B., Hou, X., Min, X., 2009. A novel post process approach for fast marching method in liver CT slides automatic segmentation. In: WRI World Congress on Computer Science and Information Engineering, CSIE, vol. 1, pp. 573–577. Ikeda, N., Yoshimura, A., Hagiwara, M., Akata, S., Saji, H., 2013. Three dimensional computed tomography lung modeling is useful in simulation and navigation of lung cancer surgery. Ann. Thorac. Cardiovasc. Surg. 19 (1), 1–5. Kanzaki, M., Isaka, T., Kikkawa, T., Sakamoto, K., Yoshiya, T., Mitsuboshi, S., Oyama, K., Murasugi, M., Onuki, T., 2015. Binocular stereo navigation for three dimensional thoracoscopic lung resection. BMC Surg. 15, 56. https:// doi.org/10.1186/s12893-015-0044-y. Kassahun, Y., Yu, B., Tibebu, A.T., Stoyanov, D., Giannarou, S., Metzen, J.H., Vander Poorten, E., 2016. Surgical robotics beyond enhanced dexterity instrumentation: a survey of machine learning techniques and their role in intelligent and autonomous surgical actions. Int. J. Comput. Assist. Radiol. Surg. 11 (4), 553–568. https:// doi.org/10.1007/s11548-015-1305-z. Lanfranco, A.R., Castellanos, A.E., Desai, J.P., Meyers, W.C., 2004. Robotic surgery: a current perspective. Ann. Surg. 239 (1), 14–21. Lee, S.J., Zea, R., Kim, D.H., Lubner, M.G., Deming, D.A., Pickhardt, P.J., 2018. CT texture features of liver parenchyma for predicting development of metastatic disease and overall survival in patients with colorectal cancer. Eur. Radiol. 28 (4), 1520–1528. https://doi.org/10.1007/s00330-017-5111-6. Leonard, S., Wu, K.L., Kim, Y., Krieger, A., Kim, P.C., 2014. Smart tissue anastomosis robot (STAR): a vision guided robotics system for laparoscopic suturing. IEEE Trans. Biomed. Eng. 61 (4), 1305–13015. https://doi.org/10.1109/ TBME.2014.2302385.

References

197

Leow, J.J., Chang, S.L., Meyer, C.P., Wang, Y., Hanske, J., Sammon, J.D., Cole, A.P., Preston, M.A., Dasgupta, P., Menon, M., Chung, B.I., Trinh, Q.D., 2016. Robot assisted versus open radical prostatectomy: a contemporary analysis of an all payer discharge database. Eur. Urol. 70 (5), 837–845. Moustris, G.P., Hirids, S.C., Deliparaschos, K.M., Konstantinidis, K.M., 2011. Evolution of autonomous and semiautonomous robotic surgical systems: a review of the literature. Int. J. Med. Robot. 7 (4), 375–392. https://doi. org/10.1002/rcs.408. Ng, C.K., Kauffman, E.C., Lee, M.M., Otto, B.J., Portnoff, A., Ehrlich, J.R., Schwartz, M.J., Wang, G.J., Scherr, D.S., 2010. A comparison of postoperative complications in open versus robotic cystectomy. Eur. Urol. 57 (2), 274–281. https://doi.org/10.1016/j.eururo.2009.06.001. Ou, Y.C., Yang, C.K., Wang, J., Hung, S.W., Cheng, C.L., Tewari, A.K., Patel, V.R., 2013. The trifecta outcome in 300 consecutive cases of robotic assisted laparoscopic radical prostatectomy according to D’Amico risk criteria. Eur. J. Surg. Oncol. 39 (1), 107–113. https://doi.org/10.1016/j.ejso.2012.10.003. Panesar, S.S., Ashkan, K., 2018. Surgery in space. Br. J. Surg. 105, 1234–1243. Panesar, S., Cagle, Y., Chander, D., Morey, J., Fernandez Miranda, J., Kliot, M., 2019. Artificial intelligence and future of surgical robotics. Ann. Surg. 270 (2), 223–226. Park, H.S., Lee, J.M., Kim, S.H., Yang, H.K., Han, J.K., Choi, B.I., 2010. Three dimensional MDCT for preoperative local staging of gastric cancer using gas and water distention methods: a retrospective cohort study. AJR Am. J. Roentgenol. 195 (6), 1316–1323. Phillips, B.T., Wang, E.D., Rodman, A.J., Watterson, P.A., Smith, K.L., Finical, S.J., Eaves 3rd, F.F., Beasley, M.E., Khan, S.U., 2012. Anesthesia duration as a marker for surgical complications in office based plastic surgery. Ann. Plast. Surg. 69 (4), 408–411. https://doi.org/10.1097/SAP.0b013e31825f4e5a. Porpiglia, F., Fiori, C., Checcucci, E., Amparore, D., Bertolo, R., 2017. Hyperaccuracy three dimensional reconstruction is able to maximize the efficacy of selective sampling during robot-assisted partial nephrectomy for complex renal masses. Eur. Urol. 74, 651–660. Ramsay, C., Pickard, R., Robertson, C., Close, A., Vale, L., Armstrong, N., Barocas, D.A., Eden, C.G., Fraser, C., Gurung, T., Jenkinson, D., Jia, X., Lam, T.B., Mowatt, G., Neal, D.E., Robinson, M.C., Royle, J., Rushton, S.P., Sharma, P., Shirley, M.D., Soomro, N., 2012. Systematic review and economic modelling of relative clinical benefit and cost-effectiveness of laparoscopic surgery and robotic surgery removal of the prostate in men with localised prostate cancer. Health Technol. Assess. 16 (41), 1–313. https://doi.org/10.3310/hta/16410. Saji, H., Inoue, T., Kato, Y., Shimada, Y., Hagiwara, M., Kudo, Y., Akata, S., Ikeda, N., 2013. Virtual segmentectomy based on high quality three dimensional lung modeling from computed tomography images. Interact. Cardiovasc. Thorac. Surg. 17 (2), 227–232. San Jose Estepar, R., Martin Fernandez, M., Caballero Martinez, P.P., Alberola Lopez, C., Ruiz Alzola, J., 2003. A theoretical framework to three dimensional ultrasound reconstruction from irregularly sampled data. Ultrasound Med. Biol. 29 (2), 225–269. Saxena, A., Yan, T.D., Chua, T.C., Morris, D.L., 2010. Critical assessment of risk factors for complications after cytoreductive surgery and perioperative intraperitoneal chemotherapy for pseudomyxoma peritonei. Ann. Surg. Oncol. 17, 1291–1301. Seo, H.J., Lee, N.R., Son, S.K., Kim, D.K., Rha, K.H., Lee, S.H., 2016. Comparison of robot assisted radical prostatectomy and open radical prostatectomy outcomes: a systematic review and meta analysis. Yonsei Med. J. 57, 1165–1177. Seymour, N.E., Gallagher, A.G., Roman, S.A., O’Brien, M.K., Bansal, V.K., Andersen, D.K., Satava, R.M., 2002. Virtual reality training improves operating room performance: results of a randomized double blinded study. Ann. Surg. 236 (4), 458–463. Shademan, A., Decker, R.S., Opfermann, J.D., Leonard, S., Krieger, A., Kim, P.C., 2016. Supervised autonomous robotic soft tissue surgery. Sci. Transl. Med. 8 (337), 337ra64. https://doi.org/10.1126/scitranslmed.aad9398. Shah, N., Hamilton, M., 2013. Clinical review: can we predict which patients are at risk of complications following surgery? Crit. Care 17, 226. Shields, T.W., 1994. Surgical anatomy of the lungs. In: Shields, T.W., LoCicero, J., Reed, C.E., Feins, R.H. (Eds.), General Thoracic Surgery, seventh ed. In: vol. 1. Lippincott, Williams & Williams, Philadelphia, PA, pp. 171–185. Song, X., Cheng, M., Wang, B., Huang, S., Huang, X., 2011. Computer aided preoperative planning for liver surgery based on CT images. Procedia Eng. 24, 133–137. https://doi.org/10.1016/j.proeng.2011.11.2615. Su, L., Dong, Q., Zhang, H., Zhoy, X., Chen, Y., Hao, X., Li, X., 2016. Clinical application of a three dimensional imaging technique in infants and young children with complex liver tumors. Pediatr. Surg. Int. 32(4). https://doi. org/10.1007/s00383-016-3864-7.

198

4. Surgeon at work

Thompson, J.E., Egger, S., Bohm, M., Haynes, A.M., Matthews, J., Rasiah, K., Stricker, P.D., 2014. Superior quality of loge and improved surgical margins are achievable with robotic radial prostatectomy after a long learning curve: a prospective single surgeon study of 1552 consecutive cases. Eur. Urol. 65 (3), 521–531. https://doi.org/10.1016/ j.eururo.2013.10.030. Tranchart, H., Gaillard, M., Chirica, M., Feretti, S., Perlemuter, G., Naveau, S., Dagher, I., 2015. Multivariate analysis of risk factors for postoperative complications after laparoscopic liver resection. Surg. Endosc. 29 (9), 2538–2544. https://doi.org/10.1007/s00464-014-3965-0. van Sickle, K.R., Ritter, E.M., Baghai, M., Goldenberg, A.E., Huang, I.P., Gallagher, A.G., Smith, C.D., 2008. Prospective, randomized double blind trial of curriculum based training for intracorporeal suturing and know tying. J. Am. Coll. Surg. 207 (4), 560–568. Walz, J., Epstein, J.I., Ganzer, R., Graefen, M., Guazzoni, G., Kaouk, J., Menon, M., Mottire, A., Myers, R.P., Patel, V., Tewari, A., Villers, A., Artibani, W., 2016. A critical analysis of the current knowledge of surgical anatomy of the prostate related to optimization of cancer control to prevent preservation of continence and erection in candidates for radical prostatectomy: an update. Eur. Urol. 70 (2), 301–311. https://doi.org/10.1016/j.eururo.2016.01.026. Yamanaka, J., Saito, S., Fujimoto, J., 2007. Impact of preoperative planning using virtual segmental volumetry on liver resection of hepatocellular carcinoma. World J. Surg. 31, 1249–1255. Yaxley, J.W., Coughlin, G.D., Chambers, S.K., Occhipinti, S., Samaratunga, H., Zajdlewicz, L., Dunglison, N., Carter, R., Williams, S., Payton, D.J., Perry Keene, J., Lavin, M.F., Gardiner, R.A., 2016. Robot assisted laparoscopic prostatectomy versus open radical retropublic prostatectomy: early outcomes from a randomized controlled phase 3 study. Lancet 388 (10049), 107–1066. https://doi.org/10.1016/S0140-6736(16)30592-X. Zhang, D., Zeng, G., Zhang, Y., Liu, X., Wu, S., Hua, Y., Liu, F., Lu, P., Feng, C., Qin, B., Cai, J., Zhang, Y., He, D., Lin, T., Wei, G., 2016. 3D reconstruction computed tomography scan in diagnosis of bilateral Wilms’ tumor with its embolus in right atrium. J. X-ray Sci. Technol. 24 (5), 657–660.

C H A P T E R

5

Oncologist at work 5.1 Establishing a treatment plan. Oncological guides Oncology studies cancer. Once a person has been diagnosed with cancer she/he is referred to an oncologist who tries to treat the cancer and provide all the medical care that is needed. Oncology is divided in three main parts: surgical, medical, and radiation. In the former chapter we have discussed the oncological surgery area, and how it can be improved through the use of AI. In this chapter we shall discuss the role of the medical oncologist. A medical oncologist tries to treat cancer through chemotherapy, hormone therapy, targeted therapy, or immunotherapy. After she/he receives information from the pathologist, she/he must explain the cancer diagnosis and staging; she/he talks about all the treatment options that are possible in that specific case, an also must indicate the preferred choice giving the situation. The oncologist needs to be compassionate and help the patient manage the side effects of the cancer treatment and cancer. Oncology is linked with hematology, the medicine branch that deals with blood disorders. Before cancer spreads, the treatment is often surgical. If the surgeon can remove the whole tumor, then the patient has the best chance of being cured. On the other hand, if the tumor has spread or it cannot be resected due to its location in the body, the cancer may respond to chemotherapy. Chemotherapy works throughout the whole body, killing cancer cells that have metastasized in parts of the body that are far away from the primary tumor. Chemotherapy may be used: • to cure cancer as sole treatment; no other treatments such as surgery or radiotherapy are employed; • as adjuvant therapy; chemotherapy is used in this case to kill the hidden cancer cells, that cannot be seen on CT or MRI scans. Even if the whole tumor is resected, cancer cells may have already spread in the body, but they are too small to be detected. • as neoadjuvant therapy; chemotherapy is used before any other cancer treatment in order to shrink the tumor. If a tumor is too big to be removed, then chemotherapy is needed before the surgery.

Artificial Intelligence in Cancer: Diagnostic to Tailored Treatment https://doi.org/10.1016/B978-0-12-820201-2.00005-2

199

# 2020 Elsevier Inc. All rights reserved.

200

5. Oncologist at work

• palliative chemotherapy; when cancer cannot be cured, but its signs and symptoms are hard on the patient, chemotherapy can be used to relieve them by killing some of the cancer cells. Chemotherapy doesn’t cure a person’s cancer; it has the intent of curing cancer. There are no guarantees. Years can go by before a person finds out if she/he is really cancer free. Even if cure is not always possible, chemotherapy can be used to control the cancer by shrinking the tumors and/or stopping the cancer cells from growing and spreading. In some cases, cancer transforms into a chronic disease. In those cases the cancer is carefully monitored and treated, but it never goes away, just like other chronic diseases. Through different tests or scans the doctors monitor if the disease is stable. We shall discuss this topic further in Chapter 8, which deals with remission and recurrence. The oncologist together with the patient decides what drug or combination of drugs will be use for treatment. The oncologist makes the drug scheme, that is what doses will the patient receive, how she/he will respond to those doses, and also how often and for how long will the treatment continue. When making a decision the doctor takes into account the cancer type, its location, how much it has affected the body, and the patient’s overall health. Using more than one drug in the chemotherapy session helps in lowering the chances that the cancer might become resistant to a certain chemo drug. In some cases the chemotherapy scheme is clear, and all the doctors recommend the same choice of doses plus schedules. In other cases, things aren’t so clear, so different doctors choose different drug combinations, different doses, and different schedules. The characteristics that are taken into consideration when choosing the chemotherapy scheme are: • • • •

the type and stage of the cancer; the patient’s age and overall health; other health problems such as heart, liver, or kidney diseases; other types of cancer treatment received in the past.

Besides this information, the doctors take into account new research data published in medical journals, textbooks, oncological guides, that describe the outcomes of other patients that are similar to their case that have been treated with chemotherapy. The doses of chemo must be accurately computed, since if a patient takes too little of a drug it might not be enough to fight the cancer, and if she/he takes too much it may be life threatening. Some of the drugs used in chemotherapy are measured in milligrams. In general, the dose is computed taking into account the patient’s weight, and the standard dose is 10 mg/kg (1 kg is 2.2 pounds). Other drugs are measured taking into account the body surface area (m2), computed using the height and the weight of the patients. Besides this protocol, when computing the right dosage, the oncologist must take into account the following factors: • whether the patient is a child or an elderly person. Their bodies process the drugs differently, due to their levels of sensitivity. • whether the patient is obese, or has a poor nutritional status. • whether the patient has or is currently taking other drugs. • whether the patient undergoes radiation therapy. • whether the patient has a low blood cell count, liver or a kidney disease.

5.2 Chemotherapy and Artificial Intelligence

201

Chemotherapy is given in cycles. A cycle can be a dose of a drug or a combination of drugs. This session is afterwards followed by days or weeks without treatment, so that the normal cells can recover from the chemo’s side effects. The session of chemo can be given several days in a row, or every other day, depending on the drug. Even if the number of cycles is decided at the beginning of the treatment, it can be modified taking into account the response of the patient’s body and the response of the cancer. Using clinical trials the oncologist finds the best doses and schedule, for specific types of cancers. When the body can’t take it anymore, supportive medicines are given to help the body recover. The Lynx Group—www.thelynxgroup.com (Accessed November 22, 2019)—has released the “Fourth Annual Oncology Guide to New FDA approvals.” The guide presents an overview of the new drugs approved by the US Food and Drug Administration (FDA) in 2018 for different types of cancer. The year 2018 has been a record year for the FDA due to the large number of new molecular entities and new biologic license applications approved. 59 novel drugs have been approved in 2018, and the numbers look great compared to 46 in 2017, and 22 in 2016. New therapies for cancer have been developed in 2018, including treatments for patients that had no approved treatment before, for example for two rare types of non Hodgkin lymphoma. The National Comprehensive Cancer Network (NCCN)—www.nccn.org (Accessed November 22, 2019)—is a non profit alliance of 28 cancer centers. NCCN offers clinicians access to knowledge as well as tools in order to help them make the best decisions when it comes to cancer management. The NCCN guidelines contain algorithms and flowcharts, data description and clinical information, and references for evidence of recommendations. 53 Panels develop and update the guidelines. Each panel is multidisciplinary, and contains both clinicians and researchers. In Europe, the European Society for Medical Oncology (ESMO)—www.esmo.org (Accessed November 22, 2019)—provides the guidelines. ESMO developed Pocket Guidelines, which are distributed at ESMO events. The full guides are published in Annals of Oncology. The pocket booklets can provide a quick reference guide. The guidelines vary from country to country, due to the approval and licensing indications of drugs. These are the bases of medical oncological treatment planning. What we are interested in is to see how AI can offer support in making these decisions. Thus, let us proceed to the next chapter, Chemotherapy and Artificial Intelligence.

5.2 Chemotherapy and Artificial Intelligence July 2019—a new gadget appears on the market, and we are not talking about any type of gadget, we are talking about a device that can determine whether the targeted chemotherapy drugs are working on a certain patient or not. All this was possible due to the researchers from Rutgers University, New Jersey, that have used AI and biosensors to obtain 95.9% accuracy in counting live cancer cells (Ahuja et al., 2019). The device uses multifrequency impedance spectroscopy merged with supervised machine learning in order to rapidly assess tumor cells’ sensitivity to drugs. This approach differs from others due to the fact that it does not use cell staining or labeling. The device works on targeted cancer therapies, where

202

5. Oncologist at work

antineoplastic agents are tied to antibodies to target surface makers on tumor cells, such as B-cell lymphoma, multiple myeloma, and epithelial carcinomas. If the cancer responds to the treatment, the tumor cells will generate a certain protein called matriptase (Antalis et al., 2011; Bertino et al., 2010; Lin et al., 2016). By analyzing the surface markers on the tumor cells we can assess whether the cancer cell is sensitive or not to the drugs. If a cancer cell is sensitive to the anti matriptase drug, then it will die, otherwise it will go on living. The novel device can assess cancer cell viability in connection to its response to the anti matriptase drug, without the need of using cell staining or labeling. The research team tested the device by using different concentrations of the targeted anticancer drug on the cancer cells. Based on a shift in its electrical properties, the device can spot whether a cell is alive or not. The scope of the device is to test cancer therapies on patient tumor samples before any treatment is initiated. To improve the accuracy rate of the device, the authors used support vector machines with Gaussian Kernel, which computes the square of the Euclidian distance between two feature vectors. The training set consisted from features extracted from 100% live and dead cells. The features of the live cells were labeled with 1, and the features from the dead cells were labeled with 0. The training set contained over 1000 events. An event is a peak of impedance data that corresponds to a cell passing over the electrodes. Having this many training items the support vector machine classifier did not overfit. Three different tumor cell test samples that have the viability percentage 50% live, 82% live, and 90% live, have been used to test the accuracy of the support vector machine classifier.

Support vector machines We have encountered this type of machine learning technique before, so we guess now it is high time to explain what support vector machines are. Support vector machines can be seen as the widest possible “road” that separates left side and right side cars, buildings, and pedestrians. The cars, buildings, pedestrians that are the closest to this “road” are the support vectors. Let us take a look at Fig. 5.1. The objects from our road analogy are the points in the plane. We have the circles that represent the objects on the left side of the road, and the squares that represent the objects on the right side of the road. Support vector machines find the decision border that separates different classes and also maximizes the margins. The margins measure the distances between the line and the dots that are the closest to it. In the example from Fig. 5.1 we can draw multiple lines that can separate the circles and the squares, but the optimal one will maximize the margins. Technically we deal with a constraint and an optimization task. The constraint that must be followed is finding the separate hyperplanes that classify the classes correctly. After finding the hyperplanes, we must choose the one that maximizes the margin.

FIG. 5.1 Support vector machines.

5.2 Chemotherapy and Artificial Intelligence

203

A hyperplane is an n  1 dimensional subspace of a n dimensional space. That is, if we have a 2D space, then the hyperplane will be 1D, which mathematically speaking it is just a line. If we have a 3D space, then the hyperplane will be a 2D space, which mathematically speaking is a plane. A hyperplane of dimension n can be written using the following formula: β0 + β1  x1 + β2  x2 + ⋯ + βn  xn ¼ 0: For 2D space, the hyperplane equation will be a linear equation given by: β0 + β1  x1 + β2  x2 ¼ 0: All the data that satisfies the following formula will represent the points that are above the line: β0 + β1  x1 + β2  x2 > 0: whereas the data that satisfies the next formula are the points below the line: β0 + β1  x1 + β2  x2 < 0: Let us assume that the circles are labeled 1, and the squares are labeled 1. Then to separate the two classes by the hyperplane, we have the following mathematical writing: β0 + β1  x1 + β2  x2 > 0, if y ¼ 1 β0 + β1  x1 + β2  x2 < 0, if y ¼ 1: We can have the two equations rewritten as: y  ðβ0 + β1  x1 + β2  x2 Þ > 0: Regarding the margin, let us imagine that we have a hyperplane defined H that separates 60 points. We compute the distances between all the 60 dots to the hyperplane H, and select as margin the smallest distance. All of the above can be used only if the classes are linear separable, but let us face the facts, that this is never encountered in real life problems. So, in order to solve the problem when we have to deal with non-linear separable classes, we must discuss two new concepts: Soft Margin and Kernel. The two concepts provide a solution to our problem. If we use Soft Margin, then we are searching for a line that separates the two classes, with a little tolerance if one or few cases are misclassified. If we use the Kernel, then we are searching for a non-linear decision margin. When it comes to the Soft Margin approach we can accept two types of misclassifications. The first accepted misclassification is when a case is on the wrong side of the decision line, but on the margin (see Fig. 5.2). The second accepted misclassification is when the case is on the wrong side of the decision line, but not on the margin (see Fig. 5.3). Technically, when using Soft Margin, the classifier tries to find the trade off between finding the hyperplane that minimizes the misclassification while maximizing the margin.

204

5. Oncologist at work

FIG. 5.2 Wrong side of the decision line, but on the margin.

FIG. 5.3 Wrong side of the decision line, but not on the margin.

The difficult part comes when we have to decide how much tolerance we want to use when finding the decision hyperplane. This tolerance is determined by a hyper parameter named penalty. On the other hand, what the Kernel does is to use the existing features, apply some transformations, thus creating new features. The new created features will help the support vector machine to find the nonlinear decision borderline. There are several kernels we can choose from: linear, polynomial, RBF, sigmoid, Gaussian, etc. In order to classify the data we need to move it into another dimension, for example from 2D to 3D. Imagine the circles and square launched into a 3D space. This is what the kernel is used for. Until now we have presented a friendly approach to understanding support vector machines. Here is where the hard part begins, where math steps in. Mathematically speaking building a support vector machine means computing the inner product kernel between a support vector and a random vector chosen from the input (Haykin, 1999; Tan et al., 2005; Zaknich, 2003). Presuming that we have x a vector from the space, and g(x) ¼ (g1(x), g2(x), …., gm1(x)) the non-linear transformation of the input space x to the feature space z, having the m1 as the feature space’s dimension. Thus, the decision surface will be defined as follows: m1 X

wj  gj ðxÞ + b ¼ 0,

j¼1

where w ¼ (w1, w2, …, wm1) is the weight vector that connect the z to the output space d, and b represents the bias. If we say that g0(x) ¼ 1 and w0 ¼ b, then we can change the above formula into: m1 X wj  gj ðxÞ ¼ 0: j¼0

205

5.2 Chemotherapy and Artificial Intelligence

Thus, by adding g0(x), g(x) becomes g(x) ¼ (g0(x), g1(x), …, gm1(x)), and it will be weighted by w 5 (w0, w1, …, wm1). The decision surface will be given by: w  gT ðxÞ ¼ 0: If we take as input vectors xi, i ¼ 1, 2, …, N, then we can built the inner product kernel using the following formula: Kðx, xi Þ5gðxÞ  gT ðxi Þ5

m1 X

gj ðxÞ  gj ðxi Þ, i ¼ 1,2, …, N:

j¼0

Going deeper into the theory, let us presume that we have the training set T ¼ {xi, di, i ¼ 1, 2, …, N}, where xi is the input pattern, and di is the associated output. We shall describe a two class decision problem, where we have positive patterns, di ¼ + 1, and negative patterns, di ¼  1. For the sake of simplicity we shall also presume that the classes are linearly separable. Recall that in this case we shall have: w  xTi + b  0, if di ¼ + 1, w  xTi + b < 0, if di ¼ 1: Fig. 5.4 shows us multiple hyperplanes that separate the data from which we need to choose the optimal one.

FIG. 5.4 Optimal hyperplane.

From our example, we see that H1 hyperplane has the largest margin of separation, thus we shall choose it. Let us denote w0 and b0 the values that correspond to H1, then we shall have: w0  xT + b0 ¼ 0, making the discriminant function, which measures the distance from the vector x to H1: gðxÞ ¼ w0  xT + b0 : If we denote r the distance we desire, then we have: gðxÞ ¼ w0  xT + b0 ¼ r  jjw0 jj, gðxÞ

where jjw0 jj is the Euclidian norm of the vector wo. Thus, we have r ¼ kw0 k.

206

5. Oncologist at work

In order to find the optimum hyperplane, we need to estimate w0 and b0 (Fig. 5.5).

FIG. 5.5 Optimal hyperplane together with the corresponding margin.

The points (xi, di), which belong to the two lines defined by: w0  xTi + b0  0, if di ¼ + 1, w0  xTi + b0 < 0, if di ¼ 1, are the support vectors. These points are placed on the decision boundaries that separate the two classes, which means that they are the most difficult points that need to be classified. If xi is a support vector, by definition we have: gðxi Þ ¼ w0  xTi + b0 ¼ 1, for di ¼  1. The distance from the support vector xi to the H1 is: 8 1 > > , di ¼ 1 gðxi Þ < kw0 k r¼ ¼ : kw0 k > 1 > : , di ¼ 1 kw0 k If we denote by ρ the optimal value of the margin of separation, then from the above equation we have: 2 ρ ¼ 2r ¼ : kw0 k By maximizing the distance we compute the value of the margin: ρ¼

2 , kwk

which actually means that we have to minimize the following function: LðwÞ ¼

kw k2 , 2

that is the minimization of the Euclidian norm of the weight vector w, having the following constraint: ( +1, if w  xTi + b  1 f ð xi Þ ¼ : 1, if w  xTi + b  1

207

5.2 Chemotherapy and Artificial Intelligence

What we have discussed so far is applicable only in the case when we have to deal with linear separable decision classes. Let us see now, how the paradigm changes when we encounter non-linear decision classes. We already mentioned in the friendly presentation above that other parameters are needed to solve this problem. In this case, we are going to use a set of additional non-negative scalars {ξi, i ¼ 1, 2, …, N} that are called slack variables. The slack variables measure the deviation of a point from the ideal situation of pattern separability. Our goal has changed, and now we have to minimize a cost function given by: Lðw, ξÞ ¼ having as constraints:

( f ðxi Þ5

N X jjwjj2 +C ξi , 2 i¼1

+1, if w  xTi + b  1  ξi 1, if w  xTi + b  1 + ξi

,

where C is a regularization parameter. To set the value of C we have two choices, either we use the experimental determination, or we use the analytical determination. The experimental determination is done by using the training and testing data set, whereas the analytical determination is done via the estimation of C dimension (see Haykin, 1999; Zaknich, 2003). The finale step implies choosing the appropriate kernel type from the ones mentioned in the beginning. Here we are going to present only few kernel types: • Polynomial learning machine:  p Kðx, xi Þ ¼ x  xTi + 1 , where p is called power and it is set by the user. • Radial basis function networks:

  1 2 Kðx, xi Þ ¼ exp   x  x , j j j j i 2  σ2

where σ 2 is a smoothing parameter, and is set by the user. • Two-layer perceptron:

  Kðx, xi Þ ¼ tanh β0  x  xTi + β1 ,

where β0 and β1 satisfy Mercer’s theorem. For technical details we refer the reader to Haykin (1999). Support vector machines obtain high accuracy, work well when we do not have large datasets, and are efficient due to the fact they use subsets of training points. On the other hand, if the data sets are large, then the training time can get quite high. Besides this, another disadvantage is that it gets less and less effective when applied on data sets that contain noise and overlapping classes. More details on support vector machines can be found in Stoean and Stoean (2014).

208

5. Oncologist at work

In the first chapter we discussed the fact that the oncologist must find the right dosage of chemo drug for each patient taking into account several factors. This process can be painful for the patient. Adverse drug reactions are estimated to reach 280,000 hospital admissions per year in the United States (Bourgeois et al., 2010; U.S. Department of Health, 2014). These reactions are caused by multiple factors such as: a certain condition that the patient might have, like a renal or liver dysfunction; the interactions between drugs, if two different drugs have as target the same enzyme (e.g. fluoxetin and tamoxifen); the interaction between drugs and food, for instance it is well known that grapefruit interacts with a lot of drugs, not only chemo drugs, thus it may inhibit drug metabolism enzymes and cause drug toxicity; and genetics, certain genomic variations can influence the drug response (Steward et al., 2007). Fortunately, things are starting to change. For example, Adrien Coulet, a lecturer from the University of Lorraine, and also a researcher in a joint Inria and Loria team, together with other researchers from Stanford, developed a new algorithm that predicts in advance if the dosage should be lowered in a certain patient (Coulet et al., 2018). The research team analyzed the Electronic Health Records of different patients, and by using a Random Forest Classifier were able to identify the patients that needed a lower dosage than the standard one for certain drugs. This prediction was made before the treatment had started. Technically, they have focused on the interaction between the drugs and the P450 enzyme family, a priori knowing that their dosage is sensitive. The innovation of their study lies in the selection of the data that had been used in the training phase. Electronic Health Records can be used for: studying drug effects and interactions (Tatonetti et al., 2012), patients’ outcome after receiving a certain drug ( Jensen et al., 2012), or the detection of adverse events (Neuraz et al., 2013). The drug dose changes in the Electronic Health Records were considered to be a sign in each patient’s drug response. If in the Electronic Health Record the dosage was unchanged during a period of time, then it meant that the patient received the right drug dosage. If in the Electronic Health Record was spotted a reduction in dosage, then it meant that the patient had an adverse reaction, whereas if an increase was found, then it meant that there was no response in the chemotherapy treatment. 34 Drugs that are metabolized by the P450 enzymes have been used in the study. The 34 drugs were prescribed in at least 300 intervals. The phenotype profiles of the patients that presented adverse reaction and had their dosage reduce were compared with the profiles of the patients whose dosage remained unchanged. The same procedure was repeated for the patients whose drug dosage was increased. The only question that remained was if the phenotype profile indeed could predict patient’s drug sensitivity. The method proved that it can predict dose reduction for 23 out of 34 drugs, but could not predict dose increases. The researchers developed a web tool that was used by three doctors to provide a clinical interpretation of the results obtained. The phenotype profile contained three types of characteristics: the diagnostic codes, health condition, and lab test orders. For a certain patient’s phenotype profile, which contained 300 features, 100 feature for each characteristic type, there was generated another profile that contained all the features that were statistical significant. Some phenotypes ended up being empty, due to the fact that no feature was statistically significant, that is there was nothing written in the Electronic Health Record that could indicate why the dosage had to be changed.

5.2 Chemotherapy and Artificial Intelligence

209

The web tool, or the phenotype profile browser, is available at http://snowflake.loria.fr/ p450/ (Accessed November 23, 2019). The evaluation of the phenotype profile’s prediction was performed by using the 10-fold cross validation, and by holding out the last year data to be the testing data set. Since we already have covered the concept of cross validation in Chapter 2, we are not going to discuss it any further. As for the second type of model evaluation it was done on the data from 2014. In this case, the AI system was trained on the data collected between 2008 and 2013. The reported results were: for the 10-fold cross validation the average AUC was 0.76 with F-measure equaling 0.69, as for the hold out testing method the AUC was 0.68 with F-measure of 0.64. The authors mention that the discrepancy between the two AUCs is caused by the different sizes of the data sets that the training has been done on. These results are achieved when trying to predict sets of drugs; if the model is applied only on a particular drug the results are better. For example, if using the 10-fold cross validation for tacrolimus, the model reaches 0.94 AUC and 0.89 F-measure; for class L-Antineoplastic and Immunomodulating agents the AUC was 0.93 with F-measure 0.88. For 10 single drugs the AI system surpass 0.70 for the AUC. Future work will probably improve this numbers. The researchers are trying to adapt this AI model to the social and legal setting of French hospitals. Only by the digitalization of the health care system, more data will be collected, and ultimately the AI performance will go up. Having discussed all the side effects of chemotherapy, it would make a great deal, if a patient would know from the beginning whether she/he will benefit from chemotherapy treatment and improve the changes of getting better, or if it is all in vain, and the only outcome is the horrible side-effects, a great price for nothing. Luckily, steps are made in this direction through AI. An AI system has been taught to recognize which patients will benefit from chemotherapy. VTT has developed a model that reviews MRI images and identifies which breast cancer patients are going to benefit from pre-operative chemotherapy. This project is part of EU funded BigMedilytics (Big Data for Medical Analysis). VTT has two other partners: the hospital Institute Curie that focuses on the treatment of breast cancer patients, and the IBM Haifa, Israel, which works with mammography images. The technique classifies tissues depending on their shape and volume seen of the MRI images. This information together with other patient data, such as metabolism, etc., is used for predicting who benefits from neoadjuvant chemotherapy. The researchers have used a deep neural network for this task. Results of this pilot have been presented at the “Big Data: Fueling the transformation of Europe’s Healthcare sector” event. Here is the link for the poster: https://www. bigmedilytics.eu/wp-content/uploads/2019/10/2.3.Breast-cancer.pdf (Accessed November 24, 2019). In the meantime, MIT researchers are using AI to improve the patients’ quality of life by trying to reduce the chemotherapy and radiotherapy dosage for glioblastoma. Glioblastoma is the most aggressive type of brain cancer, the survival rate not surpassing five years. In general, oncologists give the highest possible drug dosages in order to shrink the tumor. The study was presented at the Machine Learning For Healthcare conference at Stanford University in 2018 (Yauney and Shah, 2018). The AI system reviews the treatments that are in use today, and iteratively adjusts the doses. In the end, the goal is to find the optimal treatment plan, that gives the lowest possible dosage that still reduces the tumor size. The shrinkage of the tumor must be comparable to the shrinkage done by traditional treatments. The study was

210

5. Oncologist at work

conducted on simulated trials of 50 patients, and the MIT researchers reported that the AI model designed treatment cycles that reduced the dosages to a quarter or a half of classical treatment dosages, and still reached the desired shrinkage in the tumor size. The chemo cycles were reduced to twice a year or in some cases skipped at all. The researchers used reinforced learning to train the AI system. The model starts with a combination of temozolomide and procarbazine, lomustine and vincristine, that are administered over weeks or months of treatment. In order for an oncologist to predict how many doses needs a patient based on her/his weight, she/he uses regimens, which are based on protocols that have been developed based on animal testing and clinical test and scenarios, and that are in practice for ages. The model is “fed” with these regimens, and when it analyzes one of them and it decides whether it initiates a dose, or withholds it. If the decision is to initiate a dose, then it needs to decide if it will administer the whole dose, or just a part of it. After the process of deciding is over, the model verifies with another clinical model to see if the tumor has shrunk or not. If the tumor has shrunk, then it receives an award, +1, otherwise it receives a penalty, 1. One crucial problem for the researchers was to make sure that the model did not surpass the maximum safety dosage permitted, or that it did not surpass the maximum number of doses just to have the tumor’s size reduced. The solution to this problem was that the moment the model tried to administer full doses, it receives a penalty. The 50 patients were randomly selected and all of them had undergone traditional treatment, prior to this study. About 20,000 trial and error tests were run for each patient in the training phase. Once the training was done, the model was tested on 50 new simulated patients, and used the tailored treatments learned in the training phase. The AI regimen was compared to the results obtained when using the drug combinations temozolomide and procarbazine, lomustine and vincristine (Ricard et al., 2007; Peyre et al., 2010). Another research that studies the response of patients that suffer from breast cancer, to neoadjuvant chemotherapy was published in Oncotarget in 2019 (Tadayyon et al., 2019). A research team from Sunnybrook Research Institute in Toronto has developed a new model that can predict whether neoadjuvant chemotherapy is needed for certain patients that have locally advanced breast cancer, before they begin the treatment. The model uses neural networks to review quantitative ultrasound tumor images. The neural network learned how to find patterns between the images and the corresponding patient outcome data. The authors reported 96% accuracy, 0.96 AUC. The neural network’s performance was compared with the performance of a kNN, 67%. This study provides a framework base for personalized a priori neoadjuvant chemotherapy. In DeWan et al. (2018) the authors developed an AI system to predict neutropenia risk within six months of chemotherapy in breast cancer patients. People that suffer from neutropenia have a low number of neutrophils cells. The role of neutrophils is to attack bacteria or other organisms when they invade our body. The data set contained 10,288 breast cancer patients from the ASCO CancerLinQ Discovery. All the patients underwent chemotherapy treatment with doxorubicin and cyclohosphamide, followed by paclitaxel. Some of them had pre-chemotherapy WBC growth factor prophylaxis. The reported results show a positive predictive value of 0.56 and a negative predictive value of 0.92. Remaining in the breast cancer chemotherapy department, we will present next a paper that regards the prediction of the complete pathological response after neoadjuvant

5.2 Chemotherapy and Artificial Intelligence

211

chemotherapy (Bhardwaj and Hooda, 2019). The prediction was made using an ensemble of machine learning techniques, called Deux Machine Learning framework. Deux Machine Learning framework brings a new method of validating its performance: a multi criteria decision-making technique named weighted simple additive weighting or WSAW. The WSAW performance is computed taken into consideration the accuracy, mean absolute error, root mean square error, true positive rate, false positive rate, precision, recall, F-measure, Matthews correlation coefficient (MCC), and ROC curves. The validation was done using the k-fold cross validation. The authors reported 99.08% accuracy, and compared their AI system with support vector machines, random tree forests and Adaboost. The study has three steps/layers: layer 1—chemotherapy: in this layer anthracycline and taxane are used as treatment, blocking the cancer cell growth; layer 2—machine learning: the Deux Machine Learning predicts the pathological complete response; layer 3—post surgical treatment and follow up. The training data set contains 222 cases that have been preprocessed before being “fed” to the Deux Machines. Nine prediction models have been used for classification: Bayes Net, Naive Bayes, Logistic regression, Multilayer Perceptron neural network, Sequential minimal optimization, Voted Perceptron neural network, random forests, Adaboost, and Adabag. Evaluating the performance of the nine AI models, the authors chose random forest, Adaboost and Adabag. Another prediction of the patients’ response to chemotherapy was published in Radiology: Artificial Intelligence (Khorrami et al., 2019a,b). The data set contained 125 patients diagnosed with non-small cell lung cancer. The study is retrospective, all the patients being prior treated with pemetrexed based platinum doublet chemotherapy at Cleveland Clinic. The data set was divided into two subsets having an equal number of responders and non-responders in the training set. In order to predict the chemotherapy response, an AI system was trained using radiomic texture features extracted from CT scans. The CT scans were done on intra- and peritumoral regions with non-contrast. The Cox regression model together with the least absolute shrinkage and selection operator has been used in this research paper. The authors reported a mean AUC of 0.82, having 0.09 as standard deviation for the training set, and an AUC of 0.77 for the testing set. In another recent study, the scientists from the University of California and University of Surrey used AI to discover unrelieved symptoms of cancer patients that underwent chemotherapy treatment (Papachristou et al., 2019). On average, cancer patients have fifteen unrelieved symptoms after receiving chemotherapy treatment (Esther Kim et al., 2009; Miaskowski et al., 2015; Papachristou et al., 2018). Most research in this field involved cluster analysis, the researches taking into account the presumption that by studying the symptoms that group themselves into the same cluster, might represent the key to tailored therapeutic treatments (Barsevick, 2016; Miaskowski et al., 2017). The authors used network analysis to explore the relationship of 38 common symptoms: difficulty concentrating, pain, lack of energy, cough, feeling nervous, hot flashes, dry mouth, nausea, feeling drowsy, numbness of tingling in hand or feet, chest tightness, difficulty breathing, difficulty sleeping, feeling bloated, problems with urination, vomiting, shortness of breath, diarrhea, feeling sad, sweats, problems with sexual interest and/or activity, worrying, difficulty swallowing, feeling irritable, mouth sore, weight loss, hair loss, constipation, swelling, change in the way food tastes, problems with oneself, changes in skin. Network analysis belongs to graph theory, and it practically gives a visual representation of data in the form of graphs, helping us gain

212

5. Oncologist at work

knowledge in order to make better decisions or predictions (Chen, 1971; Bollobas, 1982; Read, 1972). Network graphs were used in studying depression (Bringmann et al., 2015; Fried et al., 2016), posttraumatic stress (Frewen et al., 2013), quality of life (Kossakowski et al., 2016), and in the identification of high-risk cancer population (Zou and Wang, 2018) among other areas of applications. Network analysis has been also used in oncology, studying the occurrence of 18 symptoms in 665 patients (Bhavnani et al., 2010). These symptoms are severe and cause distress, thus researchers are not interested only in the occurrence, but also on their variability (McCorckle, 1987; McCorckle and Young, 1978; Portenoy et al., 1994a,b). Papachristou et al. studied three symptoms dimensions: occurrence, severity and distress. The patients used in this study suffered from breast, gastrointestinal, gynecological, or lung cancer. All 1328 patients underwent 4 weeks of chemotherapy, and had been scheduled for another 2 chemotherapy cycles. In order to evaluate the occurrence, severity, and distress, the authors used a modified version of the Memorial Symptom Assessment Scale (Portenoy et al., 1994a,b), a questionnaire that measures the experience of dealing with symptoms. Technically, each patient had to answer whether she/he had experienced a certain symptom, and if the answer was yes she/he needed to indicate the rate of its severity and distress. The severity of a symptom was measured using the Likert scale of size 4, which is: slight ¼ 1, moderate ¼ 2, severe ¼ 3, and very severe ¼ 4 (Robinson, 2014), whereas the distress was measured using a Likert scale of size 5, which is: not at all ¼ 0, a little bit ¼ 1, somewhat ¼ 2, quite a bit ¼ 3, very much ¼ 4. In the original study, only 32 symptoms were reviewed. Papachristou et al. added six more symptoms: hot flashes, difficulty breathing, weight gain, increased appetite, chest tightness, and abdominal cramps. For this, the researchers used two models of Pairwise Markov Random Fields (PMRF) (Koller and Friedman, 2009). The PRMF is an undirected graph, where the random variables have a Markov property. The nodes in the graph are represented by symptoms. If there is no edge between two nodes, then that means that the two symptoms are independent of each other; if there exists an edge between two nodes that means that the relationship between the two symptoms cannot be explained by any other symptom in the network. In a PRMF the number of parameters grows directly proportional to the size of the network. In this study 741 parameters had to be estimated. The networks were created using the IsingFit R-package and R-package qgraph. The certainty of the edges was examined using bootstrap, having α ¼ 0.05 the significance between edges based on 1000 bootstrap iterations, whereas the way the symptoms clustered together was done using the Walktrap algorithm (Orman and Labatut, 2009; Yang et al., 2016). The Walktrap algorithm clusters nodes that are highly connected with each other. The reported results are: for the occurrence there were observed 36.42% potential connections, the majority of them being positive except for gain and weight loss; for the severity 54.48% possible connections, all of which were positive, except for increased appetite, lack of appetite, hair loss, difficulty with urination, diarrhea and constipation; for the distress 50.92% possible connections, all positive, except for lack of appetite, weight gain, weight loss, diarrhea, hot flashes, and both hot flashes and swelling of the arms and legs. By viewing all these symptoms clustered, further research can produce new methods that may ease the many side effects cancer patients have while undergoing chemotherapy. In the first chapter we discussed that besides chemotherapy, the oncologist may prescribe hormone therapy. In a 2019 study, Ferreira et al. proved that hormone therapy has a greater

5.2 Chemotherapy and Artificial Intelligence

213

impact on the women’s quality of life, rather than chemotherapy (Ferreira et al., 2019). In recent years, hormonal therapy has been used more and more on patients that have early stage breast cancer, in detriment of chemotherapy. Ferreira et al. studied the impact of hormonal therapy versus chemotherapy on the quality of life. The data set contained 4262 patients responses to the European Organization for Research and Treatment of Cancer QLQ-C30/BR23 questionnaires inside the CANTO trial. The reported results show that after 2 years since the diagnosis, hormone therapy worsens the quality of life and the side effect last longer, especially with menopausal patients. 37.2% were premenopausal and 62.8% were postmenopausal. Hormonal therapy had a negative impact on the social function, pain, insomnia, breast symptoms, and emotional function. Chemotherapy had a negative impact on physical and cognitive function, dyspnea, financial difficulties, body image, and breast symptoms. The primary treatment was surgery followed by chemotherapy and/or radiotherapy. Around 81.9% patients received hormone therapy for at least 5 years. The aim of the study was to see if doctors are able to predict which women will develop severe side effects when undergoing hormonal therapy, in order to find means to support them. Hormonal therapy has shown that reduces the relapse rate in hormone depended breast cancer, which represent 75% of all breast cancer (Burstein et al., 2019). AI proved that it could predict the response to immunotherapy. By processing CT scans, an AI system, developed by the researchers from Gustave Roussy, Centrale Supelec, Inserm, Paris Sud University and TheraPanaceea, a spin-off from Centrale Supelec that is specialized in AI applied in oncology, radiotherapy and precision medicine, is able to create a radiomic signature (Sun et al., 2018). Practically, the radiomic signature expresses the lymphocyte infiltration of a tumor, and uses it to predict the efficacy of immunotherapy treatment in a patient. This study could possible enable physicians to identify biological phenomena in a tumor just by using imaging, not having to perform biopsy anymore. We know, this might sound like a sci-fi movie. Only 15%–30% of patients respond to anti PD-1/PD-L1 immunotherapy. PD-1 and PD-L1 inhibitors are proteins that are present on the cells’ surface. They are also called checkpoint inhibitors and are used as treatment for several types of cancer: metastatic melanoma, nonsmall cell lung cancer, renal cell carcinoma, bladder, or urothelial cancer (Alsaab et al., 2017). Currently, immune checkpoint inhibitors are being tested in breast, head and neck cancers, as well as in hematological malignancies. The more lymphocytes are present in the tumor, the greater the chance that the patient will respond to immunotherapy. Thus, finding out the amount of lymphocytes in the tumor from CT scans has been the goal of Alaab et al. The study was applied on four independent cohorts of patients. The whole data set contained 135 patients from the MOSCATO trial, which took place in France. Their CT scans were merged with the RNA-seq genomic data obtained from the tumors’ biopsies, and the T-lymphocytes cell tumor infiltration was predicted. Using AI, they predicted the presence of lymphocytes, and established the radiomic signature. This signature was afterwards tested on a Cancer Genome Atlas data set that included 119 patients. The radiomic predictor of the T-lymphocytes was predicted using elastic net regularized regression model. The predictor was evaluated using two cohorts of patients that had advanced solid tumors. They were randomly selected from the Gustave Roussy Cancer Campus data set, which contains medical records on tumor immune phenotypes such as: immune inflated (dense T-lymphocytes cell infiltration), or immune desert (low T-lymphocytes cell infiltration). In the end, the immunotherapy treated

214

5. Oncologist at work

data set, which contained 137 patients that were treated with anti-PD-1 and anti-PD-L1 monotherapy, was used to measure the performance of the radiomic signature. The reported results were: for the Cancer Genome Atlas the AUC was 0.67; on the data set that contained immune inflated and immune desert inflated tumors the AUC was 0.76; as for the patients that were treated with anti-PD-1 and anti-PD-L1, a high baseline radiomic score was correlated with a higher proportion of patients that had a response after 3 and 6 months of treatment, than of those with better overall survival. Another research group studied CT scans in order to predict how well lung cancer patients will respond to immunotherapy (Khorrami et al., 2019a,b). After predicting whether patients with lung cancer will respond to chemotherapy, the data scientists from the Case Western Reserve University digital imaging lab tried to determine whether patients will respond to immunotherapy. The method was teaching the AI system to see if there are any changes in the CT scans done prior to the immunotherapy treatment, and the ones done after 2 or 3 cycles of treatment. Those changes have been discovered inside, but also outside the tumor. 139 patients were used in this study. The patients were divided into the discovery set that contained 50 patients, and two validation sets, one containing 62 patients, the other one 27. The response evaluation criteria in solid tumors was predicted using a linear discriminant analysis, which yield an AUC of 0.88, in classifying responders versus non responders. The problem with immunotherapy is that even if it changed entirely the way cancer is treated, it is too expensive, $200,000 per patient, so finding out who will and who will not respond to it is crucial. Moving forward, scientists are trying to see whether there is a connection between the gut and the brain in what regards the “chemo brain”—https://news.osu.edu/a-possible-gutbrain-connection-to-chemo-brain/ (Accessed November 26, 2019). Chemo brain is a term that is used by cancer survivors. Through it they describe the problems related to the side effect of chemotherapy, which included cognitive dysfunction and memory problems. The term chemo brain is used very often, but still the causes that determine it are not understood. The signs that are most encountered in chemo brain are: being unusually disorganized, having difficulty concentrating, finding the right word, learning new skills, multitasking, confusion, short term memory problems, not being able to remember a conversation, not being able to recall an image, feeling mental fogginess, etc. The researchers believe that the fact that chemotherapy is hard on the digestive system (diarrhea, constipation, nausea, anorexia, etc.) might lead to a connection between the gut and the brain. Leah Pyter, an assistant professor of psychiatry and behavioral health, and also an investigator in the Institute for Behavioral Medicine Research at Ohio State, says: “It may be that part of why cancer patients get chemo brain is because the gut is changed and is talking to the brain differently.” To prove this theory, the scientists are analyzing the chemo’s effects on mice. An experiment involves feeding the mice antibiotics, whereas the second experiment involved coprophagia (mice eating their own and other mice’s feces). By coprophagia, the mice are experimenting a similar situation to the fecal microbial transplant. Putting mice that underwent chemo cohabiting with untreated mice showed that all animals’ gut bacteria has changed. The mice that received chemotherapy lost less weight due to the fact that they ate the feces of the untreated mice, compared to mice that underwent chemotherapy and lived alone. This implies that their gut bacteria changed and partially reversed one symptom of chemotherapy. If indeed there is a connection between the gut and the chemo brain, then the patients could be prescribed probiotics or prebiotics, eat differently, etc., so that they preserve the gut bacteria

5.2 Chemotherapy and Artificial Intelligence

215

that is beneficial to the brain. The study is still in the starting phase, and we look forward to the statistical validation of research results. Precision medicine is growing more and more. By studying genes and genetic mutations that show up in tumors, doctors can match them with drugs that can stop their growth, and even destroy them. Today, drugs can target dozens of cancer genes, but data scientists are hoping to target hundreds of them. Even if the discovery of targeted chemotherapy drugs has evolved, there is still a long road ahead of us. Today, there is still no cure for cancer. Only one in ten patients that are diagnosed with advanced cancers have genes that are presently known to make the cancer respond to a new drug. Dr. Sameek Roychowdhury says that even if the goal is to give to all the patients a new therapy based on genomic testing, still “today we don’t know how to provide a special treatment for the results of nine of ten genomic tests we do.” And things are even worse than they appear at first. Due to the fact that many doctors do not have the expertise or the means to perform genetic tests, their patients cannot benefit from precision medicine. Besides these facts, the cost of genetic testing is enormous. The insurance companies do not always cover these tests, thus only 10% of cancer patients undergo genetic testing. It is clear that AI needs to step in. We have seen throughout this chapter that AI can find the relationship between tumor genes, the effect of chemotherapy or immunotherapy on cancer growth, the effect of chemotherapy or hormonal therapy on patients, etc. An army of doctors could not interpret the huge amount of data that is generated. Just picture the fact that the human genome includes three billion DNA nucleotides, and each one of them can be mutated or repeated and ultimately cause cancer. If this isn’t enough proof, you should know that the DNA is just part of the whole cancer scheme. We mentioned in the beginning that the cells work through proteins, which means that the cancer growth and also the immune system that fights it are both controlled by them. There are around 6 millions proteins. The chemotherapy drugs target the proteins, not the genes. Medicine is not math, but ultimately maybe it should be. AI does not process data like humans do. AI does not understand the biological structures and processes behind cancer. AI is fed with data regarding tissue samples of patients that suffer from a certain type of cancer and computes mathematically different correlations between the initial state of cancer and the response of that cancer to different treatments. Things go even further, because we are not interested only in the cancer response to the treatment, but also on the treatment’s effect on the patient’s body and mind. This is why, whether doctors like it or not, mathematics has entered the medical field through AI. Using math, Nucleai, an Israeli company, trained its AI system to learn from 20 million digitalized biopsy samples, and now it is able to recognize cancer with 97% accuracy. And this is just the beginning, Nucleai wants to achieve something that the pathologist failed to do: extract information from slides so that they match the correct drugs to the tumor. Avi Veidman states that the human eye cannot spot some subtle signs of the interaction between the cancer cells and the immune system cells, but the software can. These signs may be the answer to boosting the patient’s immune system through immunotherapy so it can fight the tumor. Another example is the South Korean firm Lunit—www.lunit.io (Accessed November 26, 2019)—that also developed an AI system that analysis digitalized pathology slides in order to predict whether a patient will respond to checkpoint inhibitors cancer drugs. Unfortunately, the fight against cancer is difficult, due to the fact that it is constantly changing. The cancer cells develop new mutations so that they can trick the immune cells and

216

5. Oncologist at work

survive cancer drugs. Cancer cells continue to change their shape, and that makes the training of an AI system almost impossible, since for it to learn to recognize a certain pattern it needs thousands of examples. A particular pattern of mutation can appear only in a few patients, and that causes a dilemma. As it can be easily seen the problems are related to the data sets. In order to test new chemotherapy drugs, data scientists need to recruit patients for clinical trials, and this does not imply only gathering patients to test the new drugs, but also gather patients to be part of the control group, for comparison purposes. To demonstrate that a drug is efficient it might take years and years. A solution to this problem comes in the form of Statistics. Data from past studies can be used to predict how a real control group would respond. Even if the issue concerning the patients from the control group is resolved, what happens when we, researchers, must find a cure against the clock? The most aggressive type of cancer is glioblastoma, the brain cancer. The lowest median of survival time is 15 months. Can researchers find a cure so fast? Can a drug be designed at this rhythm in order to kill the glioblastoma tumor? The tumor is so vicious, that it does not give time to a new drug to show whether it is good or not. Before we can see the result, the patient is already dead. The Ivy Brain Tumor Center—http://ivybraintumorcenter.org (Accessed November 26, 2019)—is developing this type of new trials called accelerated trials for brain cancer patients. Immediately, when a brain cancer patient turns to them, she/he is given a dose of a new experimental drug. The dosage is big enough to reach the tumor, but also small enough not to kill the patient through its toxicity. After performing brain surgery, the doctors measure the tumor to see if the drug was efficient or not. If the results are good, then the patient receives and increased dose of the drug, else the oncologist chooses another course of treatment for that particular patient. This procedure might appear hard-core, but it pays off. A patient beat a form of brain cancer, malignant meningioma, through a personalized treatment based on the accelerated trial. Precision medicine is still far away from achieving its goal. But even if we still have a long road ahead us, the future looks bright. Let us suppose that performing a genetic test leads us to only help one patient out of 100. At first we might be disappointed, but in the long run, we spared 99 patients of going to unnecessary, excruciating, expensive treatments. If a patient is tested and the result shows that she/he can benefit from the new drug, the insurance company might not pay for an experimental drug that has not been yet approved by the FDA—https://www.cancer.net/research-and-advocacy/clinical-trials/health-insurancecoverage-clinical-trials. Their only chance is if they become part of a clinical trial, then they will receive the experimental drug for free. Unfortunately, the patients who really need the experimental drug are the ones that have the most advanced cancers, and they are usually rejected from clinical trials. Sadly enough, drug companies together with data scientists want to avoid patients that have no chance at surviving, due to the fact that their study might fail. Even if money is not the issue, overall healthcare is. Around the world, people do not have access to the newest treatments available. Oncologists who treat multiple cancers treat most people. What can be done? Find another oncologist. Get a second opinion before starting treatment. Find a trade-off between your limited time and the fear that you might receive the wrong treatment. We believe that we have reached the goal of this chapter. We shall see you in the next chapter where we will be dealing with radiotherapy and Artificial Intelligence.

References

217

References Ahuja, K., Rather, G.M., Lin, Z., Sui, J., Xie, P., Le, T., Bertino, J.R., Javanmard, M., 2019. Toward point of care assessment of patient response: a portable tool for rapidly assessing cancer drug efficacy using multi frequency impedance cytometry and supervised machine learning. Microsyst. Nanoeng. 5, 34. https://doi.org/10.1038/s41378019-0073-2. Alsaab, H.O., Sau, S., Alzharni, R., Tatiparti, K., Bhise, K., Kashaw, S.K., Iyer, A.K., 2017. PD-1 and PD-L1 checkpoint signaling inhibition for cancer immunotherapy mechanism, combinations, and clinical outcome. Front. Pharmacol. https://doi.org/10.3389/fpahr.2017.0051. Antalis, T.M., Bugge, T.H., Wu, Q., 2011. Membrane anchored serine proteases in health and disease. Prog. Mol. Biol. Transl. Sci. 99, 1–50. https://doi.org/10.1016/b978-0-12-385504-6.00001-4 (Chapter 1). Barsevick, A., 2016. Defining the symptom cluster: how far have we come? Semin. Oncol. Nurs. 32, 334–350. Bertino, J.R., Lin, S.Y., Lin, C.Y., 2010. Targeted delivery of doxorubicin conjugated with anti matriptase antibody to treat multiple myeloma. In: Proceedings: AACR 101st Annual Meeting, Washington, DC. Bhardwaj, R., Hooda, N., 2019. Prediction of pathological complete response after neoadjuvant chemotherapy for breast cancer using ensemble machine learning. Inform. Med. Unlocked 16. https://doi.org/10.1016/j.imu.2019.100219. Bhavnani, S.K., Bellala, G., Ganesan, A., Krishna, R., Saxman, P., Scott, C., Silveira, M., Given, C., 2010. The nested structure of cancer symptoms: implications for analyzing co-occurrence and managing symptoms. Methods Inf. Med. 49 (6), 581–591. Bollobas, B., 1982. Graph Theory. vols. 1 & 2 Elsevier. Bourgeois, F.T., Shannon, M.W., Valim, C., Mandl, K.D., 2010. Adverse drug events in the outpatient setting: a 11 year national analysis. Pharmacoepidemiol. Drug Saf. 19, 901–910. Bringmann, L.F., Lemmens, L.H., Huibers, M.J., Borsboom, D., Tuerlinckx, F., 2015. Revealing the dynamic network structure of the beck depression inventory. Psychol. Med. 45, 747–757. Burstein, H.J., Lacchetti, C., Andreson, H., Buchholtz, T.A., Davidson, N.E., Gelmon, K.A., Giardano, S.H., Hudis, C.A., Solky, A.J., Stearns, V., Winer, E.P., Griggs, J.J., 2019. Adjuvant endocrine therapy for women with hormone receptor positive breast cancer: ASCO clinical practice guideline focused update. J. Clin. Oncol. 37 (5), 423–438. Chen, W.K., 1971. Applied Graph Theory. Elsevier. Coulet, A., Shah, N.H., Wack, M., Chawki, M.B., Jay, N., Dumontier, M., 2018. Predicting the need for a reduced drug dose, at first prescription. Sci. Rep. 8, 15558. DeWan, P.A., Inbar, O., Spina, C.S., Rudeen, K., Lagor, C., Walker, M.S., Stepanski, E.J., Nwankwo, J.O., Hyde, B., 2018. Artificial intelligence methods to predict chemotherapy induced meutropenia in breast cancer patients. J. Clin. Oncol. 36, 6555. Esther Kim, J.E., Dodd, M.J., Aouizerat, B.E., Jahan, T., Miaskowski, C., 2009. A review of the prevalence and impact on multiple symptoms in oncology patients. J. Pain Symptom Manage. 37 (4), 715–736. https://doi.org/10.1016/j. painsymman2008.04.018. Ferreira, A.R., Di Meglio, A., Pistili, B., Gbenou, A.S., El Mouhebb, M., Dauchy, S., Charles, C., Joly, F., Everhard, S., Lambertini, M., Coutant, C., Cottu, P., Lerebours, F., Petit, T., Dalenc, F., Rouanet, P., Arnaud, A., Martin, A., Berille, J., Ganz, P.A., Partridge, A.H., Delaloge, S., Michiels, S., Andre, F., Vaz Luis, I., 2019. Differential impact of endocrine therapy and chemotherapy on quality of life of breast cancer survivors: a prospective patient reported outcomes analysis. Ann. Oncol. https://doi.org/10.1093/annonc/mdz298. Frewen, P.A., Schmittmann, V.D., Bringmann, L.F., Borsboom, D., 2013. Perceived causal relations between anxiety, posttraumatic stress and depression: extension to moderation, medication and network analysis. Eur. J. Psychotraumatol. 4. https://doi.org/10.3402/ejpt.v4i0.20656. Fried, E.I., Epskamp, S., Nesse, R.M., Tuerlinckx, F., Borsboom, D., 2016. What are ’good’ depression symptoms? Comparing the centrality of dsm and non-dsm symptoms of depression in network analysis. J. Affect. Disord. 189, 314–320. Haykin, S., 1999. Neural Networks: A Comprehensive Foundation, second ed. Prentice Hall, Englewood Cliffs. Jensen, P.B., Jensen, L.J., Brunak, S., 2012. Mining electronic health records: towards better research applications and clinical care. Nat. Rev. Genet. 13, 395–405. Khorrami, M., Khunger, M., Zagoruas, A., Patil, P., Thawani, R., Bera, K., Rajiah, P., Fu, P., Velcheti, V., Madabhushi, A., 2019a. Combination of peri and intratumoral radiomic features on baseline CT scans predicts response to chemotherapy in lung adenocarcinoma. Radiol. Artif. Intell. 1 (2):e180012. https://doi.org/ 10.1148/ryai.2019180012.

218

5. Oncologist at work

Khorrami, M., Prasanna, P., Gupta, A., Patil, P., Velu, P.D., Thawani, R., Corredor, G., Alilou, M., Bera, K., Fu, P., Feldman, M., Velcheti, V., Madabhushi, A., 2019b. Changes in CT radiomic features associated with lymphocyte distribution predict overall survival and response to immunotherapy in non-small cell lung cancer. Cancer Immunol. Res. https://doi.org/10.1158/2326-6066.CIR-19-0476. Koller, D., Friedman, N., 2009. Probabilistic Graphical Models: Principles and Techniques. MIT Press. Kossakowski, J.J., Epskamp, S., Kieffer, J.M., van Borkulo, C.D., Rhemtulla, M., Borsboom, D., 2016. The application of a network approach to health related quality of life (HRQoL): introducing a new method for assessing HRQoL in healthy adults and cancer patients. Qual. Life Res. 25 (4), 781–792. Lin, S.Y., Bertino, J.R., Lin, C.Y., 2016. Targeting Tumor Cells With Chemotherapeutic Agent Conjugated to Matriptase Antibodies. (Google Patents). McCorckle, R., 1987. The measurement of symptoms distress. Semin. Oncol. Nurs. 3, 248–256. McCorckle, R., Young, K., 1978. Development of a symptom distress scale. Cancer Nurs. 1, 373–378. Miaskowski, C., Dunn, L., Ritchie, C., Paul, S.M., Cooper, B., Aouizerat, B.E., Alexander, K., Skerman, H., Yates, P., 2015. Latent class analysis reveals distinct subgroups of patients based on symptom occurrence and demographic and clinical characteristics. J. Pain Symptom Manage. 50 (1), 28–37. https://doi.org/10.1016/j. painsymman.2014.12.011. Miaskowski, C., Barsevick, A., Berger, A., Casagrande, R., Grady, P.A., Jacobsen, P., Kutner, J., Patrick, D., Zimmerman, L., Xiao, C., Matocha, M., Marden, S., 2017. Advancing symptom science through symptom cluster research: expert panel proceedings and recommendations. J. Natl. Cancer Inst. 109 (4). https://doi.org/10.1093/ jnci/djw253. Neuraz, A., Chouchana, L., Malamut, G., Le Beller, C., Roche, D., Beaune, P., Degoulet, P., Burgun, A., Loriot, M.A., Avillach, P., 2013. Phenome wide association studies on quantitative trait: application to TPMT enzyme activity and thiopurine therapy in pharmacogenomics. PLoS Comput. Biol. 9 (12), e1003405. Orman, G., Labatut, V., 2009. A comparison of community detection algorithms on artificial networks. In: International Conference on Discovery Science, pp. 242–256. Papachristou, N., Barnaghi, P., Cooper, B.A., Hu, X., Maguire, R., Apostolidis, K., Armes, J., Conley, Y.P., Hammer, M., Katsaragakis, S., Kober, K.M., Levine, J.D., McCann, L., Patiraki, E., Paul, S.M., Ream, E., Wright, F., Miaskowski, C., 2018. Congruence between latent class and k modes analyses in the identification of oncology patients with distinct symptoms experiences. J. Pain Symptom Manage. 55 (2), 318–333. Papachristou, N., Barnaghi, P., Cooper, B., Kober, K.M., Maguire, R., Steven, M., Hammer, M., Wright, F., Armes, J., Furlong, E.P., McCann, L., Conley, Y.P., Patiraky, E., Katsaragakis, S., Levine, J.D., Miaskowski, C., 2019. Network analysis of the multidimensional symptom experience of oncology. Sci. Rep. 9, 2258. https://doi.org/10.1038/ s41598-018-36973-1. Peyre, M., Cartalat Carel, S., Meyronet, D., Ricard, D., Jouvet, A., Pallud, J., Mokhtari, K., Guyotat, J., Jouanneau, E., Sunyach, M.R., Frappaz, D., Honnorat, J., Ducray, F., 2010. Prolonged response without prolonged chemotherapy: a lesson from PCV chemotherapy in low grade gliomas. Neuro Oncol. 12 (10), 1078–1082. Portenoy, R.K., Thaler, H.T., Kornblith, A.B., McCarthy Lepore, J., Friedlander Klar, H., Coyle, N., Smart Curley, T., Kemeny, N., Norton, L., Hoskins, W., Scher, H., 1994a. Symptom prevalence, characteristics and distress in a cancer population. Qual. Life Res. 3 (3), 183–189. Portenoy, R.K., Thaler, H.T., Kornblith, A.B., McCarthy Lepore, J., Friedlander Klar, H., Kiyasu, E., Sobel, K., Coyle, N., Kemeny, N., Norton, L., Scher, H., 1994b. The memorial symptom assessment scale: an instrument for the evaluation of symptom prevalence, characteristics and distress. Eur. J. Cancer 30A (9), 1326–1336. Read, R.C., 1972. Graph Theory and Computing. Elsevier. Ricard, D., Kaloshi, G., Amiel Benouaich, A., Lejeune, J., Marie, Y., Mandonnet, E., Kujas, M., Mokhtari, K., Tailliebert, S., Laigle Donadey, F., Carpentier, A.F., Omuro, A., Capelle, L., Duffau, H., Cornu, P., Guillevin, R., Sanson, M., Hoang Xuan, K., Delattre, J.Y., 2007. Dynamic history of low-grade gliomas before and after temozolomide treatment. Ann. Neurol. 61 (5), 484–490. Robinson, J., 2014. Likert scale. In: Michalos, A.C. (Ed.), Encyclopedia of Quality of Life and Well Being Research. Springer, Dordrecht. https://doi.org/10.1007/978-94-007-0735-5. Steward, W.F., Shah, N.R., Selna, M.J., Paulus, R.A., Walker, J.M., 2007. Bringing the inferential gap: the electronic health record and clinical evidence. Health Aff. 26, 181–191. Stoean, C., Stoean, R., 2014. Support Vector Machines and Evolutionary Algorithms for Classification. Single of Together. Springer.

References

219

Sun, R., Limkin, E.J., Vakalopoulou, M., Dercle, L., Champiat, S., Han, S.R., Verlingue, L., Brandao, D., Lancia, A., Ammari, S., Hollebecque, A., Scoazes, J.Y., Marabelle, A., Massard, C., Soria, J.C., Robert, C., Paragios, N., Deutsch, E., Ferte, C., 2018. A radiomics approach to assess tumour infiltrating CD8 cells and response to antiPD-1 or anit-PD-L1 immunotherapy: an imaging biomarker, retrospective multicohort study. Lancet Oncol. https://doi.org/10.1016/S1470-2045(18)30413-3. Tadayyon, H., Gangeh, M., Sannach, L., Trudeau, M., Pritchard, K., Ghandi, S., Eisen, A., Look Hong, N., Holloway, C., Wright, F., Rakovitch, E., Vesprini, D., Tran, W.T., Curpen, B., Czarnota, G., 2019. A priori prediction of breast tumor response to chemotherapy using quantitative ultrasound imaging and artificial neural networks. Oncotarget 10 (39), 3910–3923. https://doi.org/10.18632/oncotarget.26996. Tan, P.N., Steinbach, M., Kumar, V., 2005. Introduction to Data Mining. Addison Wesley. Tatonetti, N.P., Ye, P.P., Daneshjou, R., Altman, R.B., 2012. Data driven prediction of drug effect and interactions. Sci. Transl. Med. 4, 125–132. U.S. Department of Health, 2014. https://health.gov/sites/default/files/2019-09/ADE-Action-Plan-508c.pdf. Yang, Z., Algesheimer, R., Tesson, C.J., 2016. A comparative analysis of community detection algorithms on artificial networks. Sci. Rep. 6, 30750. Yauney, G., Shah, P., 2018. Reinforcement learning with action derived rewards for chemotherapy and clinical trial dosing regimen selection. In: Proceedings of Machine Learning Research, p. 85. https://static1.squarespace.com/ static/59d5ac1780bd5ef9c396eda6/t/5b7373b44fa51a1e232dbadd/1534292925951/29.pdf. Zaknich, A., 2003. Neural networks for intelligent signal processing. In: Series in Innovative Intelligence. vol. 4. World Scientific, Singapore. Zou, J., Wang, E., 2018. eTummorRisk, an algorithm predicts cancer risk based on co-mutated gene networks in an individual’s germline genome. bioRxiv. https://doi.org/10.1101/393090.

C H A P T E R

6

Radiotherapist at work 6.1 Establishing a treatment plan Radiotherapy or radiation is another cancer treatment that uses high doses of radiation in order to shrink tumors and destroy cancer cells. Almost everyone has experimented low doses of radiation when having a chest, teeth, sinuses, or maybe broken bones X-rays. Radiotherapy works by using high doses of radiation that damage the DNA of the tumors or cancer cells. Having their DNA damaged so much that it cannot be repaired, the cancer cells stop dividing and eventually die. The cancer cells can’t be killed in just one dose of radiotherapy. The treatment must be repeated for days or weeks before the DNA is damaged beyond repair. Even after the radiotherapy treatment ends, the cancer cells keep on dying, the process taking weeks or even months before it is done. Radiotherapy uses high-energy waves or particles like X-rays, gamma rays, protons, and electron beams. We have seen in Chapter 5, that chemotherapy typically affects the whole body, whereas radiotherapy is usually a local treatment. Right now there are two kinds of radiation therapy: internal or external beam. The doctor chooses the correct type after taking into consideration factors like: the type of cancer, the tumor’s location in the body, the tumor’s size, whether normal sensitive tissues are near the tumor, patient’s general health and medical history, whether the patient receives other types of treatment or not, etc. External beam radiotherapy is done with the help of a large, noisy machine that moves around the patient’s body while sending radiation to a certain part of the body from multiple directions. External beam radiotherapy is local, and radiates only the part of the body where the tumor is located. For example if a patient has brain cancer, then she/he will receive radiation only to her/his head. Internal radiotherapy uses radioactive liquid or solid substances that are given in a vein or as orally intake. You might think that by entering the patient’s blood system or digestive system, the radiation travels throughout the body, but in fact the radioactive substance mostly reaches the area of the tumor, having little effect on the rest of the patient’s body. Internal radiation that uses liquid radioactive substances is called systemic radiation, and makes the patient’s body fluids (sweat, saliva, or urine) give off radiation for a while. The treatment

Artificial Intelligence in Cancer: Diagnostic to Tailored Treatment https://doi.org/10.1016/B978-0-12-820201-2.00006-4

221

# 2020 Elsevier Inc. All rights reserved.

222

6. Radiotherapist at work

is administered through an IV line or injection. Internal radiation that uses solid radioactive substances is called brachytherapy. Here, the patient swallows a capsule containing a radiation source. With this type of radiation, your body will give off radiation for a while. Radiotherapy can cure, prevent, stop or slow cancer. When the treatment is administered only to ease off cancer symptoms, it belongs to the palliative treatment type. For symptoms as breathing problems, loss of bladder or bowel control, the doctors use external beam, whereas for pain control system radiotherapy is used. A lot of cancers can be treated with radiotherapy. External beam radiation is used in a lot of cancers, whereas systemic radiation is used in thyroid cancer, advanced prostate cancer, or gastroenteropancreatic neuroendocrine tumor, and brachytherapy is used in breast, cervix, prostate, eye, or head and neck cancer. Just like chemotherapy, radiotherapy can be given before, during or after other cancer treatments have been administrated. For instance, if radiotherapy is given before cancer surgery, then the aim is to shrink the tumor size, so the surgeons will more easily remove it. If radiotherapy is given during surgery, then the aim is for the radiation to go straight to the tumor, without passing through the patient’s skin; this type of radiotherapy is also know as intraoperative radiation, and the normal sensitive tissues that are nearby the tumor can be protected. If radiotherapy is given after surgery, then the aim is to kill any cancer cells that might have remained in the body (for instance a brain tumor could not be removed entirely, due to the fact that the important portions of the brain might have been permanently affected). A patient cannot receive more than a certain amount of radiation during her/his lifetime. If a person receives that amount of radiation, she/he will not be able to receive radiotherapy in that area ever again. Another area of the body can be treated with radiotherapy if there is a large enough distance between the two areas. Radiotherapy has also side effects, through the fact that it can damage healthy cells, and thus cause other health problems or diseases, like leukemia. Radiotherapy has proved to be an effective treatment for cancer for over 100 years. Before 1895, there were few options for treating cancer, but things were about to change, when Wilhelm Conrad Roentgen discovered the X-rays (Roentgen, 1985). One year later, Emil Herman Grubbe treated a patient with breast cancer by using X-rays (Gruebbe, 1933). While Grubbe was treating cancer using the X-rays, Antoine Henri Becquerel started searching for natural sources of radiation. Marie Sklodowska-Curie and Pierre Curie discovered radium in 1898. Three years later the physiologic effects of radium are published (Becquerel and Curie, 1901). At first, skin cancers were the most treated type of cancer due to low penetration with radiation of the tissue. It was 1910, when deeper cancer started to get treated, due to the high-energy device that was developed by Coolidge (Lawrence and Livingston, 1932). At that time, researchers were having a hard time to find the trade-off between treating cancer and not killing the patient with radiation. It was 1920, when scientists came to the conclusion that it is safer for a patient to receive radiotherapy in multiple sessions, rather than in just one session (Coutard, 1934). The ionizing chamber was introduced in 1932, making possible the measurement of the radiation dose—the Roentgen unit (Thoraeus, 1932). Between 1930 and 1950, brachytherapy started to be used a lot in treating deep cancers. In that same period of time the super voltage X-ray tubes were developed. They delivered

6.2 Radiotherapy and Artificial Intelligence

223

energy from 50 to 200 kV. The super voltage X-ray tubes led to the development of the electron beam therapy (Courant, 2008). Besides the electron beam therapy, the Cobalt teletherapy that produces high-energy gamma rays was introduced, along with electron linacs that delivered megavoltage X-rays (Boone et al., 1977; Fry et al., 1948). In 1954 the first clinical use of the proton bean was reported. Still, it was only between 1970 and 1980 that the computer assisted accelerator for protons was used to treat different tumors (Hall, 1994; Ying, 2001). The end of the 1990s brought computers into radiotherapy. The development of the Stereotactic radiotherapy, a 3D conformal radiotherapeutic machine, provided more efficient and safer treatments (Mohan, 1995). Leaving history behind, it should be noted that over half of the patients that suffer from cancer undergo radiation therapy. The treatment is planned for each patient. Using the scans took from the area that needs to be treated; the radiotherapist decides the exact dosage of radiation is needed. The only question that remains is how can AI help in this department?

6.2 Radiotherapy and Artificial Intelligence Since in order to have the right radiotherapy dose established, the doctors analyze the patients’ scans, it is clear that this could be done assisted by our AI friend. We already know that when establishing a diagnosis the main question remains if there is a malignant tumor in the image. When it comes to tumor segmentation for radiotherapy purposes, things get quite difficult. The image must be analyzed voxel by voxel to discriminate cancer from health tissue, and also to determine the risk/benefit of where the radiation dose will target. This process can be done after years and years of training, and with a lot of intuition, and experience, since the impact of a wrong output is a death sentence (Wang et al., 2015). Experienced human resource lacks at a global level (Grover et al., 2015). The numbers look awful in what concerns Africa. On this continent radiation therapy resources are the fewest. And this is not the most frightening fact about radiotherapy in Africa. The most frightening is that 60% of the machines and human resources are in Egypt and South Africa, meaning that 29 countries are lacking any radiation therapy sources. Globally more than 90% of patients that come from countries with low income have no access to radiotherapy (Abdel-Wahab et al., 2017). Let us take for example lung cancer. 58% of cases occur in low- and middle-income countries (Ferlay et al., 2015), where the estimated need for human resources is 23,952 by 2020. In 2012 there were only 11,803 radiation oncologists (Datta et al., 2014). A solution to the human resource crisis would be the use of AI. In this respect, the researchers from the Cleveland Clinic developed a deep learning AI system that uses CT scans and electronic health records in order to tailor the right dose of radiation for each patient (Lou et al., 2019). The Deep Profiler reduces the probability of treatment failure to less than 5%, due to the fact that it can generate a personalized radiation dose plan. The AI system was trained on a data set that contained patients with primary or recurrent lung cancer, and also patients that had other cancer types with solitary or oligometastases to the lung. All the patients had CT scans. Deep Profiler is a multi-task neural network that also has radiomics embedded in the training. Using the CT scans of the lungs, it is able to

224

6. Radiotherapist at work

generate a fingerprint image that can predict the outcome and also estimate radiomic features. Radiomics allows the extraction of features that can describe more objectively the tumor, more than a human possibly can. Data characterization algorithms are used to convert the image data into a high dimensional feature space (Lambin et al., 2012, 2017). Some of the characteristics that have been extracted with the use of radiomics showed that they could describe distinct tumor features and also could be used in computing the prognostic (Aerts et al., 2014). Local tumor control and the avoidance of perioperative and long-term surgical morbidity can be achieved through the administration of high doses of radiation delivered via stereotactic body radiotherapy. Some successful cases regarding local tumor control in inoperable patients that suffered from lung cancer were found in several prospective clinical trials (Timmerman et al., 2010, 2014; Videtic et al., 2015). Even so, things are not so bright as they seem, due to the fact that recent studies pointed out very high rates of local failure in some patients (Baine et al., 2018; Horner Rieber et al., 2017; Woody et al., 2017). All the 944 patients included in the study had been treated with lung stereotactic body radiotherapy, and had electronic health records and CT scans. The treatment was chosen based on pathological or radiographical findings. CT chest scans were used for primary lung cancer staging, and PET, MRIs, or CTs scans of the brain were taken when clinically indicated. The Deep Profiler has three parts: an encoder that extracts the features of the images and builds the fingerprint, a decoder for the handcrafted radiomic features, and a network that generates image signatures for the outcome prediction. The encoder was a 3D convolutional neural network. The network’s performance was assessed using the fivefold cross-validation, having as testing data 20% of the data set. After the Deep Profiler generated the image signature, this signature was merged with clinical variables in order to approximate the iGray, and to predict failures. The 365 3D handcrafted radiomic features extracted from the tumor are: intensity (the statistical distribution of the voxel intensities), geometry (3D shapes features of the tumor), texture (spatial distribution of the voxel intensities, that is the intratumoral heterogeneity), and wavelet. The researchers used the fivefold cross-validation experiment, and feature selection. The feature selection was performed by computing the performances of all individual characteristics using the concordance index, afterwards selecting the best features from each four feature groups. The concordance index, or C-index, is a measurement that takes values between 0 and 1. It indicates how the model can order the event times, 1 indicating perfect concordance, and 0.5 indicating order by chance. The authors used the Wilcoxon signed rank test to compare the distribution of the C-index obtained by different models. The performance of the handcrafted features was assessed also through Ridge L2 regularization on regression coefficients. The complementary effect of the image score was assessed using biologically effective dose and histological subtypes. The biologically effective dose is a continuous variable, whereas the histological subtypes, adenocarcinoma, and squamous cell carcinoma are coded as categorical data. Fine and Gray regression modeling were used to examine the effect of factors on local failure, only if the death risk was assessed. The radiation dose was obtained by using a multivariable regression model with the score obtained from the Deep Profiler and the biologically effective dose. The statistical analysis contained a risk analysis assessment in order for the cumulative incidence of local failure to be estimated. Due to the fact that the death of a patient leads to the censoring of the primary outcome, the Kaplan-Meier method was not used for estimating the

6.2 Radiotherapy and Artificial Intelligence

225

therapy failure. Death is dependent to therapy failure, thus in the case where no evidence was found to explain the death of a patient, it was considered as a competing event. The authors computed the median score of the training set, then applied a threshold to stratify patients into low and high risk groups. Each patient was classified in one of the two classes. For each group the cumulative incidence curves were estimated together with the Gray’s test that determined the significance difference between them. The cumulative incidence of local failure at 3 years was 13.5%. From 849 patients of the internal study cohort, 55% were stratified into the high risk group, and 45% into the low risk group. The significance level obtained after performing Gray’s test was p < 0.0001. Radiotherapy proved to be more efficient in the patients that belonged to the low risk group, the 3-year cumulative incidence of local failure being 5.7% in the low risk class, whereas in the high risk class it was 20.3%. The AI system predicted treatment failures with a C-index of 0.72, a score that is significantly better than classical radiomics (p < 0.0001) or 3D volume (p < 0.0001). No other AI system similar to Deep Profiler can achieve what it can. This AI system can determine how much prior knowledge is needed to be able to predict whether the radiotherapy treatment will fail or not. Another study that also regards lung tumors is the one of Mak et al. (2019). The authors found that the crowd innovation contest could produce AI methods that could segment lung tumors for radiotherapy targeting. The contest lasted for 10 weeks and had three phases, was prized based, online, and the prizes totaled $55,000. The data set consisted in CT scans and lung tumor segmentations from 461 patients, with the median of 157 images per scan, totaling 77,942 images, from which 8133 images with tumor present. All the teams received a training set of 229 CT scans together with expert contours. A single expert, with 4 years of radiotherapy specialty and 7 years of subspecialty experience in lung cancer, segmented the tumors. The training data set contained 229 cases, the validation data set contained 96 cases, and the test data set contained 136 cases. The tumor volume ranged from 0.28 to 1103.74 cm3, with median of 16.40 cm3. 564 Contestants from 62 countries entered the competition, but only 34 submitted their methods. Traditionally, a radiation oncologist does the tumor segmentation manually. Two problems might appear during the segmentation process. The first problem is that when it is done manually it takes a lot of time. The second problem is that there exists a significant interobserver variation among experts. Clinical outcomes are directly influenced by the quality of the segmentation (Cui et al., 2015; Eaton et al., 2016; Peters et al., 2010; Van de Steene et al., 2002; Wuthrick et al., 2015). During a clinical trial, deviations of the original plan of radiation therapy occur in 71% of patients, implying treatment failure, and increased mortality (Ohri et al., 2013). The teams had access to the training set, which contained CT scans, expert lung tumor and organ segmentations, plus clinical data, and to the validation set, which contained CT scans without segmentations. After producing their segmentations, the contestants received realtime evaluation. Having this feedback they could modify their algorithm to obtain a better performance. The final AI algorithms were submitted at the end of each of the three phases. In the first phase the goal was to locate the tumor and match the expert’s segmentations. The review of the performances obtained in phase 1, showed that tumor localization was a limiting factor, thus phase 2 aimed at tumor targeting, not tumor diagnosis. An ensemble model combined the segmentation done by the top 5 AI algorithms from phase 2, entered phase 3. The aim of phase 3 was to resolve the problems that were found in phase 2. Phase 3 was a collaborative phase. The top contestants used convolutional neural networks,

226

6. Radiotherapist at work

clustering, and random tree forests. The convolutional neural network that was applied for detection and localization was Overfeat (Sermanet et al., 2014), and/or segmentation SegNet (Kendall et al., 2015; Badrinarayanan et al., 2015, 2017), and U-Net (Ronneberger et al., 2015). The collaboration of algorithms from phase 4 outputted the segmentation between 15 s per scan to 2 min per scan. A comparison between the contest algorithms, human experts, and commercially available AI software was performed. The reported results show that the phase 2 algorithms perform better than the commercial ones. The ensemble phase 3 algorithms achieved scores comparable to the interobserver mean between the expert used in the study and 5 other radiation oncologists. Still, the performance of the ensemble did not exceed the intraobserver benchmark. Competitions regarding AI in radiotherapy are always open. For example the Institute for Cancer Research was looking for a PhD candidate to monitor heterogeneous radiotherapy response in soft tissue sarcoma. Around 3300 new cases of soft tissue sarcoma are diagnosed each year, and only 50% of patients have a survival rate of 5 years or more. Soft tissue sarcoma is a rare cancer and it begins in tissues that support, surround, and connect other body structures. Basically it develops in the muscle, fat, blood, vessels, nerves tendons, and joint lining. Surgery is the first line of treatment course, but radiation and chemotherapy might be recommended depending on the tumor’s location, size, type, and aggressiveness. A soft tissue tumor is considered to be heterogeneous if it consists of different components such as fat, cystic, cellular, and also hemorrhagic tissues. Analyzing conventional imaging to determine the treatment response of patients undergoing external beam radiotherapy is difficult. The tumors sometimes do not change in size, and sometimes they appear like they grew more. So if you are PhD in search of a job similar to this one, do not hesitate to browse the following site: https://findaphd.com (Accessed November 29, 2019). Quantum annealing, a meta-heuristic similar to simulated annealing, can be applied in radiotherapy also. Such an algorithm has been applied in optimizing the intensity modulated radiotherapy (Nazareth and Spaans, 2015). Intensity modulated radiotherapy permits the computation of the radiation dose to the target volume. Local control and the reduction of long term morbidity are its main advantages (Tailor and Powell, 2004). Nazareth and Spaans applied quantum annealing to two prostate cancers. The objective function was defined taking into account clinical constraints, and dose volume objectives. The quantum annealing’s performance was compared with the ones obtained by simulated annealing and Tabu search run on a clustering algorithm. Even if the speed of the quantum annealing was 3–4 times faster than the other two, it could not surpass the simulated annealing method in terms of performance. An AI system was able to predict two side effects that occur in patients that undergo radiation therapy for head and neck cancer. Around 53,000 patients are diagnosed with head and neck cancers in the United States per year. If the cancer is detected early, the treatment consists of surgery and radiation therapy. Discovered in a latter stage, the cancer is treated with radiotherapy and chemotherapy. Radiation is effective, but it produces side effects such as mouth bacteria, sore throat, loss of taste, etc. These side effects lead to significant weight loss and ultimately the need for the patient to have a feeding tube placement. This method might be able to prevent weight loss and also to decrease the necessity for feeding tube placement. The study was presented at the 61st Annual Meeting of the American Society for

6.2 Radiotherapy and Artificial Intelligence

227

Radiation Oncology that took place in Chicago September 15–18, 2019—https://www. astro.org/News-and-Publications/News-and-Media-Center/Press-Kits/2019/2019-ASTROAnnual-Meeting-Press-Kit. The novel AI system reviewed a large data set that contained data from electronic health records, a web-based charting tool, and the record system. The patients were 75% men, and 25% women, with a median age of 62 years. They received over 2000 courses of radiation therapy. The data set contained over 700 clinical and treatment variables. The authors tried to predict three side effects: weight loss, feeding tube placement, and unplanned hospitalizations. All the models that surpassed AUC 0.70, were considered valid. The two most encountered side effects were predicted with AUC 0.751 for the significant weight loss, and AUC 0.755 for the feeding tube placement, whereas the AUC for the unplanned hospitalization was 0.64. At the University of Toronto, a new AI system has been developed using knowledge based automated planning pipeline that predicts 3D doses using deep neural networks (Babier et al., 2019). The pipeline consisted of a generative adversarial network that predicted the dose from the CT scan, process, which was followed by two optimization models that were used to learn objective function weights and to generate fluence based plans. Three generative adversarial networks were trained in 130 oropharyngeal treatment plans. The testing was done on 87 plans. The comparison between the plan generated by the pipeline versus the clinical plans, showed that the pipeline satisfied almost 77% of all clinical criteria, whereas the clinical plans satisfied 64% of all clinical criteria. In the last year there has been an increasing interest in AI applied in radiotherapy. Besides the growing academic interest, commercial AI boosted also. For instance, we have Google’s DeepMind—https://deepmind.com/blog/announcements/applying-machinelearning-radiotherapy-planning-head-neck-cancer (Accessed November 30, 2019)—a Google partnership with the Radiotherapy Department at University College London Hospitals NHS Foundation Trust. Deep Mind focuses on applying AI to radiotherapy planning for head and neck cancer. Their aim is to reduce the side effects of radiotherapy, which indeed improved the survival rates. The disadvantage of this therapy method is that it is applied in an area where there are lots of delicate structured and may cause nerves and organs damage. Cancer that is located in the sinuses is hard to be treated. The aim of AI in this case is to reduce the treatment planning. Another commercial AI is Siris Medical’s QuickMatch (Valdes et al., 2017) that developed a commercial software that could export, anonymize, and process data for the investigation of an AI system that allows doctors to identify which treatment previously approved would bring the most benefit to a new patient. The data that the AI system has been trained on was collected from 2009 to 2015 from the University of Pennsylvania Health System. The data set contained 104 consecutive early stage cancer patients who received as treatment stereotactic body radiation therapy. Untimely, the AI system’s goal in this case is to classify treatment plans, by matching a new patient with treated patients that had similar features. The feature set that was used to predict the dose for radiation were: anatomical data (e.g. volume, distance, geometric relationship, surrounding structures and their importance), medical record (e.g. ICD-9/10 code, gender, ethnicity), treatment intent (fractionation schedule, treatment margin, beams number, and also the clinicians that designed the treatment plan), and radiation transport (penumbra, aperture, beam energy, incident angle, radiation type, depth

228

6. Radiotherapist at work

of structure, bolus existence). The treatment plan classification was done using the learning curve analysis (Friedman et al., 2001). Mirada’s DCLExpert (Aljabar and Gooding, 2001)—https://mirada-medical.com/ radiationoncology/dlc-expert (Accessed November 30, 2019)—developed a deep learning contour expert for radiotherapy. At the annual meeting of the American Society for Radiation Oncology (ASTRO), which was held in 2018 in San Francisco, Mirada Medical showcased DLCExpert. The AI software got the FDA’s clearance, and it is the first AI autocontouring solution in the world. The software brings quality and consistency in radiation treatment planning. Varian—https://varian.com (Accessed November 30, 2019)—started to invest in Oncora Medical’s Precision Radiation Oncology Platform—https://oncoramedical.com (Accessed November 30, 2019)—in order to accelerate the development of precision medicine in the radiotherapy field. Oncora’s AI system can encode knowledge regarding cancer patients, their treatments, and their clinical outcome, in computational models that can predict with accuracy the response of a certain new patient to the oncological treatment. The AI software is used at 13 sites of care in the United States. Oncora partnered with MD Anderson Cancer Center oncologists in order to develop the software. Phase 1 of the study focused on the data collected from 2000 regarding breast cancer patients. The Precision Radiation Oncology Platform analyzed data from MD Anderson’s electronic medical record system, tumor registry, radiation therapy planning system, and also Brocade, a clinical documentation tool for medical records, which has been developed by Dr. Benjamin Smith, an Associate Professor of Radiation Oncology at MD Anderson. The task is to create interoperability between the two software systems and see what the potential value of the merged product would be. To sum things up, the clinical decision support offered by AI in Radiation Oncology has at least three main applications: • pre-planning prediction of dose measurements tradeoffs between the efficiency of the treatment and the side effects (Cheng et al., 2016; Langendijk et al., 2013; Hall et al., 2017); • integrating dosimetric data with orthogonal data (e.g. imaging, electronic medical records, genomics, etc.) (Bradley et al., 2004; Hope et al., 2006; Naqa et al., 2006; Naqa et al., 2010; Oberije et al., 2014; Valdes et al., 2017; Oncora Medical); • and radiomics (Huynh et al., 2016). It is obvious that developing AI solutions for radiation therapy will increase the productivity, thus allowing doctors to spend more time with the patients. Besides this, AI can overcome the workforce worldwide crisis in the oncological field. The largest charity investment in radiotherapy research is £56 million. Cancer Research UK leads the research network, and it will explore how AI can improve the radiation treatment therapies. Seven centers of excellence are united through this research network: University of Cambridge, Glasgow, Leeds, Manchester, and Oxford, the Cancer Research UK City of London Centre and The Institute of Cancer Research, London, as well as the Royal Marsden NHS Foundation Trust. The network will study new radiotherapy plus drug combinations, and personalized treatment through radiation. Besides this, the data scientists will also pay attention to reducing long-term side effects of the treatment. Not everybody is thrilled about the role of AI in radiotherapy. Guidelines regarding errors that arise from AI usage are already published (Royal College of Radiologists,

6.2 Radiotherapy and Artificial Intelligence

229

2019; Jaremko et al., 2019). Due to the fact that AI needs big data in order to learn how to detect, diagnose, and predict, this leads to the potential threat of patient privacy and confidentiality breaches (Winfield and Jirotka, 2018). Jaremko et al. summarize key ethical and legal issues regarding AI use in radiology, focusing on imaging data, and electronic medical records. In (Bridge and Bridge, 2019) the authors state that using AI in radiotherapy has the potential to speed the workflow, and make the workforce more efficient, but also point out the danger that these successes could be achieved at the cost of creativity, innovation, and safety. Their major concern is that AI lacks empathy and intuition. Their theory is sustained by Karches (2018). Another concern regarding AI in cancer was published in the New England Journal of Medicine (Char et al., 2018). Char et al. cite Casalino et al. (2016) and state that researchers are predicting that in the future being familiar with machine learning tools will become a requirement for the next generation of doctors. Their point is that AI raises ethical concerns, and that the algorithms that have been used in other fields have been proven to make sort of problematic decisions. AI algorithms have become biased, due to the cases that they learn from the data sets. Such an example is the well-known situation when judges used an AI system to determine which sentence to give based on the outcome of an AI system that is supposed to predict the offender’s risk of recidivism. The AI has predicted an alarming tendency to be racist, taken into account the data set that it had been trained on (Penso, 2017). We will politely state that this is a false alarm; an AI system cannot be labeled as being racist, just because it constructed some patterns based on the data set it had been trained on. The authors also state that there exists a possibility for an AI algorithm to be racist even when it comes to health care. They state that health care delivery is variable to race, and that the AI might become biased if there are a few or none genetic studies in certain populations. In order to prove their hypothesis, they present the case where the use of the Framingham Heart Study for predicting cardiovascular events risk in nonwhite population have led to biased results, having the risk over and under estimated (Meyer et al., 2012). Let us take a minute and think things through. Is AI really racist? For instance, to compute the expected weight of a fetus, you can use multiple linear regression and compute the weight using the head circumference, thoracic circumference, and the femur’s length. If the regression coefficient and intercept were computed on nonwhite data set, and it wrongly estimates the weight of a Caucasian baby, it means that it is racist? Or simply that we should have used another data set? We agree that ethical guidelines need to be created. We strongly believe that physicians must learn to understand AI algorithms, otherwise we would not have written this book, right? The only logical solution is for the doctors to partner up with computer data scientists and statisticians and truly start learning. Learn about AI algorithm’s construction, learn about the data sets that the AI is using, and learn about its limitations. Paranoia rises from lack of knowledge. If we continue to believe that neural networks are black boxes, then all of this will lead to disaster. Once and for all we need to understand that AI is not magic, far from it actually, it is simply math. AI is not snake oil. It is math. One reasonable question arises from the skeptical people: if things go wrong, who will take the blame? Or it will be decided that is was negligent behavior? Who is responsible: the medical staff that approved the radiotherapy treatment, the computer scientists that developed the algorithms, or the statistician that assessed and validated the model? All these scientists do have a point, but it is our strong belief that the hypothesis that the iDoctor will replace the

230

6. Radiotherapist at work

human doctor is wrong. We again state that AI is our friend, not our foe, and the only solution is to stop being ignorant and learn math. With this thought in our mind, we shall proceed further in our book. Chapter 7 brings us new theoretical concepts that regard survival analysis. Enjoy the ride!

References Abdel-Wahab, M., Zubizarreta, E., Polo, A., Meghzifene, A., 2017. Improving quality and access to radiation therapy an IAEA perspective. Semin. Radiat. Oncol. 27 (2), 109–117. Aerts, H.J., Velazquez, E.R., Leijenaar, R.T., Parmar, C., Grossmann, P., Carvalho, S., Bussink, J., Monshouwer, R., Haibe Kains, B., Rietveld, D., Hoebers, F., Reitbergen, M.M., Leemans, C.R., Dekker, A., Quackenbush, J., Gillies, R.J., Lambin, P., 2014. Decoding tumor phenotype by noninvasive imaging using a quantitative radiomics approach. Nat. Commun. 5, 4006. https://doi.org/10.1038/ncomms5006. Aljabar, P., Gooding, M.J., 2001. The cutting edge: delineating contours with deep learning. Mach. Learn. https:// pdfs.semanticscholar.org/53df/c19e71c8062ebe5101c975f139fbea654246.pdf. Babier, A., Mahmood, R., McNiven, A.L., Diamant, A., Chan, T.C.Y., 2019. Knowledge based automated with three dimensional generative adversarial networks. Med. Phys. arxiv.org/abs/1812.09309. Badrinarayanan, V., Vanda, A., Cipolla, R., Cipolla, R., 2015. SegNet: A Deep Convolutional Encoder Decoder Architecture for Robust Semantic Pixel Wise Labelling. https://arxiv.org/abs/1505.07293. Badrinarayanan, V., Kendall, A., Cipolla, R., 2017. SegNet: a deep convolutional encoder decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39 (12), 2481–2495. Baine, M.J., Verma, V., Schonewolf, C.A., Lin, C., Simone, C.B., 2018. Histology significantly affects recurrence and survival following SBRT for early non stage non small cell lung cancer. Lung Cancer 118, 20–26. Becquerel, A.H., Curie, P., 1901. Action physiologique des rayons de radium. C. R. Acad. Sci. 132, 1289–1291. Boone, M.L.M., Lawrence, J.H., Connor, W.G., Morgado, R., Hicks, J.A., Brown, R.C., 1977. Introduction to the use of protons and heavy ions in radiation therapy: historical perspective. Int. J. Radiat. Oncol. Biol. Phys. 3, 65–69. Bradley, J., Deasy, J.O., Bentzen, S., El-Naga, I., 2004. Dosimetric correlates for acute esophagitis in patients treated with radiotherapy for lung carcinoma. Int. J. Radiat. Oncol. Biol. Phys. 58 (4), 1106–1113. Bridge, P., Bridge, R., 2019. Artificial intelligence in radiotherapy: a philosophical perspective. J. Med. Imaging Radiat. Sci. https://doi.org/10.1016/j.jmir.2019.09.003. Casalino, L.P., Gans, D., Weber, R., Cea, M., Tuchovsky, A., Bishop, T.F., Miranda, Y., Frankel, B.A., Ziehler, K.B., Wong, M.M., Evenson, T.B., 2016. US physician practices spend more than $15.4 billion annually to report quality measures. Health Aff. (Millwood) 35 (3), 401–406. Char, D.S., Shah, N.H., Magnus, D., 2018. Implementing machine learning in health care—addressing ethical challenges. N. Engl. J. Med. 378, 981–983. Cheng, Q., Roelofs, E., Ramaekers, B.L., Eekers, D., van Soest, J., Lustberg, T., Hendriks, T., Heobers, F., van der Laan, H.P., Korevaar, E.W., Dekker, A., Langendijk, J.A., Lambin, P., 2016. Development and evaluation of an online three level proton vs photon decision support prototype for head and neck cancer—comparison of dose, toxicity and cost effectiveness. Radiother. Oncol. 118 (2), 281–285. Courant, E.D., 2008. Early milestones in the evolution of accelerators. In: Chao, A.W. (Ed.), Reviews of Accelerator. Science and Technology. In: vol. 1. World Scientific, Singapore, pp. 1–5. https://doi.org/10.1142/ s179362808000022. Coutard, H., 1934. Principles of X-ray therapy for malignant disease. Lancet 2, 1–12. https://doi.org/10.1016/201406736(00)90085-0. Cui, Y., Chen, W., Kong, F.M., Olsen, L.A., Beatty, R.E., Maxim, P.G., Ritter, T., Sohn, J.W., Higgins, J., Galvin, J., Xiao, Y., 2015. Contouring variations and the role of atlas in non small cell lung cancer radiotherapy: analysis of a multi-institutional pre-clinical trial planning study. Pract. Radiat. Oncol. 5 (2), 67–75. Datta, N.R., Samiei, M., Bodis, S., 2014. Radiation therapy infrastructure and human resources in low- and middleincome countries: present status and projections for 2020. Int. J. Radiat. Oncol. Biol. Phys. 89 (3), 448–457. Eaton, B.R., Pugh, S.L., Bradley, J.D., Masters, G., Kavadi, V.S., Narayan, S., Nedzi, L., Robinson, C., Wynn, R.B., Koprowski, C., Johnson, D.W., Meng, J., Curran Jr., W.J., 2016. Institutional enrollment and survival among NSCLC patients receiving chemoradiation: NRG oncology radiation therapy oncology group (RTOG) 0617. J. Natl. Cancer Inst. 108 (9). https://doi.org/10.1093/jnci/djw034.

References

231

Ferlay, J., Soerjomataram, I., Dikshit, R., Eser, S., Mathers, C., Rebelo, M., Parkin, D.M., Forman, D., Bray, F., 2015. Cancer incidence and mortality worldwide: sources, methods, and major patterns in GLOBOCAN 2012. Int. J. Cancer 136 (5), E359–E386. Friedman, J., Hastie, T., Tibshirani, R., 2001. The Elements of Statistical Learning. Springer Series in Statistics, Springer, Berlin. Fry, D.W., Harvie, R.B., Mullett, L.B., Walkinshaw, W., 1948. A travelling wave linear accelerator for 4 MeV electrons. Nature 162, 859–861. Grover, S., Xu, M.J., Yeager, A., Rosman, L., Groen, R.S., Chackungal, S., Rodin, D., Mangaali, M., Nurkic, S., Fernandes, A., Lin, L.L., Thomas, G., Tergas, A.I., 2015. A systematic review of radiotherapy capacity in lowand middle-income countries. Front. Oncol. 4, 380. https://doi.org/10.3389/fonc.2014.00380. Gruebbe, E.H., 1933. Priority in the therapeutic use of X-rays. Radiology 21, 156–162. https://doi.org/ 10.1148/21.2.156. Hall, E.J., 1994. The physics and chemistry of radiation absorption. In: Radiobiology for the Radiologists, fourth ed. JB Lippincott, Philadelphia, pp. 8–10. Hall, D.C., Trofimov, A.V., Winey, B.A., Liebsch, N.J., Paganetti, H., 2017. Predicting patient specific dosimetric benefits of proton therapy for skull base tumors using geometric knowledge based method. Int. J. Radiat. Oncol. Biol. Phys. 97 (5), 1087–1094. Hope, A.J., Lindsay, P.E., El Naqa, I., Alaly, J.R., Vicic, M., Bradley, J.D., Deasy, J.O., 2006. Modeling radiation pneumonitis risk with clinical dosimetric, and spatial parameters. Int. J. Radiat. Oncol. Biol. Phys. 65 (1), 112–124. Horner Rieber, J., Bernhardt, D., Dern, J., Konig, L., Adeberg, S., Paul, A., Heussel, C.P., Kappes, J., Hoffmann, H., Herth, F.J.P., Debus, J., Warth, A., Rieken, S., 2017. Histology of non-small cell lung cancer predicts the response to stereotactic body radiotherapy. Radiother. Oncol. 125 (2), 317–324. Huynh, E., Coroller, T.P., Narayan, V., Agrawal, V., Hou, Y., Romano, J., Franco, I., Mak, R.H., Aerts, H.J., 2016. CT based radiomic analysis of stereotactic body radiation therapy patients with lung cancer. Radiother. Oncol. 120 (2), 258–266. Jaremko, J.L., Azar, M., Bromwich, R., Lum, A., Cheong, K.H.A., Gibert, M., Laviolette, F., Gray, B., Reinhold, C., Cicero, M., Chong, J., Shaw, J., Rybicki, F., Hurell, C., Lee, E., Tang, A., 2019. Canadian Association of Radiologists white paper on ethical and legal issues related to artificial intelligence. Can. Assoc. Radiol. J. 70 (2), 107–118. https://doi.org/10.1016/j.carj.2019.03.001. Karches, K.E., 2018. Against the iDoctor: why artificial intelligence should not replace physician. Theor. Med. Bioeth. 39, 91–110. Kendall, A., Badrinarayanan, V., Cipolla, R., 2015. SegNet: Model Uncertainty in Deep Convolutional Encoder Decoder Architectures for Scene Understanding. https://arxiv.org/abs/1511.02680. Lambin, P., Rios Velazquez, E., Leijenaar, R., Carvalho, S., van Stiphout, R.G., Granton, P., Zegers, C.M., Gillies, R., Boellard, R., Dekker, A., Aerts, H.J., 2012. Radiomics: extracting more information from medical images using advanced feature analysis. Eur. J. Cancer 48 (4), 441–446. Lambin, P., Leijenaar, R.T.H., Deist, T.M., Peerlings, J., de Jong, E.E.C., van Timmeren, J., Sanduleanu, S., Larue, R.T.H.M., Even, A.J.G., Jochems, A., van Wijk, T., Woodruff, H., van Soest, J., Lustberg, T., Roelofs, E., van Elmpt, W., Dekker, A., Mottaghy, F.M., Wildberger, J.E., Walsh, S., 2017. Radiomics: the bridge between medical imaging and personalized medicine. Nat. Rev. Clin. Oncol. 14 (12), 749–762. Langendijk, J.A., Lambin, P., De Ruysscher, D., Widder, J., Bos, M., Verheij, M., 2013. Selection of patients for radiotherapy with protons aiming at reduction of side effects: the model based approach. Radiother. Oncol. 107 (3), 267–273. Lawrence, E.O., Livingston, M.S., 1932. The production of high speed light ions without the use of high voltages. Phys. Rev. 40, 19–35. https://doi.org/10.1103/PhysRev.40.19. Lou, B., Doken, S., Zhuang, T., Wintergerter, D., Gidwani, M., Mistry, N., Ladic, L., Kamen, A., Abazeed, M.E., 2019. An image based deep learning framework for individualizing radiotherapy dose: a retrospective analysis of outcome prediction. Lancet Digit. Health 1 (3), 136–147. Mak, R.H., Endres, M.G., Paik, J.H., Sergeev, R.A., Aerts, H., Williams, C.L., Lakhani, K.R., Guinan, E.C., 2019. Use of crowd innovation to develop an artificial intelligence based solution for radiation therapy targeting. JAMA Oncol. 5 (5), 654–661. https://doi.org/10.1001/jamaoncol.2019.0159. Meyer, G.S., Nelson, E.C., Pryor, D.B., James, B., Swensen, S.J., Kaplan, G.S., Weissberg, J.I., Bisognano, M., Yates, G.R., Hunt, G.C., 2012. More quality measures versus measuring what matters: a call for balance and parsimony. BMJ Qual. Saf. 21 (11), 964–968.

232

6. Radiotherapist at work

Mohan, R., 1995. Field shaping for three dimensional conformal radiation therapy and multileaf collimation. Semin. Radiat. Oncol. 5, 86–99. Naqa, I.E., Bradley, J., Blanco, A.I., Lindsay, P.E., Vicic, M., Hope, A., Deasy, J.O., 2006. Multivariable modeling of radiotherapy outcomes, including dose volume and clinical factors. Int. J. Radiat. Oncol. Biol. Phys. 64 (4), 1275–1286. Naqa, I.E., Deasy, J.O., Mu, Y., Huang, E., Hope, A.J., Lindsay, P.E., Apte, A., Alalt, J., Bradley, J.D., 2010. Data mining approaches for modeling tumor control probability. Acta Oncol. 49 (8), 1363–1373. Nazareth, D.P., Spaans, J.D., 2015. First application of quantum annealing to IMRT beamlet intensity optimization. Phys. Med. Biol. 60 (10), 4137–4148. Oberije, C., Nalbantov, G., Dekker, A., Boersma, L., Borger, J., Reymen, B., van Baardwijk, A., Wanders, R., De Ruysscher, D., Steyerberg, E., Dingemans, A.M., Lambin, P., 2014. A prospective study comparing the predictions of doctors versus models for treatment outcome of lung cancer patients: a step toward individualized care and shared decision making. Radiother. Oncol. 112 (1), 37–42. Ohri, N., Shen, X., Dicker, A.P., Doyle, L.A., Harrison, A.S., Showalter, T.N., 2013. Radiotherapy protocol deviations and clinical outcomes: a meta-analysis of cooperative group clinical trials. J. Natl. Cancer Inst. 105 (6), 387–393. https://doi.org/10.1093/jnici/djt001. Penso, J., 2017. A health care paradox: measuring and reporting quality has become a barrier to improving it. STAT News. https://www.statenews.com/2017/12/13/health-care-quality/. Peters, L.J., O’Sullivan, B., Giralt, J., Fitzgerald, T.J., Trotti, A., Bernier, J., Bourhis, J., Yuen, K., Fisher, R., Rischin, D., 2010. Critical impact of radiotherapy protocol compliance and quality in the treatment of advanced head and neck cancer: results from TROG 02.02. J. Clin. Oncol. 28 (18), 2996–3001. Roentgen, W.C., 1985. Uber eine neue Art von Strahlen. Vorl aufige Mitteilung. 30, Sitzung: Sitzungsberichte der physikalisch medicinishen Gesellschaft zu Wurzburg. pp. 132–141. Ronneberger, O., Fischer, P., Brox, T., 2015. U-net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (Eds.), 18th International Conference, Munich, Germany, October 5–9, 2015, Proceedings, Part III. In: Medical Image Computing and Computer Assisted Intervention MICCAI 2015, Springer International Publishing. Royal College of Radiologists (RCR), 2019. RCR Position Statement of Artificial Intelligence. https://www.rcr.ac.uk/ posts/rcr-position-statement-artificial-intelligence. (Accessed 28 November 2019). Sermanet, P., Eigen, D., Zhang, X., Matthieu, M., Fergus, R., LeCun, Y., 2014. OverFeat: Integrated Recognition, Localization and Detection Using Convolutional Networks. https://arxiv.org/abs/1312.6229. Tailor, A., Powell, M.E.B., 2004. Intensity modulated radiotherapy—what is it? Cancer Imaging 4 (2), 68–73. Thoraeus, R.A., 1932. A study of ionization methods for measuring the intensity and absorption of roentgen rays and of the efficiency of different filters used in therapy. Acta Radiol. 15, 1–86. Timmerman, R.D., Paulus, R., Galvin, J., Michalski, J., Straube, W., Bradley, J., Fakiris, A., Bezjak, A., Videtic, G., Johnstone, D., Fowler, J., Gore, E., Choy, H., 2010. Stereotactic body radiation therapy for inoperable early stage lung cancer. JAMA 303 (11), 1070–1076. Timmerman, R.D., Hu, C., Michalski, J., Straube, W., Galvin, J., Johnstone, D., Bradley, J., Barringer, R., Bezjak, A., Videtic, G.M., Nedzi, L., Werner Wasik, M., Chen, Y., Komaki, R.U., Choy, H., 2014. Long term results of RTOG 0236: a phase II trial of stereotactic body (SBRT) in the treatment of patients with medically inoperable stage I non small cell lung cancer. Int. J. Radiat. Oncol. Biol. Phys. 90 (1), s30. Valdes, G., Simone II, C.B., Chen, J., Lin, A., Yom, S.S., Pattison, A.J., Carpenter, C.M., Solberg, T.D., 2017. Clinical decision support of radiotherapy treatment planning: a data driven machine learning strategy for patient specific dosimetric decision making. Radiother. Oncol. 125 (3), 392–397. van de Steene, J., Linthout, N., de Mey, J., Vinh Hung, V., Claassens, C., Noppen, M., Bel, A., Storme, G., 2002. Definition of gross tumor volume in lung cancer: inter observer variability. Radiother. Oncol. 62 (1), 37–49. Videtic, G.M., Hu, C., Singh, A.K., Chang, J.Y., Parker, W., Olivier, K.R., Schild, S.E., Komak, R., Urbanic, J.J., Timmerman, R.D., Choy, H., 2015. A randomized phase 2 study comparing 2 stereotactic body radiation therapy schedules for medically inoperable patients with stage 1 peripheral non small cell lung cancer: NRG Oncology RTOG 0915 (NCCTG N0927). Int. J. Radiat. Oncol. Biol. Phys. 93 (4), 757–764. Wang, E.H., Rutter, C.E., Corso, C.D., Decker, R.H., Wilson, L.D., Kim, A.W., Yu, J.B., Park, H.S., 2015. Patients selected for definitive concurrent chemoradiation at high volume facilities achieve improved survival in stage III non small cell lung cancer. J. Thorac. Oncol. 10 (6), 937–943.

References

233

Winfield, A.F.T., Jirotka, M., 2018. Ethical governance is essential to building trust in robotics and artificial intelligence systems. Philos. Transact. A: Math. Phys. Eng. Sci. 376. Woody, N.M., Stephans, K.L., Andrews, M., Zhuang, T., Gopal, P., Xia, P., Farver, C.F., Raymond, D.P., Peacock, C.D., Cicenia, J., Reddy, C.A., Videtic, G.C., Abazeed, M.E., 2017. A histologic basis for the efficacy of SBRT to the lung. J. Thorac. Oncol. 12 (3), 510–519. Wuthrick, E.J., Zhang, Q., Machtay, M., Rosenthal, D.I., Nguyen Tan, P.F., Fortin, A., Silverman, C.L., Raben, A., Kim, H.E., Horwitz, E.M., Read, N.E., Harris, J., Wu, Q., Le, Q.T., Gillison, M.L., 2015. Institutional clinical trial accrual volume and survival of patients with head and neck cancer. J. Clin. Oncol. 33 (2), 156–164. Ying, C.H., 2001. Update of radiotherapy for skin cancer. H. K. Dermatol. Venereol. Bull. 9 (2), 52–59.

C H A P T E R

7

Survival analysis Throughout the first chapters we have discussed different statistical analysis and AI concepts that regarded processing different types of data for cancer diagnosis, detection, and segmentation, etc. It is high time to browse another crucial statistical analysis aspect known as survival analysis. Survival analysis regards the time that passes until a certain event happens. For instance, we can consider the time that has elapsed from a surgical procedure, or from the start of chemotherapy or radiotherapy treatment, to the time of death of the patient, in other words, how long did the patient survive. When dealing with survival times, we encounter two practical issues: • in some cases the start time is difficult to be specified. For instance, if we would want to use as starting time the onset of the disease, it would be impossible to correctly establish it. • the end time is hard to be determined also. If we recorded as the time of death, that would not cause any difficulty, but what about the situation when a person simply decides that she/he wants to leave the study, or it survives more than the time we established to record (e.g. we want to see how many patients survive glioblastoma for 2 years, and a patient surpasses that time period). This type of survival time is called a censored survival time. Henceforth, in survival analysis we deal with two types of records: time of death and censored survival time. Fig. 7.1 presents a hypothetical situation of how the survival times can be recorded. Fig. 7.1 can be “translated” in a table, such as Table 7.1. Let us explain a little bit Fig. 7.1. We see that there is a dotted line, right at the value 3. The dotted line tells us that the patients have been recruited for the first 3 months of the study, and observed for another 6 months. Patient 1 is marked on the figure as the first line counting from bottom up; she/he was recruited from the beginning of the study, that is month 0, and stayed in the study all the observation period, month 9, at the end of which she/he passed away. The second patient enrolled in the second month, that is month 2, but died at the end of month 3, exactly when the observation process would have begun; she/he survived exactly 2 months. Patient 3 enrolled in the very beginning of the study, and survived the observation period, thus there is no record of what happened to her/him after the study was over. Patient number 4 enrolled in the second month of the study, month 1, but left the study after only 1 month; there is no record of her/his current state, thus another censored survival time.

Artificial Intelligence in Cancer: Diagnostic to Tailored Treatment https://doi.org/10.1016/B978-0-12-820201-2.00007-6

235

# 2020 Elsevier Inc. All rights reserved.

236

7. Survival analysis

FIG. 7.1 Survival times recordings.

TABLE 7.1 Tabulated survival times recordings. Patient

Start time

Time of death/censored

Death/censored

Survival time

1

0

9

D

9

2

1

3

D

2

3

0

9

C

9*

4

1

2

C

1*

5

2

7

D

5

6

2

9

C

9*

Patient 5 enrolled in the last month of the recruitment time, month 2, and died in month 7, thus she/he survived only 5 months. The last patient, patient 6, enrolled also in the last month of the recruitment period, and survived the observation period; she/he also represents another censored survival time, due to the fact that there is no record of her/his current state. The last column of the table has the censored times marked with a (*). When using survival analysis in cancer research, our aim is to review clinical trial outcomes, cohort studies, etc. For example, let us consider the following case: we have a cohort of 40 women who have been diagnosed with breast cancer between 2009 and 2017, and were observed until the end of 2019 in order to review their survival rate. We shall define the time period elapsed from the start time till the time of death as the survival time. Technically, a survival analysis has a starting period, in which all the patients are recruited, and a observation period or follow-up, in which the patients are observed, and a final period in which all the data collected is analyzed, and the conclusions are drawn.

7. Survival analysis

237

Returning to our fictional example, let us presume that during the observation period we have lost 6 women, thus they have been excluded from the statistical analysis process. Henceforth, only 34 women have completed the 2-year follow-up period, from which 10 have passed away, leaving only 24 women alive. This result can be represented under a tree structure diagram, such as the one in Fig. 7.2.

FIG. 7.2 Tree structure diagram.

The death rate or risk for the 2-year follow-up can be computed using the formula: death rate ¼

number of deaths : number of subjects

In our case, if we apply the formula, we shall have the death rate equal to 10 34 ¼ 0:294, that is 29.4%. If we have censored times in a cohort of size N, with D the number of deaths, and L the number of lost patients during the follow-up, then we can estimate the death probability using the following equation: Death probability ¼

D : N  0:5  L

Thus, in our fictional example, we have the following data: • 10 women died during the follow-up period, making D ¼ 10; • 24 women survived the follow-up period; • 6 women were lost during the follow-up, making L ¼ 6. Let us see how the tree structure diagram will look in this case (Fig. 7.3):

FIG. 7.3 Tree structure diagram 2. 10 The death probability will be 400:56 ¼ 10 37 ¼ 0:27 that is 27%. The follow-up period can be divided into several shorter periods. Thus, we can obtain a more accurate observation of the survival rate. Fig. 7.4 depicts a fictional tree structure diagram of such a division of the follow-up period.

238

7. Survival analysis

FIG. 7.4 Divided follow-up tree structure.

Knowing the death risk, the survival probability for a certain time period is given by: 1  death probability for that interval. The cumulative survival probability can be plotted, obtaining the survival curve. The curve starts with the value 1, meaning all the patients are alive at the beginning of the study, and starts to approach the value of 0, as patients start to die. Let us return to our example: • in the first year of the follow-up we have no censored data, making thus the death risk 7 40 ¼ 17:5%, thus the survival probability will be equal to 82.5%. • in the second year of the follow-up we have six censored data, and the cohort size is 3 estimated at 33  (0.5  6) ¼ 30, then the death risk is 30 ¼ 10%, thus the survival probability will be 90%. • in the third year of the follow-up, we have three censored data, and the cohort size is 4 estimated at 24  (0.5  3) ¼ 22.5, thus the death risk is 22:5 ¼ 17:8%, making the survival rate 82.22%. We can build a tree structure diagram based on the death and survival probabilities for each year (Fig. 7.5):

FIG. 7.5 Tree structure of the death and survival probabilities.

239

7. Survival analysis

The overall death probability can be computed using the death rates per year, as it follows: Overall death probability year ðiÞ ¼ Survival Probability year ði  1Þ  Death Probability year ðiÞ In our example, we shall have the overall death probability for the first year equal to 17.5%, for the second year equal to 0.825  0.10 ¼ 8.3%, and the for the third year equal to 0.825  0.90  0.178 ¼ 13.2%. The overall survival probability for 3 years will be 0.825  0.90  0.822 ¼ 61%. Using the overall death probabilities, we can compute the overall survival rates as such: for the first year 82.5%, for the second year 91.7%, and for the third year 86.8%. If we add the overall death probabilities for each year, we obtain 17.5 % + 8.3 % + 13.2 % ¼ 39%, thus the cumulative survival probability will be 1  0.39 ¼ 0.61%, in concordance with the result obtained above. Data regarding survivals are presented in tables called life tables. Table 7.2 represents a life table. We can plot the values from the last three columns of Table 7.2 (see Figs. 7.6–7.8): TABLE 7.2

Life table for breast cancer patients.

Year

No. of subjects at the beginning of the year

No. of deaths (D)

No. of censors (C)

N 2 0.5 × L

Annual death prob.

Annual survival prob.

Cumulative survival prob.

0

40

7

0

40

17.5%

82.5%

82.5%

1

33

3

6

30

10%

90%

74.3%

2

24

4

3

22.5

17.8%

82.2%

61%

3

17

4

4

15

26.7%

73.3%

44.7%

4

9

3

3

7.5

26.7%

73.3%

32.8%

5

4

1

2

3

33.3%

66.7%

21.9%

6

1

0

1

0.5

0%

100%

21.9%

21

19

Total

FIG. 7.6 Annual death probabilities.

240

7. Survival analysis

1.2 1 0.8 0.6 0.4 0.2 0 Year 0

Year 1

Year 2

Year 3

Year 4

Year 5

Year 6

FIG. 7.7 Annual survival probabilities. 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0

Year 0

Year 1

Year 2

Year 3

Year 4

Year 5

Year 6

FIG. 7.8 Cumulative survival probabilities.

Recall that in Chapter 3, we have discussed the Cox regression model. The Cox regression model is a survival analysis method. Besides it, there are several others that we are going to present in the following pages. We shall first turn our attention to the Kaplan-Meier survival curves, which are often used when the predictor variable is categorical (e.g. new therapy drug vs. placebo), or it can take few values, so it can be considered categorical.

7.1 Kaplan-Meier survival curve Edward L. Kaplan and Paul Meier developed and published in 1958 a groundbreaking statistical model that was able to estimate survival curves when we had to deal with incomplete observations (Kaplan and Meier, 1958). Kaplan-Meier curves soon became the standard way

7.1 Kaplan-Meier survival curve

241

of reporting the survival of patients in medical research. More than 70% of oncology papers that perform survival analysis use Kaplan-Meier (Stalpers and Kaplan, 2018). The beauty of the Kaplan-Meier curves is that they are simple. With just a few knowledge of math, one can calculate Kaplan-Meier survival curves. All that is to be done is to collect three pieces of information from each patient: when the patient entered the study as the first date of observation; the last date of observation, that is when it was the last time the patient was seen alive; and whether the last date was observed due to the death of the patient, or due to her/him leaving the study. As we have seen before when performing survival analysis we are most interested on computing the survival probability of a certain patient in very specific conditions. Having recorded the survival times for a set of patients, we can estimate the proportion of the population that might survive a certain period of time. For example, having a set of patients that have undergone chemotherapy, we can estimate the probability of a new patient that undergoes the exact chemotherapy treatment to survive a certain period of time. In our first example only the fact that a patient has died was recorded at the end of a certain time interval. If we want a more thorough analysis we need to record the exact time of death of a patient. In this way we can estimate the survival probability after every single death case, without having to cumulate all the data from a time interval. This method is called the Kaplan-Meier survival curve. We established, when discussing the Cox regression model, the way the survival time can be computed: if we denote Pn the probability of a patient to survive the nth day after cancer surgery conditioned by the fact that she/he survived all the n  1 days before that, then the survival probability of the nth day is given by: P1  P2  P3  …  Pn1  Pn , where Pi is the probability that the patient survived day i, conditioned by the fact that she/he has survived the first i  1 days. The intermediate survival probabilities are given by: pk ¼ pk1 

rk  f k , rk

where pk is the probability of surviving k time units, rk is the number of subjects that still present a death risk at time k, but survived the k units of time, but they are not out of the woods yet, and fk is the number of deaths recorded at time k. The probability before the first death is recorded equals 1, because none of the patients have died so far, making the survival 100%. The standard error, SE, of probability pk is given by: sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ð 1  pk Þ : SEpk ¼ pk  rk Having the standard error we can compute the 95% confidence interval, if we make the presumption that the pk has a Gaussian distribution:   pk  1:96  SEpk , pk + 1:96  SEpk :

242

7. Survival analysis

If we have a small lot of patients, or if there are extreme survival probabilities, then the standard error might not give accurate approximations. In such a case, we recommend to use the Greenwood formula given by: vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi u k uX fj  : SEpk ¼ pk  t r  rj  fj j¼1 j For a better understanding of the above formulas, let us present a lot of 21 patients that have been diagnosed with stage III breast cancer, and have undergone chemotherapy treatment with a drug A. We monitor the patients for a time period of 120 days. The survival time of a patient is computed since day 0. TABLE 7.3 Life table for stage III breast cancer patients (drug A). No. of the subject

Survival time (months)

1

30

2

50

3

50*

4

51

5

66*

6

82

7

92

8

120*

9

120*





21

120*

From Table 7.3 we can see that: • the first 7 patients survived between 30 and 92 months after chemotherapy treatment, and there have been two censored events at 50 and 66 months, patients 3 and 5. • the rest of the patients have been censored after the end of the follow-up study, that is, after 120 months, at which moment they were still alive. • because 5 deaths were recorded, only 5 survival probabilities were computed. Table 7.4 depicts survival probabilities for the five uncensored data, computed as it follows: P1 ¼

18 ¼ 0:9473, 19

243

7.1 Kaplan-Meier survival curve

P2 ¼

17 ¼ 0:8947, 19

P4 ¼

15 ¼ 0:8333, 18

P6 ¼

13 ¼ 0:7647, 17

P7 ¼

14 ¼ 0:7058: 19

We shall present how we can compute the five intermediate probabilities: 1. for day 30, the intermediate survival probability is p30 ¼ 18 19 ¼ 0:9473; 2. at the beginning of the interval (30, 50), the percentage of patients that are still in the study equals 0.9473, which we are going to multiply by the percentage of patients that are still alive at the end of (30, 50) interval, that is 17 19. Thus the intermediate survival probability 17 for day 50 is p50 ¼ 0:9473  19 ¼ 0:8476; 3. the day when a censored data appears, both the denominator and numerator must be decreased correspondingly. Thus, the probability of survival after 51 days is p51 ¼ 0:8476  15 18 ¼ 0:7063; 4. the survival probability after 82 days is p82 ¼ 0:7063  13 17 ¼ 0:5401; 5. the survival probability after 92 days is p92 ¼ 0:5401  14 19 ¼ 0:3812: TABLE 7.4

Survival probabilities.

No. subjects

Survival time (months)

rk

fk

rk fk rk

pk

1

30

19

1

0.9473

0.9524

2

50

19

2

0.8947

0.8476

3

50*

4

51

18

3

0.8333

0.7063

5

66*

6

82

17

4

0.7647

0.5401

7

92

17

5

0.7058

0.3812

8

120*

9

120*





21

120*

244

7. Survival analysis

1.2 1 0.8 0.6 0.4 0.2 0 0

20

40

60

80

100

120

FIG. 7.9 Survival curve Kaplan-Meier.

Now, we can plot the survival plot, Fig. 7.9. In general, we are interested in comparing the survival rate between two groups of patients. In this case, we must compute the Kaplan-Meier curve for each one of the two groups and plot both curves in one graph. Thus, we can analyze how two or more groups of patients behave when undergoing the same treatment. In the following example, we are interested to see the difference between two treatments, thus we are going to plot the Kaplan-Meier curves of two groups of patients that undergo different treatments. So, let us consider a second group of patients diagnosed with stage III breast cancer, which have undergone chemotherapy treatment with drug B. Table 7.5 depict the life table for these patients: TABLE 7.5 Life table for stage III breast cancer patients (drug B). No. of patients

Survival time (months)

pk

1

10

0.93

2

20

0.89

3

50

0.82

4

60

0.81

5

80

0.74

6

90

0.61

7

100

0.54

8

110

0.41

9

120*





21

120*

7.1 Kaplan-Meier survival curve

245

FIG. 7.10 Kaplan-Meier curves for two groups of patients.

Fig. 7.10 represents the Kaplan-Meier curves for the two groups of patients plotted in one graph. For more details about the Kaplan-Meier method we refer the reader to Balakrishnan and Rao (2004), Rich et al. (2010), Gorunescu and Belciug (2014), and Stalpers and Kaplan (2018). Let us see how Kaplan-Meier survival analysis works on real life examples. For instance Henschke et al. (2006) presented a study regarding the survival of patients that received treatment for stage I lung cancer. The study was collaborative and contained the screening of 31,567 persons that did not have any symptoms, but were considered to be at risk for lung cancer, screening done between 1993 and 2005. At 7–18 months after the first screening, an annual screening took place, this time for 27,456 persons. The authors tried to estimate the 10-year lung cancer specific survival rate, for the patients that have been diagnosed with stage I lung cancer, after the CT screening and corresponding biopsy. The estimation was done regardless of the treatment received. All the patients considered in the study, underwent surgical resection within 1 month of the diagnosis. All the patients that were diagnosed with lung cancer were monitored year by year by the research team. When a patient died, the date and cause of death were obtained from the patient’s doctor and/or family. If the treatment caused death, then the cause of death was considered to be lung cancer. The duration of the follow-up study was 123 months. 484 Patients were diagnosed with lung cancer, from which 411 received surgical resection, 56 underwent chemotherapy, radiotherapy, or even both, while 16 received no treatment what so ever. The authors used Kaplan-Meier estimates for all the patients. The estimated 10-year survival was 80% for all the patients, with the 95% confidence interval (74, 85). The operative mortality rate was 0.5%, 2 out of 411 participants dying within 1 month after surgery. Overall 75 patients out of 484 had died of lung cancer. Out of the 484 patients, 412 were diagnosed as having stage I lung cancer. The 10-year survival rate no matter the treatment was 88%, with the 95% confidence interval (84, 91). 39 Out of 412 patients passed away. From the 412 patients, 375 underwent surgical resection, 29 did not receive surgery, but had chemotherapy and/or radiotherapy, and the remaining 8, no treatment. Within 5 years after the diagnosis, all the 8 patients who did not receive treatment died.

246

7. Survival analysis

7.2 Life tables We have seen that the tables that correspond to the Kaplan-Meier curves are called life tables. They are also known as mortality tables or actuarial tables. These tables are built in order to measure the death rate in a group of subjects during a period of time. The first life table was built in ancient Rome, under the praetorian prefect Ulpian, a famous Roman jurist. The table contains data regarding life expectancy in the early 3rd century AD. It can be found in Justinian’s Digest, a compendium of Roman law (Pflaumer, 2014). To this day, we still do not know for sure what population does Ulpian’s table refer to or even how he gathered the data. In his book, Richard Duncan-Jones suggested that the life tables refers to slaves and ex-slaves (Duncan-Jones, 2009). Modern life tables appeared in the 17th century, with the Bills of mortality by John Graunt in 1662 (Walford, 1878). After the Black Death took its tribute between 1348 and 1350, there were several plague outbreaks in Europe between the 14th and the 17th century. With each return of the plague, the cities became paralyzed. Kings, queens, and their court, wealthy people, politicians, magistrates, and sometimes physicians left them behind. With no authority in the cities, riots and looting were common. Thus, measures needed to be taken. The Kings commanded the Church of England to come up with a surveillance system that would monitor the plague deaths. These were London Bills of Mortality (Rusnock, 2002). Thirty years later, Neumann sent the demographic data for the years 1687–1691 from the city of Breslau to Henry Justel, the secretary of Royal Society (Bacaer, 2011). Justel died, and Halley took the data, analyzed it and in 1693 published Halley’s table, in the Philosophical Transactions of the Royal Society. There are two types of life tables: cohort life table and current life table. The cohort life table, or generation life table, presents the overall mortality of a cohort of subjects from their birth to the death of the last member of the group. All the subjects must be born during the same time interval. The current life tables consider the mortality of a certain population for a short period of time. Life tables can contain complete data, meaning that they include the data for each year of the time period, or they can include data computed for certain time periods (e.g. 0–1 years, 1–5 years, 5–10 years, and so on). In what follows we shall present how a complete life table is created. In a life table we shall have the following columns: • column 1 represents the age interval (x, x + 1), x ¼ 0, 1, …, w, the last interval remaining open and starting with the value w; • column 2 represents the estimation of the probability qbx of an individual that is alive at moment x, to die in the (x, x + 1) time interval. These values are computed based on the mortality rates for that specific group age in the current population. Based on these estimations, the other columns values are computed; • column 3 represents the number lx of the subjects that are alive at moment x; • column 4 represents the number dx of subjects that died during the (x, x + 1) interval; • column 5 represents the time fraction ax0 . For the subjects that died during the time interval (x, x + 1), we compute the average of the fractions of time they were alive in year x + 1, obtaining thus ax0 ;

247

7.2 Life tables

• column 6 represents the number Lx that is the total number of years lived in the time interval (x, x + 1). Every member of the cohort who survives the year x + 1 will contribute with 1 year, the rest with their fraction ax. • column 7 represents the total number Tx of lived years, having as starting point year x. • column 8 represents the observed mean ebx of life at the moment in time x, that is the average number of years that a subject has to live if she/he is now x years old. Before presenting such a table, some formulas must be briefed: dx ¼ lx  c qx , x ¼ 0, 1, …,w lx + 1 ¼ lx  dx , x ¼ 0,1, …,w  1 Lx ¼ ðlx  dx Þ + a0x  dx , x ¼ 0, 1,…,w  1 Tx ¼ Lx + Lx + 1 + … + Lw , x ¼ 0, 1,…, w T x ¼ Lx + T x + 1 ebx ¼

Tx , x ¼ 0,1, …, w lx

Table 7.6 depicts a complete life table. For further details regarding its construction we refer the reader to Chiang (1968), Kramer (1988), Fink and Brown (2006), and Bruce et al. (2017). TABLE 7.6

Example of fictional complete life table.

(x, x + 1)

qbx

lx

dx

ax0

Lx

Tx

ebx

0–1

0.02258

100,000

2258

0.1

97,968

6,562,327

65.62

1–2

0.0014

97,742

136

0.41

97,661

6,464,359

66.13

2–3

0.00093

97,605

90

0.44

97,554

6,366,698

65.22

3–4

0.00075

97,514

73

0.46

97,475

6,269,144

64.28

4–5

0.00061

97,441

59

0.49

97,411

6,171,669

63.33

















20–21

0.00117

96,406

112

0.5

96,349

4,621,054

47.93

21–22

0.00133

96,293

128

0.5

96,229

4,524,704

46.98

22–23

0.00114

96,165

109

0.5

96,110

4,428,475

46.05

23–24

0.0013

96,055

124

0.5

95,993

4,332,365

45.10

24–25

0.00105

95,930

100

0.5

95,880

2,797,322

29.15

















40–41

0.00286

93,516

267

0.5

93,382

2,701,441

28.88

41–42

0.00294

93,248

274

0.5

93,111

2,608,059

27.96 Continued

248

7. Survival analysis

TABLE 7.6 Example of fictional complete life table—cont’d (x, x + 1)

qbx

lx

dx

ax0

Lx

Tx

ebx

42–43

0.00362

92,974

336

0.5

92,806

2,514,948

27.04

43–44

0.00391

92,637

362

0.5

92,456

2,422,141

26.14

44–45

0.00443

92,275

408

0.5

92,071

2,329,685

25.24

















60–61

0.0172

79,927

1367

0.5

79,243

952,753

11.92

61–62

0.01659

78,560

1303

0.5

77,908

873,510

11.11

62–63

0.02045

77,256

1580

0.5

76,466

795,601

10.29

63–64

0.02056

75,676

1556

0.5

74,899

719,135

9.502

64–65

0.02351

74,120

1742

0.5

73,249

644,236

8.69

















90–91

0.23825

8156

1899

0.5

7206

21,419

2.62

91–92

0.26615

6257

1665

0.5

5424

14,213

2.27

92–93

0.27769

4591

1275

0.5

3954

8788

1.91

93–94

0.29254

3316

970

0.5

2831

4834

1.45

94–95

0.29254

2346

686

0.5

2003

2003

0.85

95

1

1659

1659

If we are not interested in a complete table, we can build a shorter version of that one, taken into account the fact that we have to replace the age intervals (x, x + 1) with time period (xi, xi + 1), and also all the parameters qbx , lx ,dx ,a0x ,Lx , Tx , and ebx , with qbi , li , di , ai , Li ,Ti , and ebi . Table 7.7 depicts an example of a periodical life table. TABLE 7.7 Example of fictional periodical life table. (xi, xi + 1)

qbi

li

di

ai

Li

Ti

ebi

0–1

0.02633

100,000

2633

0.1

97,630

1,490,086

14.90

1–5

0.00432

97,367

420

0.37

97,102

1,392,456

14.30

5–10

0.00245

96,946

237

0.44

96,813

1,295,354

13.36

10–15

0.00219

96,708

211

0.54

96,611

1,198,541

12.39

15–20

0.00458

96,497

441

0.56

96,302

1,101,929

11.41

20–25

0.00616

96,055

591

0.49

95,753

1,005,627

10.46

25–30

0.00642

95,463

612

0.5

95,157

909,873

9.53

30–35

0.008

94,850

758

0.51

94,478

814,716

8.58

249

7.2 Life tables

TABLE 7.7

Example of fictional periodical life table—cont’d

(xi, xi + 1)

qbi

li

di

ai

Li

Ti

ebi

35–40

0.01159

94,091

1090

0.54

93,590

720,238

7.65

40–45

0.0134

93,001

1246

0.54

92,428

626,648

6.73

45–50

0.02902

91,754

2662

0.54

90,530

534,220

5.82

50–55

0.04571

89,092

4072

0.53

87,178

443,690

4.98

55–60

0.06575

85,019

5590

0.54

82,448

356,511

4.19

60–65

0.10457

79,429

8306

0.52

75,443

274,063

3.45

65–70

0.14563

71,123

10,357

0.52

66,152

198,620

2.79

70–75

0.21471

60,766

13,047

0.51

54,373

132,468

2.17

75–80

0.3428

47,718

16,358

0.51

39,704

78,095

1.63

80–85

0.46312

31,360

14,523

0.48

23,808

38,391

1.22

85–90

0.61437

16,837

10,344

0.45

11,147

14,583

0.86

90–95

0.79812

6492

5182

0.41

3435

3435

0.52

95

1

1310

1310

0

0

In oncology, we are interested to compare the survival experience of two or more groups, when it comes to different chemotherapy, radiotherapy, etc. treatments. Till now we have seen that the simplest way to perform a comparison is by using the Kaplan-Meier survival curves. Besides the actual visualization of the curves, we can compute the standard error of the differences between the subjects that survive at a given moment in time and the corresponding confidence intervals. Even if these comparisons are very important and interesting to know, using Kaplan-Meier we cannot obtain a global survival comparison of the two groups, just a comparison at a certain moment. In order to resolve this issue, we can use other statistical methods for the survival comparison between two groups such as: the logrank test, hazard ratio, and Nelson Aalen Filter.

The logrank test The logrank test is a statistical test in which the null hypothesis H0 states that there is no difference between the two groups. Logrank test is a non-parametric method. In order to perform this test we need to divide the survival time scale taking into account the moments when deaths have been observed, and also ignoring censored survival times. Next, we compute for separate intervals the observed number of deaths and the expected one, after which we sum them up. Let us presume that we have two groups. Technically, we divide the survival time in time periods that finish with one or multiple deaths. For each death time unit we compute the number of subjects that present a death risk. We do this computation for each of the two groups, denoting with r1 the obtained number for the first group, and with r2, the obtained

250

7. Survival analysis

number for the second group. Identically, we compute the number of observed deaths for the first group, f1, and for the second group, f2. Next, we compute the expected number of deaths, if we accept the null hypothesis. Using all these data, we can build a table such as Table 7.8. TABLE 7.8 Deaths and survivals of two groups. Group 1

Group 2

Total

Deaths

f1

f2

f

Survivals

r1  f1

r2  f2

rf

Total

r1

r2

r

The two below formulas gives the expected number of deaths: e1 ¼

r1  f ; r

e2 ¼

r2  f : r

By summing up the observed values Oi , and the expected ones Ei, which correspond to the whole table, we shall obtain: X Oi ¼ fji , i ¼ 1, 2, j

Ei ¼

X

eji , i ¼ 1, 2:

j

In order to verify the hypothesis, we denote O1 + O2 ¼ E1 + E2, as a control equality. Now, we can compute the logrank statistical test, given by: X2 ¼

ðO1  E1 Þ2 ðO2  E2 Þ2 + : E1 E2

If we have more than two groups, than the logrank test becomes:  2 n X m X Oij  Eij T¼ : Eij j¼1 i¼1 The obtained value will be compared with a χ 2 distribution with (n  1)(m  1) degrees of freedom, where m is the number of groups, and n is the number of time intervals. Computing the p-value we decide whether we accept or reject the null hypothesis (Bland and Altman, 2004; Gorunescu and Belciug, 2014). Let us apply the logrank test on an example where we are trying to determine which one of the two new cancer drugs is better. We gave the two drugs to two groups of lab rats that have been previously exposed to a carcinogenic substance. During 72 h, the two groups that contained 10 and 8 lab rats were monitored. The next two tables, Table 7.9 and 7.10 represent

251

7.2 Life tables

the survival times. Recall that (*) represents censored data, in our case the lab rat died of other causes or was withdrawn out of the study. TABLE 7.9

G1—first group of lab rats and their survival times.

1

2

3

4

5

6

7

8

9

10

9*

27

36

45

45

54*

58

63

68*

71

TABLE 7.10

G2—first group of lab rats and their survival times.

1

2

3

4

5

6

7

8

18

18

27*

38

47

48*

65

69

In the following computations and tables we will use the following notations: • t—the time when the event took place (in our case the hour); • n—the number of subjects that are still under observation at t hour; • n1—the number of subjects from the first group G1 that are still under observation at t hour; • n2—the number of subjects from the second group G2 that are still under observation at t hour; • r— the number of deaths that have been reported at t hour; • c—the number of censored data that have been reported at t hour; • σ 1—the number of deaths that have occurred within group 1, and have been reported at t hour; • σ 2—the number of deaths that have occurred within group 2, and have been reported at t hour; • e1—the number of expected deaths that are likely to occur in group 1, at t hour; • e2—the number of expected deaths that are likely to occur in group 2, at t hour. From Table 7.9 we can see that we have seven death events that are distributed as such: one death event at the following hours: 27, 36, 58, 63, and 71, and two death events in the 45th hour. From Table 7.10 we can see that we have six death events that have the following distribution: one death event per hour at 38, 47, 63 and 69, and two death events in the 18th hour. By combining the two groups according to the death events, we will obtain 10 distinct time values, thus having 11 time intervals: (0, 18], (18, 27], (27, 36], (36, 38], (38, 45], (45, 47], (47, 58], (58, 63], (63, 65], (65, 69], (69, 71]. Let us see how the algorithm works, step-by-step. Table 7.11 presents the data regarding the time frame (0, 18]. TABLE 7.11

Data regarding the time frame (0, 18].

t

n

n1

n2

r

c

σ1

σ2

e1

e2

(0, 18]

18

10

8

2

1

0

2

1.12

0.88

252

7. Survival analysis

The ratio of observed subjects from group 1 is 10 18 ¼ 0:555, and the ratio of the observed sub8 jects from group 2 is 18 ¼ 0:444. From these numbers we can compute the expected events: e1 ¼ 2 

10 8 ¼ 1:12, and e2 ¼ 2  ¼ 0:88: 18 18

Table 7.12 presents the data regarding the time frames till (18, 27]. TABLE 7.12 Data regarding the time frames till (18, 27]. t

n

n1

n2

r

c

σ1

σ2

e1

e2

(0, 18]

18

10

8

2

1

0

2

1.12

0.88

(18, 27]

15

9

6

1

1

1

1

0.6

0.4

9 The ratio of observed subjects from group 1 is 15 ¼ 0:6, and the ratio of the observed subjects 6 from group 2 is 18 ¼ 0:4. From these numbers we can compute the expected events:

e1 ¼ 1 

9 6 ¼ 0:6, and e2 ¼ 1  ¼ 0:4: 15 15

Following the same procedure, we will achieve in the end Table 7.13. TABLE 7.13 Computational table for log-rank test. t

n

n1

n2

r

c

σ1

σ2

e1

e2

(0, 18]

18

10

8

2

1

0

2

1.12

0.88

(18, 27]

15

9

6

1

1

1

1

0.6

0.4

(27, 36]

13

8

5

1

0

1

0

0.62

0.38

(36, 38]

12

7

5

1

0

0

1

0.58

0.42

(38, 45]

11

7

4

2

0

2

0

1.27

0.73

(45, 47]

9

5

4

1

0

0

1

0.56

0.44

(47, 58]

8

5

3

1

2

1

0

0.63

0.37

(58, 63]

5

3

2

1

0

1

0

0.6

0.4

(63, 65]

4

2

2

1

0

0

1

0.5

0.5

(65, 69]

3

2

1

1

1

0

1

0.67

0.33

(69, 71]

1

1

0

1

0

1

0

1

0

All that is left to do is to compute the T statistics and to obtain the p-level from the χ 2 table (Table A11 found at the end of Chapter 1):  2 11 X 2 X Oij  Eij T¼ ¼ 10:8643: Eij j¼1 i¼1

253

7.2 Life tables

In this case, we have 0.001 < p < 0.002, thus we shall reject the null hypothesis, which implies the fact that there is a significant difference between the survival experience of the two groups of lab rats. Let us consider another example. We have two groups of patients, each one of them containing 21 individuals that suffer from uterine cancer. The first group of patients undergoes chemotherapy using a new drug D1, and the second group undergoes chemotherapy using a second new drug D2. The data regarding their survival experience is depicted in Tables 7.14 and 7.15. TABLE 7.14

Group of patients that were treated with drug D1.

1

2

3

4

5

6

7

8

9



21

30

50

50*

51

66*

81

92

120*

120*



120*

TABLE 7.15

Group of patients that were treated with drug D2.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17



21

5

6*

11

11

13

24

63

65

69

69

79

81

82

102

115

120*

120*



120*

We want to know if there exists a significant difference between the survival experiences of the two groups of patients. In the first group we have 5 death events in the following days: 30, 51, 51, 82, and 92. In the second group we have 15 death events in the following days: 5, 13, 24, 63, 65, 79, 102, and 115, and 2 death events in the following days: 11, 69, and 82. Thus we shall divide the time period into these time intervals: (0, 5], (5, 11], (11, 13], (13, 24], (24, 30], (30, 50], (50, 51], (51, 63], (63, 65], (65, 79], (79, 81], (81, 92], (92, 102], (102, 115], (115, 120]. By applying the logrank test algorithm, we will build the following table, Table 7.16: TABLE 7.16

Computational table for log-rank test.

t

n

n1

n2

r

c

σ1

σ2

e1

e2

(0, 5]

42

21

21

1

0

0

1

0.5

0.5

(5, 11]

41

21

20

2

1

0

2

1.03

0.97

(11, 13]

38

21

17

1

0

0

1

0.55

0.45

(13, 24]

37

21

16

1

0

0

1

0.55

0.45

(24, 30]

36

21

16

1

0

0

1

0.58

0.42

(30, 50]

35

21

15

1

1

1

0

0.57

0.43

(50, 51]

33

18

15

1

0

1

0

0.56

0.44

(51, 63]

32

17

15

1

0

0

1

0.53

0.47

(63, 65]

31

17

14

1

0

0

1

0.55

0.45

(65, 79]

30

17

13

2

1

0

2

1.86

1.14 Continued

254

7. Survival analysis

TABLE 7.16 Computational table for log-rank test—cont’d t

n

n1

n2

r

c

σ1

σ2

e1

e2

(79, 81]

27

16

11

1

0

0

1

0.59

0.41

(81, 92]

26

16

10

3

0

1

2

1.86

1.14

(92, 102]

23

15

8

1

0

1

0

0.65

0.35

(102, 115]

22

14

8

1

0

0

1

0.64

0.36

(115, 120]

21

14

7

1

0

0

1

0.67

0.33

We compute: T¼

 2 15 X 2 X Oij  Eij ¼ 20:3261: Eij j¼1 i¼1

Looking in the χ 2 test Table A11, in Chapter 1, we see that the corresponding p-level is approximately 0, which means that we must reject the null hypothesis, and that there are significant differences regarding the survival experiences of the two groups of patients.

The hazard ratio Sometimes besides finding out whether two or more groups are different in terms of the survival experience, we might be interested in finding out how different the two groups really are. Unfortunately, the logrank test provides no such information. We can measure the relative survival between two groups by comparing the observed number of events with the expected numbers. The ratio OEii , i ¼ 1, 2, computes the observed  O event rate in each group. Thus, by computing



O2

1



E1

E2

we obtain the relative event rates in

the two groups. R is called the hazard ratio (Altman, 1991; Barraclough et al., 2011; Rouaum, 2013). Using the above example, we have R ¼ 0.4087, that means that the estimated relative risk of dying when undergoing chemotherapy with drug D1 is 0.41 of the estimated relative risk of dying when undergoing chemotherapy with drug D2. Bitencourt et al. studied the prognostic significance of preoperative MRI findings regarding breast cancer patients younger than 40 years old (Bitencourt et al., 2019). Even if no more than 7% of all breast cancer cases in the United States are found in women younger than 40 years, it has been noticed that the younger patients have increased frequency of hormone receptor-negative tumors, their tumors are more aggressive, have an advanced TNM stage, and overall a bad outcome (Copson et al., 2013; Eugenio et al., 2016; Kim et al., 2011). As we have seen throughout this book, MRI is used for detecting and diagnosing breast cancer, in some studies showing that it is more accurate than: the traditional mammography or ultrasound in the assessment of the tumor spread, in the detection of multifocal and multicentric

7.2 Life tables

255

cancer (Plana et al., 2012), and also in the preoperative surgical planning (An et al., 2012; Petrillo et al., 2013; Mukherjee et al., 2016). Bitencourt et al. wanted to study the importance of MRI regarding the prognostic of young patients diagnosed with breast cancer. The study was performed on 120 breast cancer patients who received their diagnosis between November 2008 and August 2012. Being a retrospective study, not all the patients had breast MRI scans before the treatment started, consequently 28 out 120 were removed, in the study remaining only 92 patients. After the detection of the main tumor’s size, multifocality and multicentricity on the MRI scans, the surgery took place, after which histopathology was performed on the resected tissues. The breast tumors were classified by immunohistochemical data into four classes: luminal A (estrogen/progesterone receptors and low proliferation index); luminal B (positive results for estrogen and/or progesterone receptors with high proliferation index, or HER-2 overexpression), HER-2 (overexpression of HER-2 combined with negative hormone receptor expression), and triple negative (HER-2 with both hormone receptor). As for the statistical analysis, the authors used Chi-square test for the comparison of categorical variables, the Student’s t test for the continuous variables that were normally distributed, and the Mann-Whitney test for the continuous variables that were not governed by the Gaussian distribution. Taking into account the age of 30 years as a threshold, the patients were split into two groups below 30, and above 30 years old. Taking into account the TNM classification guidelines, the tumor size was classified into three classes: T1 (the tumor size less than 2 cm), T2 (the tumor size was between 2 and 5 cm), and T3 (the tumor size was above 5 cm). For the survival analysis the authors used Kaplan-Meier to analyze the overall survival, log rank test for the comparison between groups, and the Cox regression to estimate the hazard functions. The reported statistical findings were: the mean age was 34 years, having 20 patients younger than 30. The tumor size was evaluated only on the patients who did not receive neoadjuvant chemotherapy, thus only on 70 patients. The mean size was 4.8, ranging from 0.2 to 12.0 cm. 28.3% of the tumors were classified as type T1, 35.5% as type T2, and the rest (12%) as type T3. In terms of multifocality it has been observed in 13% of patients, whereas multicentricity in 12% of patients. The mean follow-up was 5.4 years. Unfortunately, 15 patients did not make it. In all of the 16.3% of cases the cause of death was cancer related. In 21 patients recurrence appeared, from which 20 patients presented distance metastases in bones (15), liver (11), lung (10), brain (7), lymph node (5), kidney (1), and ovary (1). Using the Kaplan-Meier survival curve, and the logrank test on the MRI findings, the authors reported that the worse overall survival depended on: the size of the tumor, greater than 5 cm, logrank test p-level ¼ 0.0001; the presence of nonmass enhancement, logrank test p-level ¼ 0.010; and multifocality, logrank test, p-level ¼ 0.019. In terms of the same survival analysis methods applied in the pathological findings that were associated with the worse overall survival, the reported results were: tumor size, greater than 5 cm, logrank test p-level ¼ 0.003; and the metastases found in the axillary lymph nodes, lo rank test, p ¼ 0.005. The Cox regression analysis point out the fact that there were no significant correlations between the survival and age (p-level ¼ 0.784), survival and multicentricity (p-level ¼ 0.472), survival and multifocality (p-level ¼ 0.906), survival and molecular subtypes (p-level ¼ 0.779), or survival and associated intraductal component (p-level ¼ 0.833).

256

7. Survival analysis

Nelson Aalen Filter Nelson Aalen Filter resembles Kaplan-Meier, considering the fact that it also provides an average view of the studied population. Technically, we must compute the ratio of number of deaths observed at time t and the number of subjects at risk. The Nelson Aalen Filter is also a non-parametric method. Having ni the number of subjects that are considered to be at risk before the moment in time t, and di the number of deaths observed at time t, we can compute: Ht ¼

X di ti

t

ni

:

7.3 Survival regression Survival regression is a method that besides the duration and censorship it takes into consideration additional data (e.g. gender, salary, age, etc.). These are covariates. One may ask, why use survival regression instead of the old fashioned multiple linear regression? Well, because in this manner we will ignore the censoring. What if we are performing a survival analysis on a group of patients that are in a clinical drug trial, and one patient dies due to an accident? Are we going to record her/his death as being related to the new drug? Ok, then one may continue to ask why shouldn’t we just compare proportion of events in our groups using the logistic regression? Because logistic regression ignores time. Up until now we have discussed non-parametric models. As we saw before, nonparametric models make no assumptions regarding data distribution. Parametric models on the other hand are “school like.” We always have an equation that defines how long a cancer patient will survive. In survival analysis the most common used distributions are: the exponential distribution, the log normal distribution, the Weibull distribution, the log logistic distribution, the Gamma distribution, and the Gompertz distribution. Let us review them one by one.

7.4 The exponential distribution The exponential distribution is also known as the negative exponential distribution. It is a probability distribution that describes time between events that happened in a Poisson process. Let us presume that a Poisson distribution describes the number of cancer deaths in a certain period of time. The time that passes between each death event can be modeled using the exponential distribution. Everything would be clear right now, if we would know what a Poisson process is. Let us try a simple explanation of this new concept.

7.5 Poisson process A Poisson process computes the probabilities for random events in time for a process. In our case the cancer death events that happen in a given time. Using the Poisson process, we are able to forecast when one of these death events might happen. This computation is not

7.6 The log normal distribution

257

rocket science, it is just a counting process; it counts the number of times a death event has occurred since a given moment in time, e.g. 3 deaths since month 6 of the clinical trial. In order to use this kind of process we must assume that the events are independent. Returning to our exponential distribution, it can help us answer questions like: how long till a new cancer death might appear on this new drug clinical trial? If we assume that the answer is unknown, then you can think of the elapsed time as a random variable, which has an exponential distribution. Obviously, we must assume that the event occur continuously and independently, and also at a constant rate. The exponential distribution has a memoryless property, practically forgetting what happened in the past. The length of time that passes neither decreases, nor increases the probability of an event to happen. Let us say that we have registered a cancer death this week, the probability of another death event to happen in a week, or a month, or a year, are all equal. The probability density function for the exponential distribution has the following formula: f ðx, λÞ ¼ eλ  x  x > 0, where λ is the mean time between events, and x is a random variable (see Fig. 7.11).

FIG. 7.11 Probability distribution function for the exponential distribution.

7.6 The log normal distribution The log normal distribution of the Galton distribution is a probability distribution with a normally distributed logarithm, that is if the logarithm of a random variable is normally distributed, then that random variable is log normally distributed. All the distributions that have low mean values, large variances, and positive values fit the log normal distribution (see Fig. 7.12). Thus the probability density function for the log normal distribution is:

258

7. Survival analysis

"

N ðlnx; μ, σ Þ ¼

1 pffiffiffiffiffiffiffiffi  e σ 2π

ðlnx  μÞ2  2  σ2

# , x > 0,

where: • σ is the shape parameter, and as you can see from the log normal formula, it represents the standard deviation. The shape parameter is computed from the known old data, or it can be estimated using the current data. The location or height of the graph remains unchanged, the shape affecting only the overall shape. • μ is the location parameter, the one that points out where the graph is located on the x-axis.

m =0 s = 0.5 s =1

1

s =2

0.5

0 0

FIG. 7.12

1

2

3

4

Probability density function for the log normal distribution.

7.7 The Weibull distribution The Swedish mathematician Waloddi Weibull developed the Weibull distribution, which is a continuous probability distribution (Weibull, 1951). It is used to analyze life data, model failure times, and assess the reliability of a product. There are two versions of the Weibull probability density functions: the one that uses two parameters, and the one that uses three parameters. In our presentation, we will use the following notations: • σ denotes the shape parameter; • x is the variable; • μ denotes the location parameter. The two parameter Weibull has the following formula:  hxiσ   σ xσ1 α , x  0: f ð xÞ ¼  e α α

259

7.8 The log logistic distribution or Fisk distribution

where α is the scale parameter, or the characteristic life parameter. The two parameter Weibull is used in failure analysis, since no failure can happen before the moment 0 in time. As you can see, the waiting parameter μ is not included in the two parameter distribution. The three parameter Weibull has the following formula:  x  μσ   σ ðx  μÞðσ1Þ α f ð xÞ ¼  e , x  μ, α > 0: α α Taking into account the values attributed to the shape parameter, σ, we can have the following situations: • if it is less than 1, then the death rate decreases with time, that is we have a larger number of observed deaths at the beginning of the process, and then fewer deaths are observed as time passes; • if it equals 1, then the death rate is constant; • if it is greater than 1, then the death rates increases as time passes. Depending on what parameters the user chooses, we have different types of the Weibull distribution. We can see the different shapes that the probability density function can take depending on the value of σ. 2.0

s = 0.5 s=1 s=2 s=5

1.5 1.0 0.5 0.0

0

1

2

3

4

5

FIG. 7.13 Probability density functions of the Weibull distribution.

From Fig. 7.13, we see that the Weibull probability density functions can have exponential shapes, right skewed, or symmetric. If we change the scale parameter, α, the shape does not change, it only stretches out. If we increase α, then the graph stretches to the right, as the height decreases, whereas if we decrease α, the graph is shrinking to the left, as the height increases.

7.8 The log logistic distribution or Fisk distribution The log logistic distribution is a continuous probability distribution. It is used to model situations when the rate of the events is increased in the beginning, after which it starts to decrease. Just like in the other presented distributions, we have to deal with two parameters: the scale, λ, and shape σ (Fig. 7.14). Both parameters take positive values (Al-Shomrani et al., 2016).

260

7. Survival analysis

The probability density function has the following formula: f ð xÞ ¼

λ  σ  ðλ  σ Þσ1 ð1 + ðλ  σ Þσ Þ

2

,x > 0:

s = 0.5 s=1 s=2 s=4 s=8

2

1.5

1

0.5

0 0

FIG. 7.14

0.5

1

1.5

2

Probability density functions of the log logistic distribution.

7.9 Gamma distribution The Gamma distribution is a right skewed continuous probability distribution. It is used for elapsed times, and Poisson processes. Two parameters are involved in the probability density function: the shape, σ, and the scale, λ (Bowman and Shenton, 2014). The probability density function has the following formula: 8 x 1 <  xσ1  e λ , x0 f ð xÞ ¼ λ σ  Γ ð σ Þ , : 0, otherwise Ð x1  t  e dt. where Γ(x) ¼ ∞ 0 t 5 4.5 4

s = 1, l = 1 s = 2, l = 1 s = 3, l = 1 s = 3, l = 0.5

3.5 3 2.5 2 1.5 1 0.5 0 0

FIG. 7.15

0.5

1

1.5

2

Probability density function for the gamma distribution.

2.5

3

7.10 The Gompertz distribution

261

If σ ¼ 1 then Gamma distribution becomes the exponential distribution, whereas if λ ¼ 1 then we are dealing with the standard gamma distribution. We can think of the shape parameter as the number of events we are waiting to happen, and the scale as the mean waiting time until the first event takes place. If the number of events remains the same, but the mean time that passes between events starts to increase, then the graph will shift to the right. If the mean time remains the same, but the number of events increases, then we again have a right shift of the graph. As the number of events continues to increase, approaching infinity, the gamma starts to match the Gaussian distribution (see Fig. 7.15).

7.10 The Gompertz distribution Benjamin Gompertz, a self educated mathematician, developed the Gompertz distribution, which is an exponentially increasing continuous probability distribution (Pollard and Valkovics, 1992; Willekens, 2001). In 1820, he started a study regarding the modeling of human mortality caused by other events rather than external causes. In his paper (Gompertz, 1825), he was interested to see how can one compute the probability of a person to live to a certain age, if nothing unexpected happens to her/him. By comparing the proportion of people of different ages who died in across four cities in England, he interpolated the mortality curve, and which resulted in the fact that mortality increases exponentially as age increases. The Gompertz distribution is a truncated extreme value distribution. In this kind of distribution function we have two shape parameters: σ and δ (Fig. 7.16). The probability density function is given by the following formula: f ð xÞ ¼ δ  σ x  e



δ  ðσ x 1Þ lnðσ Þ ,

x > 0,

for σ > 1, and δ > 0. FIG. 7.16 Probability density function for the Gompertz

f(x) d = 0.5

s=2 s=3

d = 0.1

s=2

d=1

1.0 0.8

distribution.

0.6 0.4 0.2 0.0

x 0

1

2

3

4

5

6

Let us see now, how these parametric survival models can be used in real life applications. Alfonso and de Oca presented in their paper an application of hazard models in a study regarding breast cancer patients from Cuba (Alfonso and de Oca, 2011). The data used in performing this study contained 6381 breast cancer patients, who have been diagnosed from

262

7. Survival analysis

January 2000 to December 2002. The data was obtained from the National Cancer Register of Cuba. The follow-up was performed until December 2007, time at which 2167 patients have already passed away. The authors used the following parametric models for the survival analysis: Exponential, Weibull, Log-logistic, Lognormal, Generalized Gamma, and Gompertz model. The parameters for the first five models were set using the accelerated failure-time metric, while for the sixth, the proportional hazard metric was used. Alfonso and de Oca analyzed in this study the survival curves for different lots of patients, and also investigated the differences between them. They were interested if by plotting the survival curves for different groups, the assumption regarding the proportionality still holds, implying that the survival probability of an individual patient in regard to another patient survival probability does not change as time passes. The proportionality assumption is verified using two variables: the age and the clinical stage of the patient, which is classified according to the TNM. The six parametric models were used to determine the possible factors that may significantly influence the survival time. For each one of the six models, three covariates have been included: the province where the patient lived, the patient age, and clinical stage. The reported results were: the overall median survival time was 5 years and 8 months. The chances of a patient surviving the first year are 85%, the probability decreasing up to 10% for the 5 years survival. From the 2167 women that have died before the follow-up ended, 1101 died within the first 2 years, and only 3% in the last 2 years. The more serious the clinical stage is, the more likely is for the patient to die. For the patients that were diagnosed with clinical stage I and II, the chances of dying were 4.8% and 8.7%, whereas for the stage IV the chances went up to 43.6%. Regarding the fitting performances of the six parametric survival models, the authors reported that the Generalized Gamma and Gompertz methods fitted the data best, while the exponential, log logistic, log normal, and Weibull fitted the data poorly. Further on, due to the obtained performance of the Generalized Gamma, the authors used it for the assessment of the multivariate effects sizes. It appears that there is no significance difference between the survival times from different provinces in Cuba, implying the fact that the accessibility to the health care systems is the same in all of Cuba. The differences appear when the clinical stage is taken into account. Thus, the survival times for patients diagnosed with stage I breast cancer are 40% longer than those diagnosed with stage II breast cancer. The difference continues to increase up to 50% and even 60% for the patients diagnosed with stage III and IV. After one and half years since the diagnosis, patients diagnosed with stage IV have a predicted median survival time of 6 months. Now that we are done covering the survival analysis basic concepts and applications, we believe it is time to move onto the next chapter where we will be discussing remission and recurrence.

References Alfonso, A.G., de Oca, N.A.M., 2011. Application of hazard models for patients with breast cancer in Cuba. Int. J. Clin. Exp. Med. 4 (2), 148–156. Al-Shomrani, A.A., Shawky, A.I., Arif, O.H., Aslam, M., 2016. Log-logistic distribution for survival data analysis using MCMC. SpringerPlus 5, 1774. Altman, D.G., 1991. Practical Statistic for Medical Research. Chapman and Hall. An, Y.Y., Kim, S.H., Kang, B.J., 2012. Characteristic features and usefulness of MRI in breast cancer in patients under 40 years old. Correlations with conventional imaging and prognostic factors. Breast Cancer 21, 302–315.

References

263

Bacaer, N., 2011. Halley’s life table (1693). In: A Short History of Mathematical Population Dynamics, pp. 5–10. Balakrishnan, N., Rao, C.R., 2004. Advances in Survival Analysis. Elsevier. Barraclough, H., Simms, L., Govindan, R., 2011. Biostatistics primer: what a clinician ought to know: hazard ratios. J. Thorac. Oncol. 6 (6), 978–982. Bitencourt, A.G.V., Eugenio, D.S.G., Souza, J.A., Souza, J.O., Makdissi, F.B.A., Marques, E.F., Chojniak, R., 2019. Prognostic significance of preoperative MRI findings in young patients with breast cancer. Sci. Rep. 9, 3106. Bland, J.M., Altman, D.G., 2004. The logrank test. BMJ 328 (74447), 1073. Bowman, K.O., Shenton, L.R., 2014. Gamma distribution. In: Lovric, M. (Ed.), International Encyclopedia of Statistical Science. Springer, Berlin, Heidelberg. Bruce, N.G., Pope, D., Stanistreet, D., 2017. Life tables survival analysis, and Cox regression. In: Quantitative Methods for Health Research: A Practical Interactive Guide to Epidemiology and Statistics, second ed. Wiley. https://doi. org/10.1002/9781118665374.ch8. Chiang, C.L., 1968. Introduction to Stochastic Processes in Biostatistics. John Wiley & Sons. Copson, E., Eccles, B., Maishman, T., Gerty, S., Staton, L., Cutress, R.I., Altman, D.G., Durcan, L., Simmonds, P., Lawrencem, G., Jones, L., Bliss, J., Eccles, D., POSH Study Steering Group, 2013. Prospective observational study of breast cancer treatment outcomes for UK women aged 18-40 years at diagnosis: the POSH study. J. Natl. Cancer Inst. 105 (13), 978–988. Duncan-Jones, R., 2009. Structure and Scale in the Roman Economy. Cambridge University Press. Eugenio, D.S., Souza, J.A., Chojniak, R., Bitencourt, A.G., Graziano, L., Souza, E.F., 2016. Breast cancer features in women under the age of 40 years. Rev. Assoc. Med. Bras. 62 (8), 755–761. Fink, S.A., Brown Jr., R.S., 2006. Survival analysis. Gastroenterol. Hepatol. (NY) 2 (5), 380–383. Gompertz, B., 1825. On the nature of the function expressive of the law of human mortality, and on the mode of determining the value of life contingencies. Philos. Trans. R. Soc. 115, 513–585. Gorunescu, F., Belciug, S., 2014. Incursiune in Biostatistica. Editura Albastra—Grupul Microinformatica. Henschke, C.I., Yankelevitz, D.F., Libby, D.M., Pasmantier, M.W., Smith, J.P., Miettinen, O.S., 2006. Survival of patients with stage I lung cancer detected on CT screening. N. Engl. J. Med. 355 (17), 1763–1771. Kaplan, E.L., Meier, P., 1958. Nonparametric estimations from incomplete observations. J. Am. Stat. Assoc. 53 (282), 457–481. Kim, E.K., Noh, W.C., Han, W., Noh, D.Y., 2011. Prognostic significance of young age (20 ng/mL) prior to the radical prostatectomy, as well as in patients with a combined Gleason score over 8. A correlation was found also between the XPO6 and the lymph node metastasis occurrence, biochemical recurrence, and distant metastasis occurrence. Overall, the reported results show that an elevated expression of XPO6 is highly and significantly correlated with aggressive prostate cancers and poor clinical outcomes. The authors were also interested in computing the correlations between the XPO3 and XPO6, and biochemical recurrence. The analysis was performed using the Kaplan-Meier survival curves and the logrank test. The results computed using Kaplan-Meier showed that the XPO3 is not correlated significantly with the biochemical recurrence, whereas the patients that had elevated expressions of XPO6 had a significant shorter time period before the cancer relapsed. The average recurrence-free survival time was 6.5 months shorter compared to the rest of the patients. Besides the Kaplan-Meier analysis, the Cox proportional-hazard regression was used to see whether there was a correlation between XPO3 and XPO6, and recurrence. The likelihood ratio test confirmed that elevated expressions of the XPO3 gene are not correlated with the recurrence free survival time, and that elevated expression of the XPO6 are indeed significantly correlated with it. Ultimately, the authors discovered that the XPO6 gene expression proved to be a novel biomarker for identifying potential high-risk patients that were labeled as low risk patients. Saso et al. developed a new prognostic model that can predict recurrence in patients with stage II colon cancer, after they underwent curative resection (Saso et al., 2018). In recent years, the incidence of colorectal cancer has started to increase (Colvin et al., 2017; NCCN, 2017). Just like in other cancer types, the risk of recurrence is variable in patients. It has been reported that using chemotherapy might bring significant benefits to patients with stage II colon cancer (Labianca et al., 2013). Still, giving patients adjuvant chemotherapy remains a debated subject. The European Society of medical Oncology guidelines recommend it for patient diagnosed with stage II colon cancer, whereas the American Society of Clinical Oncology guidelines do not recommend it (Benson et al., 2004). Some reports provided by the Surveillance, Epidemiology, and End Results (SEER) program in the United States that the prognosis of stage IIIA (T1-2, N1) is better than the prognosis of stage IIB (T4, N0) (O’Connell et al., 2004). Thus the aim of Saso’s study was to predict the recurrence of stage II colon cancer after the patients had undergone curative resection. Besides this, they tried to predict whether adjuvant chemotherapy would be beneficial for different patients. The study was applied on 436 patients from the Osaka International Cancer Institute and Yao Municipal Hospital. All the patients underwent surgical resection. From the 436 patients, 84 received adjuvant and neoadjuvant chemotherapy, thus being excluded from the study. Cox proportional hazard regression model was used to predict the 5-year recurrence free survival.

276

8. Remission and recurrence. What do to next?

The validation was done on an independent group of patients from the Osaka University Hospital. The validation set contained 213 cases. The following features were extracted from the medical records: age, sex, body mass index, pre-perforation, pre-obstruction, post-anastomotic leakage, preoperative serum level of the tumor marker carcinoembryonic antigen, surgery, dissection for lymph node, number of lymph nodes sampled, tumor location, histology, histological grade, lymph node sampling, tumor invasion, lymphatic invasion, and venous invasion. During the follow-up period, all the patients underwent blood examinations to evaluate the serum levels of the tumor markers, abdominal CT scans, chest X-ray, and/or PET scans at every 3–6 months. Besides this, all the patients underwent colonoscopy every year (Watanabe et al., 2018). The authors used a classification and regression CART to predict the recurrence free survival, after which they applied the Kaplan-Meier survival curves and logrank test. The median follow-up time was 4.67 years, and the 5-year recurrence free survival rate was 89.2%. After the surgical resection, 314 patients had no recurrent events. The factors that were highly and significantly correlated with the recurrent events were: high operative serum CEA level, pre-obstruction, tumor invasion, lymphatic invasion, and venous invasion. The patients were divided into 6 subgroups taking into account the CEA level, tumor invasion, lymphatic invasion, and venous invasion. The recurrence free survival was computed for each subgroup, and the reported results were: for the first group 96.6%, for the second group, 85.1%, for the third group 87.8%, for the fourth group 88.7%, for the fifth group 81.3%, and for the sixth group 68.6%. Patients that belonged to group 6 would benefit from adjuvant chemotherapy. Mao et al. studied the prediction of cervical cancer recurrence with the use of a ninelncRNA signature (Mao et al., 2019). Cervical cancer is ranked as the fourth most diagnosed cancer, and it has been correlated with the human papillomavirus infection (Castellsague et al., 2006; Torre et al., 2015). Even if medicine has advanced, the prognosis of cervical cancer patients remains poor (Fuller et al., 2007). Previous studies published by Mao et al., demonstrate that genomic factors could predict cervical cancer prognosis (Mao et al., 2018). The authors used the GSE44001 data sets from the Gene Expression Omnibus (Kodahl et al., 2014; Clough and Barrett, 2016), which contains 300 patients’ microarray data regarding cervical cancer. The data set was split in two equal subsets, one of them becoming the training set, and the other the internal validation set. The RNA sequencing and recurrence status and recurrence free survival time were downloaded from the TCGA database. 49 Patients were used for the external testing set. The data had been preprocessed using quantile normalization and log2-scale transformation, followed by a Cox regression analysis with Least Absolute Shrinkage and Selection Operator (LASSO). The nine-lncRNA signature was found using the expression of lncRNAs weighted by the coefficients computed by the LASSO Cox regression. For each sample, a score was computed taken into account the expression levels of the RNAs and the LASSO coefficients. Using the score, the patients were divided into two groups: high risk and low risk. For the survival analysis, the authors used Kaplan-Meier and Chi-square analysis. Besides this, the authors computed the dynamic AUC under the time specific ROC curves. The reported results were: high expression levels of three lncRNAs, ATXN8OS, C5orf60, and INE1, are correlated with shorter recurrence free survival; the expression level of DIO3OS, EMX2OS, KCNQ1DN, KCNQ1T1, LOH12CR2, and RFPL1S was negatively correlated with the risk of cervical cancer recurrence. The Chi-square analysis confirmed the fact that the recurrence

8.3 Cluster network

277

rate is lower in the low risk group, and higher in the high-risk group. Further studies need to be conducted in order to confirm the established signature. Jeong et al. developed a nomogram that predicts the 5-year recurrence free survival for gastric cancer patients using a prognostic biomarker gene expression ( Jeong et al., 2019). T1 gastric cancer incidence has increased over the years in South Korea, from 30.4% in 1995, 47.4% in 2004, reaching 61% in 2014 (Information Committee of Korean Gastric Cancer, 2016). Five gastric cancer candidate molecules: PPASE, CAPZA, gamma-enolase, SHH, and OCT-1 have been previously established by the research team ( Jeong et al., 2012, 2014; Kim et al., 2012a,b; Lee et al., 2013; Park et al., 2017). In this research, using the prognostic biomarker gene expression they wanted to develop a nomogram for the 5-year recurrence free survival for gastric cancer patients. The training data set contained gastric cancer patients diagnosed with stage I, 206 patients, stage II, 62 patients, and stage III, 92 patients, as for the validation data set there have been used for stage I, 157 cancer patients, for stage II, 102, and for stage III, 136. The overall reported recurrence rate for the training set was of 46.6%, and for the testing set was of 52.8%, for the median 60 months follow-up period. After analyzing the differences between recurrent and non-recurrent events, using the Fisher’s test, the authors found out that 6 out of 10 genes were significantly different (CAPZA, gamma-enolase, PRDX4, OCT-1, c-Myc, and c-Met). Besides this, the Cox proportional hazards were also employed and the results obtained were that the PPASE, CAPZA, gamma-enolase, PRDX4, OCT-1, and cMyc, together with age and sex, were significantly different between the two groups of patients. Using the information obtained by the statistical analysis, the researchers developed the nomogram that computed the risk of recurrence of a patient taking into account the above genes, plus age and sex. An obtained high score meant that the patient had a high risk of disease relapse. Afterwards, all the patients were divided into four groups accordingly to their risk of recurrence: low, intermediate, high, and very high. Thus, in the training set, the probability of no disease relapse was 89% in the low-risk group, 75% in the intermediate risk group, 54% in the high-risk group, and 32% in the very high-risk group. As for the validation set, the recurrence free probabilities were 89% for the low risk group, 75% for the intermediate risk group, 63% for the high-risk group, and 60% for the very high-risk group. The reported AUCs were 0.718 for the training set, and 0.640 for the validation set. We need to point out that it would have been interesting to see the precision recall score, not the ROC score, since both data sets, training and validation, are unbalanced. Using the Kaplan-Meier survival curves, the recurrence free probability was computed for the training set. The reported results for the very high-risk group were: up to a year the probability was 0.92, and 2 patients out of 25 (2 were censored) had a recurrent event; between 1 and 3 years the probability was 0.44, and 11 out of 21 patients had a relapse; between 3 and 5 years, the probability was 0.26, and 4 out of 10 patients had a recurrent event. For the highrisk group, the reported probabilities: up to 1 year, 0.96, with 2 recurrent events in 50 patients (1 was censored); between 1 and 3 years, 0.69, with 13 recurrent events in 47 patients; between 3 and 5 years, 0.59, with 5 recurrent events in 34 patients. For the intermediate risk group the results were: up to 1 year, 0.95, 4 recurrent events in 89 patients; between 1 and 3 years, 0.83, with 10 recurrences in 88 patients; between 3 and 5 years, 0.77, with 5 recurrences in 67 patients. In the low risk group, the probabilities were: up to 1 year, 0.99, only one recurrent event in 151 patients; between 1 and 3 years, 0.94, with 3 recurrent events in 145 patients; between 3

278

8. Remission and recurrence. What do to next?

and 5 years, 0.91, with 4 recurrent event in 134 patients. As for the recurrence free probabilities for the validation data set, the results were: for the very high-risk group 0.88 (up to 1 year), 0.63 (1–3 years), and 0.53 (3–5 years); for the high-risk group 0.94 (up to 1 year), 0.74 (1–3 years), and 0.47 (3–5 years); for the intermediate risk group 0.96 (up to 1 year), 0.79 (1–3 years), and 0.74 (3–5 years); for the low risk group 0.96 (up to 1 year), 0.94 (1–3 years), and 0.483 (3–5 years). Slowly, but steadily, we have reached the end of this chapter. We have seen different AI techniques and models that let data scientists predict with certain accuracy what are the odds for patients to have a recurrent event in the 5-year follow-up after the first diagnosis. We have learned that the recurrent events are fewer if the cancer is caught early. We have discussed remission and spontaneous remission, and that there is still no scientific explanation for them. Using AI, doctors can communicate better with their patients, tell them what are the changes for the cancer to come back, so they could expect the best, but prepare for the worst. Even if the recurrence rates are high, and the patient does not believe it will happen to her/him, a part of her/his brain will always know that there is definitely that possibility, so she/he will be more psychologically prepared when and if that event does happen. It is time to enter the last chapter of the book, Chapter 9, and get a glimpse of what the future may lay before us.

References Barrett, T., Troup, D.B., Wilhite, S.E., Ledoux, P., Rudnev, D., Evangelista, C., Kim, I.F., Soboleva, A., Tomaschevsky, M., Edgar, R., 2007. NCBI GEO: mining tens of millions of expression profiles—database and tools update. Nucleic Acids Res. 35, D760–D765. Belciug, S., Gorunescu, F., Gorunescu, M., Salem, A.B., 2010. Clustering based approach for detecting breast cancer recurrence. In: Proceedings of the 10th IEEE International Conference on Intelligent Systems Design and Applications, Cairo, pp. 533–538. Benson III, A.B., Schrag, D., Somerfield, M.R., Cohen, A.M., Figueredo, A.T., Flynn, P.J., KRzyazanowska, M.K., Maroun, J., McAllister, P., Van Cutsem, E., Brouwers, M., Charette, M., Haller, D.G., 2004. American Society of Clinical Oncology recommendations on adjuvant chemotherapy for stage II colon cancer. J. Clin. Oncol. 22 (16), 3408–3419. Bloom, H.J., Richardson, W.W., 1957. Histological grading and prognosis in breast cancer: a study of 1409 cases of which 359 have been followed for 15 years. Br. J. Cancer 11 (3), 359–377. Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J., 1984. Classification and Regression Trees. Wadsworth and Brooks/Cole Advanced Books and Software, Monterey. Buchholz, T.A., Hunt, K.K., Whitman, G.J., Sahin, A.A., Hortobagyi, G.N., 2003. Neoadjunvant chemotherapy for breast carcinoma: multidisciplinary considerations of benefits and risks. Cancer 98 (6), 1150–1160. Castellsague, X., Diaz, M., Sanjose, S.D., Munoz, N., Herrero, R., Franceschi, S., Peeling, R.W., Asley, R., Smith, J.S., Snijders, P.J., Meijer, C.J., Bosch, F.X., International Agency for Research on Cancer Multicenter Cervical Cancer Study Group, 2006. Worldwide human papillomavirus etiology of cervical adenocarcinoma and its cofactor: implications for screening and prevention. J. Natl. Cancer Inst. 98 (5), 303–315. Clough, E., Barrett, T., 2016. The gene expression omnibus database. Methods Mol. Biol. 1418, 93–110. Coley, W.B., 1891. Contribution to the knowledge of sarcoma. Ann. Surg. 14, 199–220. Coley, W.B., 1893. Fluctuations in the growth energy of tumors in man, with especial reference to spontaneous recession. J. Cancer Res. 3, 193–225. Coley, W.B., 1894. Treatment of inoperable malignant tumors with toxins of erysipelas and Bacillus prodigiosus. Trans. Am. Surg. Assn. 12, 183–212. Colvin, H., Mizushima, T., Eguchi, H., Takiguchi, S., Doki, Y., Mori, M., 2017. Gastroenterological surgery in Japan: the past, the present, and the future. Ann. Gastroenterol. Surg. 1, 5–10. de Vicente, J.C., Esteban, I., Germana, P., Germana, A., Vega, J.A., 2003. Expression of ErbB-3 and ErbB-4 protooncogene proteins in oral squamous cell carcinoma: a pilot study. Med. Oral 8 (5), 374–381.

References

279

de Vicente, J.C., Fresno, M.F., Villalain, L., Vega, J.A., Hernandez Vallejo, G., 2005a. Expression and clinical significance of matrix metalloproteinase-2 and matrix metalloproteinase-9 in oral squamous cell carcinoma. Oral Oncol. 41 (3), 283–293. de Vicente, J.C., Fresno, M.F., Villalain, L., Vega, J.A., Arranz, L., 2005b. Immunoexpression and prognostic significance of TIMP-1 and -2 in oral squamous cell carcinoma. Oral Oncol. 41 (6), 568–579. de Vicente, J.C., Olay, S., Lequerica Fernandez, P., Sanchez Mayoral, J., Junquera, L.M., Fresno, M., 2006. Expression of Bcl-2 but not Bax has a prognostic significance in tongue carcinoma. J. Oral Pathol. Med. 35 (3), 140–145. de Vicente, J.C., Lequerica Fernandez, P., Santamaria, J., Fresno, M.F., 2007. Expression on MMP-7 and MT1 MMP in oral squamous cell carcinoma as predictive indicator for tumor invasion and prognosis. J. Oral Pathol. Med. 36 (7), 415–424. Exarchos, K.P., Goletsis, Y., Fotiadis, D.I., 2012. Multiparametric decision support system for the prediction of oral cancer reoccurrence. IEEE Trans. Inf. Technol. Biomed. 16, 1127–1134. Fuller, C.D., Wang, S.J., Thomas, C.R., Hoffman, H.T., Weber, R.S., Rosenthal, D.I., 2007. Conditional survival in head and neck squamous cell carcinoma. Cancer 109, 1331–1343. Gorunescu, F., Gorunescu, M., El-Darzi, E., Gorunescu, S., 2010. A statistical framework for evaluating neural networks to predict recurrent events in breast cancer. Int. J. Gen. Syst. 39, 471–488. Grange, J.M., Stanford, J.L., Stanford, C.A., 2002. Campbell de Morgan’s ‘Observations on cancer’, and their relevance today. J. R. Soc. Med. 95, 296–299. Hall, M.A., 1999. Feature selection for discrete and numeric class machine learning. In: ICML: Proc. 17th Int Conf. on Machine Learning, pp. 359–366. Hao, J., Chiang, Y.T., Gout, P.W., Wang, Y., 2015. Elevated XPO6 expression as a potential prognostic biomarker for prostate cancer recurrence. Front. Biosci. 8, 44–55. Haykin, S.H., 1999. Neural Networks: A Comprehensive Foundation. Prentice Hall. Hoption Cann, S.A., van Netten, J.P., van Netten, C., 2003. Dr William Coley and tumor regression: a place in history or in the future? Postgrad. Med. J. 79, 672–680. Hunt, E.B., Marin, J., Stone, P.J., 1966. Experiments in Induction. New York Academic. Information Committee of Korean Gastric Cancer, 2016. Korean gastric cancer association nationwide survey on gastric cancer in 2014. J. Gastric Cancer 16, 131–140. Isenberg, J., Stoffel, B., Wolters, U., Beuth, J., Stutzer, H., Ko, H.L., Pichlmaier, H., 1995. Immunostimulation by propionibacteria—effects on immune status and antineoplastic treatment. Anticancer Res. 15, 2363–2368. Jackson, R., 1974. Saint Penegrine, O.S.M.—the patron saint of cancer patients. Can. Med. Assoc. J. 111, 824–827. Jeong, S.H., Ko, G.H., Cho, Y.H., Lee, Y.J., Cho, B.I., Ha, W.S., Choi, S.K., Kim, J.W., Lee, C.W., Heo, Y.S., Shin, S.H., Yoo, J., Hong, S.C., 2012. Pyrophosphatase overexpression is associated with cell migration, invasion, and poor prognosis in gastric cancer. Tumor Biol. 33, 1889–1898. Jeong, S.H., Lee, Y.J., Cho, B.I., Ha, W.S., Choi, S.K., Jung, E.J., Ju, Y.T., Jeong, C.Y., Yoo, J., Hong, S.C., 2014. OCT-1 overexpression is associated with poor prognosis in patients with well-differentiated gastric cancer. Tumor Biol. 35 (6), 5501–5509. Jeong, S.H., Kim, R.B., Park, S.Y., Park, J., Jung, E.J., Ju, Y.-t., Jeong, C.Y., Park, M., Ko, G.H., Song, D.H., Koh, H.M., Kim, W.H., Yang, H.K., Lee, Y.L., Hong, S.C., 2019. Nomogram for predicting gastric cancer recurrence using biomarker gene expression. Eur. J. Surg. Oncol. https://doi.org/10.1016/ejso.2019.09.143. Johnston, B.J., 1962. Clinical effects of Coley’s toxin. I. A controlled study. Cancer Chemother. Rep. 21, 19–41. Johnston, B.J., Novales, E.T., 1962. Clinical effect of Coley’s toxin II. A seven-year study. Cancer Chemother. Rep. 21, 43–68. Joyce, J.M., 2014. Kullback Leibler divergence. In: Lovric, M. (Ed.), International Encyclopedia of Statistical Science. Springer, Berlin, Heidelberg. Kim, W., Kim, K.S., Lee, J.E., Noh, D.Y., Kim, S.W., Jung, Y.S., Park, M.Y., Park, R.W., 2012a. Development of novel breast cancer recurrence prediction model using support vector machine. J. Breast Cancer 15 (2), 230–238. Kim, J.Y., Ko, G.H., Lee, Y.J., Ha, W.S., Choi, S.K., Jung, E.J., Jeong, C.Y., Ju, Y.T., Jeong, S.H., Hong, S.C., 2012b. Prognostic values of sonic hedgehog protein expression in gastric cancer. Jpn. J. Clin. Oncol. 42 (11), 1054–1059. Kodahl, A.R., Lyng, M.B., Binder, H., Cold, S., Gravgaard, K., Knoop, A.S., Ditzel, H.J., 2014. Novel circulating microRNA signature as a potential non-invasive multi-marker test in ER positive early stage breast cancer: a case control study. Mol. Oncol. 8 (5), 874–883. Kohavi, R., JOhn, G.H., 1997. Wrappers for feature subset selection. Artif. Intell. 97, 273–324. Krone, B., Kolmel, K., Grange, J.M., 2014. The biography of the immune system and the control of cancer: from St Peregrine to contemporary vaccination strategies. BMC Cancer 14, 595. Kullback, S., Leibler, R.A., 1951. On information and sufficiency. Ann. Math. Stat. 22, 79–86.

280

8. Remission and recurrence. What do to next?

Labianca, R., Nordlinger, B., Beretta, G.D., Mosconi, S., Mandala, M., Cervantes, A., Arnold, D., 2013. ESMO Guidelines Working Group. Early colon cancer: ESMO clinical practice guidelines for diagnosis, treatment and followup. Ann. Oncol. 24 (6), 64–72. Lee, Y.J., Jeong, S.H., Hong, S.C., Cho, B.I., Ha, W.S., Park, S.T., Choi, S.K., Jung, E.J., Ju, Y.T., Jeong, C.Y., Kim, J.W., Lee, C.W., Yoo, J., Ko, G.H., 2013. Prognostic value of CAPZA1 overexpression in gastric cancer. Int. J. Oncol. 42 (5), 1569–1577. Mao, Y., Fu, Z., Zhang, Y., Dong, L., Zhang, Y., Zhang, Q., Li, X., Liu, J., 2018. A seven-lncRNA signature predicts overall survival in esophageal squamous cell carcinoma. Sci. Rep. 8, 8823. Mao, Y., Dong, L., Zheng, Y., Dong, J., Li, X., 2019. Prediction of recurrence in cervical cancer using a nine-lncRNA signature. Front. Genet. https://doi.org/10.3389/fgene.2019.00284. Nauts, H.C., 1980. The Beneficial Effects of Bacterial Infections on Host Resistance to Cancer. End results in 449 Cases (Monograph No. 8), second ed. Cancer Research Institute, New York. NCCN, 2017. National Comprehensive Cancer Network. Clinical Practice Guidelines in Oncology, Colon Cancer. Version 2. Niu, Y., Otasek, D., Jurisica, I., 2010. Evaluation of linguistic features useful in extraction of interactions from PubMed; application to annotating known, high throughput and predicted interactions in I2D. Bioinformatics 26, 111–119. O’Connell, J.B., Maggard, M.A., Ko, C.Y., 2004. Colon cancer survival rates with the new American Joint Committee on Cancer sixth edition staging. J. Natl. Cancer Inst. 9 (6), 1420–1425. Park, C., Ahn, J., Kim, H., Park, S., 2014. Integrative gene network construction to analyze cancer recurrence using semi-supervised learning. PLoS One. 9, e86309. Park, T., Lee, Y.J., Jeong, S.H., Choi, S.K., Jung, E.J., Ju, Y.-T., Jeong, C.Y., Park, M., Hah, Y.S., Yoo, J., Ha, W.S., Hong, S.C., Ko, G.H., 2017. Overexpression of neuron specific enolase as a prognostic factor in patients with gastric cancer. J. Gastric Cancer 17 (3), 228–236. Parkin, D.M., Bray, F., Ferlay, J., Pisan, P., 2005. Global cancer statistics. CA Cancer J. Clin. 55, 74–108. Rhodes, D.R., Yu, J., Shanker, K., Deshpande, N., Varambally, R., Ghosh, D., Barrette, T., Pandey, A., Chinnaiyan, A.M., 2004. ONCOMINE: a cancer microarray database and integrated data-mining platform. Neoplasia 6 (1), 1–6. Rohdenburg, G.L., 1918. Fluctuations in the growth energy of tumors in man, with especial reference to spontaneous recession. J. Cancer Res. 3, 193–225. Rosado, P., Lequerica, F.P., Villalain, L., Pena, I., Sanchez Lasheras, F., de Vincente, J.C., 2013. Survival model in oral squamous cell carcinoma based on clinicopathological parameters, molecular markers and support vector machines. Expert Syst. Appl. 40 (12), 4770–4776. Saso, K., Myoshi, N., Fujino, S., Takenaka, Y., Takahashi, Y., Nishimura, J., Yasui, M., Ohue, M., Tokuoka, M., Ide, Y., Takahasi, H., Haraguchi, N., Hata, T., Matsuda, C., Mizushima, T., Doki, Y., Mori, M., 2018. A novel prognostic prediction model for recurrence in patients with stage II colon cancer after curative resection. Mol. Clin. Oncol. 9, 697–701. Starnes, C.O., 1992. Coley’s toxins in perspective. Nature 357, 11–12. Tang, Z.Y., Zhou, H.Y., Zhao, G., Chai, L.M., Lu, J.Z., Liu, K.D., Havas, H.F., Nauts, H.C., 1991. Preliminary results of mixed bacterial vaccine as adjuvant treatment of hepatocellular carcinoma. Med. Oncol. Tumor Pharmacother. 8, 23–28. Torre, L.A., Bray, F., Siegel, R.L., Ferlay, J., Lortet Tieulent, J., Jemal, A., 2015. Global cancer statistics, 2012. CA Cancer J. Clin. 65, 87–108. Tsung, K., Norton, J.A., 2006. Lessons from Coley’s toxin. Surg. Oncol. 15, 25–28. Ture, M., Tokatli, F., Kurt, I., 2009. Using Kaplan-Meier analysis together with decision tree methods (CART, CHAID, QUEST, C4.5, and ID3) in determining recurrence free survival of breast cancer patients. Expert Syst. Appl. 36 (2), 2017–2026. Watanabae, T., Muro, K., Aioka, Y., Hashiguchi, Y., Ito, Y., Saito, Y., Hamaguchi, T., Ishida, H., Ishiguro, M., Ishihara, S., Kanemitsu, Y., Kawano, H., Kinugasa, Y., Kokudo, N., Murofushi, K., Nakajima, T., Oka, S., Sakai, Y., Tsuji, A., Uehara, K., Ueno, H., Yamazaki, K., Yoshida, M., Yoshino, T., Boku, N., Fujimori, T., Itabashi, M., Koinuma, N., Morita, T., Nishimurra, G., Sakata, Y., Shimada, Y., Takahashi, K., Tanaka, S., Tsuruta, O., Yamaguchi, T., Yamaguchi, N., Tanka, T., Kotake, K., Sugihara, K., Japanese Society for Cancer of the Colon and Rectum, 2018. Japanese Society for Cancer of the Colon and Rectum (JSCCR) guidelines 2016 for the treatment of colorectal cancer. Int. J. Clin. Oncol. 23 (1), 1–34. Wiemann, B., Starnes, C.O., 1994. Coley’s toxins, tumor necrosis factor and cancer research: a historical perspective. Pharmacol. Ther. 64, 529–564.

C H A P T E R

9

Artificial Intelligence in cancer: Dreams may come true someday Believe it or not, we have reached the end of our book. Going through each phase of a cancer patient’s life, we could not help but wonder whether indeed one day dreams may come true. When thinking about this, two songs of the Beatles come to ours minds: Imagine and Let it be. Can we imagine a world where the cure for cancer has been found? Or a world where we can stop cancer before it attacks? One may say that these are just dreams, but if a nightmare like cancer can happen, why not a dream? There are some skeptics that believe in different conspiracies, such as there is no cancer cure and all the patients that have healed, were just misdiagnosed in the first place. Others believe that there is a cure for cancer, but we do not have access to it. Anyone can think and believe anything they want, but the truth remains that medicine has truly progressed over the years. The only problem is that cancer gets smarter too, and finds new ways to by-pass the new treatments. We have seen that AI is not voodoo, magic, or whatever. It is math and powerful computers. That is it. Can we use our knowledge in the field of AI applied in cancer research to predict how will the AI + cancer combo will look in the future? One thing is certain: no matter what new AI technology will be developed, guidelines on how to statistically validate the results must be established. The written word has great power over the person who reads it, so invading the academic scholarly articles with snake oil is devastating. A research paper that presents extraordinary results, but that cannot be replicated, does more harm than good. Obviously, reading news like “the researchers from X institute have discovered the cure for breast cancer,” will become viral, even if it is fake. Doctors, and other people in general, need to be educated to understand what they read. If someone says that they obtained 100% accuracy means that there is something definitely wrong with that result. Maybe they did not use sufficient data, or maybe they tailored the data in such a way that their method would give those extraordinary results. If we see a research paper that only reports the sensitivity obtained, we need to understand that is not enough. Question everything. Question why they did not report the specificity? Who computes only one parameter from the combo and why? Lack of knowledge couldn’t possibly

Artificial Intelligence in Cancer: Diagnostic to Tailored Treatment https://doi.org/10.1016/B978-0-12-820201-2.00009-X

281

# 2020 Elsevier Inc. All rights reserved.

282

9. Artificial Intelligence in cancer: Dreams may come true someday

be the answer. Maybe they just want their paper published, so, they make the results more appealing. Another case would be when we start reading about a new chemotherapy drug that is said it treats cancer better than a drug that is already on the market. We need to see if the statistical analysis that was performed between the two groups of patients was correctly applied or not. If the scientists used statistical tests that are based on the supposition that the data is normally distributed, but they did not check whether the assumption is correct or not, that implies that the obtained results are basically wrong. Another problem with most research papers is the lack of power analysis computation. Everybody uses a certain data set without finding out what is the statistical power that the sample size could provide. Many statistical and AI software are now on the market and are easy to use by everybody. In just a few clicks you can ask the platform to perform survival analysis, classification using support vector machine, or convolutional neural networks, or regression, etc. Besides using these techniques everybody can “play” with statistical analysis. How hard could it be to perform a Levene’s test? Or Mann Whitney U test? You just click, and voila! The results are there. The question that needs to be asked is: do you know how to interpret what you see on the screen? Do you understand basic concepts like p-level, etc. Everyone can plot receiver operating curves and compute the area under the curve, but not everybody knows when it is proper to use them, and when we should use precision recall curves. Our point is that in order for AI to progress in cancer research, doctors should start getting trained in AI, and especially statistics, or better yet combos of doctors + data scientists +should become a regular thing. One of issues with AI applied in cancer is that sometimes the costs are high. The healthcare system is expensive, and many medical centers around the world will not be able to afford to invest in the tools that are needed. More so, if we think things through, even if the budgets might afford the equipment, there is need for medical professionals that are trained in this new area of healthcare. The odds of wide spreading AI in healthcare systems in the near future are pretty low. However, if indeed the implementation of AI within hospitals will be done, clinicians will have extra time for patient interaction. This face time with the patients will be really necessary because some patients might become reluctant to the use of AI in their diagnosis, or treatment plan. How hard will it be to tell a patient that a computer states that her/his chances to survive the next 5 years are below 20%? Or that her/his genomic data shows that she/he will not benefit from chemotherapy? Imagine that she/he still wants to undergo chemo, but the insurance companies will not settle the expenses. Why should they? A computer said that this is an unnecessary course of treatment. No money, no chemo, no chance. Will she/he understand why an AI has taken away her/his hope of what she/he believed is her/his only chance of surviving? How will a regular person perceive the adoption of AI by their doctor? A plus? Or a disruptive element? The key answer here is communication. Most probably the oncologist should pair up with a psychologist when breaking the news to her/his patient. This is just one side of the coin. Maybe some patients will be so found of AI, that they will truly believe that computers are able to diagnose better, or choose the better treatment for them. Will they stop going to their doctor and just ask a computer? We don’t know what the future holds, but you know as well as we do that already people are Googling their

9. Artificial Intelligence in cancer: Dreams may come true someday

283

symptoms instead of going to see a doctor. What will happen if this technology will be so widespread and any type of cancer can be detected using you smart phone? We already have skin cancer apps on the market. And we are not talking about just an app; no, we are talking about at least five apps. All of these apps claim that they are able to identify moles and lesions that might become skin cancer. Here we shall mention the most popular skin cancer apps: • UMSkinCheck—https://www.uofmhealth.org/patient%20and%20visitor%20guide/myskin-check-app (Accessed December 7, 2019)—is an app developed by the researchers from the University of Michigan, that guides the users into performing a home skin check exam. Besides this you can save your mole’s picture into the app, and compare it overtime to notice if anything changes. • MoleMapper—www.molemapper.org (Accessed December, 7, 2019)—is an app developed by the Oregon Health and Science University. The app allows the user to take photos and also to measure moles on their bodies. Just like the UMSkinCheck, you can regularly save pictures of the mole and check whether it has changed on not. • Miiskin—https://miiskin.com/app/? gclid¼EAIaIQobChMIxayqrIKF5AIVibbtCh2uRAgBEAAYASAAEgJs2vD_BwE (Accessed December 7, 2019)—the free version of the app allows the user to take pictures of their moles for tracking purposes. If you pay some extra money it can help you track large areas of the skin. The paid version spots marks and moles that otherwise would have gone undetected. • MoleScope—www.molescope.com (Accessed December 7, 2019)—is a high-resolution camera that works with multiple smartphones. The app is able to take better quality pictures than any other app, otherwise it does the same thing as all the other skin cancer apps. • SkinVision—www.skinvision.com (Accessed December 7, 2019)—classifies moles into two categories: the ones that can be left alone, and the ones that need further monitoring. The photos that you take of moles are classified as being low or high risk. Besides this feature, SkinVision can also advice you on how to proceed further. So, we have at least five apps that do the same thing: let you take pictures of your mole, tracks their changes, and maybe classifies them into low and high risk categories. First of all, if we were to use all of the five apps, and they would provide different answers for the same mole, what would we do? Which one would we trust? Let us see what some researchers discovered when assessing the publicly available skin cancer apps. Wise’s article was published in BMJ in 2018 (Wise, 2018). He made a list of why using smart phones for diagnosing skin cancer is not recommended. The reasons are: lack of statistical analysis in order to test their effectiveness, an insufficient input from experts when the app is developed; and issues with the technology itself. Wolf et al. (2013) shows that skin cancer apps have very poor diagnostic accuracies. The data set consisted of digital clinical images of pigmented cutaneous lesions, 60 of them melanoma and 128 benign control lesions. Their diagnoses were set by a board certified dermatopathologist. The scope of the study? To compute the sensitivity, specificity, positive predictive values and negative predictive values of four smart phone apps that are used by the layman to determine if her/his mole is benign or malignant. The reported results were

284

9. Artificial Intelligence in cancer: Dreams may come true someday

very poor. The sensitivity ranged from 6.8% to 98.1%, the specificity from 30.4% to 93.7%, the positive predictive value from 33.3% to 42.1%, and the negative predictive value from 65.4% to 97%. The only app that obtained the highest sensitivity sends the images directly to a board certified dermatologist for the analysis. The other, that obtained poor results, uses AI to analyze the images. Let us dig even deeper: if a person checks herself/himself, without any app, the reported sensitivity ranges from 25% to 93%, whereas the specificity from 83% to 97% (Hamidi et al., 2010). Thus, if you do not have access to the statistical assessment of the study behind the app, you probably should not use it. If we need doctors to prescribe antibiotics, wouldn’t it be natural not to let regular people have access to powerful AI cancer related applications? If one thinks that using AI in diagnosing, or even treating cancer is so simple that any person who did not learn and work hard for many years during her/his med school, residency, specialty, etc., could use them, then you could not be anymore further from the truth. This is why AI in cancer must be used only if it is medical CE and/or FDA approved, and only if it has been thoroughly statistical assessed and validated. Another reason on why we should not trust free apps is because they do not notice every symptom. If the app has not been exposed to pictures of rare or unusual cancers, red flags such as scaly, amelanotic melanomas (e.g. melanomas that do not produce pigment), or ulcerated areas, might be missed. This issue can produce false negatives (University of Birmingham, 2018). Besides this fact, people who play doctor do not know what to look for, so they would go on studying moles that may look suspicious, but are not, ignoring others that are truly worrying, lesions that a real doctor would look carefully at. Even if apps use high-resolution cameras, they still don’t match the accuracy that is given by a dermatoscope. In clinical images asymmetries or blue-white pigmentations are hard to be seen. Besides this, the photos taken with your phone might be influenced by angles, light, etc. (Maier et al., 2015). The authors reviewed 195 lesions. The histopathological analysis classified them as 40 melanomas, 42 dysplastic nevi, and 113 benign nevi. The sensitivity obtained by the smartphone images was 73%, whereas the specificity 83%. The dermatoscope achieved a sensitivity of 88%, and a specificity of 97%. Returning to the most crucial aspect that lack skin cancer apps, the Federal Trade Commission (FTC) took action in 2015 against two marketers that developed skin cancer apps. The apps stated that they could detect melanoma at its earliest stage. Still no scientific proof was offered to sustain their claim. In two separate cases (Mel App and Mole Detective) the marketers settled with the FTC, and they could not continue anymore to state that their apps are able to detect melanoma. Still, in other two cases the marketers of the Mole Detective did not want to settle, so the FTC continued pursuing accusations against them. The app from Mole Detective was sold online on Apple and Google app stores with the price of $4.99. The MelApp was sold for $1.99—https://www.ftc.gov/news-events/press-releases/2015/ 02/ftc-cracks-down-marketers-melanoma-detection-apps (Accessed December 8, 2019). Just look at the price tag. Detecting melanoma at an early stage with only $1.99? Or $4.99? Now let us look at the cost of seeing a dermatologist around the world: • United States: for people that have insurance the copay for a dermatologist visit is around $30. The average wait time is 32 days. For people without insurance a dermatologist visit’s cost is ranges between $150 and $200. The cheaper clinics charge $70–$80 per visit.

9. Artificial Intelligence in cancer: Dreams may come true someday

285

• Europe: for people that have insurance through the health care system the dermatologist visit is free. The average wait time is between 2 and 6 weeks. If you pay yourself for a dermatologist visit the cost is around £220–£250. The visit lasts around 6 min. • Canada: if you have insurance through the health care system, the dermatologist visit is free. The cost of a visit start at $125. • Australia: if you go through the health care system it can take up to 3 months to get an appointment to see a dermatologist. If you pay it by yourself the cost of the first visit is between $110 and $200, and for the subsequent visits the cost is between $75 and $100. • Middle east: in Dubai, people are charged with 400 AED, which is about $108, for a dermatologist visit. • Asia: a visit at the dermatologist in New Delhi, Hong Kong etc. costs between $100 and $200. • Central and South America: in Mexico, a dermatologist’s visit costs between $40 and $95. These prices were seen on the following website: https://www.firstderm.com/cost-seedermatologist-country/ (Accessed December 8, 2019). After this long list the site offers you the possibility of taking your mole picture and send it to a vetted board-certified dermatologist for evaluation. The time frame is 24 h. If you look at the prices, and you cannot afford to go to specialist, this offer might be tempting. Many people are not familiar with the consequences of using cheap medical services. And they are not to blame. This is why it is crucial for people to become more and more educated in what AI in cancer really is. They need to understand that if an automated app costs as little as $1.99, they cannot trust it. Designing, implementing, testing, assessing an AI application takes time, money, and a lot of human resources. We have seen throughout the book, that from the idea proposed by AI data scientists, which is published in peer-reviewed journals, to a final product made by AI engineering is a long way. Besides the fact that this almost free apps are deceiving, they also make AI applied in cancer research seem untrustworthy. There are two potential problems here: the first, people will really start to believe that AI is snake oil, and will stop trusting its potential, and secondly due to this fake AI prices, people will stop going to the real doctors, and use these apps. This might lead to an aggravation of the disease, which basically leads to more money being spent, than in the first place, and even possibly the death of the patient. As we speak, the FDA down-regulated AI to class II that is “it doesn’t kill you.” Thus, more and more companies are developing new tools in order to get their FDA approval. Sujay Kakarmath, a post-doctoral research fellow at Partners Connected Health and Harvard Medical School, stated at the HIMSS Precision Medicine Summit in 2018, that one of the most important things is to evaluate the performance of AI technology on the data used, “Idiosyncrasies of a health care system can affect the performance of A.I. tools in unexpected ways.” Keep in mind that each hospital or clinic manages data differently. Another crucial point raised by Kakarmath is the following: “The technical performance of an algorithm for a given task is far from being the only metric that determines its potential impact. Evaluation of the true cost of implementing an algorithm should take into account factors such as the technical infrastructure and human resources required, cost of acting on false positives, cost of inaction on false negatives as well as decay in algorithm performance that occurs in diseases where medical science is evolving rapidly, such as

286

9. Artificial Intelligence in cancer: Dreams may come true someday

cancer”—https://www.healthcareitnews.com/news/testing-algorithms-key-applying-aiand-machine-learning-healthcare (Accessed December 8, 2019). The success of AI in cancer research depends on the relationship with the data provider. What we teach AI algorithms depends on their future behavior. If we teach them to learn from noisy data, they will fail to perform well when dealing with new cases. If you are a dog owner you know what I am talking about, especially if you own a border collie. It is said that if you teach a border collie something that it isn’t supposed to do, or actually not teach it yourself, more like let it learn by itself, then it will continue with that bad behavior, and it will be harder to correct it, much harder than if you would have paid attention in the beginning when you were starting to teach it. Exactly like this is AI. Feed it wrong labeled examples, noisy examples, it will tune its hyperparameters and fail you when you at least expect it. Dr. Jack Kreindler, from the Centre for Health and Human Performance made a statement in which he said that he already trusts technology more than human doctors. His exact words were: “I would sooner today trust computer scientists and data scientists to tell me how to treat a really complex system like cancer than my fellow oncologists. I would not have said that two to three years ago.” The comment was done regarding the Google artificial intelligence tool, LYNA, that can spot whether breast cancer has spread to the lymph nodes with accuracy of 99%—https://www.newsmax.com/Health/health-news/google-artificialintelligence-breast-cancer-lymph-nodes/2018/10/16/id/886462/ (Accessed December 8, 2019). McKinsey—https://www.mckinsey.com/industries/pharmaceuticals-and-medicalproducts/our-insights/how-big-data-can-revolutionize-pharmaceutical-r-and-d (Accessed December 8, 2019)—predicts that Big Pharma might generate over $100 billion per year using AI in research and development. Using data from patients, caregivers, retailers, Big Pharma companies could identify new drug candidates faster. Thinking of John Lennon’s Imagine, imagine: • using AI, we could model biological processes and efficient drugs will become widespread.; • patients would be identified for clinical trials from social media, not just from doctor’s visits. Taking into account their genetic information the trials would be smaller, thus less expensive, and ultimately more accurate; • adverse issues could be spotted before they would happen. Fiona Nielsen, from the Repositive.io, believes that curing cancer with AI is a way off, but might be possible if we unlock data in order to make new discoveries. Just like her, Dr. Hugh Harvey from Kheiron Medical says that AI algorithms are the new drugs. Fiona Nieles believes that pharmaceutical companies will make use of AI technologies to reduce the cost of developing new efficient drugs. Indeed AI will improve cancer treatment, because by its computing power it is able to tailor special treatment plans for each individual. A team from Intel is working on this issue already. They developed a Collaborative Cancer Cloud that aims to tailor unique treatment plans—https://itpeernetwork.intel.com/intel-ohsu-announce-collaborative-cancercloud-at-intel-developers-forum/#gs.kky8gi (Accessed December 8, 2019). Until now we have discussed curing cancer and AI. But what about the cases when nothing can be done. When the cancer has spread so much than no drug, no surgery can help the

9. Artificial Intelligence in cancer: Dreams may come true someday

287

patient. Can AI help when there is nothing left to do? Apparently yes. Bob Gramling is a data scientist whose research focuses on dying patients, family members, and doctor talk about palliative care, imminent death, end of treatment, and pain management— https://qz.com/1700778/a-doctor-and-a-linguist-are-using-ai-to-improve-palliative-care/ (Accessed December 8, 2019). Recently, he received over $1 million from the American Cancer Society, to develop the most extensive study of palliative care conversations. He and his team built a database at the Vermont Conversation Lab, which contains over 12,000 min and 1.2 million words of conversations that involve 231 patients. Their aim is to analyze this data and find the characteristics of the conversations that make the patients and their families feel understood. His research assistant, Brigitte Durieux, listened to moments of silence and then classified them. Once labeled, all these moments of silence were fed to an AI algorithm in order to automatically detect moments of emotional connection between the doctor and the patient. A milestone study is the US Institute of Medicine’s Dying in America research (US Institute of Medicine, 2015). A similar study was published in the United Kingdom, under the name of Ambitions for Palliative and End of Life Care (Wee, 2016). Up until now, training in palliative care offers doctor templates on how to deliver bad news and help families decide following treatment. Apparently these training courses are not enough. In general, oncologists want focus on different protocols and clinical trials, because talking about death makes them uncomfortable. So, why not let AI lend a helping hand. Bob Gramling teamed up with his brother, David, a linguistic professor at the University of Arizona. Their work resulted in a book Palliative Care Conversations, Clinical and Applied Linguistic Perspectives (Gramling and Gramling, 2019). Their aim is to show doctors how their words are interpreted by the patients. David Gramling reviewed the conversations using conversation analysis. He listened for hours and hours doctor and patient conversations and labeled the important moments. Robert Gramling on the other hand studied at the Stanford Literary Lab how digital tools can be used to recognize patterns in large literary corpuses that are too dispersed for humans to catch. Communication in health care consists of monologues from doctors, but in palliative care we deal with another type of conversation called back-and-forth. An essential element is forgotten during the talk with older patients: the fact that they do not hear well. Smith et al. published a paper where they did a survey on 510 responses from hospice and palliative care facilities across the United States (Smith et al., 2016). Out of 510 respondents, 315 were physicians, 50 nurses, 48 nurse practitioners, 58 social workers, and 39 chaplains. 87% Reported that they did not screen for hearing loss, 61% of them felt comfortable with their communication skills for patients with hearing loss, but only 21% went to training courses where they learned how to deal with this issue, 31% were not familiar with the existence of resources for patients with hearing loss, and 38% did not even hear of a pocket talker amplification device. Besides this, Bob and David Gramling paid attention to other features in the communication between patient and doctor, such as pain, fatigue, shortness of breath, or medications. In the lab, the team uses AI to identify pauses longer or equal to 2 s, after which other researchers are labeling them, searching for pauses that are more than silence. Unfortunately building such a data set is hard, because of the lack of ground truth. The data scientists do not have access into the minds of doctors or patients to see what they were actually thinking when talking or pausing, so they started looking for other signs that signaled the presence of

288

9. Artificial Intelligence in cancer: Dreams may come true someday

emotions: words, sighs, or crying. Out of 100 clips with pauses the team found only 32 powerful pauses that lasted less than 4 s. After this type of pause the conversation shifts and the patient starts talking more than before, directing the conversations. This is just the beginning of AI in palliative care. The researchers only started studying English speakers. Speech and pauses are different from language to language, so data sets must be built on that kind of data also. Another difficulty arises from the fact that the data sets cannot be build on speech from healthy patients, because terminal cancer patients talk differently. Besides the Gramlings, this area is explored by James Tulsky, a palliative care physician from the Dana Farber Cancer Institute in Boston and also a Professor at Harvard. He teamed up with Panayiotis Georgiou, a computer engineer from the University of Southern California, in order to develop an AI system that could detect the emotional connections between doctors and patients. Tulsky together with his team published a paper regarding natural language processing in order to identify serious illnesses (Udelsman et al., 2019). The results are amazing: patients that suffered of pneumoperitoneum and leptomeningeal metastasis from breast cancer have been identified through natural language processing tools and administrative codes. Using administrative codes resulted in the identification of 6438 patients that were suspected of suffering of pneumoperitoneum, and 557 patients that were suspected of suffering of leptomeningeal metastasis. The natural language processing analysis reduced the numbers to 869 patients with pneumoperitoneum, and 187 with leptomeningeal metastasis. The statistical analysis revealed for the administrative codes a positive predictive value of 13% for pneumoperitoneum, and a positive predictive value of 25% for leptomeningeal metastasis. When the two were combined the reported result was positive predictive value of 100%. Obviously there are skeptics that do not believe AI can be of any use in palliative care. We believe that we just have to wait and see how things will turn out. Will AI bring more humanity to health care? Quoting agent Mulder from the X-files series: “the truth is out there.” Will our dreams regarding cancer come true one day? Maybe yes, maybe no. The truth is that cancer changes fast, and we must do the same, or else it will be always one step ahead of us. In this whole book, we talked about all sorts of things, except one. AI and doctors can do everything they can to help diagnose, treat, manage symptoms, etc., and still in some cases nothing can be done. Because we cannot control everything, we cannot control the human mind, and sometimes the attitude of a patient toward her/his illness is the best medicine there ever was. Besides this, if we have spontaneous remission that cannot be scientifically explained, we also have spontaneous deaths that cannot be scientifically explained. So, this is the reason we mentioned Paul McCartney’s Let it be: sometimes, after we did everything that could have been done, after we crossed barriers that we thought we could not cross, we just need to let it be, knowing as doctors, data scientists, patients, and family members that we did what needed to be done, and that from now on, nothing is under our control. That is why we believe that AI helps and will help cancer research in the future, because in a world of uncertainty, mathematics always provided a compass. Thank you for reading this book, we hope you learned new things, understood difficult concepts, and that ultimately found out that the two most important things that you can do is “to go home and love your family” (Mother Theresa), and “carpe diem” (Horace).

References

289

References Gramling, D., Gramling, R., 2019. Palliative Care Conversations: Clinical and Applied Linguistic Perspectives. De Gruyer Mouton. Hamidi, R., Peng, D., Cockburn, M., 2010. Efficacy of skin self examination for the early detection of melanoma. Int. J. Dermatol. 49, 126–134. Maier, T., Kulichova, D., Schotten, K., Astrid, R., Ruzicka, T., Berking, C., Udrea, A., 2015. Accuracy of smartphone application using fractal image analysis of pigmented moles compared to clinical diagnosis and histological results. J. Eur. Acad. Dermatol. Venereol. 29 (4), 663–667. Smith, A.K., Ritchie, C.S., Wallhagen, M.L., 2016. Hearing loss in hospice and palliative care: a national survey of providers. J. Pain Symptom Manage. 52 (2), 254–258. Udelsman, B., Chien, I., Ouchi, K., Brizzi, K., Tulsky, J.A., Lindvall, C., 2019. Needle in a haystack: natural language processing to identify serious illness. J. Palliat. Med. 22 (2). https://doi.org/10.1089/jpm.2018.0294. University of Birmingham, 2018. Three Major Failing in Some Apps Used for the Diagnosis of Skin Cancer. https:// www.sciencedaily.com/releases/2018/07/180705113940.htm. US Institute of Medicine, 2015. Dying in America: Improving Quality and Honoring Individual Preferences Near the End of Life. Committee on Approaching Death: Addressing Key end of Life Issues. National Academies Press, Washington, DC. Wee, B., 2016. Ambitions for palliative and end-of-life care. Clin. Med. 16 (3), 213–214. Wise, J., 2018. Skin cancer: smartphone diagnostic apps may offer false reassurance warn dermatologists. BMJ 362, k2999. Wolf, J.A., Moreau, J.F., Akilov, O., Patton, T., English 3rd, J.C., Ho, J., Ferris, L.K., 2013. Diagnostic inaccuracy of smartphones applications for melanoma detection. JAMA Dermatol. 149 (4), 422–426.

Index Note: Page numbers followed by f indicate figures, and t indicate tables.

A Accuracy, 25, 48, 50, 85, 106, 120–121, 136–137, 146, 169, 171–173, 175–176, 180, 189–190, 194, 201–202, 207, 210–211, 215, 228, 271–274, 278, 281–282, 284, 286 Activation function hyperbolic tangent, 88, 121 rectified linear unit, 88 sigmoid, 88 Actuarial tables, 246 Adversarial attacks brute-force, 150 confidence score, 150 gradient, 107 hard labels, 150 surrogate models, 150 Algorithm ant colony optimization feature selection, 127 backpropagation, 106–110 bayesian learning algorithm, 121–123 genetic algorithm, 110–111, 130 Hunt’s, 269–270 incremental, 130 Monte Carlo, 130 Analysis of variance, 22 ANOVA one way ANOVA, 37–40 Area under the ROC curve, 52–53

B Bartlett’s test, 37 Bayes’ formula, 45 Bayes’ theorem, 44–55, 121 Bernoulli experiments, 26 Bias, 86, 100, 132, 148, 204 Biological parameters assessment, 80–81 Bolzmann, 113 Box-and-whiskers plot, 23, 24f, 27, 28f, 29, 31f Branch, 146, 190, 192–193, 199, 268–269 Brown-Forsythe (B-F) test, 38

C Categorical data, 11, 22, 111, 224, 271–272 Censored data, 170, 238, 243

Central limit theorem, 16, 24 Chi-square test, 255 Chromosome, 4, 46, 111–117, 111f, 120–121, 130 Clinical data, 5, 180, 274 Clinically relevant difference, 11 Clinical trial, 10, 22, 29, 32, 40–41, 171, 180, 201, 216, 224–225, 236, 256–257, 286–287 Clustering complete linkage, 139 external validation, 139 furthest neighbor, 139 group average - unweighted pair group average, 139 group average - weighted pair group average, 139 Cluster network, 272–278 Cohort life table, 246 Conditional probability, 45, 121–122 Confidence interval, 12, 15, 22–23, 85, 129, 145, 169, 241, 245, 249 Confusion matrix, 49–50, 50t, 50–51f Continuous data, 11 Correlation coefficient, 92–93, 95–98, 122 Goodman-Kruskal rank, 121–122, 125 Covariates, 92, 256, 262 Cox regression model, 169–173, 211, 240–241, 273 Critical value, 14–16, 40, 55–57, 56–57t Crossover blend, 117 BLX-α, 117 linear BGA, 117 n-point, 114, 115f, 116 probability, 120–121 single arithmetic, 116, 116f total arithmetic crossover, 117 uniform, 115–116, 115f Wright’s heuristic, 117 Cross-validation leave-p-out, 84, 84f n-fold, 82–83 stratified n-fold, 83 CT scans, 46, 145, 147–148, 175–176, 188–193, 211, 213–214, 223–225, 275–276 Current life table, 246

291

292

Index

D Death overall death probability, 239 probability, 237–239 rate, 73–74, 170, 174, 237, 239, 246, 259 Decision performance, 25 Decision trees, 267–272 Deep learning, 85, 90, 130–136, 146–149, 166–168, 171–173, 223–224, 228 Defense adversarial training, 150 detection, 151 empirical, 150 extra class, 151 formal, 150–151 gradient masking, 151 input modification, 151 Degrees of freedom, 22, 25, 29, 32, 36, 38, 43–44, 61–64, 67, 250 Density, 11, 127–128, 143–144, 257–258 Density based spatial clustering of applications with noise, 144 Dependent group of observations, 21, 29–33 Depth, 5, 131–132 Derivatives, 109 Determination analytical, 207 experimental, 207 Dilated convolution, 132–133, 133f Dilation, 132–133 Disease free survival curves, 267 Dispersion, 9, 11 Distance Chebychev, 140 city block, 140 cosine, 140 crow flies, 140 euclidian, 140–141, 272 fuzzy extensions, 141 fuzzy Minkowski, 141 Jaccard, 140–141 L1-norm, 150 L2-norm, 150 Mahalanobis, 141 Manhattan distance, 140–141 Pearson’s r, 141 Tanimoto, 140 taxicab, 140 Distribution binomial, 25–26, 26f exponential, 256–257, 257f, 261 F, 36, 39, 67, 67–69t gamma, 256, 260–261

Gaussian, 11–14, 16–17, 19–20, 35–36, 39, 64, 128, 144, 170, 241, 255, 261 Gompertz, 256, 261–262 graph, 11–12, 24 lifetime, 170 log normal, 256–258 normal, 9–12, 14, 16–20, 22, 25–26, 37–38, 40, 42 t distribution, 22, 25, 29, 61, 61–63t Weibull, 170, 256 DNA microarrays, 15, 179–180

E End time, 235 Entropy, 270–271 Evidence, 45 Expectation maximization clustering using Gaussian Mixture Model, 144 EXPLORER, 47

F False negative rate, 51 False negatives, 8–10, 48, 49f, 53, 284 False positive paradox, 44 False positive rate, 51, 210–211 False positives, 8–10, 49f, 50, 53, 167, 285–286 Feedforward structure, 89 Fiber, 132 Filter, 3–4, 131–132, 151, 256 Fisk, 259–260 Fitness function, 111–112, 119, 121 Follow-up, 36, 81, 84–85, 124–125, 170, 193, 236–238, 238f, 242, 245, 255, 261–262, 267–268, 273–278 Forward propagation, 89, 89f F test, 36–37, 271–272

G Gauss Bell, 11–12 GINI impurity index, 270 Gradient descent, 107, 107f, 130–131, 136 Ground truth, 85–86, 93, 105–107, 145–147, 166, 177, 192, 287–288

H Hazard function, 170–171, 255 ratio, 169–170, 249, 254–255 Heat map, 50, 50–51f, 97, 98f, 181 Hidden units, 89, 123 Hierarchical internal validation, 138, 276 nearest neighbor, 136–138 non-hierarchical, 138, 142–144 single linkage, 139

Index

293

unweighted pair group centroid, 139 Ward’s method, 139 weighted pair group centroid, 139 Histological sections frozen, 164 permanent, 164 smear, 164 Human genome project, 46 Hyperplanes, 202–206, 206f Hypothesis alternative, 9, 14, 38, 40–41 null, 9–10, 15, 17–21, 26, 29, 36–42, 55, 249–250, 253–254 testing, 9, 40

Learning rate, 107–109 Least significant difference, 39 Least squared error, 93 Levene’s test, 37–38 Life expectancy, 22 Life tables, 169–170, 239, 239t, 242t, 244t, 246–256 Likelihood, 121–122, 275 Lilliefors test, 13–18, 55–57, 56–57t Linear separable classes, 86, 87f Logistic regression, 91–92, 100–103 Log logistic distribution, 259–260 Logrank test, 169–170, 249–254 Loss (error) function, 106–107

I

M

Imaging test, 44, 46–47, 80–81 Independent group of observations, 21 Inflection point, 11 Input neuron, 89, 133–135 Intercept, 92–93, 96, 100–101, 229

Magnetic resonance imaging (MRI), 8, 47, 145–147, 189–190, 199, 209, 224, 254–255, 271–272 Magnitude, 27 Mammograms, 46, 145 Mann Whitney U test, 9, 33–36, 33–35t Mass-spectrometry, 5 Mean hypothesized, 25 sample, 14, 25 Mean-shift clustering, 143 Medical diagnosis, 8 Memoryless property, 257 Metabolomic, 5 Metastasis, 8, 165 Misclassification measure, 270–272 Mortality tables, 246 MRI. See Magnetic resonance imaging (MRI) Multi-layer feedforward, 90 Multiple linear regression, 91–100 Mutation binary chromosome, 118 creep, 118 integer chromosome, 118 non-uniform, 118 normally distributed, 11 probability, 118–119 random setting, 118 uniform, 118

K Kaplan-Meier survival curve, 169–170, 240–245 Kernel, 131–133, 135, 203–204, 207 k-means, 142–144, 272–273 k-medians, 143 k-nearest neighbors (k-NN), 136–138, 137f, 142 Kolmogorov’s axioms, 45–55 Kolmogorov-Smirnov Goodness of Fit test (K-S test), 13–14 Kurtosis leptokurtic, 13 mesokurtic, 13 negative, 13 platykurtic, 13 positive, 13

L Lasso regression, 100 Layer convolutional, 131–132, 132f, 135–136 fully-connected, 131 hidden, 85, 89–90, 120–121, 123, 125–126, 131, 179–180 input, 89–90, 122, 125–126, 129, 131, 134–135 output, 90, 125, 129, 132, 134–135, 273 pattern, 129 pooling, 131, 135–136, 136f summation, 129 Learning paradigms Bayesian, 121–123, 125–126 evolutionary computation, 110–114 reinforcement learning, 106 supervised learning, 105–106, 105f unsupervised learning, 106

N Naive Bayes classifier, 121, 136–137 Negative predictive value, 48, 50–51, 167, 283–284 Nelson Aalen Fitter, 249, 256 Neural networks adaptive single layer feedforward, 125–126 convolutional, 106–107, 130–136, 166, 171–173, 175–176 ELM/ant colony optimization hybrid, 125–127 extreme learning machine, 110, 124–125 Kohonen, 272–273

294 Neural networks (Continued) multi-layer perceptron, 91f partially connected neural network, 109–110 probabilistic, 110, 127–130 radial basis function, 123–124 self organizing maps, 272–273 single-layer feedforward, 90 Node child, 269 decision, 269 leaf, 269 parent, 269 root, 269 Non-linear separable classes, 86, 87f Nuclear medicine scan, 47

O Observation period, 235–237 Oncogene, 4–5t Oncology, 2, 199 One hot encode, 103, 123 Operating point, 53 Output neurons, 89 Overfit, 81–82, 135

P Palliative, 200, 287–288 Parameter control, 119 sharing, 135 tuning, 119 Pathology report atypical, 164 carcinoma, 164 dysplasia, 164 hyperplasia, 164 leukemia, 164 lymphoma, 164 neoplasia, 164 sarcoma, 164 PET. See Positron emission tomography (PET) scan Physical exam, 80 Placebo, 10, 22, 171, 240 p-level, 9–10, 255, 282 Poisson process, 256–257, 260 Polynomial learning machine, 207 Pooled variance, 29 Pooling average, 135 L2-norm, 135 max, 135 overlapping, 135 Population, 9, 41, 111

Index

Positive predictive value, 48, 50–51, 167, 191, 288 Positron emission tomography (PET) scan, 47, 147, 224, 275–276 Posterior probability, 45 Power analysis, 10–11, 282 Precision, 53–54, 84–85, 181–182, 189, 215–216, 267 Precision-recall curves, 146, 282 Predictor variables, 92, 240 Prior probability, 45, 122 Prognostic index, 103 Progression free survival curves, 267 Protein expression, 5 Pruning, 269

R Random sample, 9, 40, 44, 112 Recall, 19, 46, 53, 69, 99, 114, 133, 144, 146, 151, 161, 167, 179, 189, 205, 210–211, 214–215, 240 Receiver operating characteristic (ROC) curves, 48, 51–54, 52–54f Receptive field, 132 Recombination blend, 117 BLX-α, 117 linear BGA, 117 n-point, 114, 116 probability, 114 single arithmetic, 116 total arithmetic crossover, 117 uniform, 115 Wright’s heuristic, 117 Recurrence distant, 267 local, 267 regional, 267 Regression coefficients, 82–84, 92–93, 96, 99–101, 229 Regression line, 93–96, 95f Regularization, 99–100, 207, 224, 273–274 Remission partial, 265 spontaneous, 265–267, 278 Representation binary, 111, 111f, 114, 114–115f, 116 integer, 111, 111f, 115, 118 real-coded representation, 111, 116 Residuals, 39, 93 Ridge regression, 99–100 RNN. See Recurrent neural network (RNN) Robustness, 25, 146

S Sarcoma, 3–4, 164, 226, 266 Scatter correlation plot, 98

Index

Selection age-based, 113 exponential rank, 113 fitness based survival, 113 fitness proportional, 112 ranking, 112, 142 roulette wheel, 112 stochastic universal sampling, 113 tournament, 112 truncation, 113 Sensitivity, 48, 50–51, 53, 85, 145, 167, 169, 171–172, 201–202, 208, 274, 281–284 Shapiro-Wilk W test, 18–22 Significance level, 20, 25, 38, 98, 169, 175–176, 224–225 Sign test, 25–26 Skeletonization, 177 Skewness negative, 13 normal, 12, 13f positive, 12f, 13 Slack variables, 207 Smart Tissue Autonomous Robot (STAR), 194 Smoothing parameter, 129, 207 Soft margin, 203 Softmax, 91–92, 103–104, 104f Sparse connectivity, 133–135, 135f Sparse weights, 133 Specificity, 48, 50–51, 85, 145, 167, 169, 171, 274, 281–284 Split, 21, 32, 44, 82–85, 114, 116, 124, 130, 137–138, 142, 255, 269–271, 276 Standard deviation, 9, 11, 15, 22–23, 26–27, 29, 35–36, 40–41, 44, 118, 129, 144, 147–148, 150, 258 Standardized difference, 11 STAR. See Smart Tissue Autonomous Robot (STAR) Start time, 235–236 Statistical power, 10–11, 41, 282 Statistical series, 12 Statistical significance, 9 Statistical tests non-parametric, 9 parametric, 9 Statistics D statistics, 18 T, 35 U, 35 W statistics, 18–20, 37–38, 57–61 Z2 statistics, 18–19 Stochastic nature, 25 Stopping criterion, 119, 123 Stride, 132, 135–136 Student’s t-test, 22 Superellipsoids, 177 Support vector machines, 138, 172–173, 175–176, 201–216, 273–274, 282

295

Surgery cryosurgery, 187 electrosurgery, 187 laparoscopic, 188 laser, 187 moth, 188 natural orifice, 188 robotic, 188 Survival analysis, 169–170, 235–262, 267–268, 274–277, 282 curve, 169–170, 238, 240–245, 244f, 255 function, 170 probability, 169, 238–239, 241, 243, 261–262 rate, 5, 23–26, 23t, 32, 40, 44, 96, 176–177, 209–210, 226–227, 236–239, 244–245, 274, 276 regression, 256 time synapse, 86

T Testing, 82 Test statistic, 21 Total probability law, 45 Training, 82 True negative, 48, 49f, 50 True positive, 48, 49f, 50–51, 53 t-test independent samples t-test, 22 one sample, 22, 25 paired sample, 22 Tukey Honest significant difference, 39 Type I error, 9

U Ultrasound sonography, 46 Underfit, 81

V Variance, 12, 29, 31, 36–39, 99, 257–258 Variance ratio test, 36 Vectorization, 177

W Weighted sum, 85–86, 144 Weight vector, 86, 120–121, 129, 204, 206, 272 Wilcoxon rank-sum test, 9–10 Winner-takes-all rule, 103–104, 120

X X-rays, 46, 145–146, 221–223

Z Zero-padding, 132, 135 z-score, 14–15, 17, 40–42, 40f, 69–71, 69–72t, 126 z-test one sample, 40–41, 69–71, 69–71t two proportion, 41–43, 42f, 69–71