Genome Chaos: Rethinking Genetics, Evolution, and Molecular Medicine 0128136359, 9780128136355

Genome Chaos: Rethinking Genetics, Evolution, and Molecular Medicine transports readers from Mendelian Genetics to 4D-ge

1,282 202 14MB

English Pages 400 [555] Year 2019

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Molecular Evolutionary Genetics 9780231886710

Summarizes and reviews developments in molecular evolutionary genetics as one discipline rather than separate as molecul

190 41 36MB Read more

Molecular population genetics 9780878939657, 0878939652

'Molecular Population Genetics' is a general text covering one of the most active and exciting areas in biolog

953 120 8MB Read more

Molecular Genetics and Evolution of Pesticide Resistance 9780841234536, 9780841215948, 0-8412-3453-1

Content: Applications of molecular genetics in combatting pesticide resistance : an overview / Thomas M. Brown -- Molecu

451 36 27MB Read more

From DNA to diversity: molecular genetics and the evolution of animal design [2nd edition] 1405119500, 9781405119504

617 59 59MB Read more

HKAL Biology – Genetics Evolution and Ecology 9789622790513

718 96 28MB Read more

Evolution and Genetics 9781593398606, 9781593398651, 1593398654

Evolution and Genetics, is one book in the Britannica Illustrated Science Library Seriesthat is correlated to the scienc

636 65 10MB Read more

Bacteriophage: Genetics and Molecular Biology [1 ed.] 190445514X, 9781904455141

Written by eminent international researchers actively involved in the disparate areas of bacteriophage research this boo

1,097 126 5MB Read more

Genetics and Molecular Biology [2 ed.] 0801846730, 9780801846731

In the first edition of Genetics and Molecular Biology, renowned researcher and award-winning teacher Robert Schleif pro

898 124 5MB Read more

Clinical Molecular Medicine. Principles And Practice 9780128093566

2,376 360 34MB Read more

Models and Algorithms for Genome Evolution 9781447152989, 1447152980

Part I: Emergence of Standard Algorithms -- What's Behind Blast -- Forty Years of Model-Based Phylogeography -- How

398 48 2MB Read more

Genome Chaos: Rethinking Genetics, Evolution, and Molecular Medicine
0128136359, 9780128136355

Author / Uploaded
Henry H. Heng

Table of contents :
Cover
Genome Chaos: Rethinking Genetics, Evolution, and Molecular Medicine
Copyright
Dedication
Preface
Acknowledgments
1 . From Mendelian Genetics to 4D Genomics
1.1 Summary
1.2 The Emergence of Genomics
1.2.1 A Brief History of Genomics
1.2.2 Genetics or Genomics?
1.2.3 Fundamental Limitations of Traditional Genetics
1.3 Diminishing Power of Gene-Based Genomics
1.3.1 The Ignored Voice of Antigenetic Determinism
1.3.2 The Rise and Fall of the Gene
1.3.3 A Reality Check of the “Industry Gene” Concept
1.3.4 Gene-Based 1D Genomics Is Not Enough
1.4 New Genomic Science on the Horizon
1.4.1 Time to Rethink Genetics and Genomics
1.4.2 Crisis Created New Opportunities for Future Genetics/Genomics
1.4.3 4D Genomics: the New Paradigm
2 . Genes and Genomes Represent Different Biological Entities
2.1 Summary
2.2 The Definition of the Genome
2.3 “Parts Versus the Whole”: The Emergent Relationship (Which Challenges Reductionism)
2.4 ReExamining Gene Theory Predictions
2.4.1 Selfish Gene or Constrained Genome?
2.4.2 Genomes Not Genes Define Biosystems
2.4.2.1 There Are No Common Minimal Gene Sets in Nature
2.4.2.2 Are Gene or Genome Alterations Mainly Responsible for Speciation?
2.5 The Conflicting Relationship Between the Gene and the Genome
2.5.1 Chromosomal Position and Loop Size: Overall Genomic Architecture Constrains Local Structures
2.5.2 Gene Expression and Chromatin Loops: Chromatin Loop Domains Constrain Gene Function
2.5.3 Loops/Chromosome Length and AT/GC Composition: Why Little Clarification Comes from the Highest Resolution
2.6 Genome Context Determines Gene Function
2.7 Action Needed
3 . Genome Chaos and Macrocellular Evolution: How Evolutionary Cytogenetics Unravels the Mystery of Cancer
3.1 Summary
3.2 SOS: We Need a New Conceptual Framework for Cancer Research
3.2.1 Exceptions Versus the General Rule: The Chronic Myeloid Leukemia Story
1. The Unique Evolution of CML
2. Population Genomic Structure and Microenvironments Influence the Pattern of Cancer Evolution and Their Responses to Trea ...
3. Heterogeneity and Treatment Response
4. Unforeseeable Negative Impact of CML on Research Community
5. Exceptions of Model Systems and the Reality of Cancer
3.2.2 Cancer Genome Sequencing: The Results Challenge the Rationale
3.2.2.1 Initial Goal and Controversy
3.2.2.2 Major Discoveries and Surprises
1. Validation of Known Cancer Gene Mutations
2. There are far Fewer Newly Identified Driver Genes Than Expected
3. Interesting Gene Mutation/Genomic Alteration Patterns
4. Chromosomal-level Alterations are Overwhelming
5. Multiple Levels of Genetic/Genomic/Epigenomic Landscapes
6. The Landscape Dynamics During Cancer Progression and After Treatment
3.2.2.3 The Ultimate Challenge to Current Cancer Theory
1. Too Many Mutations, not Enough Common Drivers
2. Disagree with the stepwise cancer model of accumulating gene mutations
3.2.3 The Somatic Gene Mutation Theory Is No Longer Relevant
3.2.3.1 Challenging the Obvious: Can a Few Key Gene Mutations Be the Molecular Basis of Carcinogenesis?
3.2.3.2 Challenging the Concept of Sequential Accumulation of Gene Mutations in Cancer
3.2.3.3 The Limitations of Searching for Hallmarks of Cancer
3.2.3.4 Clinic Facts Do Not Support the Cancer Gene Mutation Theory of Cancer
3.2.4 Increased Calls for New Cancer Theories
3.2.4.1 Noted Competing Theories/Concepts
1. Aneuploidy Theory
2. Tissue Organization Field Theory
3. Cancer Attractor Theory
4. Theories Influenced by the Natural History of Evolution
5. Theories Influenced by Developmental Biology and Epigenetics
6. Theories Related to Genetic and Environmental Factors
3.2.4.2 The Search for New Framework
3.3 Genome Chaos: Rediscovery of the Importance of the Karyotype in Cancer
3.3.1 Linking Incidental NCCAs to CIN and Evolutionary Potential
3.3.2 Two Phases of Cancer Evolution
1. Terminologies:
2. The Dynamics of the Two Phases of Evolution
3. Implications
3.3.3 Genome Chaos: Reorganizing the Genomic Landscape
1. What Happened?
2. Causative Factors
3. Mechanisms
4. Implications
3.3.4 The Evolutionary Mechanism of Cancer
3.3.4.1 Linking Genome Heterogeneity to Tumorigenesis, Metastasis, and Drug Resistance
3.3.4.2 Focus on the Evolutionary Mechanism of Cancer Rather Than the Diverse Individual Mechanisms
3.4 A New Genomic Model for Cancer Evolution
1. The Ultimate Cause of Cancer
2. Cancer Evolution: The Game for Outliers
3. Cancers Represent Emergent New Genome Systems
4 . Chromosomal Coding and Fuzzy Inheritance: The Genomic Basis of Bio-information and Heterogeneity
4.1 Summary
4.2 Chromosomal or Karyotype Coding
4.2.1 The Rationale of Searching for New Types of Inheritance
4.2.2 New Challenge: What Defines Inheritance?
4.2.3 Genes Code “Parts Inheritance”
4.2.4 A Chromosomal Set Codes “System Inheritance”
4.2.4.1 Background and Rationale
4.2.4.2 The Model and Its Prediction
4.2.4.3 The Mechanism and Significance of Preserving Chromosomal Coding
4.2.5 Why Has Chromosomal Coding Long Been Ignored (If It Is Indeed Important)?
4.2.5.1 Historical Lessons: Topology Is a Key Piece of Bioinformation
4.2.5.2 Accepting System Inheritance Is Necessary in the Search for the Correct Context of Genomic Information
4.2.5.3 The Limitations of Reductionist Tradition and the Power of Metaphor
4.3 Fuzzy Inheritance
4.3.1 Rationales for Searching for New Types of Inheritance
4.3.2 A New Inheritance Needs to Explain Heterogeneity: A Key Genomic Feature of Cellular Population
1. Inheritance of a given unstable cellular population can pass the degree of genomic changes, but not specific changes
2. System inheritance is unstable for many cell lines and can be drastically altered during crises
3. A single cell can pass heterogeneity to an entire population
4. Karyotype heterogeneity is associated with other cellular heterogeneities
5. Inheritance of heterogeneity: the mechanism of heterogeneity
4.3.3 The Inheritance of Heterogeneity in Organismal Systems in Both Physiological and Pathological Conditions
1. Inheritance of heterogeneity can be universally observed in all types of organisms
2. Abnormal phenotype and the hidden inheritance of heterogeneity from normal genomes
4.3.4 The Definition of Fuzzy Inheritance and Its Key Differences Compared to Traditional Inheritance
4.3.5 The Mechanisms of Fuzzy Inheritance
a) Mechanisms of fuzzy inheritance at the karyotype level
b) Mechanisms of fuzzy inheritance at CNV level
c) Mechanisms of fuzzy inheritance at gene level
d) Mechanisms of fuzzy inheritance at epigenetic level
e) Mechanisms of fuzzy inheritance for mitochondrion
f) Mechanisms of fuzzy inheritance of other interesting observations
4.3.6 Potential Significance and Implications of Fuzzy Inheritance
4.4 Overlooked Genome Variations
4.4.1 Generally Accepted Chromosomal Variations
4.4.2 Ignored and Unclassified Chromosomal/Nuclear Aberrations
4.4.2.1 Free Chromatin
4.4.2.2 Defective Mitotic Figures
4.4.2.3 Chromosome Fragmentations
4.4.2.4 Unit Fibers
4.4.2.5 Sticky Chromosomes
4.4.2.6 Genome Chaos
4.4.2.7 Micronuclei Cluster
4.4.2.8 Unclassified Chromosomal or Nuclear Abnormalities/Variations
4.4.2.9 Unification of the Different Types of Chromosomal Aberrations
5 . Why Sex? Genome Reinterpretation Dethrones the Queen
5.1 Summary
5.2 What Is the Purpose of Sex? The Answer Is Not Obvious
5.3 Surprise! Asexual Reproduction Does Not Generate Clonal Progenitors!
5.4 The Search for the Main Function and Common Mechanism of Sex
5.5 The Battle Is On: Changing Concepts
5.6 Simulation: Ask the Simplest Question About the Function of Sex
5.7 Case Studies: Reinterpretation Using New Framework
5.8 Lessons Learned
1. Focus on the first principle
2. Follow the paradoxes
3. Respect the facts
4. Fill in knowledge gaps: Dare to think big
6 . Breaking the Genome Constraint: The Mechanism of Macroevolution
6.1 Summary
6.2 Pattern of Cellular Evolution Challenges Current Evolutionary Theory
6.2.1 Simple Evolutionary Principles Are No Longer Simple
6.2.2 Why the Cancer Model Is an Excellent Platform for Studying Evolution in General
6.2.3 Similarities and Differences Between Somatic Cell Evolution and Natural Evolution
6.2.4 The Conflict Between Observations From Somatic Cell Evolution and Neo-Darwinian Concepts
6.2.5 Time to Compare/Reexamine Evolutionary Theories
6.3 Artificial Selection and Natural Selection Are Fundamentally Different
6.4 Both Isolated Cases and Isolated Natural Environments Represent Exceptions That Fail to Demonstrate the Relationship Betwee ...
6.5 Maintaining Genome Integrity: The Major Evolutionary Constraint
6.5.1 Why Are Evolutionary Constraints Important?
6.5.2 Genome Integrity Represents the Major Evolutionary Constraint
6.5.2.1 Why Is It Essential to Discuss Genome-Level Constraint?
a. The genome codes for the package of an entire system.
b. Sex safeguards genomic integrity.
c. Sex-mediated genome integrity ensures long-term evolutionary stability.
d. The genome—not the gene—is the macroevolutionary selection unit.
6.5.2.2 Different Factors Contribute to Genome Constraint
6.6 Implications of Genome Theory to Evolutionary Concepts
6.6.1 The Concept of Species
6.6.2 The Origin of Adaptation Differs From Speciation
6.6.3 Genome Theory: Defining the Concept of Chromosome-Mediated Speciation
1. Spontaneous chromosome alterations mediated speciation
2. Hybrid speciation: chromosomes play an important role
3. Genome chaos: massive speciation during crisis
4. The core genome concept: limited fuzzy inheritance within populations
5. Altered karyotypes and evolutionary certainty: understanding the genotype–phenotype relationship
6. New species formation occurs much more frequently than suggested by natural selection, but the chance of the formation o ...
7. What is the role of geographic isolation in speciation?
6.7 Evolution Is True but Its Mechanism Must Be Reexamined
6.7.1 The Integrated Model of Speciation: How Micro- and Macroevolution Create and Maintain Species
6.7.2 Time for Reinterpretations
a. Genome-based alternative mechanisms for evolution
b. Fast or sluggish evolution?
c. The irrelevance of the neutral theory
d. The genome basis of punctuated equilibrium
e. The invisible missing link in macroevolution
f. Multilevel evolution and constraint
g. Somatic cell dynamics and germline constraint
h. Extinction issues
i. The unified evolutionary theory?
6.8 Implications: Creating Artificial Species by Shattering the Genome Followed by Artificial Mating/Genome Selection
a. Creating new cell lines
b. Creating artificial laboratory species
c. Creating artificial animals/plants
7 . The Genome Theory: A New Framework
7.1 Summary
7.2 The Rationale for Establishing a Genome-Based Genomic Theory
7.3 Unique Considerations for Genome Theory
7.3.1 The Genome Is an Integrated Information Unit That Defines the Boundary of the System
7.3.2 Emergent Properties in Biological Systems
7.3.3 Understanding Genomic Principles Through the Lens of Evolution
7.4 Outline of the Genome Theory
7.5 The Predictions, Implications, Limitations, and Falsifiability of the Genome Theory
7.5.1 Predictions
7.5.2 Implications
7.5.3 Limitations
7.5.4 Falsifications
7.6 Challenges Ahead
8 . The Rationale and Challenges of Molecular Medicine
8.1 Summary
8.2 A Brief History: The Promises of Molecular Medicine
8.3 The Challenges and Opportunities for Precision Medicine
8.3.1 The 40-Year Journey of Studying p53, From Certainty to Increased Uncertainty
8.3.2 The Relationship Between Stress, Variation, Adaptation and Trade-Off, and Disease
8.3.3 Genome Alterations and Common/Complex Diseases
8.3.3.1 Key Features and Types of Common and Complex Diseases
8.3.3.2 Stochastic Genomic Alterations Contribute to Most Common Diseases
8.3.3.3 The Search for the General Model for Common and Complex Diseases/Illnesses: A Case Study for Gulf War Illness
8.3.3.4 New Model With New Explanations
8.4 Future Direction
8.4.1 Facing Reality: The Increased Bio-Uncertainty
8.4.2 Big Data, Artificial Intelligence, and Biomarkers for Adaptive Biosystems
8.4.2.1 The Future of Big Data in Biological Systems
8.4.2.2 Big Data Versus Theories: The End of Theories or the Beginning of Better Theories
8.4.2.3 How to Collect the Necessary Data to Create a New Generation of Biomarkers?
8.4.2.4 Big Data and Phenotypes
8.4.3 Education and the Future of Biomedical Science
8.4.3.1 Knowledge Structure
8.4.3.2 Scientific Culture and Professionalism
8.4.3.3 Policy Matters
Epilogue (or Why We Did What We Did)
Bibliography
Index
A
B
C
D
E
F
G
H
I
K
L
M
N
O
P
Q
R
S
T
U
V
W
X
Y
Back Cover

Citation preview

GENOME CHAOS RETHINKING GENETICS, EVOLUTION, AND MOLECULAR MEDICINE HENRY H. HENG Wayne State University School of Medicine Detroit, MI, United States

Academic Press is an imprint of Elsevier 125 London Wall, London EC2Y 5AS, United Kingdom 525 B Street, Suite 1650, San Diego, CA 92101, United States 50 Hampshire Street, 5th Floor, Cambridge, MA 02139, United States The Boulevard, Langford Lane, Kidlington, Oxford OX5 1GB, United Kingdom Copyright © 2019 Elsevier Inc. All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage and retrieval system, without permission in writing from the publisher. Details on how to seek permission, further information about the Publisher’s permissions policies and our arrangements with organizations such as the Copyright Clearance Center and the Copyright Licensing Agency, can be found at our website: www.elsevier.com/permissions. This book and the individual contributions contained in it are protected under copyright by the Publisher (other than as may be noted herein).

Notices Knowledge and best practice in this field are constantly changing. As new research and experience broaden our understanding, changes in research methods, professional practices, or medical treatment may become necessary. Practitioners and researchers must always rely on their own experience and knowledge in evaluating and using any information, methods, compounds, or experiments described herein. In using such information or methods they should be mindful of their own safety and the safety of others, including parties for whom they have a professional responsibility. To the fullest extent of the law, neither the Publisher nor the authors, contributors, or editors, assume any liability for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions, or ideas contained in the material herein. Library of Congress Cataloging-in-Publication Data A catalog record for this book is available from the Library of Congress British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library ISBN: 978-0-12-813635-5 For information on all Academic Press publications visit our website at https://www.elsevier.com/books-and-journals

Publisher: Andre G. Wolff Acquisition Editor: Peter Linsley Editorial Project Manager: Timothy Bennett Production Project Manager: Poulouse Joseph Cover Designer: Matthew Limbert Typeset by TNQ Technologies

To Julie, Eric, and Christine and those scientists who are brave enough to climb out of the shadows of giants.

Preface This is not a typical book about genomics, evolution, and molecular medicine where you are provided with comprehensive information reinforcing what you already know. Rather, this book will invite you to rethink the fundamental theories of genetics and evolution you have held for so longdknowledge that has formed the style of your scientific thinking and practicing and that has sculpted you into who you are today. Perhaps you are an undergraduate student or a fresh graduate, younger than the age of genomics, who grew up amid large-scale -omics technical platforms and promises. You dreamed of becoming a hero by finally curing cancer in the age of big data and personalized medicine. “Is this necessary?” you may wonder. Or perhaps you are established in the field. “Why bother?” you may ask. “Why do we need to reexamine the frameworks in a blossoming field such as genomics and molecular medicine? We have headline after headline trumpeting the success of the Human Genome Project and many the large-scale -omics projects that followed.” Your career is going well; you are widely published, well-funded, and respected. “Yes, there are many big surprises,” you argue, “but science is naturally a half-full or halfempty story. The logical way to advance the field is by continuing to improve technologies to collect more data. With more funding, the mystery of life will be solved, and all diseases will be eradicated.” After reading report after report on the major failures of molecular targeting-based clinical applications or after witnessing the disappointment of large-scale genome-wide association studies or after the cancer genome project bluntly questions the value of your personal research (when the list of the top 100 most mutated genes in cancer does not include your favorite gene), occasionally, in the back of your mind, you might have asked yourself, “Geez, how did we miss it again? So many years of hard work and experimentation had clearly already proved the importance of this gene mutation in cancer.” Then came a series of shocking and nerve-racking report: most landmark studies in oncology (nearly 90%) are unrepeatable. The success rate of translational research is extremely low, akin to “crossing the valley of death.” “Genomics is good for research, but not for medicine.” “Current US biomedical research needs to be rescued from its systemic flaws.” These conclusions are based on solid analyses and can no longer be ignored. How does one reconcile positive and negative news in the

ix

x

PREFACE

genomic era? What are the key implications of these surprises? How should the fields of genomics and molecular medicine move forward? And more profoundly, do those stunning reports reflect the increased anomalies and paradoxes that are crying out for a paradigm shift in biomedical research? In this book, these issues will be systematically and candidly discussed. In particular, you will be introduced to a different conceptual framework of biological reasoning, exposing holes in traditional ways of thinking and conducting research. By reexamining many paradoxes through the lens of the genome theory, rather than the gene theory you are so familiar with, this book will lead you to discover a new way to appreciate how genomics and evolution work in the context of heterogeneity/emergence-defined reality. Perhaps you will realize that many well-known milestone findings in classic genetics and evolution are in fact less important as they represent exceptions rather than generalities. These new “discoveries” and further syntheses, many of which may initially seem counterintuitive and controversial, will surely challenge your own knowledge and beliefs. At the start, they may make you uncomfortable. But hopefully, at the end of the day, many of these proposed concepts and statements will make sense to you. We explore this paradigm shift in eight chapters: In Chapter 1, “From Mendelian Genetics to 4D Genomics,” a brief review of the birth of genomics and its relationship with the Human Genome Project is presented, which illustrates how the concept of the gene has changed during the genomics era. Further discussions highlight some key limitations of traditional genetic/genomic research and call for a new genomic paradigm. Chapter 2, “Genes and Genomes Represent Different Biological Entities,” supports the concept that the emergent genome, rather than isolated genes, defines a biosystem. Many gene-centric concepts and their limitations are briefly reviewed. Experimental observations are presented to illustrate the conflicting relationship between genes and the genome, as the genome-level operation is not simply a matter of “adding up” the functions of individual genes. Chapter 3, “Genome Chaos and Macrocellular Evolution: How Evolutionary Cytogenetics Unravels the Mystery of Cancer,” discusses why a new conceptual framework of cancer research is urgently needed and describes the journey of searching for a genome-based cancer evolutionary theory. This journey has led to many important discoveries, including two-phased cancer evolution (macro- and microcellular evolution), the importance of nonclonal chromosome aberrations, the key function of genome chaos, and the evolutionary mechanism of cancer. A new genomic model for cancer evolution is proposed to relate the contributions of genome and gene in cancer evolution.

PREFACE

xi

In Chapter 4, “Chromosomal Coding and Fuzzy Inheritance: The Genomic Basis of Bio-information and Heterogeneity,” the many genomic surprises observed from the first three chapters are explained by the novel concepts of system inheritance and fuzzy inheritance. Unlike genedefined “parts inheritance,” chromosome-encoded “system inheritance” defines the genomic blueprint. In addition, the multiple levels of genomic and nongenomic information are fuzzy, which allows them to code for a spectrum of potential genotypes for the environment to select. Furthermore, fuzzy inheritance is the genomic mechanism of bio-heterogeneity, which is the key to understanding many common and complex diseases. In Chapter 5, “Why Sex: Genome Reinterpretation Dethrones the Queen,” an unexpected story likely solves the century-long mystery behind the main function of sexual reproduction, the “queen of problems” in evolutionary biology. Initially, the two phases of cellular evolution suggested that asexual reproduction can produce highly diverse genomes, whereas sexual reproduction should produce identical genomes. Ample evidence from different organisms now supports this new concept. Furthermore, meiosis has been identified as the mechanism to maintain species identity by preserving system inheritance. Thus, the primary function of sex is to preserve the genome-defined system (the species). In Chapter 6, “Breaking the Genome Constraint: The Mechanism of Macroevolution,” following a brief review of the fundamental differences between artificial selection, highly isolated natural selection, and natural selection in general, the maintenance of genome integrity is linked to major evolutionary constraints. After emphasizing the importance of evolutionary constraint, a new model of speciation is proposed. In this model, speciation is characterized by genome reorganization-mediated macroevolution, mating with a partner of a similar genome to produce fertile offspring. Finally, microevolution might promote the formation of lasting species with large populations. This model drastically departs from the explanation of speciation through natural selection, where the accumulation of the small changes over long period of time is key. In Chapter 7, “The Genome Theory: A New Framework,” the rationale for integrating genome-based genomics with evolutionary concepts is laid out alongside the key assumptions that validate them (treating the genome as an information unit, evolutionary selection unit, and platform of bio-emergence). The genome theory is cohesively outlined with 12 principles. In addition, the genome theory’s key predictions and limitations, as well as its falsifiability, are discussed. In Chapter 8, “The Rationale and Challenges for Molecular Medicine,” the history of precision medicine, as well as its challenges and opportunities, is briefly traced. The future direction of molecular medicine is discussed, including the relationship between big data and theory,

xii

PREFACE

the increase in bio-uncertainty, and how education can play an important role in the future of biomedical science. Although the chapters are arranged in a logical order that reflects our journey of thinking and searching, each chapter has been written as a potential standalone unit for the ease of readers. Therefore, there is some minimal overlap among chapters. The commonly shared message that threads all eight chapters together is the importance of genomedefined genomic information and its implications for evolution and molecular medicine. More specifically, the often-ignored role of genome constraint in evolution is emphasized. Within this new perspective, both the system’s variable features (reflected as short-term adaptive dynamics) and the existence of the system itself (reflected as long-term stasis) are essential components, marking a departure from current genomic and evolutionary theories. Such new theories based on real-world complexity will not only challenge current genomic and evolutionary mechanisms, but also explain the relationships between micro-adaptation and macro-speciation, germline stability and somatic dynamics, and fuzzy inheritance encoded phenotype potentials and environmentselected realities. By analyzing initial confusions, identifying paradoxes, thoroughly reinterpreting key data, rethinking ignored phenomena, introducing new discoveries, and searching for new frameworks, this book invites you to join us on this journey of rediscovery. If you are a genome-based reader, help us to improve the genome theory and establish a technological platform to study human diseases. If you are a hard-core gene-based reader, bear with us and momentarily set aside the ideas you know best to have a conversation about ideas that you may have dismissed, ignored, or even disliked. We are always searching for a better theory, after all. Maybe you will come up with a strong argument to convince us to join you. No matter what, the only goal of a true scientist should be to search for truth. In that light, I hope this process, no matter how difficult, will be enjoyable. Sometimes the truth hurts. Often, it is not easily appreciated, especially when an improper framework has previously dominated our thinking. But ultimately, truth will prevail. When you finish reading and ponder our message, you will likely start to ask a few questions. “Is this really real?” you may wonder. And “If so, how did I miss something this big for so long?” And hopefully, “What should I do next?”

Acknowledgments First, let me thank Julie Heng, Barbara Spyropoulos, Sarah Regan, Sarah Alemara, Steven Horne, and Batoul Abdallah for editing the manuscript. I also would like to thank all the members of my research team from the Wayne State University School of Medicine for believing in me and my work on the genome theory when others were highly skeptical: Gao Liu, Joshua Stevens, Steve Bremer, Karen Ye, Lesley Lawrenson, Steve Horne, Batoul Abdallah, Sarah Regan, Wei Lu, and Christine Ye. Second, my sincere appreciation belongs to many of my mentors for supporting my efforts to characterize genome level alterations when most people consider them insignificant: Lap-Chee Tsui, Peter Moens, F. T. Kao, Clement Markert, Y. C. Wang, and W. Y. Chen. Their support allowed me to develop new methods such as high-resolution fiber FISH and DNA/ chromosome/protein in situ codetection. Third, I must thank many thinkers and scientists for their encouragement, suggestions, and candid opinions to improve our concepts and to articulate our message: Bill Brinkley, Mina Bissell, Linda Cannizzaro, Don Coffey, Jim Crow, Peter Duesberg, Wayt Gibbs, Arny Glazier, Morris Goodman, Root Gorelick, Dean Hamer, Gloria Heppner, Sui Huang, Steve Krawetz, Rong Li, Thomas Liehr, Larry Loeb, Carlo Maley, O. J. Miller, Brian Reid, Harry Rubin, Bill Shields, L. -J. Shi, Jeremy Squire, Gary Stein, Joachim Sturmberg, David Ward, Douglas Wallace, Adam Wilkins, and T. H. Yosida. Thanks also to my friends, colleagues, and collaborators, with whom I shared many interesting discussions regarding 4-D genomics and evolution: Ping Ao, Rodrigo Fernandez-Valdivia, Jing-Bing Fan, Y. -B. Fu, Rafael Fridman, Markus Friedrich, Rafe Furst, Edward Golenberg, Alex Gow, Larry Grossman, Weilong Hao, ZhuoCheng Hou, Markku Kurkinen, Joshua Liao, R. Lin, J. S. Liu, Fred Miller, William Moore, Avraham Raz, Zachary Sharpe, Sureyya Savasan, John Tomkiel, Jeffrey Tseng, Derek Wildman, Alan Wang, J. Wang, H. -Y. Wu, G. -S. Wu, Y. -M. Xie, Ping Xue, Yang Yang, Weining Yang, Hao Ying, Holly Yu, J. -W. Yu, Kezhong Zhang, and Ren Zhang. Fourth, I own my gratitude to my editors from Elsevier: Peter Linsley for his initiation of this project, and Timothy Bennett for his valuable editorial help.

xiii

xiv

ACKNOWLEDGMENTS

Finally, I would like to thank my wife, Christine, and children, Julie and Eric. It takes a family to write this book. In addition to their unconditional support, their enthusiasm made the writing process a highly enjoyable journey. From breakfast to bedtime and in between, my discussions with them have motivated me to climb out of the shadows of giants as much as I hope to inspire them to.

C H A P T E R

1

From Mendelian Genetics to 4D Genomics 1.1 SUMMARY The gene frames much of modern genetics by acting as an independent unit of genetic information. The gene-defined genotypeephenotype relationship has been demonstrated by classical studies linking genes to specific genetic traits and Mendelian diseases. However, it is now apparent that most genetic traits cannot be explained by single genes or even a combination of many. Genomics was positioned to solve this challenge by searching for more genetic variants and quantitatively illustrating their combinatorial mechanisms. Although this approach appears promising to many, genomics has failed to identify common mechanisms of most complex traits. Where then do genetics and genomics fall short? A review of the field reveals that most genes do not, in reality, have independent functions, leading to a great deal of confusion about the role of genes in determining the phenotype. One could say that Mendel’s original pea experiments, which formed the foundation of modern genetics, should have already generated such confusion upon close analysis. In this chapter, the transition from genetics to genomics is briefly reviewed, as reflected by how the concept of the gene has changed during the genomics era. The initial enthusiasm and subsequent disappointment of the Human Genome Project is addressed, as well as the lack of fundamental progress despite overwhelming data accumulation, which slows down bio-industry and medicine. This journey has now brought us to an urgent need for a new biological paradigm, which focuses on genome and evolution-based genomics and incorporates both emergent properties and cytogenetic organization.

Genome Chaos https://doi.org/10.1016/B978-0-12-813635-5.00001-X

1

Copyright © 2019 Elsevier Inc. All rights reserved.

2

1. FROM MENDELIAN GENETICS TO 4D GENOMICS

1.2 THE EMERGENCE OF GENOMICS “Genetics” had already come a long way when British botanist William Bateson coined the term at the first International Congress on Genetics in 1906 to describe a new science that explored heredity and variation as initiated by Mendel’s (1866) publication of heredity in peas (Mendel, 1866). In the past 150 years, to understand the mechanism of Mendelian inheritance, researchers have zoomed in from the nucleus to chromosomes, from chromosomes to genes, and then from genes to DNA motifs. Such reductionist analyses have triumphed, leading to our understanding of the physical and chemical properties and structure of the gene, the mechanism of gene coding RNAs and proteins, the various models of gene regulation, protein modifications/degradation, macromolecule assembly, and the link between gene mutations and phenotypic variants, including many human diseases. We also understand how to identify and manipulate specific genes and apply this knowledge to produce genetically modified foods and improve human health through molecular medicine. The introduction of the double-helix model of DNA in 1953 and recombinant DNA technology in 1972 changed genetics forever (Watson and Crick, 1953a; Jackson et al., 1972). Molecular genetics has become the go-to field for new generations of biologists. Many bio-disciplines that were not gene-based withered. Moreover, the power of the gene has become a cultural phenomenon by capturing the general population’s imagination, thanks to many popular ideas. Richard Dawkins’s The Selfish Gene marked the onset of the gene-era hype in which everything was apparently controlled by genesdfrom individual proteins to specific biological traits and from evolutionary history to current health and behavior (Dawkins, 1976). This mode of thought assumed that all biological systems, including humans, serve the gene masters. We are merely the unwitting vehicles of genes. Genes are dominant, powerful, selfish, and mysterious. Such gene-centric concepts have shaped modern biology, generating a great deal of excitement and expectation within science, medicine, bio-industry, and society in general. If only the path of future genetics was as clear and simple as just following the gene!

1.2.1 A Brief History of Genomics Naturally, the ultimate goal of human genetics became hunting down all “disease genes” by molecular cloning and then correcting them by genetic manipulation such as gene therapy or eliminating them through prenatal screening. Suddenly, gene-based molecular genetics became the flagship of science, and the success of identifying gene defects responsible for human diseases further validated gene-based genetic approaches.

1.2 THE EMERGENCE OF GENOMICS

3

Positional cloning initiated an exciting wave of gene hunting. Following the first gene cloning success in 1986 for X-linked chronic granulomatous diseases by Harvard Medical School’s Stuart Orkin, gene after gene associated with many important disorders have been cloned, including Duchenne muscular dystrophy (cloned by Louis Kunkel at Boston Children’s Hospital and Ronald Worton from the Hospital for Sick Children in Toronto), cystic fibrosis (cloned by Lap-Chee Tsui from the Hospital for Sick Children in Toronto in cooperation with Francis Collins from the University of Michigan), Huntington disease, adult polycystic kidney disease, certain forms of colorectal cancer, and breast cancer. By 1995, about 50 inherited disease genes had been identified, highlighting the triumphant era of human molecular genetics (Collins, 1995). Interestingly, even before the gene hunting movement reached its peak in the late 80s to early 90s, there were increasing concerns about the gene-centric reductionist approach, which lead to calls for genome-based research, notably by Barbara McClintock and a number of evolutionary biologists and scientists who questioned genetic determinism. McClintock, the Nobel laureate who greatly recognized the importance of the genome in biology, specifically emphasized this in her 1983 Nobel Prize acceptance lecture at the Karolinska Institute in Stockholm. In the future, attention undoubtedly will be centered on the genome, with greater appreciation of its significance as a highly sensitive organ of the cell that monitors genomic activities and corrects common errors, senses unusual and unexpected events and response to them, often by restructuring the genome. We know about the components of genomes that could be made available for such restructuring. We know nothing, however, about how the cell senses danger and instigates response to it that often are truly remarkable. McClintock, 1984

It gradually became obvious that most genes do not have dominant phenotypes that display high penetration in populations. Researchers also realized that even though it is possible to identify specific gene mutations in many single-gene Mendelian diseases, this success might not be transferable to many common and complex diseases because of the large number of potential genes involved. Clearly, a better strategy was to search for more genes throughout the entire genome, which was the rationale to move from single-gene hunting to whole genome searches. For many, the advantage of focusing on the genome was merely to include more genes. In the mid-80s, some key technologies became capable of analyzing more genes, such as DNA panels of rodent-human somatic cell hybrids for physical mapping, DNA restriction fragment length polymorphism or RFLPs as variation markers for genetic mapping, polymerase chain reaction, automated DNA sequencing, and partial sequencing or mapping of several small genomes of microbes. These methodologies and the increased use of computers for data storage and analysis served as the

4

1. FROM MENDELIAN GENETICS TO 4D GENOMICS

necessary platforms for this new frontier of genetics. Then, the “perfect storm” came. In May 1985, Robert Sinsheimer, the Chancellor of the University of CaliforniaeSanta Cruz, held a workshop there titled “Can we sequence the human genome?” Sinsheimer organized this workshop to present a stronger argument that such a project was significant and feasible following an unsuccessful attempt to extract funding from his University. Many leading researchers attended, including David Botstein, George Church, Ron Davis, Walter Gilbert, Lee Hood, and John Sulston, and they discussed potential problems, technologies, and a timeline as well as costs for the genome project. Despite the success of this workshop, Sinsheimer still failed to obtain any funding for his project. However, the meeting initiated a chain reaction (Sinsheimer, 2006). In March 1986, new on the job and eager to establish a novel megaproject to bolster the genetic programs within the US Department of Energy (DoE), Charles DeLisi, the Director of the Office of Health and Environmental Research of the DoE, organized a conference at Santa Fe. Influenced by Sinsheimer’s workshop, this meeting also sought to determine the complete sequence of the human genome and map the location of each gene. Most significantly, in addition to discussing the desirability and feasibility of implementing a Human Genome Project, this meeting was crucial to pushing the idea of a full genome sequence onto the national scientific stage and converting it into a reality. DeLisi and others were able to begin the key task of garnering support from the DoE, the Reagan administration, and Congress (DeLisi, 2008). At the same time, Renato Dulbecco, a Nobel winner for discoveries concerning the interaction between tumor viruses and the genetic materials of the cell, published an influential editorial piece in Science urging that sequencing the entire human genome was the best way to solve the puzzles of cancer. His argument has often been used as the rationale for genome sequencing, especially in later cancer genome sequencing. Another meeting worth mentioning is the 1986 Cold Spring Harbor symposium “The Molecular Biology of Homo Sapiens” where the Human Genome Project was also debated in a “rump session” moderated by Paul Berg and Walter Gilbert. Despite the fact that there were more voices urging caution, the discussion among many molecular geneticists in attendance was essential to maturing this idea (Robertson, 1986). Also in late 1986, the National Academy of Science/National Research Council formed a committee on mapping and sequencing the human genome. Collectively, all these events led to the Human Genome Project becoming a reality. The genome research center was established in 1987 and included three National Laboratories of the Energy Department. An office of Human Genome Research at the NIH opened its doors in 1988. Finally, an international organization named the

1.2 THE EMERGENCE OF GENOMICS

5

Human Genome Organization (HUGO) was established in 1988, and the rest is history. It is interesting to ask what caused Sinsheimer to act? He says he was influenced by other “Big Science” projects outside biology. . As Chancellor, I had been involved in the conception of several large-scale scientific enterpriseseinvolving telescopes (the TMT project) and acceleratorsewhich were “Big Science,” scientific projects requiring, in some instances, billions of dollars and the joint efforts of many scientists and engineers. It was thus evident to me that physicists and astronomers were not hesitant to ask for large sums of money to support programs they believed to be essential to advance their science. Biology was still very much a cottage industry, which was fine, but I wondered if we were missing some possibilities of major advances because we did not think on a large enough scale . Sinsheimer, 2006

Similarly, why did the DoE initially play the leading role rather than the NIH? The NIH was correctly concerned about the potential shift of money away from investigator-initiated proposals to this big science project. Despite the fact that the DoE had funded studies of the biological effects of radiation for years, perhaps its historical link to some big projects like the construction of the atomic bomb in the Manhattan Project influenced the Department to undertake this gigantic project. The idea of sequencing the human genome to bolster the DoE’s research program was already circulated before DeLisi’s arrival. The report titled “Technologies for Detecting Heritable Mutations in Human Beings” by the Office of Technology Assessment hinted at the idea of sequencing the whole genome. A new wave of big science was coming. Nevertheless, the birth of such an enormous initiative like the Human Genome Project meant that genetics and biology would never be the same. It certainly marked the maturation of genetics and it also transformed genetics into genomics. There are different opinions regarding the relationship between the birth of the Human Genome Project and genomics. Some believe the Human Genome Project spawned a new science called genomics, while others think the birth of genomics was a gradual process that began from earlier efforts of gene mapping and sequencing that led to the Human Genome Project as it represented a necessary preliminary step before considering the feasibility of the Human Genome Project. Just as genetic research predated the use of the term “genetics,” genomics research predated the creation of its official name. It is hard to determine a defined timeline for the official birth of genomics compared with the Human Genome Project as they are intimately intertwined in both research context and historical timing. One thing was certain, however: the Human Genome Project became the primary goal and a major challenge for the young field of genomics.

6

1. FROM MENDELIAN GENETICS TO 4D GENOMICS

The journal Genomics was launched in 1987 by Victor McKusick, a medical genetics pioneer who published a catalog of all known genes and genetic disorders called Mendelian Inheritance in Man (MIM), and Frank Ruddle, a gene mapping pioneer. In their introduction of Genomics, “A new discipline, a new name, a new journal,” they stated the following: For the newly developing discipline of mapping/sequencing (including analysis of the information) we have adopted the term genomics . The new discipline is born from a marriage of molecular and cell biology with classical genetics and is fostered by computational science. Genomics involves workers competent in constructing and interpreting various types of genetic maps and interested in learning their biologic significance. Genetic mapping and nucleic acid sequencing should be viewed as parts of the same analytic processea process intertwined with our efforts to understand development and diseases.

The initial focus of the journal reflected the focus of the field of genomics which was well-laid out by McKusick and Ruddle in their first editorial piece (McKusick and Ruddle, 1987). It included the following topics: chromosomal mapping of genes, DNA fragments and gene families; sequence characterization of cloned genes and/or other interesting portions of genomes; comparative analyses of genomes to understand structural, regulatory, functional, developmental or evolutionary mechanisms; methods for large-scale genomic cloning, restriction mapping, and DNA sequencing; computational platforms/methods and algorithms to illuminate DNA and protein sequence data; understanding the hierarchy of chromosome structure; analysis of genetic linkage data related to inherited disorders; development of a genomic database; and parallel studies on genomes from different organisms. Thomas Roderick of the Jackson Laboratory coined the term “genomics” that would become the name of the new journal as well as for the new scientific field. According to Roderick, while attending a 1986 meeting with future editors in chief, McKusick and Ruddle: One evening, about 10 of us were at a raw bar, drinking beer and discussing possible titles for the new journal. We were on our second or third pitcher when I suggested ’genomics’. Little did we know then that it would become such a widely used term. Keim, 2008

1.2.2 Genetics or Genomics? Since the emergence of genomics, the terms “genetics” and “genomics” have been associated with diverse definitions within literature. Despite some definitional overlap, “genetics” is generally defined as the science of individual genes, heredity, and variation in living organisms, whereas “genomics” is a new discipline that studies the genomes of organisms.

1.2 THE EMERGENCE OF GENOMICS

7

The main difference between genetics and genomics is that genetics scrutinizes the functions and composition of single genes to illustrate how individual traits are transmitted from parent to offspring, whereas genomics addresses the structure, organization, and function (inheritance) of a genome by dealing with a large number (or the complete set) of genes and noncoding sequences and their nuclear topological and/or biological interrelationships (see Chapters 2 and 4). As no gene is an island and most genetic traits involve multiple genes and their complex interactions within environments, the scope of genomic research is drastically increasing. In particular, because the genome is not just a bag of genes (see Chapter 2), genomics has expanded past its genetic roots. Now, genomic concepts and methodologies generally dominate biological science. Single-gene research no longer fits under the genomics umbrella unless the aim of a specific study is to incorporate the gene or its associated pathway and elucidate its effect on the entire genome’s network (Genome.gov). It would be safe to say that genomics represents a new phase of genetics. Some scholars even refer to genomics as 21st century genetics. Knowing that the future studying of genetic information will undoubtedly involve the genome system rather than individual genes in isolation, the holistic platform of genomics might someday replace genetics altogether. The emergence of genomics is of ultimate significance to genetic research. First and foremost, it turned traditionally highly selective genetic research into less selective genomic investigation. Such a transition is reflected at both the research subject level and the system used. New genomics research focuses on large regions of the genome or the entire genome rather than specific and isolated genes of interest. Equally important, genomics allows a new research approach more amenable to direct analyses of natural populations rather than traditional genetics studies that are mainly dependent on highly specific model systems under controlled laboratory conditions. In fact, most genetics laboratories focus on model organisms as experimental systems, including Escherichia coli, Saccharomyces cerevisiae, Drosophila melanogaster, Caenorhabditis elegans, Arabidopsis, various inbred mice strains, and established cell lines. These model organisms/systems clearly lack the diversity and heterogeneity of natural populations. Although some classic genetics studies have analyzed some natural populations, the scale is incomparable with genomics studies in terms of the whole genome approach and the size of the natural populations that are studied. Second, the birth of “Big Science” in biology has transformed genetics, challenging the previous small-scale hypothesis-driven system that was best suited to studying causative relationships in a defined linear system. By revealing the true complexity of biological systems, researchers will likely begin to question the genetic traditions of searching for causal

8

1. FROM MENDELIAN GENETICS TO 4D GENOMICS

isolated genes or well-defined molecular pathways. The “big science projects” approach has brought both enthusiasm and uneasiness to the scientific community, as this is a change from traditional biological science where individual researchers carve out their own unique niche, testing their own hypotheses, sometimes for many years. A key challenge of large-scale genomic projects is using the correct framework to best integrate technologies (Heng and Regan, 2017). These large-scale genomic projects will likely conflict with traditional genetic knowledge, as when increased variants are involved, different principles are often applied. For example, when dealing with a complex adaptive system involving many different factors, the correlation study becomes more important as it is too hard to identify true causation. Third, the biomedical industry requires pragmatic reality checks more often than traditional academic institutions require. They require vigorous reviews to select molecular targets derived from basic research. The failure of any clinical trial could be devastating to a company regardless of how solid the lab-based research is. Such reality checks, which are now fed back into the research community, influence the direction of basic research, as policy makers and researchers are increasingly paying attention. For instance, it was industrial researchers who reported that the majority of representative “high-quality” cancer research papers are unrepeatable (Mullard, 2011; Begley, 2012). There is no incentive for academic research to carry out such analyses. However, it is crucial for pharmaceutical companies to make sure that their billion dollar drug development effort has a solid basis. Finally, because of the scale of funding, public interest is now a key component in genomic research. It is no longer enough to just explore billion dollar hypotheses for curiosity’s sake. Many basic genetic researchers are not happy with this new trend. They firmly believe that basic research takes time and will ultimately pay off in the long run and that scientific progress should not be unduly influenced by factors outside of science. However, the good old days of doing science purely for the accumulation of academic knowledge will likely not return. The days of moderate research budgets supporting individual labs and their genetic discoveries have given way to the megaprojects of the genomics age. This large-scale approach requires more public support and associated scrutiny. Understanding the new reality of genomics is critical, as the research community must educate the general public and be careful to avoid harmful overreaching promises. For this reason, many previously “offtopic” issues have become inseparable parts of genomics itself. Science policy, ethical issues, and public interest are often on the agenda of most scientific conferences of genomics.

1.2 THE EMERGENCE OF GENOMICS

9

1.2.3 Fundamental Limitations of Traditional Genetics Throughout the history of modern genetics, a chain of many “brilliant experimental designs” has generated our core knowledge of genetics, which formed the backbone of the gene theory. An interesting “open secret” is, though, that most of these famous milestone experiments are actually based on exceptional cases that can only be effectively demonstrated using specific model systems and under welldefined experimental conditions. For example, it is well known that Mendel’s classic paper, which perfectly illustrated the genotypee phenotype relationship between parent and offspring, demonstrated that genes function as defined independent informational units. This is still the basis of current genetic theory. However, it is much less appreciated that there were many preconditions or limitations for his beautiful illustrations. First, it is difficult to replicate Mendel’s clear-cut patterns using most other species. In fact, Mendel himself had failed to confirm his hypothesis in his own hands when he used hawkweed (as suggested by Karl von Nageli, one of the leading scientists at that time who had read Mendel’s seminal paper) and beans. Rather, some upsetting data began to appear: only for certain characteristics did the flowers follow the same pattern as his peas. The drastically increased data diversity presented in these other systems clearly caused increased confusion for Mendel. Second, Mendel only selected 7 traits among 34 initially studied traits in peas to demonstrate his points. The rationale of reporting seven selected traits was likely because only these seven traits produced the most appropriate results to support his concept. It would be interesting to know what the data looked like for majority of the traits unreported by Mendel. We know today that the phenotypes of most genes do not follow the Mendelian 3:1 pattern because a majority of genes do not truly function as straightforward independent units. Instead, the expression of a genotype often involves multiple genes and complicated genomic and environmental interactions. Now, Mendel’s seven genetic factors have been linked to seven genes with molecular characterization. These famous characteristics are likely involved in a range of genetic causes (including simple base substitutions, changes to splice sites, and the insertion of a transposon-like element). Interestingly, these seven genes were either not linked or if linked, possibly not subject to his analysis (Reid and Ross, 2011), which allowed Mendel to see a distinct pattern of segregation. Clearly, in contrast to the popular viewpoint, it was not by luck that Mendel chose these seven

10

1. FROM MENDELIAN GENETICS TO 4D GENOMICS

“perfect” characteristics, but by extreme trait selection. Indeed, Mendel’s character selection was described in his paper: Some of the characters noted do not permit of a sharp and certain separation, since the difference is of a “more or less” nature, which is often difficult to define. Such characters could not be utilized for the separate experiments .. Mendel, 1866

Third, Mendel had a strict selection criterion for each sample. He had purposely avoided collecting “average data” by using exceptional samples in his experiments. For example, to comparing the difference in the stem length (one of his 7 traits), a long axis of 6e7 ft was always crossed with a short one of 0.75e1.5 ft. By pushing extreme cases rather than using average long and short populations, the certainty of data becomes much more impressive. Paradoxically, however, the pattern he discovered based on selection will not represent the majority of the data he ignored. Fourth, Mendel had tried his best to reduce environmental variations that could influence the data, such as growth conditions, timing of experiments, and the effect of all foreign pollen, which invariably created ideal systems with minimal environmental influences. Together, Mendel had created a perfect yet highly exceptional system. Perfect for a manipulated linear model with reduced variants, exceptional for the reality of genetics where most genetic traits do not contribute by a single gene and heterogeneity dominates within a population. Mendel’s approach might be the reason why many scientists have had trouble replicating the same simple ratios he reported for these carefully selected traits. For example, when the sweet pea (a closely related species of the garden pea that Mendel used) was examined, the pattern of heredity was considerably more complicated than Mendel’s results (Bateson and Saunders, 1902). In fact, the independence of genes can be diluted when passed them among generations. Furthermore, the rationale of classifying genes into dominant or recessive status has been challenged back to beginning of the last century, when data showed that genetic traits can be dominant, recessive, neither (Weldon, 1902; Radick, 2015), both, or one of many statuses in between. The effect of a gene is constrained or defined by the hereditary background (ancestry) and environments, and the determinist’s viewpoint of the gene might be an illusion for majority of species. After carefully analyzing data from Mendel and other well-known researchers working on related systems, Weldon concluded the following: . I think we can only conclude that segregation of seed-characters is not of universal occurrence among cross-bred Peas, and that when it does occur, it may or may not follow Mendel’s law. The law of segregation, like the law of dominance, appears therefore to hold only for races of particular ancestry. In special cues, other formulae expressing segregation have been offered, especially by De Vries and by Tschermak for other plants, but these seem as little likely to prove generally valid as Mendel’s formula itself. Weldon, 1902

1.2 THE EMERGENCE OF GENOMICS

11

Interestingly, the above paper systematically challenged the data presentation and legitimacy of Mendel’s theory immediately following the rediscovery of the laws of genetics. Based on the understanding of the pea varieties and their pedigrees, Weldon was convinced that Mendel’s law had no validity beyond the created artificially purified experimental systems. He not only calculated the chance that getting worse results is 16 to 1 (based on Mendel’s data) but also illustrated the challenge of classifying continuous variable characteristics of the pea (green or yellow for seed color, round or wrinkled for seed shape) using binary categories (dominant vs. recessive). His analyses hinted the high possibility of cherry-picked results on Mendel’s part. Ronald Fisher also thought that the data from Mendel were too good to be true. Given Fisher’s reputation in data analysis, his viewpoint is more influential than Weldon’s. Three decades after Weldon, Fisher published a paper to elaborate on this issue. In his paper, Fisher argued that Mendel knew how his data should be according to his theory, and he carefully planned his experiments to support his theory. Fisher even guessed that some data must have been quietly removed to support the theoretical prediction (Fisher, 1936). This paper, in addition to some later more direct accusations of data falsifying, has formed the so-called MendeleFisher controversy. Still, it is now accepted that most accusations and suspicions have turned out to be groundless (Hartl and Fairbanks, 2007; Franklin et al., 2008). The editor of Classic Papers in Genetics, James A. Peters, wrote the following introduction to Mendel’s original paper that laid the foundation of modern genetics (Peters, 1959): . There have been comments made that Mendel was either very lucky or tampered with his data, because his results are almost miraculously close to perfect . As to the second charge, that he might have arranged his data so as to shed the best possible light on his conclusions, I believe that the only way he might have manipulated his data is through omission of certain results that would have led to unnecessary complications..Mendel probably knew of these interrelationships . The fact he chose to utilize only those characteristics that fitted his concepts cannot be interpreted as an act of dishonesty on his part . Peters, 1959

Judging by Mendel’s candid presentation in his publication with all the details of data selection and, in particular, knowing that he was increasingly puzzled when he worked on other species, it is clear to us that his extreme selective reporting was not because of his dishonesty but the natural unconscious bias that comes with science research, as these improbably “perfect” data can only be generated from highly selected artificial systems that he created. The dilemma Mendel faced was how to balance the art of selecting beautiful but exceptional data to unveil hidden scientific principles while

12

1. FROM MENDELIAN GENETICS TO 4D GENOMICS

avoiding the fundamental misunderstanding by ignoring the general feature of the system under study. The majority of genetic researchers favor Mendel’s approaches. They argue that it is absolutely necessary and sometimes the only option to select specific conditions or unique models to illustrate certain aspects of nature, which is the rationale of using models to simplify nature and eliminate variables. In fact, the selection of an appropriate system to address the right questions is a key to success in science. Only when we discover the mechanisms in a specific and often the simplest system, can we further add more elements to modify our theory that best fits reality. Of course, they often cite Mendel as an excellent example. Mendel’s law fits single-gene heredity best and thus provides the basis for understanding the heredity of multiple genes in major complex cases. However, scientists must be aware of the fundamental limitations of this approach, as the more unique and elegant the model system used, the more limited the conclusions will be for generalization. For example, the following questions need to be addressed to understand the limitations of Mendel’s laws: should we classify genes based on a dominanterecessive relationship while knowing that a large amount of genetic variants cannot be explained by such binary categories of genes? If there is no clear-cut relationship between most individual genes and phenotypes, should we still consider Mendel’s law the law of genetics? What if laws based on a simplified system (like the single geneephenotype relationship detected by Mendel) are drastically different from real-world complex systems (the majority of biological cases where all genes are connected and environmental variation plays an important role)? Moreover, should we clearly point out that Mendel’s hypotheses only represent exceptions before crowning them as the law of genetics? Such a dilemma has occurred throughout the modern history of genetics, yet most textbooks fail to warn readers that many well-accepted concepts of genetics are fundamentally limited because of key differences between simplified models and reality. By comparing classical model systems and their derived laws of genetics, common features can be summarized using single-factor analysis to link a single genetic element to limited phenotypes by ignoring links with high complexity. Often, the selected model system ideally illustrates a causative relationship of an investigator’s favorite concept, as these model systems become more or less linear with artificially enhanced certainty. In a sense, each model system offers some low-hanging fruit, but they are the exceptions in the real world as these pure systems artificially amplify given genetic contributions by eliminating other important factors that exist in natural systems. This way of thinking in genetics has lasted for over a century without any serious challenge. We often validate data using artificial models

1.2 THE EMERGENCE OF GENOMICS

13

rather than real-world situations. A major and unfortunate trend in the field is to publish “positive” data and not report “negative” data or data that do not make sense. Many “clear-cut” stories have been published. Although these stories are academically interesting, they have limited practical implications. With the advances of human genetics and medical genetics and the increased popularity of the gene in society, there has been high hope to fix gene mutations to fight many common and complex diseases. For the first time in the history of genetics, theories can be directly examined using various molecular genetic methods on many human diseases. Paradoxically, however, the progress has been slow, and the knowledge gap has been drastically increased between genetic concepts and clinical realities. First, it has been challenging to link non-Mendelian diseases to specific genes. Second, the genetic heterogeneity is overwhelming. Third, environmentegene interaction plays dominant roles for disease phenotypes. Fourth, the disease progression/response to treatment is an adaptive system where the power of genetic prediction is drastically reduced. All of these features raise some profound questions: if Mendel’s law is correct, and if many diseases are caused by multiple genes, then why is it so difficult to identify these key gene mutations in most common and complex human diseases given our advanced molecular tools? Do most genes really serve as independent informational units (Pigliucci, 2010), given the fact that the function of individual genes does not divulge the emergent properties of a genetic network (Chouard, 2008)? Obviously, the time to rethink the laws of genetics and move the field forward (from Mendel’s extreme selection to the real world of genetic and environmental complexity) is long overdue. Such a challenging transition will likely generate much confusion, as it did for Mendel. It is thus interesting to know Mendel’s thoughts about his hypotheses following his unsuccessful experiments on hawkweed and other species, which ultimately might question his observations in pea plants. This issue might also relate to the pity that he did not continue his research after he published his milestone publication. A common explanation was that he interrupted his research because of more duties and issues from the monastery. Knowing his increased confusion when dealing with different species, it is not totally unreasonable to speculate that this confusion also contributed to the discontinuation of his remarkable experiments. Finally, the effort of discussing the key limitation of Mendel’s laws is not simply to discredit research based on simplified systems, as initially building knowledge based on low-hanging fruit is a common practice in science. That is why both Newton’s laws and Einstein’s theory of relativity represent keystones in physics. However, there is a crucial difference between many physical/chemical laws and genetic laws. Physical/ chemical laws are supported by the vast majority of experiments, with

14

1. FROM MENDELIAN GENETICS TO 4D GENOMICS

limited exceptions, whereas genetic laws are only correct in exceptional cases. For example, Newton’s second law, F ¼ MA (force ¼ mass acceleration), is supported by nearly all experimental data, except when velocity is close to the speed of light (when the special theory of relativity is needed). In contrast, as we just discussed, Mendel’s laws can only be supported by limited experimental data from very limited cases. Why is there such a huge difference between physical laws and Mendel’s laws of genetics in terms of application toward a majority of cases? This question is not only highly significant to rethink the future of genetics but also interesting in relation to the philosophy of science. In addition to the obvious feature of heterogeneity in biology, which could complicate the prediction power of genetic laws, we need to examine what Mendel had done to initially discover and then establish the laws of genetics. On the surface, Mendel followed similar patterns that other giants of science used when searching for his scientific theory: his initial surprised observation was that the same characteristics kept appearing with unexpected regularity when he crossed certain varieties. He thus designed systematic cross experiments with highly selected traits. His observations included the disappearance of the recessive phenotype from F1, the recovery of the recessive phenotype from F2, and a dominant to recessive phenotype relationship that closely matched a 3:1 ratio. He then introduced a model to explain how the independent genetic factors (both dominant and recessive) can be separated and recombined during the cross without dilution by its counterparts. By scoring the number of offspring, the genetic factors and their defined phenotypes can be illustrated simply by the numbers! His analyses thus validated his models which lead to the laws of genetics. What were the potential problems then? First, the phenotypes were not correctly classified (the initial observations were not very solid). There was no clear cut between dominant and recessive phenotypes; rather, there were many “in between” phenotypes. For example, in between the green and yellow seeds, there are many nontypical greens and nontypical yellows. If a careful classification is used, the data distribution would be far from a 3:1 ratio. The same is true for the seed shapes, as well as other traits, challenging the most basic assumption that phenotypes should be divided into dominant and recessive classifications. From the initial observation to the validation of the model, the data presentation was problematic. Without a solid factual basis, any “law” will inevitably fail. Second, it is possible that under specific conditions, some exceptional “strong” traits might display the pattern close to 3:1 ratio. However, we

1.3 DIMINISHING POWER OF GENE-BASED GENOMICS

15

should not generalize these into the general law. A more realistic model should be established to explain most biological cases. It should be pointed out that, in fact, Mendel did describe some inconsistencies in other species. However, these important discussions/confusions were ignored by other educators who were keen to tell the successful and easy-to-understand story of Mendel’s law. Again, in James Peters’ introduction for Mendel’s classical paper, he wrote: I have not included the last few pages of Mendel’s original paper, which dealt with experiments on hybrids of other species of plants, and with remarks on certain other questions of heredity. These paragraphs have little bearing on the principles Mendel proposed in this paper, and I have found from experience with my students that these pages serve primarily to confuse rather than to clarify.

That is potentially problematic. Many scientific concepts, in a clear-cut and well-designed system, are simple, precise, and even beautiful. However, when put into a broad context, or through the lens of reality, it can become confusing, conflicting, and hard to understand. We do need to show the real picture of science to students. It is a crucial way to illustrate the limitations of some beautiful theories; we cannot just retell rosy stories. That is the partial reason why most researchers nowadays firmly believe they can finally identify key common genetic factors for most complex traits despite how difficult the task actually is. They say, “Mendel did it, why not us? It’s just a bit more complicated than his single gene traits.” Similar examples can be found in cancer research, evolutionary research, and many other fields of biology. Now, knowing the reality behind Mendel’s data, it is up to us to change our attitudes toward genetic approaches.

1.3 DIMINISHING POWER OF GENE-BASED GENOMICS The past 30 years of genomic research has transformed biological research as well as increased interest and expectations of the general public toward science. This is particularly true once the sequencing of the human genome was successfully completed, an achievement that has been praised as the greatest scientific achievement of mankind, as the entire sequence represents the “book of humanity” and “language with which God created life.” During a joint announcement of the United States and United Kingdom on June 26, 2000, surrounded by two teams of scientists, US president Bill Clinton proudly announced that

16

1. FROM MENDELIAN GENETICS TO 4D GENOMICS

“It is now conceivable that our children and our children’s children will know the term cancer only as a constellation of stars”. According to him, “Genome science [.] will revolutionize the diagnosis, prevention and treatment of most, if not all, human diseases.” The White House Office of the Press Secretary, 2000a

Headlines appeared all over the world following this announcement. The New York Times’ banner proclaimed, “Genetic code of human life is cracked by scientists.” Time Magazine made it their cover story. The Guardian called it, “The breakthrough that changes everything.” The Wall Street Journal opined, “This is truly big stuff,” and the Economist read, “The results are a huge step toward a proper understanding of how humans work.” Such hyperbole was not created by politicians in concert with the media but came directly from the scientific genomics community, particularly from many of the leaders who functioned as scientific advisors to politicians as well as spokespersons to the general public. All media information came from these scientists’ estimates of the impact that sequencing would have. The following are a few examples. Francis Collins, then the head of the US Genome Agency at the National Institute of Health, said: “It is probably the most important scientific effort mankind has ever mounted, and that includes splitting the atom and going to the moon.” He predicted the genetic diagnosis for cancer would be achieved in 10 years and in another 5 years, the development of treatments. “Over the longer time, perhaps in another 15 or 20 years, you will see a complete transformation in therapeutic medicine” (The White House Press Release, 2000b). Roland Wolf of the Imperial Cancer Research Fund said: “The sequencing of the human genome represents one of the great achievements in human science. It really will be a landmark in the evolution of man.” Mike Stratton, head of the cancer genome project at the Sanger Center in Cambridge, stated: “Today is the day in which the scientific community hands over its gift of the human genome sequence to humanity. This is a gift that is very delicate, very fragile, very beautiful ..” The genome was picked by Science as “Breakthrough of the Year” in 2000. According to Science, compiling maps and sequences of genetic patterns “might well be the breakthrough of the Decade, perhaps even the Century, for all its potential to alter our view of the world we live in.” Nearly two decades have elapsed since these pronouncements and exciting predictions were made. During these hopeful 19 years of hard work, with the genome sequencing information in hand, many other large-scale methods have been developed and used in genome research and clinical studies (Heng et al., 2011a). Some examples include whole genome scanning to hunt for genetic loci responsible for human diseases; global gene expression profiling to identify the pattern of

1.3 DIMINISHING POWER OF GENE-BASED GENOMICS

17

diseases needed for diagnosis and treatment; copy number variation analysis to understand the genetic mysteries that gene mutations cannot explain; classification of noncoding sequences such as the ENCODE project (the encyclopedia of DNA elements); (Harrow et al., 2012) Human Epigenome projects; and other “omics” projects such as massive parallel sequencing and analysis of personal genomes. Because of these cutting-edge technologies, the future of medicine now appears much clearer to us. In the near future, the prediction is that each physician’s office will be equipped with desktop genetic diagnostic machines. A few drops of blood should offer a look into the future disease potential of each individual patient. Such documentation of personal genomic information will then be the basis of selecting target-specific drugs or even gene therapy. This ultimate goal of individualized medicine, referred to by the buzzword “precision medicine”, appears to be around the corner. Such expectations are obviously overstated as they are based solely on the belief that individual genes and proteins have a linear causative relationship with diseases and treatment response. And so there continues to be disappointment after disappointment that contests such linear relationships and there is growing concern (that was previously dismissed) about the current direction of genomics. A typical argument is that the only way to overcome these obstacles is by being positive and continuing to work hard. Additional data will ultimately reveal the truth. In today’s world of positive outlooks, it is more fashionable to have a “half full” outlook rather than a “half empty” one. Clearly, incorrect approaches and irresponsible media promises have been interpreted in a positive and forward looking light. The issue here is not whether scientific attitude is positive (half full) or negative (half empty). Rather, we must critically evaluate and choose the correct scientific framework when there is clear evidence that the current paradigm is not working. A more rational question is why, in spite of all these technological breakthroughs, have only limited knowledge and applications been achieved? Why do scientists not understand the full picture? By just amassing more data, will they be able to figure out the correct framework? From the Ptolemaic view to the Copernican concept or from Newton’s laws to the world of Einstein, it is known that the simple accumulation of data does not generate a revolution. Unfortunately, many biologists keep busy only with data collection. There is an entrenched mindset that if it is not the gene itself, then it must be something interconnected such as regulatory elements, or copy number variation, or noncoding RNA, or something that is on the horizon and we just have not dug deep enough yet to find it. This has led to wave after wave of popular approaches within the research community where

18

1. FROM MENDELIAN GENETICS TO 4D GENOMICS

few have questioned the gene-based framework of thought. Grouping all these isolated concerns that represent different factions of both the academic and bio-industry communities sends a message that very powerfully questions the status quo. Researchers cannot afford to continue to ignore this message.

1.3.1 The Ignored Voice of Antigenetic Determinism A broader view of genetic determinism (that genes determine biological fate) has been with us far before we obtained knowledge of how genes code proteins. The narrow view of genetic determinism (that genes determine human behavioral phenotypes), in fact, was closely associated with the eugenics movement in the late 19th and early 20th century. For example, besides the infamous actions of Nazi Germany, the sterilization of people with so-called “bad genes” was encouraged. Many states in the United States had even adopted laws to reduce “unfit” populations. It is thus ironic to see that a similar but much broader idea once again comes into play with the onset of gene-centric molecular genetics, and in particular with the growing excitement of the Human Genome Project (Kevles and Hood, 1992). In the late 1980s, to lobby the US congress into funding the Human Genome Project, James Watson declared that “We used to think out fate was in our stars. Now we know, in large measure, our fate is in our genes,” which highlighted the popular viewpoint of genetic determinism. The core tenet of genetic determinism is that genes determine how an organism works: if we know enough about what genes are and how genes “act,” we could understand all of biology (Keller, 1993); and the gene is an independent genetic information unit. No matter how complicated a given genetic process is, it can be dissected into its basic genes as its causative units. Despite the fact that there is no strong direct evidence to back it up, genetic determinism has become incredibly popular in the field of molecular genetics. According to Keller, “. it is hard to see what might be controversial in such claims. The attribution of agency, autonomy, and causal primacy to genes has become so familiar as to seem obvious, even self-evident” (Keller, 1993). Nevertheless, there have been a handful of scholars who continuously pointed out the fundamental flaws of genetic determinism. Their arguments include the following: organisms not only inherit genes made of DNA but also a complex structure of cellular machinery made of proteins (Lewontin, 1993); there are complex dynamics between nuclear and cytoplasmic elements; epigenetic features are also inheritable; and bioprocesses are adaptive systems with emergent properties.

1.3 DIMINISHING POWER OF GENE-BASED GENOMICS

19

Unfortunately, most of these credible points failed to transform the field (beyond the increased appreciation of the importance of epigenetics). The majority of epigenetic studies still focus on how to make sense of gene-centric frameworks rather than search for new concepts outside the “box of genes”. Interestingly, people do read these antigenetic determinism ideas, and some admit (often in private) that these points make sense, but few have changed their research methods. There is thus a big gap between the logical way of how science should be done and the way people practice science in their daily life. In particular, the excitement and high expectation of the Human Genome Project pushed genetic determinism into a new high. For example, genetic determinists have predicted that sequencing the human genome would solve multiple theoretical and practical problems within human biology (Pigliucci, 2010). When the Human Genome Project was completed 17 years ago, some well-known scientists cautioned against over celebrating and overestimating the meaning of these findings as based on genetic determinism. Nobel laureate David Baltimore wrote the following in his piece “DNA is a reality beyond metaphor”: The drumbeats get louder as we approach the day when the first draft of the entire structure of the human genome is to be announced. Pundits appear on television shows, trying to tell the public what this means. Many are my good friends. But I must tell their dirty little secret. They are not telling the whole story. . they tell the world that the genome is like a book, with words, sentences and chapters. .the periodic table for biologists. But these and other metaphoric links miss the real story. The genome is like no other object that science has elucidated. No mere tool devised by humans has the complexity of representation found in the genome. Baltimore, 2000

After reflecting on why it is so challenging to understand sequencing information, he concluded that we should not mistake progress for a solution. According to Baltimore, it will take at least another 50 years to fully understand the meaning of these DNA. In that time, we should “try harder, but with richer and more honest” analyses of the true nature of the genome. In fact, even before the drumbeat of celebration for the genome project began, there were serious concerns about the direction in which genomics was heading as influenced by genetic reductionism. The late Richard Strohman, an emeritus professor of molecular and cell biology at UC Berkeley, wrote numerous papers offering his criticism on the highly publicized genome project. In contrast to genetic determinism, Strohman proposed that the most common and complex human diseases and behaviors (including beliefs and desires) could not be reduced to purely

20

1. FROM MENDELIAN GENETICS TO 4D GENOMICS

genetic influences. Moreover, he believed that genetic determinism was unable to reconcile the increasing findings of enormous biological complexity and that awareness of this fact demanded a new holistic scientific theory of living systems. He further called for a biological revolution using Thomas Kuhn’s criterion. Interestingly, in his view, biological science had not been on the cusp of a Kuhnian revolution until now. Additionally, he argued that the most acclaimed Watson-Crick era had, in fact, marked departure from organismal biology and caused a wrong turn toward gene-dominated genetic research. The Watson-Crick era, which began as a narrowly defined and proper theory and paradigm of the gene, has mistakenly evolved into a revived and thoroughly molecular form of genetic determinism. Strohman, 1997

He realized that the current genetic paradigm no longer worked in light of the importance of epigenesis and bio-complexity. This is the first time in history of the life science that a single generation has been able to live through the rise and fall of a single dominant paradigm. It is a deeply disturbing experience, especially for those who have followed the radical change from a distance, and especially given the enormous investment our culture has made in ideas tied to a hopelessly ineffective, linear causality and determinism. Strohman, 1999 (permission from Wild Duck Review)

Unfortunately, he did not witness the onset of his revolution before his death. He probably understood the reasons why this did not happen. He listed a few key challenges for Kuhn’s paradigm shift: One is incommensurability, where the scientists on either side of the paradigmatic divide experience great difficulty in understanding the other’s point of view or reasons for adopting it. Second is the accumulation of anomalies, wherein “normal science” of the current paradigm unintentionally generates a body of observations which not only fails to support that paradigm, but also points to glaring weaknesses in its method and theoretical outlook. . Third, paradigm shifts encounter resistance to change from the old guard, which is based not only upon a scientific incommensurability but on traditional ways of teaching and training the new generations in the (old) ways of research ..(permission from Wild Duck Review)

He then emphasized the importance of the arrival of a competent paradigm capable of replacing the old one. Clearly, key elements instrumental to arriving at a new and contending paradigm were not yet in place, as there is no new and competent paradigm in the field of genomics. Strohman had great interest in epigenesis and he had searched for a holistic theory of living systems. But he realized that the epigenesist approach provided only partial answers. He represented a fighter for a new paradigm before its arrival. But just what exactly is the new paradigm?

1.3 DIMINISHING POWER OF GENE-BASED GENOMICS

21

An interesting observation is that many scientists who vehemently criticize current reductionism-centered genomics and the “sequencing for the sake of sequencing” projects are senior scientists who are less directly involved in sequencing and many are close to retirement. While Kuhn stated that “a scientist cannot remain a scientist and at the same time be without a paradigm,” the reason that “scientists cannot at the same time practice and renounce the paradigm under which they work” is also a political and economic one. Senior, established scientists have the luxury of being able to critically review their own careers, and they may decide that the direction they took did not work despite a lifetime of effort. They are allowed this luxury because these senior scientists are no longer constrained by outside influencesethey are free to speak the truth as they see it without worrying about academic politics in relation to funding, publications, or tenure. Perhaps, it is more interesting to note that in the postgenome era, even leading genomic researchers who highly praise the achievements of the Human Genome Project frankly convey to the audience that many of the basic concepts of genetics they have taught students in recent years have simply been deemed incorrect. While such true statements are often used to glorify their ongoing research, it evidently reveals that there is a big elephant in the lecture hall: if the new data disprove previous concepts, where is the new paradigm in genomics? Scientific revolution constantly brings down previous paradigms no matter how solid and valued they once appeared. Practically speaking, large portions of our current knowledge will eventually be found to be somewhat wrong or inaccurate. When looking back at the history of our current scientific era in the future, one intriguing question might be: why did genomics scientists in the 21st century not actively search for a new paradigm, given that they (1) knew the history of science, (2) were familiar with Karl Popper and Tomas Kuhn’s concept of how science works, and (3) in the face of daily accumulating facts that did not make sense and contradicted the current paradigm.

1.3.2 The Rise and Fall of the Gene Genetic determinism was spawned from the gene concept and has served to further cement veneration of this concept that scientists created. The gene is the foundation of genetics, and genetics itself can be described as the history of understanding the gene. The definition of the gene has been constantly refined during history of genetics and genomics. What started as an abstract idea acquired the physical identity of coded DNA molecules. The idea developed from a one geneeone enzyme hypothesis to a one geneeone peptide idea and progressed to increasingly complex explanations that had increasing uncertainty. With the development of

22

1. FROM MENDELIAN GENETICS TO 4D GENOMICS

genomics, however, the concept of the gene has been seriously challenged. First, the classic notion of genes as discrete units in the genome is no longer correct as there is separation between coding sequences and control regions that can even reside on different chromosomes. Based on the complex relationship among all genes and different splicing forms, the “unit of inheritance” is also questionable as most genes cannot be functionally isolated within a complex genome system. The general definition that “genes are functional regions within DNA molecules, and their mission is to code the instructions to make specific proteins” is also no longer correct. Gerstein et al have briefly reviewed the major conceptual changes in the definition of the gene over time. From the 1860-1960s, the gene was accepted as a discrete unit of inheritance. Specifically, during the 1910s, the gene was considered as a distinct genetic locus; during the 1940s, as what codes an enzyme; during the 1950s, as a DNA molecule; during the 1960s, as a transcribed code; from the 1970-1980s, as an open reading frame sequence pattern; from 1990-2000s, as a genomic sequence defined by annotating methods; and finally in the post-ENCODE era, as a union of genomic sequences encoding a coherent set of potentially overlapping functional products (Gerstein et al., 2007). Clearly, the genome project has paradoxically only brought increased uncertainty to the concept of the gene. The definition of a gene is clearly influenced by concepts of inheritance and technology-defined experimental findings. There has been a struggle to balance data generation and synthesis. During synthesis, the framework used and types of data on which this framework is based are most crucial. The history of the rise and fall of the gene concept clearly reflects this. Despite drastic changes brought on by different technologies during the past 150 years regarding genetic material, what has not changed is the notion that the genotype determines phenotype, and genes are key information units that determine genotype. Like many scientific concepts, the definition of a gene has gone through a cycle of uncertainty, certainty, and then uncertainty. Before genetic elements were linked to chromosomes, the gene concept was uncertain. Once the DNA model was established, it became highly certain. But on completion of the human genome sequencing and particularly the ENCODE project, once again it became highly uncertain as there were many doubtful components within the current definition. For example, the post-ENCODE definition tries to comprehend the complexity revealed by sequencing analyses. In fact, far before the genome sequencing era, there were serious concerns about the gene and its definition from some well-known thinkers. For example, R. C. Lewontin suggested that the process of inheritance would be better understood by developing a “geneless” theory of

1.3 DIMINISHING POWER OF GENE-BASED GENOMICS

23

heredity (Lewontin and Lewontin, 1974). With the increased knowledge of the biology of genes, an even bigger challenge is that many generally accepted definitions of the gene do not address the key issue that “it’s rare that a gene can be determined to have caused any particular trait, characteristic or behavior” (Keller, 2002). According to Denis Noble: Relating genotypes to phenotypes is problematic not only owing to the extreme complexity of the interactions between genes, proteins and high-level physiological functions but also because the paradigms for genetic causality in biological systems are seriously confused. Noble, 2008

To address this confusion, researchers must reevaluate the gene theory that considers the gene as an independent informational unit and move to a new paradigm that considers genes to be parts or tools of a coding system rather than precisely defined informational entities. Such a new paradigm insists that the genotype is not simply a collection of individual genes that define traits where the given DNA sequences do not directly determine the defined function but sees the final function as an emergent property of the genome context (Ye et al., 2007; Heng, 2009; 2015; Heng et al., 2011a). With the rapid accumulation of genomic information, the significance of the individual gene has declined. It is interesting to note that 30 years after its publication, even the author of The Selfish Gene perceives the reduced star power of the gene. In 2006, in an event celebrating “The selfish gene: thirty years on,” Richard Dawkins stated: I can see how the title “The Selfish Gene” could be misunderstood, especially by those philosophers, not here present, who prefer to read a book by title only, omitting the rather extensive footnote which is the book itself. Alternative titles could well have been “The immortal gene,” “The Altruistic vehicle,” or indeed “The Cooperative Gene.” Edge, 2006 (with permission from Edge)

How the power of the gene has been altered in 30 years! The change from “Selfish Gene” to “Cooperative Gene” as suggested by the author himself is really a complete reversal of the basic premise of the Selfish Gene conceptea concept that was born amid the heyday of the gene. Its popularity was further enhanced in the nonscientific community by Dawkins’s book. The “Selfish Gene” was a brilliant title for capturing the attention of the public. The “Cooperative Gene” likely would not have enjoyed the same level of success. Daniel Dennett recorded his first impression on reading The Selfish Gene at the same 2006 meeting: When I first read The Selfish Gene . I was struck by the very first paragraph, and by one of the chief sentences in it e

24

1. FROM MENDELIAN GENETICS TO 4D GENOMICS

We are survival machines, robot machines, blindly programmed to preserve the selfish molecules known as genes The author goes on to say, “This is a truth which still fills me with astonishment.” Thirty years on I think the question that can be raised is, are we still astonished by this remarkable inversion, this strange inversion of reasoning that we find in this claim?

Dennett has perhaps already found his answer in Dawkins’s statements regarding the book’s title. Interestingly, after 30 years of veritably worshiping the gene, there is more astonishment in the reality that the imagined superpower of the gene does not in fact exist. Discussion of the changes in prevailing ideas about genes is important. It signifies the growing trend in the gene-centric outlook on biology. The selfish gene concept fits well with the gene-centric view, whereas the cooperative gene concept implies that the function of an individual gene does not translate directly to phenotypes and calls for a new theory beyond genes. There is extensive discussion in literature regarding the relationship between competing selfish genes and cooperative systems, particularly when game theory is used to address the issue (Nowak, 2006). However, this subject needs to be reexamined from evolutionary and genome perspectives (see Chapters 2 and 4). Perhaps, it is now the time to write a new chapter on genomics that no longer relies on the power of individual genes. It was long thought that all DNA sequences that did not code for genes were junk DNAdthe ENCODE project had all but declared that the concept of junk DNA was no longer relevant (Pennisi, 2012). Since the initial idea of junk DNA as defined by the gene concept of coding proteins, the death of “junk DNA” idea also challenges the concept of the gene. For example, researchers have argued that the basic unit of heredity should be the transcriptethe piece of RNA decoded from DNAeand not the gene. However, even the transcript cannot be considered an independent informational unit from a genome point of view. As for the majority of traits in the real world, only the genome package (not its parts) can serve as the platform for emergent genetic information. Defining the function of genes and noncoding sequences using the genome “package” concept rather than as individual dissected parts is now needed, in particular within the context of micro and macroevolution (Chapters 6 and 7).

1.3.3 A Reality Check of the “Industry Gene” Concept The birth of the biotech industry was surrounded by the excitement of various gene manipulation technologies. The first biotechnology company, Genentech Inc. (Genetic Engineering Technology), was founded in 1976 (the same year The Selfish Gene was published) by a venture capitalist and pioneer in the field of recombinant DNA technology. Within a short

1.3 DIMINISHING POWER OF GENE-BASED GENOMICS

25

period, the biotech industry was booming. One incredible example is the success story of AMGen Inc (the company’s original name was Applied Molecular Genetics) in the ‘80s. AMGen cloned the erythropoietin gene and marketed its gene product as “Epogen” (EPO, a glycoprotein hormone that controls erythropoiesis with therapeutic uses in diseases such as anemia). It remains one of the most successful biotech products to date. Following the advice of bio-scientists (many of whom were company advisors), everything seemed to be going as planned. With the cloning of increasing numbers of disease genes, there were high hopes that such gene products would soon revolutionize medicine. Experts even introduced the “industry gene” concept and declared genes as an independent industry entity which could be defined, patented, owned, tracked, proven to be acceptably safe, have uniform functions and useful effects, then sold, and possibly even recalled. This approach reached its ultimate level in the form of the Human Genome Project that had hoped to catalog and develop a huge treasure trove for the biotech industry. The tens of thousands of products that would include large numbers of disease genes would ensure the success of many companies. Thus, the vigorous race and battle to patent genes was on. Unexpectedly for many, the Human Genome Project and the information derived from genomic research has been a total surprise to the biotech industry. An article published in The New York Times by Denise Caruso in 2007 outlined this surprise (Caruso, 2007). The $73.5 billion global biotech business may soon have to grapple with a discovery that calls into question the scientific principles on which it was founded.

By describing a recent unexpected scientific finding that the human genome might not function as a “tidy collection of independent genes,” this author emphasized the challenges to the current concept of how individual genes work. The current theory links each gene to a single function, such as a predisposition to specific diseases like diabetes or heart disease. This finding noted that the gene operates in a complex network with overlapping functions and complicated relationships, which questions the current theory. .these findings will challenge scientists “to rethink some long-held views about what genes are and what they do”.

In addition to patent and safety issues that are critical to any company (according to a 2005 report, over 4000 human genes had been patented in the United States alone), what does the future hold for the

26

1. FROM MENDELIAN GENETICS TO 4D GENOMICS

gene-based bio-industry if the core belief in genes is no longer valid (the concept that assumes genes operate independently has been the foundation for many biotech companies)? In fact, it is much easier for academia to face such a surprise. Many already had their doubts, as their results are rarely as clear-cut as those in high-profile publications, especially the award-winning experiments cited in text books. They often doubt themselves and consider that maybe the conditions they used were not optimal. Maybe their data would have been “better” if they had used even higher resolution methods. Maybe they were not lucky or competent enough to get the “perfect” data. In light of the complexity revealed by genomics, they may finally be relieved that they were not that unlucky after all because the perfect gene-based world of textbooks genetics and the concept of the gene have been wrong all along when used to describe the general rules (Wilkins, 2007; Heng, 2008b) (see Section 1.2.3). It is interesting to point out that the US Supreme Court unanimously ruled in 2013 that human genes are not patentable, as the act of isolating DNA sequences does not make it sufficiently different from native DNA to make it patent-eligible. In contrast, synthetic DNA, or cDNA, is patent eligible because it does not occur naturally. This decision changed the US Patent and Trademark Office’s policy on this issue and was met with criticism from the biotech industry. In a statement from the president of the Biotechnology Industry Organization, the ruling is “a troubling departure from decades of judicial and Patent and Trademark Office precedent supporting the patentability of DNA molecules that mimic naturally-occurring sequences. In addition, the Court’s decision could unnecessarily create business uncertainty for a broader range of biotechnology inventions” (Genome web, 2013). Ironically, in the future, the biotech industry will likely appreciate the US Supreme Court’s ruling, when the drastically decreased value of most individual genes is finally realized and accepted. There is yet another alarming phenomenon in the current basic biological research community: there is an increasing separation between reality and artificial experimentally generated knowledge, particularly when a high degree of cherry-picking is used to create the perfect story. In private, scientists acknowledge the limitation of their experiments and are often frustrated by the uncertainty that they generate. In public lectures, publications, and grant applications, they prefer to tell a rather simpler, more clear-cut, and convincing story by ignoring the uncertainties. The accumulation of cherry-picked stories has had profound negative effects on science’s progression, as the current dominant paradigm favors most of these biased results despite that they are not a reflection of reality. Science appears to be making daily progress using carefully selected model systems, but there seems to be no interest on what this knowledge

1.3 DIMINISHING POWER OF GENE-BASED GENOMICS

27

is based on or whether it reflects real natural systems. After all, this is basic research. In the world of basic research, curiosity trumps reality. The bio-industry has thus provided a key reality check to our genomic knowledge. Regardless of its textbook support, a company will fail if it cannot translate the popular gene concept into reality. Companies do not have the luxury of failure that the academic world has. For a professor to admit the failure of a project, he or she can simply say, “well, science is much more complicated than I thought” and move on to a new hypothesis. For companies, it is rather a different story. Their investments are costly and their success relies on the correct concept. Before finishing human genome sequencing, there was high hope that pharmaceutical companies could use cutting-edge genomic knowledge for drug discovery. On paper, it made sense that gene-based drug discoveries would be superior to traditional drugs and most of the academic community agreed. There was much recruiting of high-profile academics to head research branches at big pharmaceutical companies. This move, however, did not meet expectations, as despite the drastic increase in spending on genomic research, the pipeline of new drugs has actually significantly diminished. The following is a quote from a feature article in The Economist in 2004, titled “Fixing the drug pipeline,” which discusses the challenges the pharmaceutical industry is facing (with annual sales of about $400 billion, it is one of the biggest and most lucrative industries in the world). Drug design: The more pharmaceutical companies spend on research and development, the less they have to show for it. The “pipelines” of forthcoming drugs on which its future health depends have been drying up for some time . . the sequencing of the human genome was expected to revolutionize the process of drug discovery. It is undeniably a remarkable achievement, but looked at squarely, it represents a “parts list” of genes whose connection with disease is still obscure. . The flood of information has caused a kind of “paralysis by novelty” . The Economist, 2004

Obviously, the flood of genomic information did not help Big Pharma and is a huge disappointment in light of the promises made by the Human Genome Project. Once again, new currently popular concepts such as system biology, networks, proteomics, and the big data science offer the hope of delivering powerful drugs (Ideker et al., 2001). Their rationale is that if individual gene identification and characterization is not that useful, then there must be some other molecular targets. Unfortunately, without a new genome-based framework, disappointing results will continue to prevail. In fact, it is not just the pharmaceutical industry facing this challengedacademic communities have their own headaches. In recent

28

1. FROM MENDELIAN GENETICS TO 4D GENOMICS

years, the issue of low reproducibility in cancer research has shocked the research community. The effort of systematically repeating many highly cited important experiments has revealed the sad fact that the majority of experiments cannot be duplicated (Mullard, 2011; Begley and Ellis, 2012)! Because these examined papers were published in top journals by laboratories with high reputations, and their conclusions are highly influential, the message is especially chilling. It obviously raises doubt for the majority of bio-literature (if the best literature is not reliable, what about average ones?). Interestingly, it was the pharma/bio-companies rather than academic institutions that performed such experiments, as they wanted to make sure that the scientific discoveries were solid before applying them into drug development pipelines. Heated discussions followed. Some cited the possible cell line misuse, while others criticized the scientific misconduct and dishonesty. The hidden answer for majority of the cases, however, might be the conceptual framework of the genomics and the current methods that study them.

1.3.4 Gene-Based 1D Genomics Is Not Enough The frustration of knowing more about genes and other genetic details but understanding less about biology (with increasing uncertainty and reduced medical implications) not only applies to the biotech and Pharma industries, but increasing examples have struck at the very core of genomics research itself. One of the priorities of current human genomics is to identify defective genes related to diseases and to establish a comprehensive catalog of all human diseases. The logic seemed very straightforward and solid: by sequencing all normal and defective genes using large patient populations, we can hunt down disease genes by simply comparing the sequences. The reality is much more challenging. A score of genetic abnormalities have been detected, however, most of them contribute to diseases with low penetration in populations as many of them are not shared by the majority of patients. Even in patients that have the same genetic sequence profiles that influence a disease, only a portion of them will eventually develop the disease. At the same time, there also are high levels of diverse genetic variations among normal individuals, which make the job of identifying disease genes extremely difficult for most common afflictions. On the one hand, in most solid cancers, the sequencing project has revealed so many gene mutations that are associated with the same cancer type posing the challenge of sorting out which mutations are important. On the other hand, it has been very difficult to identify “meaningful and/or useful” genetic loci that contribute to other common diseases such as obesity, diabetes, and an

1.3 DIMINISHING POWER OF GENE-BASED GENOMICS

29

array of neuron genetic diseases following large-scale whole genome scanning. For example, the ballyhooed success of identifying over 10,000 different genetic variants associated with schizophrenia in fact represents the biggest failure yet of genetic determinism, as each of these identified genetic variations is relatively rare and responsible for only a tiny increase in disease risk, rendering them clinically useless (Wade, 2009). The difficulty in identifying genetic elements in obvious genetic diseases has been blamed on so-called “genetic dark matter,” as a genetic link clearly exists but is hard to detect (Manolio et al., 2009). There has been extensive debate regarding genetic dark matterd“ missing heritability” (Pennisi, 2010). The so-called missing heritability of common traits refers to a continuing mystery in human genetics. A multitude of large-scale genetic studies including genome-wide association studies have identified many loci that are associated with common human diseases and traits, but together these results can explain only a small proportion of the “heritability” of the traits. For most traits, the majority of the heritability remains unexplained (Eichler et al., 2010). Some believe that it has not been detected because we have not been looking hard enough. Others think that targets exist beyond the gene, such as epigenetics, copy number variation, noncoding RNA, and genetic interaction. Some even argue that by improving a mathematic model, these issues will eventually be solved. While there have been increasing reports in recent years that illustrate the involvement of copy number variation and methylated DNA correlated to some traits in defined systems, these contributions are clearly limited, and the so-called phantom heritability seems real (Zuk et al., 2012). Genome-level alterations have been suggested as a key component of the missing heritability (Heng, 2010) (see Chapters 4 and 7). The situation gets worse when faced with the additional challenges that come with applying hard-to-identify genetic markers in a clinical setting. For example, 101 of these genetic markers were found to be useless in predicting heart disease in a clinical setting. Application of these markers had no clinical value in forecasting diseases among 19,000 women who had been monitored for 12 years, despite the fact that all 101 identified genetic variants had been statistically linked to heart disease in various genome-scanning studies. In fact, the oldfashioned way of asking about the family history had better prediction success (Paynter et al., 2010). Questions have been raised regarding the new trend of sequencing everything made possible by increasing technical capabilities. According to Nature, “Human genome: Genomes by the thousand,” there were 2700 finished human genomes in October 2010, and by the end of 2011, there will be 30,000 sequenced human genomes. The 1000 Genome Project is an international collaboration to produce a comprehensive catalog of human genetic variation for medical research. The genomes of over 1000

30

1. FROM MENDELIAN GENETICS TO 4D GENOMICS

unidentified individuals from around the world will be sequenced using the next generation of sequencing technologies. That venture as well as the Cancer Genome Project, and the Personal Genome Project, contribute to this trend (Mardis, 2010). A few years ago, the Personal Genome Project captured the general public’s imagination and has stimulated new hopes. This approach is a way to jump-start the whole process of integrating human genomic data into clinical medicine that has failed to deliver in the first decade of the original sequencing project. The new logic is that if whole genome scanning has not worked in terms of finding the genetic dark matter, then we must sequence more samplesethousands of themethe answer must be there! Here researchers are again using the same reasoning and the same genetic determinism only now in different clothes. The fundamental flaw of such an approach lies in the fact that most common diseases are not caused by Mendelian factors! The more samples we analyze, the more diversity we will detect because of the very nature of bio-system heterogeneity. Let us not forget that the main argument of personal genome sequencing is to provide genetic profiles for common diseases. As soon as the initial data of personal genome was available (with a few celebrities in the genome field), these data were at odds with the expectation of the rationale to carry on the personal genome project, as they provided more questions than answers. Some of these questions may shake the very core of genetics. James Watson had 310 gene mutations in his genome that could affect his health, including DNA repair genes linked to cancer. At the age of 89 and currently without cancer, it illustrates the uncertainty of trying to predict diseases based on individual gene information. In addition to the fact that there are no defined recommendations on how to improve his health based on the sequencing data, there is also some “unwanted information.” For example, Watson requested that he not be told about the status of his own gene (ApoE) to reduce potential negative anxiety. This gene is associated with late-stage Alzheimer’s disease, which affected one of his grandmothers. If a highly informed genetic scientist like Watson would lose sleep over this type of information, imagine the impact of this information on a 20-year-old without genomics training. What about the impact on making life decisions regarding marriage, kids, medical treatment, lifestyle, and even types of jobs and financial planning? Not to mention more complicated social issues regarding privacy and discrimination. Again, even leading molecular geneticist James Watson expresses his bias based on his own understanding of genetics. Such gene determinism has also generated many controversies regarding social issues. Many try to brush over his controversial points, claiming that “smart people saying dumb things,” but such judgments are also actually made based on personal knowledge and scientific beliefs. Watson’s genetic determinist view that

1.3 DIMINISHING POWER OF GENE-BASED GENOMICS

31

made him speak out his version of the genetics that is highly controversial, in addition to his worry over a potential gene mutation. In fact, Watson’s support of the Human Genome Project began with his personal quest to find a treatment for his oldest son’s schizophrenia.) So how can people prevent potential discrimination generated among nonscientific populations based on DNA sequences? Clearly, the technology will be available to the general population soon, but should it be applied in this way? These are all critical questions. More important, just sequencing without really understanding all of the repercussions is not a wise course of action. This difficult situation is most apparent in the cancer genome sequencing project. Cancer genomic research has been a front runner of using whole genome sequencing to attempt to pinpoint genetic causes. Taking this approach was not solely because of the original prediction by Renato Dulbecco who stated that sequencing the human genome is the key to solving the puzzles of cancer. It was also because of generous funding available to the field and the public’s desire to win the war on cancer. In December 2005, the US National Institute of Health announced plans to sequence every genetic mutation involved in cancer. According to Francis Collins, then Director of the National Human Genome Research Institute, such an effort was a natural extension of the Human Genome Project. Based on the estimation of Anna Barker, who was the US National Cancer Institute (NCI) Deputy Director, there were 5e15 identified genetic changes for each type of cancer at that time, and there are probably 100 or more such changes involved in the formation, growth, and metastasis of each type of cancer. This unrealistic view has faced serious challenges. George Miklos, an Australian geneticist who anticipated the original Human Genome Project and is a widely recognized expert in genomics, stated in his article “The human cancer genome projecteone more misstep in the war on cancer,” that: No one doubts that primary tumors accumulate somatic mutations over time. However, the Achilles’ heel of cancer is not the mutational baggage train of the primary tumor, but the genomic imbalances and methylation changes of the deadly cohort of cells that metastasize in different genetic backgrounds. As a megaproject in advancing cancer research and ultimate cures, the human cancer genome project thus is fundamentally flawed. Miklos, 2005

A few more articles supported Miklos’ voice. In the article “Cancer genome sequencing: the challenges ahead,” the major problem of this project is identified as existing at the conceptual level with regard to genebased cancer research. A major challenge. is solving the high level of genetic and epigenetic heterogeneity of cancer. For the majority of solid tumors, evolution patterns are stochastic and the end products are unpredictable, in contrast to the relatively predictable

32

1. FROM MENDELIAN GENETICS TO 4D GENOMICS

stepwise patterns classically described in many hematological cancers . These features of cancer could significantly reduce the impact of the sequencing approach, as it is only when mutated genes are the main cause of cancer that directly sequencing them is justified. Many biological factors (genetic and epigenetic variations, metabolic processes) and environmental influences can increase the probability of cancer formation, depending on the given circumstances. The common link between these factors is the stochastic genome variations that provide the driving force behind the cancer evolutionary process within multiple levels of a biological system. This analysis suggests that cancer is a disease of probability and the most-challenging issue to the project, as well as the development of general strategies for fighting cancer, lie at the conceptual level. Heng, 2007a (with permission from John Wiley & Son)

Unfortunately, the big sequencing machine had already been set in motion, and the leadership at the NCI as well as the research community followed the rationale of sequencing them all and identifying the gene mutations patterns, as according to the gene mutation theory of cancer, gene mutations are the driving force of cancer. With whole genome sequencing methods, scientists can leave no stone unturned, and thus it is believed that the patterns must eventually emerge after sequencing thousands of samples. Eleven years later and the cancer genome sequencing project has generated enough data to be realistically evaluated. Despite a high level of excitement and news reports that have claimed that the cancer genome has been cracked, the results are fundamentally disappointing. In most cancers, there are many gene mutations, most of them not commonly shared among patients. It is a major challenge to distinguish the important ones. For certain types of cancer, there are some highly penetrant genes, such as the known p53 mutations, but without knowing how to evaluate other diverse gene mutations, the field has been pushed back to square one. A much more serious challenge is that genome alterations are a general rule and not an exception in most cancers and the meaning of the same gene mutation differs within different genome systems. The truth is we now have a big mess (see Chapter 8). The following was a “news and view” piece in Nature published in 2010, 10 years after finishing the Human Genome Project. . Bert Vogelstein, . has watched first-hand as complexity dashed one of the biggest hopes of the genome era: that knowing the sequence of healthy and diseased genomes would allow researchers to find the genetic glitches that cause disease, paving the way for new treatments. Cancer, like other common diseases, is much more complicated than researchers hoped. By sequencing the genomes of cancer cells, for example, researchers now know that an individual patient’s cancer has about 50 genetic mutations, but that they differ between individuals. Check Hayden 2010

Vogelstein is a well-known leading cancer researcher. His concept of the stepwise accumulation of gene mutations causing cancer has been referred to as the Vogelgram and has had significant impact on current

1.3 DIMINISHING POWER OF GENE-BASED GENOMICS

33

cancer theory. Echoing Vogelstein’s new view, Harold Varmus, the director of the NCI, was quoted by the New York Times in 2012 saying “Genomics is a way to do science, not medicine.” Remember, just a few years ago in 2005, Varmus, then director of the NIH, was quoted by The New York Times that he believed that the cancer genome project could “completely change how we view cancer.” His prediction was right but with a very different twist. Indeed, the gene view of cancer is no longer working as illustrated by high levels of diverse gene mutations and in contrast to the original hope of finding a handful of key gene mutations for each cancer type. For this reason, genomics is not applicable to medicine. It is interesting to point out that without a solid paradigm, a swing of opinions can frequently occur in opposite directions, even from the same scientists. For example, in an act that surprised many, Dr. Robert Weinberg from MIT recently published a notable piece that is critical of the current molecular reductionist approaches of cancer research (Weinberg, 2014). As he is a leading scientist behind the gene mutation theory of cancer, his candid and well-thought-out opinions regarding the fundamental limitations of current molecular research should receive much attention from the research community. Equally surprising, this is not the case at all. While there are many discussions about Weinberg’s perspective among scientists who challenge the gene mutation theory of cancer (Horne et al., 2015a-c; Heng, 2015; Liu et al., 2014), no serious discussion is taking place among the majority of researchers who have followed him for decades. Ironically, as pointed by his former trainee in private, Weinberg’s piece seems to have nothing to do with his current research. It is also noticeable that Vogelstein’s group has found renewed interest in how a few gene mutations can lead to cancer. It is thus understandable that it might take a while for the field of cancer research to change. An earlier reality check happened on the 10th year anniversary of the sequencing of the human genome (Check Hayden, 2010). In contrast to many promises, life is complicated. The New York Times 2010 piece states: “A decade later, genetic map yields few new cures,” which highlighted the needed debate (Wade, 2010). Responding to this piece and the increasing dissatisfaction by the public, genomic scientists reminded the general public that science needs time to deliver (they apparently forgot it was scientists themselves that made the promise of rapidly delivering results in the first place). Increasing numbers of scientists are now questioning the direction of current research despite many surprising discoveries in basic genomics. Craig Venter’s viewpoint is worth mentioning given his status within the human genome sequencing project. (He is most famous for his role in being one of the first to sequence the human genome using private funding.) In his 2010 interview with the International online edition of

34

1. FROM MENDELIAN GENETICS TO 4D GENOMICS

Germany’s newsmagazine DER SPIEGEL, the surprising title was “we have learned nothing from the genome” (Spiegel, 2010). Excerpts: SPIEGEL: So the significance of the genome isn’t so great after all? Venter: Not at all . We couldn’t even be certain from my genome what my eye color was. Isn’t that sad? . SPIEGEL: So the Human Genome Project has had very little medical benefits so far? Venter: Close to zero to put it precisely . . Because we have, in truth, learned nothing from the genome other than probabilities. How does a 1% or 3% increased risk for something translate into the clinic? It is useless information.

When a reporter from DER SPIEGEL stated that there are hundreds of hereditary diseases which can be linked to individual gene mutations, Venter just simply responded: There were false expectations. Wow, what an interview! Nevertheless, Venter offered some very candid views about the disappointment over the lack of payoff of the Human Genome Project. Most interesting is that he completely changed his opinion since the early 2000s, soon after the completion of the human genome sequencing, regarding the significance of genome sequencing. At the time of interview, not a large number of sequenced individual data were available, and many experts in genomics consider his viewpoint too extreme, as when more genomes were sequenced, the high value of sequencing would certainly be visible. Fast forward 9 years after his interview: individual genome sequencing has become a routine approach for many patients and normal individuals. It is about time to reevaluate the value of sequencing. As CEO of HLI, Human Longevity Inc., Venter has been pushing the deep sequencing for large numbers of individuals. By October 2016, this company has sequenced over 10,000 individuals. Following coverage of 30 to 40 , the whole genome sequencing effort has revealed 150 millon mutations. According to the company web page, “By combining the largest collection of genomic and phenotypic data, HLI is able to use machine learning and expert analysis to transform the data into meaningful and useful insights. This turns the information into new discoveries that can inform health decisions leading to new treatment options, personal health plans, and the potential for longer, healthier human lifespans.” HLI also established the Health Nucleus program to profile individual customers at a price tag of $25,000 per person.

1.3 DIMINISHING POWER OF GENE-BASED GENOMICS

35

The Health Nucleus platform uses whole genome sequence analysis, advanced clinical imaging and innovative machine learningecombined with a comprehensive curation of personal health historyeto deliver the most complete picture of individual health. http://www.humanlongevity.com/about/overview

All these activities from HLI only represent the tips of the iceberg of the global sequencing movement. The high level of academic and commercial interest on sequencing is overwhelming. Some cancer centers are now offering whole genome sequencing services to all willing patients, promising that their gene mutation profile will certainly help medical treatment decisions. Judged by how popular the sequencing approach is, one logical conclusion is that experts like Venter must have drastically increased his optimistic prediction on how the sequencing information can change medicine. Specifically, based on current genome sequencing technology, we now must understand much more about the relationship between DNA sequencing and diseases. Surprisingly, despite all these accumulated data, Venter still holds the same viewpoint as he did 7 years ago (without increased confidence that using DNA information can predict human diseases). Last year, when discussing such a question at an exciting talk in Beijing University’s medical college, he said that we currently only understand about 1% (of the relationship between gene and diseases), despite that the value of DNA sequencing should have become more and more obvious with the further development of computational capability. He is not alone. Early in 2011, the journal Science published a piece titled “deflating the genomic bubble,” borrowing the popular term used to describe the meltdown of our economy during the greatest recession in decades (Evans, et al. 2011). . Although it may be hard to overestimate the significance of that achievement, it is easy to misconstrue its meaning and promise. People argue about whether mapping the human genome was worth the investment. With global funding for genomics approaching $3 billion/year, some wonder what became of all the genomic medicine we were promised. . Recent methodological progress in genomics has been breathtaking. . But claims of near-term applications are too often unrealistic and ultimately counterproductive. From the South Sea and dot-com “bubbles” to the ongoing housing market crisis, the world has seen its share of inflated expectations and attendant dangers. Science is immune to neither.

Little by little, increasing doubt and contradiction is emerging to challenge genetic determinism. The direct implication that there is limited translation of current gene-based genetic information to medicine is indeed the reality. In a perspective piece published by Nature Review

36

1. FROM MENDELIAN GENETICS TO 4D GENOMICS

Genetics in 2010, titled “Viewpoint: Missing heritability and strategies for finding the underlying causes of complex disease”, Jason Moore, a leading expert wrote: . Such biomolecular interactions that depend on multiple genetic variations can substantially complicate the relationship between genotype and phenotype, making it impossible to explain phenotypic variation simply by adding together independent genetic effects. This hypothesis is completely consistent with the current results of GWA studies. . High-throughput technology alone will not solve this problem. The time is now to philosophically and analytically retool for a complex genetic architecture or we will continue to underdeliver on the promises of human genetics. Indeed, life, and thus genetics, is complicated and some will soon ask, as seismologists have, whether we are trying to predict the unpredictable. Eichler et al., 2010

Even some highly optimistic leaders in the field have modified their position or unintentionally pointed out some key paradoxes within their own reasoning. Eric Lander, a major player of the Human Genome Project, in his Nature review “Initial impact of the sequencing of the human genome” summarized the achievement of the genome project in the decade since its publication (Lander, 2011). In addition to his routine praising of accelerated biomedical research, there were some surprising analyses worthy of quoting: It is important to distinguish between two distinct goals. The primary goal of human genetics is to transform the treatment of common disease through an understanding of the underlying molecular pathways. Knowledge of these pathways can lead to therapies with broad utility, often applicable to patients regardless of their genotype . Some seek a secondary goal: to provide patients with personalized risk prediction. Although partial risk prediction will be feasible and medically useful in some cases, there are likely to be fundamental limits on precise prediction due to the complex architecture of common traits, including common variants of tiny effect, rare variants that cannot be fully enumerated and complex epistatic interactions, as well as many non-genetic factors.

Lander admits that the reality of complexity and uncertainly of genetic information will not likely deliver the expected personal risk prediction for most common diseases. This is a sharp contradiction of the goal of the personal genome project! If there is limited value of risk prediction, why raise expectations of sequencing personal genomes on which to base medical prediction? Unfortunately, because of the same complexity and uncertainty, the same analysis also applies to the primary goal of transforming the treatment of common diseases through an understanding of the underlying molecular pathways. Based on the cancer genome sequencing data, the answers are already known. Using Lander’s own words, “Knowledge of these pathways can lead to therapies with broad utility,

1.4 NEW GENOMIC SCIENCE ON THE HORIZON

37

often applicable to patients regardless of their genotype.” If this is true, why bother to search for genotypes? Clearly, there are some profound paradoxes amid the current popular genomic reasoning. A few years ago, the journal Science also published the editorial “Genomics is not enough,” which recommends that we move to other fields to develop clinically useful applications. . Translating current knowledge into medical practice is an important goal for the public who support medical research, and for the scientists and clinicians who articulate the critical research needs of our time. However, despite innumerable successful gene discoveries through genomics, a major impediment is our lack of knowledge of how these genes affect the fundamental biological mechanisms that are dysregulated in disease. If genomic medicine is to prosper, we need to turn our attention to this gaping hole. . The lessons from genome biology are quite clear. Genes and their products almost never act alone, but in networks with other genes and proteins and in context of the environment. . Chakravarti, 2011

It should be further argued that it is not simply that genomics is not enough, but rather that gene-based 1D genomics is not enough. We cannot merely move away from genomics just yet, as we have not yet searched for and established the correct framework of the new genomics field. Moreover, to understand evolution (both organismal and cellular), genomics, which represents one of three key elements for evolution, should not be ignored. It is clear that a genome-based genomic revolution will arise as soon as the proper framework is established.

1.4 NEW GENOMIC SCIENCE ON THE HORIZON In 2013, surprised by many, DNA pioneer James Watson stated that he has changed his position on the cancer genome sequencing project. “While I initially supported the Cancer Genome Atlas project getting big money, I no longer do so. Further 100 million dollar annual injections so spent are not likely to produce the truly breakthrough drugs that we now so desperately need” (Watson, 2013). Knowing Watson’s passion and initial key role in the human genome sequencing project (he was the first director of the US Human Genome Project), this change is not a trivial one. He clearly realizes that the current cancer genome sequencing approach is ineffective. However, he then proposed a “new” strategy of screening and targeting cancer genes based on various pathways. Unfortunately, while he has criticized the continuation of large-scale DNA sequencing, the new favored approach was based on the same gene-centric framework, which will likely be just as ineffective. In fact, the rationale of sequencing the cancer genome is to provide a list of genes for molecular targeting which Watson wishes to achieve.

38

1. FROM MENDELIAN GENETICS TO 4D GENOMICS

This case forcefully illustrates the ultimate importance of rethinking the genetic and genomic concepts based on the current cancer theory. There is no use to simply modify our strategies without a solid conceptual basis. Fundamentally, the current limitations of disease research in fact root on genomic and evolution theories.

1.4.1 Time to Rethink Genetics and Genomics Increased genomic data have suggested a common message: a large amount of new genomic facts do not fit the traditional concept of genetics/genomics. Surprisingly, however, the majority of publications have so far avoided spelling out this obvious conclusion. Various fractions of the research community have discussed this issue, some of which have called for new genetic concepts because new facts demand new conceptual frameworks. For example, following decades-long genomics studies coupled with the popularity of systems biology, it seems like the field of genetics has gradually accepted the general viewpoint that there are major limitations in the gene concept, specifically concerning how to define the gene and how to understand the relationship between individual genes and their phenotypes. In his recent book, Making sense of genes, Kostas Kampourakis stated that: More recent research has shown that it is impossible to structurally individuate genes, and that the best way we can do is to identify them on the basis of their functional products. In many cases, single genes cannot explain the variation observed for simple monogenic ones. . genes “operate” in the context of developmental process only. This means that genes are implicated in development of characters but do not determine them. Kampourakis, 2017

With that being said, most researchers only admit the limitation of using a single gene to explain a complex phenotype. They insist that the quantitative determinist power of genes is factual information, by which the polygenic model will ultimately reveal the genotypeephenotype relationship for complex phenotypes when enough samples are analyzed (of course all the while believing the one-to-one correlation in single gene defined characteristics). That is the main reason why GWAS have become very popular in recent years: it was firmly believed by the research community that common diseases are caused by common genetic alterations. Based on the nature of genomic and environmental heterogeneity, we have disagreed with this idea for years (Heng et al., 2006a; Heng, 2007a). Seven years ago, the limitations of GWAS were openly discussed

1.4 NEW GENOMIC SCIENCE ON THE HORIZON

39

by a leading journal Cell. Based on the high levels of genetic heterogeneity in human diseases, McClellan and King analyzed the genetic basis of why the strategy of identifying a fixed set of common shared genetic loci is challenging, supporting the idea that common diseases caused by common loci are incorrect for many common and complex diseases. They thus favored next-generation sequencing technologies over GWAS to find rare disease-causing mutations and the genes that harbor them (McClellan and King, 2010). As expected, this piece has generated heated debate. The GWAS community has firmly claimed their success and promised to improve their strategies to deliver. In 2017, the journal Cell again published a noticeable piece entitled “An expanded view of complex traits: from polygenic to omnigenic.” In this perspective, Pritchard’s group has reanalyzed the recent GWAS studies on human height with over 205,000 individuals. The original report published in 2014 has identified 700 variants that affect human height, which collectively explain just 16% of the variation of heights in people of European ancestry. Compared to the general estimation that about 80% of all human height variation should be explained by genetic factors, the missing fraction seemed too big given the large number of individuals investigated, which triggered their curiosity and lead to their reanalysis on human height. This reanalysis has led to the realization that more than 100,000 variants affect human height, although most of these variants only contribute one-seventh of a millimeter. Because of their tiny contribution, these variants are often considered statistic noise and are ignored. Furthermore, these variants are distributed across the entire genome, making them less useful for pattern identification (such as GWAS studies) (Boyle et al., 2017). The authors further compared large genetic studies of rheumatoid arthritis, schizophrenia, and Crohn’s disease (the successful stories). Although some of the identified variants fit the mechanisms of the disease in question, the majority of genetic variants are not related to a given disease based on current knowledge. Some are detected in many different diseases (and many are linked to normal and basic function as well). They have concluded that . there is an extremely large number of causal variants with tiny effect sizes on height and, moreover, that these are spread very widely across the genome,. implying that a substantial fraction of all genes contribute to variation in disease risk. These observations seem inconsistent with the expectation that complex trait variants are primarily in specific biologically relevant genes and pathways

After a series of data analyses, they realized that even though a central goal of genetics is to understand the links between genetic variation and

40

1. FROM MENDELIAN GENETICS TO 4D GENOMICS

disease, and it is expected that disease-causing variants are clustered into key pathways that drive disease etiology, such common variants are hard to identify. It is especially known that for complex traits, association signals can spread across the entire genome, including near many genes which have no obvious connection to a given disease or phenotype (such as that most 100-kb windows can contribute to variance in human height). . We propose that gene regulatory networks are sufficiently interconnected such that all genes expressed in disease-relevant cells are liable to affect the functions of core disease-related genes and that most heritability can be explained by effects on genes outside core pathways. We refer to this hypothesis as an “omnigenic” model (Boyle et al., 2017)

Clearly, a polygenic model is no longer sufficient. In other words, the quantitative relationship among a group of common genes and given diseases is hard to establish. While focusing on a gene network seems better than on sets of individual genes or genetic loci, this hypothesis is rather vague at its current stage. For example, it does not explain why there exist many nonedisease-related genetic loci, many of them overlapping with different diseases as well as with healthy individuals, and how somatic evolution impacts the geneeenvironment interaction, which often reduces predictability based on genetic potential, as coded by the germline. In particular, it is certainly challenging to apply this concept in clinical practice. Moreover, if (almost) every gene affects (almost) everything, how does genetics actually work? According to an interview with one of the authors, it is evident that they clearly understand the challenge. “It is a really hard problem.” “Historically, even understanding the role of one gene in one disease has been considered a major success. Now we have to somehow understand how combinations of seemingly hundreds or thousands of genes work together in very complicated ways. It’s beyond our current ability.” (Youg, 2017)

Nevertheless, the authors think that nature is telling us something profound about how our cells and genes work. As is such, they are trying to shift the focus from common genes to complex gene networks. Others are willing to move further by considering factors other than genes. One main faction favors the world view of epigenetics (see Mu¨ller, 2007; Jablonka, 2012; 2013; Noble, 2013; Omholt, 2013; Strohman, 1997). This viewpoint has become increasingly popular following the failure to identify common gene mutations from many diseases, including cancer. Our group, in contrast, pushes for the genome theory where “chromosome coding or karyotype coding” defines the multiple levels of genetic/epigenetic and environmental interaction of cells, with the phenotypes representing emergent properties of such interactions within the

1.4 NEW GENOMIC SCIENCE ON THE HORIZON

41

context of somatic cell evolution (Heng et al., 2006aec; Heng, 2007a, 2009, 2015). Together, the critical voice toward the current concepts is getting louder. For example, in their 2014 article “Chasing Mendel: five questions for personalized medicine,” Joyner and Prendergast state: We close this essay by postulating that there has been an pervasive influence of the gene centrism inherent in the Modern Synthesis in conjunction with the Central Dogma of Molecular Biology on biomedical thinking. We believe this influence has now become counterproductive. Thus, it is critical for new ideas stemming from evolutionary biology highlighted in this special issue of The Journal of Physiology and elsewhere to more fully inform biomedical thinking about the complex relationship between DNA and phenotype. The time has come to stop chasing Mendel. Joyner and Prendergast, 2014

Indeed, now is the time to search for a new genomic concept, the effective way to stop chasing Mendel. This new concept needs to realize the difference between gene and genome, as well as integrate environmental interaction within evolutionary processes. Specifically, individual genes and environmental factors should function as lower level agents, while the phenotypes should represent emergent properties of the genome as a whole. More and more genomic researchers explain away the challenges that current genomic research is facing by saying that “Mendelian simplicity belied true complexity.” But we should realize that this is not the complete reason: Mendel ignored the true nature of fuzzy inheritance when he incorrectly classified his data and proposed his theory in the first place.

1.4.2 Crisis Created New Opportunities for Future Genetics/ Genomics According to Thomas Kuhn, science is not a stepwise and cumulative acquisition of knowledge. The pattern of scientific progress can be described as a series of peaceful intervals of normal science punctuated by intellectually violent revolutions. While the main goal of the normal science phase is to bring the accepted theory and fact into closer agreement, the main goal of the revolution phase is to shatter the no-longer sufficient framework and establish a new paradigm. It is thus necessary to ask whether or not the paradigm shift is currently underway in genetics/genomics. One of the key preconditions and signals of a paradigm shift is the transition from a routine progression stage to a crisis stage of a given scientific field. In a real crisis stage, the dominating paradigm is losing its capability to explain fundamental facts (most of which are newly discovered), despite that there are many superficial technical

42

1. FROM MENDELIAN GENETICS TO 4D GENOMICS

achievements being made and a large amount of data being collected. In other words, the more data that are collected, the more confusion there is and the less we can comprehend it, as these new discoveries contradict the expectations of the current paradigm. Such increased anomalies are highly unfit between the existing theory and reality. These crises can be resolved by the following ways: (1) normal science advances and finally solves the crisis-provoking problems through new technologies or new realizations, which brings the field back to “normal”; (2) the crisis continues, as no key solution can be found. The challenging problems are often passed to future generations; and (3) a competing paradigm emerges, battling the old paradigm for acceptance. Only the success of the new paradigm represents a paradigm shift. Such a process is very rare and can sometimes last a long period of time. “Successive transition from one paradigm to another via revolution is the usual developmental pattern of mature science.” (Kuhn, 1962). Being familiar with these definitions and characters is essential to judge our current status in scientific progression. Equally important, however, we must not confuse and even abuse various key terms such as “paradigm shift,” “crisis,” and “revolution” in a scientific context. Unfortunately, “paradigm shift” has become an overused term in routine communications among scientists, reflected by various research papers and seminar titles. Most scientific discoveries or technical improvements are being referred to as “paradigm shifts” when clearly they only belong to the normal science progression and are not actual paradigm shifts. In fact, in Kuhn’s viewpoint, a paradigm is not just the basic theory of the field, but the entire worldview in which it exists, and that such worldview also defines individuals with a specific landscape of their knowledge/rationale/way of thinking. So to have a true paradigm shift, you must change a framework of a field as well as its relayed worldview, not just make a simple discovery based on the already existing paradigm. Similarly, one should not confuse crisis with “technical difficulties” in the normal science phase, which are commonly faced by most scientists in routine work. Although ample evidence strongly suggests that genetics is now in the crisis stage (new data actually change the very foundation of its theory), the majority of researchers are clearly not aware of this situation and will likely continue to practice science as usual because the paradigm they believe is preventing them from realizing its own fundamental limitations. For example, one of the most popular viewpoints is that the recently discovered genomic heterogeneity in most common and complex diseases can be resolved by accumulating more data and eliminating more “noise” when the results state otherwise: the more samples used, the more heterogeneities are detected. The issue here is not about the sample size, but about the need for a better framework which can be used to correctly explain the data and understand the

1.4 NEW GENOMIC SCIENCE ON THE HORIZON

43

mechanism of bio-heterogeneity. Only by changing the current way of thinking will we accept that (1) heterogeneity is the key functional feature of bio-systems rather than useless “noise” and (2) the majority of common and complex genetic traits often do not share common genetic loci, meaning sequencing more and more samples to identify a clinically useful pattern is obviously flawed. When rational decisions cannot be made in a given field, increased efforts that are driven by emotion and political interests rather than scientific logic will not only waste valuable resources but also worsen the crisis. Nevertheless, no matter which direction genetics/genomics will move toward, the crisis status will certainly offer great opportunities for those who open their minds and actively search for new paradigms. Even researchers who still firmly believe in the current paradigm might be able to advance by recognizing the theoretical limitations of their scientific practice and perhaps even start to wonder if they too should change their views in the face of constant surprises and confusion. The following opportunities are on the horizon: a. Potential new paradigms When studying the history of bioscience, specifically when analyzing famous milestone experiments and the individual scientists behind them, most genetics students share mixed feelings: on one hand, the century-long accumulated experimental data and theories seem very impressive, which inspires them. On the other hand, the great success of previous scientists overwhelms and discourages newcomers, as if they were born too late and have missed the only one opportunity of becoming genetic pioneers. Facing these overwhelming achievements in genetics, what is the pathway newcomers should take? And how should they make contributions with equal historical importance? A motivating fact is that there is currently a unique opportunity to rethink and redefine genetics/genomics and evolution, an essential step to search for and establish the new paradigm. Knowing the long history of genetic and evolutionary studies, readers should not take this historical opportunity lightly, as it represents a once in a lifetime chance for all scientists to make their breakthrough. Most scientists are not lucky enough to witness and anticipate a paradigm shift, as the normal phase of science is often much longer than the crisis stage. Within the crisis stage, even ordinary researchers will suddenly have the same opportunities as the most brilliant researchers to be extraordinary by contributing to the new paradigm, which will certainly rewrite the history of bioscience. For example, there are some crucial steps that lie ahead, which also represent exciting opportunities to advance the field of genomics.

44

1. FROM MENDELIAN GENETICS TO 4D GENOMICS

First, it must be demonstrated that the current gene-based genetic paradigm is fundamentally limited and only continuing to push for more data accumulation will not provide satisfactory outcomes. This process is the hardest step toward establishing a new paradigm. It requires us to be honest with ourselves, as well as the following: clarity to see through the massive conflicting data, encouragement to challenge the status quo, personal sacrifice, support for new ideas and opportunities, and finally passion as well as patience to seek the truth. Second, the new genomic paradigm needs to change from ultrareductionism and move to holistic genomics. The realization that decoding genes and decoding the genome is fundamentally different will certainly help such a transition (Chapter 2 and 4). It is essential that researchers acknowledge that the various levels of genetic organization follow distinct biological laws and the knowledge gaps that exist between these levels and laws cannot simply be bridged by data accumulation. For example, understanding how the genome works is very different from simply accumulating gene data. In particular, there is an urgent need to finish the transition from studies of 1D genes to the analysis of the 3D genome and then 4D genomics that includes time within the evolutionary concept to achieve the new paradigm. System inheritance is ensured by 4D genomics which is fundamentally different from inheritance of the parts as reflected by the function of genes (Chapter 7). Third, the call to search for the new genome-based paradigm should not be simply viewed as antigene. It is meant to provide the evolutionary context of genomic research including geneegenome integration. It is obvious that gene research has generated a great deal of knowledge of the parts of the system, including the mechanistic functions of cells and organisms at the molecular level. However, when so many genes and environmental factors can be linked to the same cellular functions, the molecular understanding of each mechanism becomes limited. Paradoxically, the success of gene-based research also challenges the rationale of understanding the genome through the accumulation of knowledge at the gene level, as emergent properties have little to do with individual parts in the face of high levels of heterogeneity and complexity. This situation has been increasingly appreciated by genomic researchers when considering the genetic basis of human diseases. For example, in cancer research, a great deal is known regarding the individual molecular mechanisms of many cancer gene mutations. However, clinical predictions based on these mutations are extremely unreliable (Chapter 3 and 4). Clearly, a new paradigm that provides clinical relevance is urgently needed.

1.4 NEW GENOMIC SCIENCE ON THE HORIZON

45

b. New scientific expectations Traditional genetic analyses have favored the strategies that search for patterns or bio-certainty. Examples can be traced back to Mendel as discussed in an earlier section of this chapter. Assuming that the genetic factor functions as an independent unit provides the basis for tracing the pattern of how bio-systems pass genetic information between generations (such as 3:1 segregation). This search for a molecular pattern in the name of understanding a mechanism has become the most popular strategy following the arrival of molecular genetics. Because of the availability of various in vitro and in vivo model systems, as well as arrays of biotechnologies and molecular agents, many beautiful experimental systems can be designed and used to search for molecular patterns. Coupling with biostatistics methods, different patterns can be identified by disregarding bio-heterogeneity or “noise.” The success of bio-research using well-designed experimental systems has often been considered the art of bioscience, which has led to many milestone experiments and brought fame to many scientists. In the field of evolutionary research, in contrast, it is much more challenging to design clear-cut linear models, resulting in much less certainty. Remember the quote from Theodosius Dobzhansky: “Nothing in biology makes sense except in the light evolution”? While this statement points out the ultimate importance of evolution in biology, it also portrays the frustration of searching for bio-uncertainty. Scientists cannot just confidently make predictions in biology based on “laws.” That is the reason why it is so hard to achieve certainty in biology (to make sense of many things according to the rules). Unfortunately, few have realized this hidden message. For most bioscientists, uncertainty is a weak and even bad word. With determination and data accumulation, they say, science (with certainty) will prevail. It is thus necessary to mention Karl Popper’s viewpoint on this issue. He has sharply distinguished truth from certainty. He believes that the search for truth is not the search for certainty. According to Popper, “All human knowledge is fallible and therefore uncertain.” (Popper, 1996). Surely, large-scale genomics and various -omics have revealed this overwhelming uncertainty. The continuation of “sequencing everything” with large numbers of heterogeneous samples will only increase the uncertainty. Bio-researchers must realize the limitations of certainty and appreciate uncertainty as reality. To do so, researchers need to reconsider their treatment of genetic heterogeneity as noise, which calls on the establishment of new technical platforms. Similarly, the expectation of simplified model

46

1. FROM MENDELIAN GENETICS TO 4D GENOMICS

systems needs to be drastically modified, as any model will reveal only limited information and drastic simplification will reduce the value of their applications or clinical relevance. It often takes time to apply theories to practical matters. However, the future challenge in genomics goes beyond this tradition, as the gap between the parts (genes) and the system (genome) will most likely not be filled by an accumulation of more knowledge of the parts but will require new principles/knowledge of the genome. The key difference between genes and genomes and the way each is studied is intimately involved in this discrepancy (Chapters 2e5); yes, the knowledge of individual genes can be obtained by analyzing many defined experimental systems, but the task of assembling a functional genome represents different types of research. Yes, many gene mutations can be linked to cancer using various in vitro and animal models and these genes may be detectable in a portion of tumor samples, but these genes are not commonly detectable in patients and possess minimal prediction value in the clinic for most cancer types (Chapter 4); yes, a large number of pathways have been characterized as potential targets for cancer therapy, but as soon as effective drugs are used, pathway switching reduces the effectiveness of these drugs that actually creates a moving target; yes, in a defined “pure” experimental system, a specific gene function can be understood within the fixed context of that system, but increasing research in different systems (such as a heterogeneous cell population with an altered genome) indicates that the function of the same gene becomes less certain, and the dynamic interaction among all genes alters its function (the function of a given gene is genome context determined); yes, the importance of key development genes can be illustrated beautifully during the developmental process, but it is challenging to apply how genes work in disease conditions where genome alterationemediated stochasticity dominates (Chapters 7, 8); yes, one can study the involvement of individual genes in various evolutionary stages, as most related species have similar key genes, but studies of these genes often do not reveal the mechanism of macroevolution, as the major force of evolution is the reorganization of the genome using similar gene sets rather than mainly by an accumulation of novel genes (Chapter 6). c. New approaches New approaches should meet the new expectations and play an important role in illustrating, testing, and even falsifying a new paradigm. For example, rather than pushing the development of methods to collect more data at the single cell level (yet another current hot topic), new approaches should focus more on the

1.4 NEW GENOMIC SCIENCE ON THE HORIZON

47

integration of single cell profiles within the population dynamics, and in particular, how different single cells display heterogeneitymediated emergent relationships with phenotypes under normal physiological and pathological conditions. The new approaches need to cover the following aspects: 1. We need a truly holistic and system-orientated approach. While further molecular characterization of agents (genes, regulation elements, pathways, and different cellular parts) will continue, more effort will focus on system behavior, and in particular, the mechanism and dynamics of the emergent properties of the system. For system biology, only focusing on the characterization of drastically increased numbers of parts and their distribution pattern is not the true system approach. In contrast, studying the mechanism of how “the topological relationship among genes” (the physical genetic interaction platform) defines the dynamics of network interaction and system behavior at higher levels (above agents) might be a key (Heng, 2013c; Heng et al., 2019). 2. Most platforms need to integrate the two key components of biological processes: genomics and evolution. Understanding why different cellular parts function in such complex ways requires an evolutionary understanding beyond molecular interactions. In this sense, evolutionary mechanisms can unify diverse molecular mechanisms. In disease studies, one promising approach is to watch evolution in action. When there are too many moving parts involved, a better way is to analyze the final phenotype in the context of evolution. 3. We need to quantitatively compare the contribution of each type of genetic/epigenetic alteration in a given evolutionary process of a particular trait. Currently, different scientific groups often push the importance of their favorite types of variants, and there is a lack of studies illustrating which level is most important for which problem. Quantitative estimation of the contribution of various levels is of importance. This effort can be achieved with a systematic comparison of the gene, chromosome, and epigene in the context of contributing to a given phenotype using the same experimental system. With such knowledge, further integration of the correct levels of genetic/epigenetic organization become possible. For example, in the punctuated macrocellular phase, genome-level reorganization dominates (much more significant than gene mutations), whereas in the stepwise phase, gene mutation and epigenetic disfunction play an obvious role.

48

1. FROM MENDELIAN GENETICS TO 4D GENOMICS

d. New methods Comparing the different levels of genetic/epigenetic contributions will lead to the adjustment of our rationale to search for molecular methods with the highest resolution. This will allow us to realize that many methods that focus on the cellular level (with relatively lower molecular resolution) may be more proper for somatic evolutionary studies where the select unit is a cell rather than a specific gene or molecular pathway. As we previously illustrated, pushing for higher molecular resolutions will often lower the biological significance (chapter; debating Cancer). For example, monitoring karyotype changes will likely provide more predictable clinical information than DNA sequencing (Chapter 4). A similar argument can be made for studying the relationship between diseases and stress response pathways: despite the complex interactions among various stress response pathways, the most important outcome is the phenotype at the cellular level rather than at the specific pathway (as there are so many pathways that can contribute to a similar cellular phenotype). The evolutionary mechanism of cancer is the main rationale we have been pushing to unify the highly diverse molecular pathways (Heng et al., 2011b; Ye et al., 2009; Horne et al., 2015a-c; Heng, 2015, 2017a). Accordingly, more methods are needed to monitor the levels above gene or molecular pathways, such as at the cellular, tissue, and individual patient levels. Karyotype analysis and its 3D representation/emergence represent a new frontier, for instance. e. Big data, new challenges Because of its ultimate importance in current genomics, the issue of big data generation and analysis needs to be separated from the above “new methods” category. Departing from the traditional genetic era, recent large-scale genomic data require sophisticated computational tools to handle the overwhelming amount and types of data. This challenges the ability of researchers to present their data in the most meaningful way. For example, what type and which portion of the large amount of data is best to report when this diverse information often conflicts with each other and does not make sense according to experimental assumptions. Accordingly, various bioinformatics platforms and mathematical models have been introduced. Now, there are high expectations for big data, as many excited biologists believe that big data will finally deliver, especially to overcome the issue of high bio-heterogeneity and increasingly observed bio-nonspecificity. Despite the advances in today’s machine learning and the application of artificial intelligence into the bio-data analyses, effective computational and bioinformatics platforms cannot replace

1.4 NEW GENOMIC SCIENCE ON THE HORIZON

49

the theoretical framework or change the biological facts. Its success or failure entirely depends on correct biological concepts and data collection. Biologists should not simply depend on bioinformatics to reveal biological truths, as the validity of computational models does not equate to the reality of biological systems. A few key issues need the close attention of bioinformatics researchers. 1. For many models, the key assumptions and the projected goal do not have a solid biological basis. For example, when profiling a large number of cells, it is often assumed that the same types of cells share the same dominant source of variation. Similarly, when separating different clusters, it is assumed that each cluster is defined by a different small set of genes. Now, increased biological understanding has challenged these assumptions. 2. More generally, most models aim to filter out bio-noise or heterogeneity, for which various statistical tools have been developed. Unfortunately, these efforts represent the wrong approach, as heterogeneity is the key feature for biological systems, particularly for the evolution of disease conditions. 3. Accordingly, simply increasing the sample size will neither solve the challenges of pattern recognition nor will it benefit individual patients. More heterogeneous data will not solve the issue of heterogeneity. In fact, it will worsen it because the number of variants in the data set will increase as well. If variants are continuously being added into the data base, this prevents a magical breakthrough for bioinformatics to solve their current crisis. The cancer genome sequencing represents such an example. 4. There is a gap between the understanding of a population’s profile and the prediction of an individual. For example, listing most of the cancer gene mutations detected from a large number of breast cancer patients is one thing, but predicting the likelihood of a given individual acquiring breast cancer based on a few gene mutations is another thing. 5. “Chance” also contributes to biological processes. This is even more difficult to predict. Because of the large number of elements involved, it is extremely challenging to predict the perfect storm. 6. Finally, it is necessary to remind bioinformatics researchers of the challenge of solving the three-body problem in science. Actually, most bio-systems display much more complexity than the classical three-body problem. Clearly, many bio-problems cannot be reduced into two-body problems. The complex interactions among large numbers of heterogeneous agents, and the

50

1. FROM MENDELIAN GENETICS TO 4D GENOMICS

emergence of multiple system levels, will likely lead to nearly unlimited evolutionary potential. Altogether, we need to acknowledge the limitations of the large data movement and put more effort into searching for a better biological framework. Bioinformatics will certainly speed up biological discoveries, as long as the data collection is on the correct level, and the data presentation is based on solid bio-facts. Furthermore, genomics needs to integrate with other fields rather than simply relying on physics and mathematics. It is crucial to realize that because of the high level of heterogeneity and evolutionary processes, biological systems are as distinct from nonbiological systems as are the laws used to describe them. Although there are some successful examples of using mathematical models to predict evolution in laboratory settings, it is hard to apply these models to natural conditions. The search for a new framework should include the integration of various theories of evolution and complex systems. These important ideas include but are not limited to complex adaptive systems, collapsing chaos, network theory, selforganization, ordered heterogeneity, and various evolutionary concepts that focus on the dynamics and patterns of microevolution and macroevolution. f. New materials or model systems There has been over a century of genetic research using a dozen model systems (from drosophila to yeast to mice) for the characterization of genetic parts (mostly used to understand genes and their functions). Now, at a new stage of data synthesis, it has become obvious that most of these simple-to-use systems differ from the reality of human systems, meaning the knowledge obtained from these highly simplified model systems are difficult to apply to human systems, which contain a high degree of genomic and environmental heterogeneity (Heng, 2015). It is necessary to try to overcome such challenges and directly analyze human systems to the best of our ability, despite the difficulty that is involved. Yes, it is much easier to obtain results from model systems with high reproducibility, but it will likely be more difficult to translate them into clinical meaning for real individuals. How to choose the most appropriate cellular system to analyze has also become important. Traditionally, to get consistent data, researchers prefer using homogenous cellular populations (following cell cloning). Now, knowing that outliers are the key for cancer evolution, the heterogeneous population instead becomes the key material for us to understand the mechanism of cancer evolution.

1.4 NEW GENOMIC SCIENCE ON THE HORIZON

51

1.4.3 4D Genomics: the New Paradigm By now, most readers should clearly see the increased conflicts in the current genomic field. On one hand, the mainstream research community declares that the victory of genomic medicine is around the corner: with impressive big data sets, precision medicine will finally be achieved by profiling genomic landscapes with the resolution of single pair DNA molecules across the entire genome, coupled with the power of gene editing technology (such as CRISPR). Hopes are high: researchers plan to change the eating habits of insects by altering their DNA coding, cure most human diseases by DNA design/manipulation, and extend the human lifespan beyond our imagination. On the other hand, many fundamental limitations of genetics and genomics discussed in this chapter are equally obvious and hard to ignore, which undeniably challenge the very basis of the genetic/genomic concepts we know. This is evident not only because many candid comments came from highly respected scientists but also because the cases analyzed are easy to understand and agree with, especially through the lens of scientific revolution. Such increased conflicts (more will be discussed in the coming chapters) precisely reflect the collision between different “worldviews,” as suggested by Thomas Kuhn. Such a status in science requires a new paradigm shift. The genome theory represents an important “worldview” of future genomics. To articulate the genome theory, a genome evolutionebased concept of inheritance and its implications to biomedical science, the term “4D genomics” was proposed to distinguish from the traditional “2D genetics” on which the gene theory is based. By combining 3D genome complexity with time, 4D genomics serves as the biological platform for passing genetic information and provides a selective landscape for evolution including somatic cell evolution that drives disease progression. The following quote illustrates the rationale to introduce 4D genomics: Genes and genomes represent different levels of genetic organization with distinct genetic coding systems. According to the traditional gene theory, the DNA sequence codes for all the genetic information necessary for the life of an organism. Information transfers from DNA to RNA to proteins, and this exchange lies in the foundation of modern biology. However, under the genome theory, the information regarding assembly of parts is most likely not stored within the individual gene or genetic locus. DNA only encodes for the parts and some tools of the system (RNAs, proteins, regulatory elements). The complete interactive genetic network is coded by genome topology-mediated self-organization. Under genome theory, the genome is not merely the entire DNA sequence or the vehicle of all genes. Rather, the genome context or landscape (the genomic topologic relationship among genes and other sequences within three-dimensional nuclei) defines the genetic system and ensures system inheritance. Altered genomes yield altered genetic networks, and understanding the

52

1. FROM MENDELIAN GENETICS TO 4D GENOMICS

pattern of genome dynamics provides key information to how the entire genomic system works. The concept of 4D-Genomics was formed based on the genomic reality that genetic information is preserved by the three-dimensional topology of the genome through time. This new concept calls for a departure from the less informative 1D gene-defined traditional genetics and recognizes that the genomic topology represents the framework for ‘system inheritance,’ which is distinctly different from ‘part inheritance’ (e.g., genetic information encoded in individual genes) Horne et al., 2013a (with permission from Taylor & Francis)

By increasing the 1D gene to a 4D genome, the framework is changed not by a quantitative increase of individual gene numbers but by the introduction of an entirely new type of system inheritance with fuzziness and by integrating how evolution works based on heterogeneous genotypes and their environmental interactions. This new adaptive paradigm requires many transformations, from attitudes and expectations to specific research strategies. Specifically, research priorities need to be drastically adjusted from currently focusing on individual genes to analyzing genome dynamics, from mainly characterizing lower level agents to monitoring the higher level of emergent properties (system behavior), from primarily tracing certain molecular pathways in isolation to profiling the evolutionary reality which includes almost an unlimited combination of pathways, and finally, from appreciating only the molecular certainty to embracing the real world with its inevitable mixture of certainty and uncertainty.

C H A P T E R

2

Genes and Genomes Represent Different Biological Entities 2.1 SUMMARY It is the emergent genome rather than isolated genes that define a biosystem. The gene-centric view, by ignoring the importance of the genome system, has been limited due to its unrealistic simplicity. To illustrate the distinctive features between genes and the genome, the definition of genome and a number of gene-centricerelated concepts, and their limitations, are briefly reviewed, including the “selfish gene,” “minimal gene sets,” and “speciation genes.” Furthermore, experimental examples are discussed to highlight the conflicting relationship between genes and the genome. These analyses illustrate that gene functions are not only constrained by the genome, but, perhaps more importantly, that the characterization of genetic parts will not lead to the understanding of the emergent genomic properties of a system. The genome-level operation is not simply a matter of “adding up” the functions of individual genes. Thus, examples of genome contexts determining or influencing gene function are presented. Alongside Chapter 1, this chapter establishes the rationale of searching for a new framework of genome-based genomic theories.

2.2 THE DEFINITION OF THE GENOME A few decades after they were first observed by German botanist Wilhelm Hofmeister in 1848, chromosomes were suggested to be the carriers of inheritance. The chromosome theory of inheritance, introduced independently by Walter Sutton and Theodor Boveri in 1902, purported that chromosomes are the basis for all genetic inheritance. In 1889, Hugo

Genome Chaos https://doi.org/10.1016/B978-0-12-813635-5.00002-1

53

Copyright © 2019 Elsevier Inc. All rights reserved.

54

2. GENES AND GENOMES REPRESENT DIFFERENT BIOLOGICAL ENTITIES

de Vries used the term “pangen” to describe Mendel’s abstract concept of an isolated genetic factordthe smallest hereditary particle (de Vries, 1889). Not until 1909 did Danish botanist Wilhelm Johannsen use the word “gene” to define the fundamental physical and functional units of heredity (Johannsen, 1909). German botanist Hans Winkler coined the term “genome” in 1920 by the elision of two terms: “GENe” and “chromosOME” (Winkler, 1920). With this historical basis, clearly, the meaning of the word “genome” should include both the whole genomic basis (chromosomes) and units of heredity (genes). It is important to note that Winkler hoped to use this expression to link the genome to the foundation of the species (the whole genome system). Similarly, the genome was referred to as “a set of chromosomes” before the establishment of molecular genetics. Unfortunately, during the gene era and the dominance of the gene-centric view, the importance of the chromosome as a system organizer was ignored. The chromosome became primarily considered as a vehicle for genes. The term “genome” has been applied specifically to mean the complete set of DNA molecules of a cell (both the nuclear genome and organelle genome that include the mitochondria and chloroplasts of a given species). Gradually, the “chromosome” portion of the genome has been chipped away; in practice, the definition of the term “genome” is now relegated to merely “a collection of genes, ”or “the whole of organism’s hereditary information encoded in its DNA (or, for some viruses, RNA).” Here are some representative examples: A genome is an organism’s complete set of DNA, including all of its genes. Each genome contains all of the information needed to build and maintain that organism. US National Library of Medicine (NIH) https://ghr.nlm.nih.gov/primer/ hgp/genome The genome of an organism is the whole of its hereditary information encoded in its DNA (or, for some viruses, RNA). This includes both the genes and the non-coding sequences of the DNA. https://simple.wikipedia.org/wiki/Genome A genome is the full set of instructions needed to make every cell, tissue, and organ in your body. Almost every one of your cells contains a complete copy of these instructions, written in the four letter language of DNA (A, C, T, and G). http://www. broadinstitute.org/education/glossary/genome All the genetic material in the chromosomes of a particular organism. The size of a genome is generally given as its total number of base pairs. Kevles and Hood, 1992 The haploid set of chromosomes in a gamete or microorganism. Or in each cell of a multicellular organism, and more specifically, “The complete set of gene or genetic material present in a cell or organism.” Oxford living dictionaries

2.2 THE DEFINITION OF THE GENOME

55

Keller has listed a range of definitions for the genome (both “official” and semi-official), coupled with historical discussions about the relationship between genes, the genome, and genomics (Keller, 2011). Among all these different definitions, a common thread is the mention of “genes or genetic materials.” In addition, “genetic instructions” are sometimes vaguely mentioned. For those definitions that include chromosomes, they mainly focus on the fact that genes are located on chromosomes. No specific function of inheritance is discussed at the chromosomal level, and no topological and organizing roles of chromosomes are mentioned. In summary, all definitions have focused on the genetic materials, rather than the organization of these materials. It is no wonder then that the project of sequencing all genes has been termed “the genome project” rather than “the gene sequencing project,” and the decoding of individual genes equals the decoding of the genome (the relationship among genes) to most researchers. As a consequence, the current improper use of the term “genome” represents one of the biggest confusions in current genomics, which has also generated many misconceptions in other related fields of bioscience. For example, many consider the gene to be the independent unit of inheritance, and the individual gene or combinations thereof to be responsible for genetic traits, whereas few believe that chromosomes are more important for organizing genomic information; the gene is extensively used as the unit for cellular and organismal evolution studies, and only a few researchers still use karyotypes to study macroevolution (and if they do, they are often considered outdated); it is believed that most complex diseases can be understood by tracing these diseases back to less than a handful of genes, and studying chromosomal aberrations serves only as a tool to identify gene mutations. Almost all molecular genetic research follows the gene and its related aspects. Ironically, the characterization of chromosomal aberrations is often viewed by some as descriptive studies (a very negative comment from the NIH study sections aimed to kill proposals). However, the characterization of gene mutations (clearly, a description at another level) is not considered descriptive but rather mechanistic research. Such an unfortunate bias is based on a common belief that the genecentric reductionist approach is the preferred method and that studies of gene-mediated genetic information are more important than studies of other types of genomic organization (simply because of the high resolution of research). However, these wishful ideas are at odds with the basic principle of complex science, as well as accumulated genomic data, which demonstrate that the genome context differs from gene content (Heng, 2009). Together, rather than reexamining the basis of genetics, scientists are now pushing strategies to search for the answer of genomics beyond

56

2. GENES AND GENOMES REPRESENT DIFFERENT BIOLOGICAL ENTITIES

the genome, as studying individual genes is not working. This situation has been discussed as follows: Perhaps also due to a common misconception that finishing the genome sequencing phase means the mission for genome-level research has been accomplished, the term post-genome era has been used, as if decoding DNA by sequencing is equal to decoding the genome itself. Furthermore, some have even suggested that the goal of this post-genome era is to search for molecular mechanisms beyond the genome. In fact, “beyond the genome” has become a buzzword, even though it is clear that the genome is not just a bag of all genes and DNA sequences, and the functional organization of the genome is virtually unknown. Obviously, the major promises of sequencing the human genome did not pay off in terms of solving the mystery of life or providing an understanding of the genetic basis for most common human diseases. Such a disappointment has quickly led to the common viewpoint that: “If the answer cannot be obtained from genome sequencing, why not move beyond the genome?” On the surface, this seems logical. However, this view is based on a deep misunderstanding of genetic organization, because the genome system is defined by genomic context rather than gene content. . many features of bio-systems, such as protein interaction pattern and network dynamics (including systems- specific boundaries), are defined by the genome context. Characterization of the genome’s features or behavior is not beyond genomic research; it is the very core of it. Heng and Regan, 2017 (with permission)

To initiate a timely conversation on this subject, a definition of the genome has been introduced: A genome is the complete set of genetic material (including gene content) of an organism, which is organized by the unique composition of chromosomes or karyotypes. While genes represent parts inheritance (how individual genes code/regulate individual proteins), the karyotype determines the topological order of genes along and among chromosomes, which ultimately defines interactive relationship among genes, representing the system inheritance (how the blueprint works to instruct the protein network). Heng, 2017a

Note that the usage of “karyotype” here is not accurate but necessary. Rather than its original definition (the entire set of chromosomes of a cell from a given species, usually displayed as a systematized arrangement of chromosome pairs in descending order of size), it in fact refers to the species-specific gene order (or topological relationship) along and among chromosomes, which serves as a physical platform for gene interaction within the 3D nucleus (see Chapter 4 for more details). The main reason for using “karyotype coding” to refer to the “chromosomal code” is to get reader’s attention, as it will likely trigger readers to think more. If we just use the term “chromosomal coding,” readers will probably interpret it as DNA coding and continue to ignore it. In addition, different species often display different karyotypes; karyotype coding can thus emphasize that this new coding is species-specific.

2.3 “PARTS VERSUS THE WHOLE”: THE EMERGENT RELATIONSHIP

57

2.3 “PARTS VERSUS THE WHOLE”: THE EMERGENT RELATIONSHIP (WHICH CHALLENGES REDUCTIONISM) The bias of favoring genes over chromosomes is no surprise given the popularity of reductionism in current bioscience. According to the genecentric view, the gene represents the basic unit of inheritance, and the phenotypic contribution of higher levels of genomic organization, such as the chromosome, should be understood by simply dissecting individual genes’ functions followed by information integration. Furthermore, science education has enforced the belief that physics and chemistry comprise the mechanistic basis of how biosystems work, and biological science needs to follow the approaches which led to the success of physics and chemistry (as they represent more matured scientific disciplines). Indeed, many physicists and chemists had become biologists and contributed greatly to the birth of molecular biology, and mathematical tools have shaped the research landscape of population genetics. In the past two to three decades, computational and bioinformatic analyses have gradually become a key component of biological research, as reflected by the Human Genome Project and many other large-scale -omics projects. In fact, the journal “Bioinformatics” (initially named Computer Applications in the Biosciences before switching to its current name in 1998) was established in 1985, which was 2 years earlier than the birth of journal “Genomics.” Following the massive molecular data collection and bioinformatic analyses that have taken place for over three decades, a large number of data generation platforms and bioinformatic tools/packages are now available. Despite wave after wave of excitement (e.g., who can forget how promising gene expression microarray technologies were when they were initially introduced?), in contrast to initial expectations, the flood of data has only increased overall bio-uncertainty; as more diverse data have often meant more complexity and led to more confusion, the issues seem bigger than computational power itself, which ultimately challenges many of our genetic or biological foundations. When the data do not support the concepts, most researchers blame the data rather than question the biological concepts. One viewpoint is that inconsistencies between data generated and genomic concepts are caused by insufficient data. Once more data are collected, the pattern predicted by current concepts will become reliable. Another viewpoint is that we do not know how to filter out the noise from the available data. Once we have better computational power to filter out this noise, the pattern will become apparent.

58

2. GENES AND GENOMES REPRESENT DIFFERENT BIOLOGICAL ENTITIES

Both viewpoints are based on the same rationale: similar to the relationship between defined molecules and chemical reactions, the relationship between genes and phenotypes should have a high degree of certainty/predictability. The key in searching for certainty (reflected by highly repeatable patterns) is to eliminate the noise. Despite that: (1) it has been increasingly realized that many genetic factors are highly dynamic and heterogeneous (thus it is not a simple issue of eliminating noise); (2) the data distribution does not follow the simple patterns defined by the Mendelian laws of genetics (thus it is needed to reexamine the law of genetics); and (3) environmental factors can have a significant impact on the genes’ phenotype, especially for complex traits (thus it is not a simple issue of data accumulation from different factors), most scientists still believe or hope that gene-based predictions will be precise when the data issue is solved. After all, most genetic/genomic researchers are typical reductionists. Knowing the historical relationship between reductionism and classical physics, the general appreciation of the limitation of Newton’s view of the universe in the modern scientific era, and the fact that complexity science has become increasingly popular since the 1980s, it is extremely puzzling to observe that most molecular biologists still firmly hold the reductionist viewpoint to guide their way of practicing science. Biologists should know better, as life is both complicated and complex, and evolution is not a linear process. Among all scientists, they should embrace complexity science with the highest enthusiasm. Interestingly, the emergence of complexity science during the 1980s and 1990s unfortunately overlapped with the golden age of molecular genetics/genomics, a field in which the cloning of disease genes became dominant. It is likely that the exciting discoveries coming from molecular genetics then took needed attention away from biologists studying the new scientific frontier of complexity. When molecular researchers who were trained during the 1980s were asked if they were familiar with complexity science, they answer that they all heard about it. When asked why they have not taken it seriously in regard to their own research, they have a common answer: “Back then, we were too busy cloning genes. By the way, the reductionist molecular approaches have been working well. Why change things when they are fine?” Most bio-researchers are skillful in designing and performing linear models which lack the complexity of real life. Successfully performed artificial experiments might have given them the illusion that molecular research is going well so far. Thus, there is no reason to consider complexity. Perhaps, even more surprisingly, when asked about complexity science, most current students majoring in the biological sciences (at both the undergraduate and graduate levels) do not pay any attention to this timely subject at all. This is mostly likely because most genetics/genomics

2.3 “PARTS VERSUS THE WHOLE”: THE EMERGENT RELATIONSHIP

59

textbooks have failed to link current genomic challenges to the fundamental limitations of reductionism and the ultimately important fact that biological systems are complex adaptive systems (a fact that demands the new science of complexity). Moreover, because biological science will lead all scientific disciplines in the 21st century, it is rather troublesome to realize that complexity science has not yet been integrated into mainstream bioscience. Clearly, without the necessary understanding and appreciation of the emergent properties of complex systems, there is no way to comprehensively reconcile many conflicting issues in genomics, such as the relationship between genes and genome, between specific pathway and phenotype, and between laboratory experimental results and clinical reality. Complexity science can be regarded as a new scientific discipline for studying complex systems, within which many parts (agents) interact to generate emergent global collective behavior that cannot be easily explained by the function of individual parts or their interaction. Two simple examples are frequently used to illustrate emergent properties: (1) edible table salt is made up of sodium and chlorine, and the properties of NaCl differ drastically from those of a metal and a poisonous gas; and (2) carbon atoms, when arranged differently to result in different molecular architectures, can form many types of materials (including graphite, diamonds, and C60 buckyballs, also called fullerenes, one of the first discovered nanoparticles), each of which display different properties. It is important to point out that those examples only represent simple emergent molecules from well-defined homogenous atoms. In these situations, the relationship between the parts and the emergent properties are highly certain; thus, they have high predictability based on the parts and the conditions they are in. In contrast, for most biosystems, there are often multiple levels of emergence, the parts involved come in a large variety of types, and each type of these parts, including environments, is highly heterogeneous. Beyond these features, there is also the important factor of time. All of the complicated interactions occurring within biosystems in an adaptive fashion make mechanistic studies aimed to predict emergent properties based on parts characterization highly challenging, if not impossible (Heng et al., 2018; 2019). The key rationale of developing complexity science is to address the issue of complexity and to depart from reductionism. When applied to genomics, complexity science acknowledges that the genotypee phenotype relationship is an adaptive relationship; as such, focusing on the characterization of the initial genetic condition will offer limited prediction of the final phenotype, as there is often no strong causation between individual parts and their emergent key features. Furthermore,

60

2. GENES AND GENOMES REPRESENT DIFFERENT BIOLOGICAL ENTITIES

the nonlinear feature of complex systems requires new platforms for understanding. Nevertheless, it explains the current confusion and challenges of large-scale genomic data and demands that we adapt a new way of understanding and doing genomic science. If the basic principles of complexity science are correct, particularly that the heterogeneity of emergence represents a key feature for biocomplexity (for more, see Chapter 8), many genomic concepts/approaches need to be changed. For example, a new type of biomarker needs to be developed to monitor the adaptive process by measuring evolutionary potential. Furthermore, attention should be paid to the physiological and pathological relationship, pre- and posttreatment dynamics, as well as short-term response and long-term benefits. All of these transitions involve punctuated somatic cell evolution (Heng, 2015). Of equal importance, only when we accept the reality of biosystems as adaptive systems, and that there is no simple relationship between genetic parts and phenotypes, can the needed change of attitude regarding reductionism finally occur. Right now, we should use the lens of complexity science to reexamine some key predictions of the gene theory.

2.4 REEXAMINING GENE THEORY PREDICTIONS Chapter 1 gave a brief introduction of the rise and fall of the concept of the gene. To appreciate the fundamental limitations of the gene theory, a series of paradoxes must be presented that represent “the accumulation of anomalies” phase of the scientific revolution, according to Kuhn’s criteria. Similar to Kuhn’s description, the current paradigm of gene-based genomics unintentionally generates considerable observations which challenge the paradigm itself. The following are some examples that have triggered both my curiosity and critical evaluation.

2.4.1 Selfish Gene or Constrained Genome? The selfish gene concept generated a powerful impact when it was first introduced (Dawkins, 1976). The idea that the sole implicit purpose of the gene was to replicate itself seemed to touch the core of evolution, as it explained evolution in a simple and vivid way. This gene-centric view of evolution proposed that evolution occurs through the differential survival of competing genes. On the surface, gene competition can lead to the fitness of phenotypes or vehicles. But fundamentally, it is the gene’s own propagation that matters the most. At the time, the gene-centric concept seemed to make sense, as genes were

2.4 REEXAMINING GENE THEORY PREDICTIONS

61

thought to be the genetic material and the units of natural evolution. The chromosome was merely a vehicle of the gene, as was the individual a vehicle of the species. The birth of the selfish gene concept was not an isolated event. For decades, population genetics focused on gene frequencies. Molecular cloning technologies further pushed the gene to center stage of genetics and biology and spawned the biotech industry. As a result, geneticists and society at large embraced this concept. Dawkins’ The Selfish Gene sold over a million copies. Some credited it with causing a silent and almost immediate revolution in biology. However, despite its popularity, many criticized this extreme idea of evolutionary selection based on genetic parts rather than the genetic system. Evolutionary biologist Stephen Jay Gould, who along with Niles Eldredge established the theory of punctuated equilibrium, was critical of the selfish gene concept. According to Gould, the fatal flaw of the selfish gene concept is that the gene is not a selection unit because “no matter how much power Dawkins wishes to assign to genes, there is one thing that he cannot give themedirect visibility to natural selection” (Gould, 1990). Eldredge has nicely summarized the difference between Dawkins and Gould using Dawkins’s own analysis: Dawkins sees genes playing a causal role in evolution, while Gould and Eldredge see genes as passive recorders of evolutionary changes (Eldredge, 2004). Similarly, Ernst Mayr, one of the 20th century’s leading evolutionary biologists, also insisted that the notion of the gene as the object of selection was not a valid evolutionary idea and this is totally anti-Darwinian. He stated: The idea that a few people have about the gene being the target of selection is completely impractical; a gene is never visible to natural selection, and in the genotype, it is always in the context with other genes, and the interaction with those other genes make a particular gene either more favorable or less favorable. In fact, Dobzhansky, for instance, worked quite a bit on so-called lethal chromosomes which are highly successful in one combination, and lethal in another. Therefore people like Dawkins in England who still think the gene is the target of selection are evidently wrong. In the 30s and 40s, it was widely accepted that genes were the target of selection, because that was the only way they could be made accessible to mathematics, but now we know that it is really the whole genotype of the individual, not the gene. EDGE interview, 2001 (with permission)

Lynn Margulis, best known for her contribution to the endosymbiotic theory of evolution, even criticized the term “selfish gene”: . The terminology of most modern evolutionists is not only fallacious but dangerously so, because it leads people to think they know about the evolution of life when in fact they are confused and baffled. The ‘selfish gene’ provides a fine example. What is Richard Dawkins’s selfish gene? A gene is never a self to begin with. A gene alone is only a piece of DNA long enough to have a function. The gene by itself can be flushed down the sink; . There is no life in a gene. Margulis and Sagan, 2003

62

2. GENES AND GENOMES REPRESENT DIFFERENT BIOLOGICAL ENTITIES

Because eukaryotic cells require a much longer time to replicate than do prokaryotic cells, and if the selfish gene concept is correct that the ultimate goal of life is gene duplication, then it is difficult to explain the evolution of eukaryotes from prokaryotes, as the most effective strategy for selfish genes would be to confine themselves to bacteria, which are the most effective vehicle of gene duplication. The fact that evolution has resulted in the propagation of more complicated yet slower duplicating genomes is strong evidence for the reduced importance of the selfish gene in the eukaryotic world (Heng, 2009). Does the fact that bacteria are one of the most abundant life forms on Earth support the validity of the selfish gene? The answer is also negative. Before the genomic era, individuals from the same ‘‘species’’ of bacteria were thought to share identical copies of “selfish” genes. However, the discovery of a high level of gene dynamics in bacteria indicates that bacterial genes do not just self-replicate but generate large numbers of variants or ‘‘noneself-oriented’’ genes (Heng, 2007a). In addition, extensive horizontal gene transfer (HGT) also indicates the success of cooperation rather than mere replication of selfish genes. It is no wonder that now even Dawkins uses the term “the cooperative gene” rather than just the “selfish gene” (see also Chapter 1). If the gene is the primary unit of evolution, then gene selfishness is of ultimate importance in evolutionary competition. However, if higher levels of the genomic system are the primary unit of evolution rather than interdependent parts such as genes, then cooperation within the genome is the key for evolution (especially macroevolution) as this higher level serves as a regulatory constraint for the lower parts. It should be noted that emphasizing the importance of the genome in evolution is not to deny evolutionary selection observed in multiple levels. One of the main purposes of this book is to raise the following question: which level is more important given a particular context of genetic organization or type of evolution? In other words, when are genes or genomes more important? It is very tempting to speculate, however, that there might have been a period in history when dominant selfish genes did exist (in a specific context). During this period, early life forms contained limited numbers of genes. For these life forms, each gene was essential, directly serving as a unit of evolutionary selection; therefore, each gene was selfish. Yet with the success of genes and genome duplication, the increased number of genes actually changed the game, as paradoxically, the more genes involved in a system, the less important each gene becomes. At a certain evolutionary tipping point, there would be so many genes within each individual that individual genes are not selected upon. When bio-features are generated by the emergent properties of genes requiring a certain degree of complexity, the selection pressures shift focus to the higher level of the whole genome package and reach the point of no return in regard to

2.4 REEXAMINING GENE THEORY PREDICTIONS

63

gene-based selection. The complexity of such biosystems becomes dominant, and inheritance serves as a powerful constraint preventing a return to a much simpler system with a smaller number of genes. There are many examples in current life forms where the constraints of a system are dominated by the genome rather than by selfish genes. The classification of bacteria according to similarities in their DNA sequences reflects this constraint. High levels of intraspecies genetic diversity make it difficult to define bacterial species (Konstantinidis et al., 2006). Nevertheless, a bacterial species is defined as a collection of strains characterized by DNA with at least 70% cross-hybridization (Wayne et al., 1987) (for more discussion, see Chapter 5). HGT is a common phenomenon which on the surface supports the selfishness of individual genes. However, even the efficiency of HGT is influenced by the genome constraint across evolutionary distances. HGT is most commonly detected in closely related microorganisms. In contrast, in distantly related microorganisms, such as bacteria and archaea, HGT has not been demonstrated to occur on a large scale (Glansdorff et al., 2009), as there is a donorerecipient similarity barrier (Popa and Dagan, 2011; Popa et al., 2011; Tuller et al., 2011). For example, many horizontally acquired genes were in fact compatible with the recipient genome’s constraints such as codon usage (related to GC content and/or amino acid usage). Once integrated into the genome, acquired genes still have to adapt within the genome to be retained during evolution. Recently, it was demonstrated that there are bidirectional associations between similar tRNA pools of organisms and the number of HGT events occurring between them. Here, the similar tRNA pools reflect a similar genome context. Interestingly, this study also suggested that frequent HGTs may be a homogenizing force that increases the similarity in the tRNA pools of organisms within the same community. Clearly, genomic constraint plays a dominant role here. There is a restriction on HGT in multicellular eukaryotes: further genome constraint is visible in the separation of germ cells from somatic cells where the function of sex plays the key role of maintaining the genome in germ cells while the somatic genome makeup is more flexible (Heng, 2007b; Gorelick and Heng, 2011; Horne et al., 2013a; Ying et al., 2018). For more in depth discussion on this point, see Chapter 5. System constraints ensure that evolutionary selection acts at the level of the genetic network rather than on the genes or the individual pathways. This conclusion has been illustrated by directly manipulating bacterial genetic networks where master regulators (transcription factors) in 598 Escherichia coli gene networks were rewired by reconstructing new regulatory links in the network. Surprisingly, in the majority of altered systems, the newly formed networks were functionally unchanged or even superior to the original system in terms of growth, reflecting plasticity of

64

2. GENES AND GENOMES REPRESENT DIFFERENT BIOLOGICAL ENTITIES

individual gene function and demonstrating lack of importance of specific genes within a given system (Isalan et al., 2008). Another important and convincing experiment in yeast illustrates the power of genome system alteration in forming new systems to compensate for the loss of a key gene. MYO1 plays a central role in cytokinesis, and as expected, MYO1 deletion is lethal. However, researchers from Rong Li’s laboratory at the Stowers Institute for Medical Research noticed that, surprisingly, some cells grew back after culture plates were left on the laboratory bench for many days (the plates were supposed to have been disposed of on completion of the deletion experiments)! Detailed studies of the resurrected cells demonstrated that yeast cells with deleted MYO1 rapidly evolved divergent pathways to restore growth and cytokinesis, and the evolved cytokinesis phenotypes correlated with specific changes in the transcriptome rather than the restoration of the MYO1 gene itself. Significantly, extensive polyploidy and aneuploidy were the initial evolutionary changes detected from these new systems. Similar incidents involving other genes had previously been observed by others, but this resurrection phenomenon was ignored as most investigators are interested only in the causative relationship between a specific gene and a phenotype in the short term. When one key gene is deleted, cell death occurs, seemingly demonstrating the importance of the gene of interest, and that is the end of the story. However, Li’s laboratory realized the importance of such an observation and carefully studied this issue from an evolutionary perspective. These results demonstrate the evolvability of even a well-conserved process and suggest that changes in chromosome stoichiometry provide a source of heritable variation driving the emergence of adaptive phenotypes when the cell division machinery is strongly perturbed. Rancati et al. (2008)

This elegant experiment demonstrated that the system robustness and macroevolution of a biological organism are not dependent on individual key genes, but rather on the formation of new genome-defined systems. In this case, extensive polyploidy and aneuploidy represent new systems that characterize a new emergent potential irrespective of the individual genes. This experiment also answers the question why the genome-based evolutionary concept insists that the individual gene’s function is extremely limited during macroevolution, despite that its function could sometimes be obvious in a given individual. The key is whether or not genome reorganization is involved. When there is genome reorganization, any chromosomal changes can impact thousands of genes (which key gene is more important than hundreds and thousands of genes?), and, as recently illustrated, genome reorganization often completely changes the transcriptome (Chapters 3 and 4). Within the new genome-defined

2.4 REEXAMINING GENE THEORY PREDICTIONS

65

transcriptome, an individual gene’s function could be drastically altered. When the MYO1 gene was reintroduced to the altered genome, its original function no longer existed (Rancati et al., 2008). Ongoing experiments are actively testing how many individual gene’s functions can be altered by changing the genome. Human genomics originally set out to illustrate the connection between genes and most human diseases with an eye to use this knowledge for medical purposes. And yet, inadvertently, these studies have also supported the idea that an individual gene has limited impact on a biological system. For example, the data from the 1000 Genomes Project revealed a high level of gene defects in “normal” individuals where on average, each individual in the study carried 250e300 loss-offunction variants in annotated genes. More strikingly, 50e100 variants were previously implicated in inherited disorders (Abecasis et al, 1000 Genomes Project Consortium, 2010)! Recent data from personal genome sequencing demonstrate that more mutations exist in each individual than are necessary for most diseases to occur, which downplays the contribution of these disease genes to disease phenotypes. It is estimated that each human individual, on average, has 60 gene mutations per generation (Conrad et al., 2011). As most mutable parts of the genome are in tandem repeats and satellite DNA, the average number of mutations in the germline of an individual could even be much higher. And there is an even higher rate of somatic mutations (see Chapters 3 and 7). Yet, most of us are seemingly normal. This will be discussed in more detail in Chapter 6, but all these surprising observations cry out the same message: genome constraintsdnot gene mutationsdmatter most for the survival and evolution of biological systems. Individual parts become less important in a genome-defined system regardless of the selfishness of the genes. Even transposable elements (TEs) (one type of so-called “selfish DNA”) must acquiesce to genome constraints. The fact that different types of TEs are detected in different species suggests a genome-defined species constraint. It has been hypothesized that TE can escape the sexual filters (see Chapter 5) by spreading within the host genomes, but there is a limitation on how much can occur each generation. More importantly, the newly invading TEs are not directly selected but come as part of the whole genome package. The surprising result of sequencing various genomes is that the position of the genes are reorganized within different species, which downplays the importance of the selfishness of the gene and underscores the importance of the topological relationship among genes within a given genome. Before the genome sequencing era, however, most researchers believed that different species are caused by the accumulation of different selfish genes, which fits the neo-Darwinian evolutionary principle (see Chapter 6).

66

2. GENES AND GENOMES REPRESENT DIFFERENT BIOLOGICAL ENTITIES

By now it should be clear that a preponderance of evidence points to the importance of genome constraint versus selfish genes and the importance of the genome concept versus the gene-centric concept. Some might argue that with overwhelming genomic evidence, analyzing the fundamental limitations of the gene-centric concept is akin to fighting a straw man or beating a dead horse. However, the fact of the matter is that, even though increasing numbers of scholars are abandoning the extreme version of the selfish gene concept, the gene-centric view of life is still very dominant in biology. How many researchers agree that gene-based research does not paint a full picture but are still studying genes or gene-based biology? The geneegenome transition has surely not yet happened. Consider the following excerpt from the public conversation between Craig Venter and Richard Dawkins at Digital Life Design in 2008, titled “Life: a gene-centric view” (EDGE conversation: Digital Life Design, 2008). JOHN BROCKMAN (moderator) “. Dawkins is responsible for possibly the most important science book of the last century, The Selfish Gene, . which has become the basic science agenda for biologists for the last quarter century .” Venter: “. Richard’s book on The Selfish Gene really influenced most thinking in modern biology. I actually didn’t like his book initially .. But I’ve come to appreciate it immensely. I was looking at the world from a genome-centric viewdthe collection of genes that put together to lead to any one speciesd . But I’ve switched, and I’ve really come to view the world from a gene-centric point of view .” (with permission from Edge)

Venter should be the last person to switch from the genome-centric view back to the gene-centric view, as genomics has illustrated the fundamental limitations of the gene-centric view partially because of his personal efforts. Venter found that in most microorganisms, it is hard to identify the defined genome because of sequence diversity. This lack of a genomic identity is because of the lack of sexual filters commonly used in eukaryotic organisms (see Chapter 5). The fact that it is difficult to define systems of microorganisms by the same gene sets clearly downplays the importance of genes. It provides further support to the genome-centric concept. This should have convinced Venter to confirm the genomecentric view, not revert to his pregenome thinking. The independent behavior of the parts (genes or chromosomes alike) is only meaningful within the context of the system. Notably, in the experiment of synthetic life form creation, Venter’s group synthetically copied the genome of a bacterium and incorporated it into a cell to make what they call the world’s first synthetic life form (Gibson et al., 2010). One key message from this study is that they used the coding information of an already existing biosystem, and changing the order of genes resulted in failure to create a functional system. This further illustrates the importance of the

2.4 REEXAMINING GENE THEORY PREDICTIONS

67

genome context rather than just the importance of individual genes. Clearly, this observation favors a genome-centric rather than genecentric view. Obviously, changing views of genetics to incorporate the relative unimportance of genes will require great effort, as the gene-centric concept continues to influence genomic researchers in the face of mounting evidence that seriously challenges the gene theory itself. This situation illustrates the important need to establish a new genome-based paradigm to replace the old gene-based one. Without a new genome theory, gene-based genomic research will likely continue to fail to deliver relevant clinical results. It is true that the selfish gene has played an important role prioritizing gene-based research, but the time has come to consider the genome as the top priority after decades of focusing on individual genes. Yes, there has been a great deal of discussion regarding the selfish or cooperative gene perspective. However, these discussions still fall within the gene-centric realm where the gene is the motor of evolution. Despite many critical analyses from many well-respected evolutionary scholars (including Ernst Mayr, Stephen Jay Gould, and Lynn Margulis) on the selfish gene concept, the notion that the genome itself is a key selection unit that constrains genes has not been systematically discussed and, in particular, has not been discussed within the genome-based evolutionary concept. As a result, the relationship between genetic parts and the genome system is less clear, even though it is clear to some that a genetic program cannot reside solely within the genes (Keller, 2000). Similar to the “selfish gene” expressing the gene-centric viewpoint of evolution, genomic constraint expresses the genome-centric view of evolution (Chapter 6).

2.4.2 Genomes Not Genes Define Biosystems One crucial step to further understand the limitation of individual genes is to examine the issue of whether genes define a biological system and, if not, what genetic components or structures actually define a biological system. A key feature of biological systems is inheritance, and therefore, the involvement of genetics is crucial. The following examples address this issue and are applicable to many related topics important for genomics and evolutionary biology. Further discussions based on genomic coding can be found in Chapter 4. 2.4.2.1 There Are No Common Minimal Gene Sets in Nature A “minimal gene set” refers to the smallest possible group of genes sufficient to maintain a functioning cellular life form under ideal conditions (sufficient nutrients and absence of environmental stresses) (Koonin, 2000). These minimal gene sets, if they exist, should be evident from the

68

2. GENES AND GENOMES REPRESENT DIFFERENT BIOLOGICAL ENTITIES

simplest forms of life such as obligate host-associated bacteria when they are sequenced and compared. Alternatively, they should become apparent if we delete nonessential genes from a few species and compare the remaining essential genes common to them. The existence of minimal gene sets would powerfully illustrate the importance of these evolutionary conserved key genes in evolution and contribute to the understanding of the universal principles of life (Glass et al., 2009). In addition, minimal gene set reconstructions could be experimentally testable. These minimal gene sets serve as the basic foundation of synthetic biology, a new scientific discipline that focuses on artificially generated organisms using genetic information and material. Surprisingly, these essential gene sets are difficult to identify, despite the supposition that some genes and their coded functions are absolutely necessary for the survival of any living entity (Juhas et al., 2011). Examination of endosymbiotic genomes revealed that the expected minimal gene sets are highly diverse and are largely host-defined and speciesspecific, with the exception of a small set of genes required to process information (Klasson and Andersson, 2004). Mycoplasmas are important models to analyze essential genes because of their small genome size and easy cultivation. But there, too, the presence of diverse gene sets challenges the idea of a rigid set of minimal genes and suggests a minimal set of functional niches (or genome packages in our view). The global knockout mutagenesis of mycoplasmal genes has further illustrated that some of the so-called universal or highly conserved genes may not be necessary, suggesting the importance of evolutionary selection based on the overall system rather than individual components like genes. Interestingly, as more sequences of genomes are compared, there are a diminishing number of protein-coding essential genes that can be identified, contradicting the notion of a minimal gene set (Table 2.1). Based on this current trend, as more and more genomes are added to this analysis,

TABLE 2.1

The Reduced Minimal Gene Set

Number of Genomes Compared

Number of Shared Protein-Coding Genes

2

256

5

180

7

156

100

63

147

35

2.4 REEXAMINING GENE THEORY PREDICTIONS

69

the number of minimal essential genes will become unrealistically small or vanish altogether. Despite the disappointing results from comparative genomics and experimental biology, computational reconstructions and modeling of minimal genes or metabolic machinery necessary to sustain life are still actively pursued and numerous models have proposed a variety of numbers of minimal genes (Gil et al., 2004). It is extremely difficult to predict evolution by modeling, particularly in light of the diverse gene sets discovered to date; thus, the value of these models is limited. While the rationale of identifying clear-cut, simple general genetic patterns is ubiquitous in biological research, it has largely failed to produce useful results. The same rationale that led to the search for the minimal essential genes for life has been used to attempt to identify the common gene mutations in cancer. Such a rationale is based on the genecentric concept that mutation of a handful of key genes leads to clonal expansion and cancer and that finding the key genes will lead to an understanding of the underlying science and provide therapeutic targets. This approach ignores genomic and system heterogeneity, which is pervasive in cancer. A key assumption of minimal gene sets is the imagined optimal conditions that limit the all-important system heterogeneity. As each system has its own conditions, there are no such universal optimal conditions in the first place. Artificial conditions can be created to illustrate the significance or insignificance of genes. For example, under experimental conditions, over 70% of all genes from yeast can be deleted without serious consequences. Similarly, E. coli genomes can eliminate 10%e30% of their original genes without any detectable effect on bacterial viability. In fact, the fraction of essential genes proved to be surprisingly low in almost all organisms studied, typically in the range of 10%e30% of the whole gene set (Koonin, 2003; Feher et al., 2007). Such a conclusion has little relevance in natural systems where survival and competition occur within dynamic natural environments. Not only are individuals with serious deficiencies unable to survive competition in the real world, but system constraints (like the mechanism of sexual reproduction, see Chapter 5) will not allow such a drastic gene elimination to occur while simultaneously maintaining the same system. Indeed, when culture conditions were varied in the same yeast model, many of the previously characterized “nonessential genes” proved to be essential for viability (Hillenmeyer et al., 2008). No wonder Venter has faced such a challenge trying to assemble a “standard” genome for many marine microorganisms collected in nature. Not only is there a tremendous diversity at the gene level, but whether or not a gene is essential largely depends on its internal genetic and external environment (Nealson and Venter, 2007). It is thus clear that there are likely no minimal gene sets with fixed conserved genes in nature, as

70

2. GENES AND GENOMES REPRESENT DIFFERENT BIOLOGICAL ENTITIES

heterogeneity is the inseparable feature of biological systems. There also is no standard genome with fixed genes in microorganisms because of a lack of genome constraint provided by sexual reproduction. So where should we look for the inheritance that defines the pattern of evolution, knowing that inheritance plays a major role in evolution? If it’s not in the genes, then within what genetic component does it exist? Interestingly, the efforts of illustrating the ultimate importance of genes (including efforts to identify minimal gene sets) have in fact led to the search for the correct level of inheritance (Heng, 2009; Heng et al., 2011a). It has been stated that “some genes and the functions encoded by them are absolutely necessary for the survival of any living entity” (Juhas et al., 2011). While some functions are absolutely necessary for the survival of any living system, survival can be achieved by a variety of emergent mechanisms based on different genes. The importance of individual genes is accentuated under artificial experimental conditions, but in the natural world their individual importance can be lost or changed amid the collective interactive genomic network effect. Thus, these contradictions call for a new concept that focuses on the whole (genome package with unlimited potential) rather than the parts (diverse genes). Gene coding and system coding are fundamentally different, and it is genome coding that will provide answers for the nonexistent minimal essential gene sets. Interestingly, based on genomic information, many researchers now accept that approximately 300 genes are required to support cellular life (Juhas et al., 2011). A new suggestion is that it is likely that a certain amount of minimal variable parts (genes or DNA sequences) are needed but the specific parts list could be very different in different systems, as long as a minimal function is achieved, there is no requirement for the same parts. This idea is consistent with the finding that there often is a conservation of networks across distant species despite a lesser amount of conservation of specific genes. It would be valuable to analyze the relationship between the number of genes and degree of system complexity to investigate whether there is a required number of parts for the emergence of a certain degree of complexity. Both computational simulations and synthetic genomics could be applied to achieve this goal. 2.4.2.2 Are Gene or Genome Alterations Mainly Responsible for Speciation? There has been a significant bias largely caused by the confusion between the function of genes and chromosomes. Ever since Dobzhansky’s work over 80 years ago, the speciation gene concept has been favored as the underlying genetic mechanism of evolution. Using the classic approach to study speciation genes, Dobzhansky initially linked hybrid testis size (a proxy for fertility) to a number of mutant markers by

2.4 REEXAMINING GENE THEORY PREDICTIONS

71

crossing different species of Drosophila pseudoobscura (Dobzhansky, 1936). The DobzhanskyeMuller model of hybrid incompatibility suggested that genetic incompatible is caused by new mutation combinations unique to the hybrids (mating between species). While in parental populations (different species) each incompatible allele can arise and become fixed without combinational effects, it was the cross-hybridization that created the harmful combinations. Two important issues need to be pointed out. First, the Dobzhanskye Muller model is based on the assumption that a new gene mutation is the basis of a new species. According to this model, different species should have different genes or mutational profiles. If this key assumption is not accurate, the whole model is invalid. Second, many of Dobzhansky’s wonderful experiments were actually based on chromosomal studies rather than gene studies. Conclusions at the gene level using chromosomal studies likely will be incorrect in his cases because these two levels involve different genetic roles. In fact, the reduced hybrid compatibility he observed might simply be caused by chromosomal incompatibility. The jump from the chromosome-based concept to Dobzhansky’s gene-based concept may represent his greatest limitation. Speciation studies have now entered the molecular era, and there have been increasingly exciting reports in top science journals that claim to have identified speciation genes. In particular, equipped with highthroughput large-scale genomic analysis, speciation genomics has provided a great deal of expectation. In contrast, there has been limited interest in how chromosomes or genomes initiate speciation. To further explore this issue, we first must decide on the definition of a species. According to Ernst Mayr’s “Biological Species Concept,” species are “groups of interbreeding natural populations that are reproductively isolated from other such groups” (Mayr, 1963). The modern version of the definition emphasizes the aspect of potential interbreeding and deemphasizes natural populations. This could apply to experimental conditions where most speciation genes have been identified. The following statement was quoted from Douglas Futuyma (Futuyma, 1998): “Species are groups of actually or potentially interbreeding populations that are reproductively isolated from other such groups.” Alternatively, the “Morphological Species Concept” considers that two organisms are the same species if they look similar, while the “Evolutionary and Ecological Concept” defines organisms as the same species if they share the same evolutionary or ecological history. Despite some limitations, the Biological Species Concept still dominates. Major limitations of this concept include not explaining how asexual organisms, preserved museum specimens, and extinct taxons fit into this definition of a species, as well as defining what “potentially interbreeding” actually means.

72

2. GENES AND GENOMES REPRESENT DIFFERENT BIOLOGICAL ENTITIES

Second, one needs to know the mechanisms of speciation. The commonly accepted view suggests that reproductive isolation interrupts gene flow. Reproductive isolation can be geographic, genetic, or even behavioral. There are many models to explain how reproductive isolation happens. The ecological model states that ecological circumstances can cause two populations to become reproductively isolated and that following a sufficiently long period, a new species will develop because such isolation provides the opportunity for the original species to split after accumulating enough variation. Genetic models also address the mechanism that leads to genetic incompatibility. Within the modern synthesis school of thought, ecological reproductive isolation plays an important role in addition to genetic mechanisms, as macroevolution (of speciation) can be viewed as the compounded effects of microevolution where ecological isolation provides the time needed for the accumulation of microevolution. One of the commonly used and simple definitions of genetic speciation (for sexually reproducing organisms) is the process that transforms within-population variation into taxonomic differences through the evolution of inherent barriers to gene flow (Noor and Feder, 2006). Interestingly, recent data surprisingly suggest that much of this divergence, which led to incompatibility between species (an assumed key step for speciation), does not appear to be driven by ecological adaptation but may instead result from responses to purely mutational mechanisms or to internal genetic conflicts (Maheshwari and Barbash, 2011). Third, what type of genetic changes result in speciation (Noor and Feder, 2006)? Currently, there are three types of genetic factors known to contribute to inherent barriers that serve as the genetic basis of reproductive isolation. From the new genome perspective, the most significant type is chromosomal or genome changes. Chromosomal rearrangements and number changes, particularly polyploidization, are the most commonly identified phenomena associated with speciation. This mechanism has been commonly observed in plants. The generation of new plant species through autopolyploidy was long ago observed by Hugo de Vries (1905), one of the rediscoverers of Mendel’s experiments. He isolated a tetraploid version of the normally diploid evening primrose, Oenothera lamarckiana, which he named Oenothera gigas (Brown, 2002). There are examples where the duplication of an entire genome or massive chromosome changes occurs in a single or a few generations, resulting in speciation. This example of rapid chromosomal speciation explains cases that are not dependent on physical isolation occurring over a long duration of time. The fact that the majority of eukaryotic species display different karyotypes strongly supports the importance of chromosomalmediated speciation (White, 1978; King, 1995; Ye et al., 2007; Heng, 2007b, 2009, 2015). Unfortunately, the concept of chromosomal speciation

2.4 REEXAMINING GENE THEORY PREDICTIONS

73

is largely ignored by current research, as it fundamentally contradicts the common belief of gene-mediated stepwise evolution. Extrachromosomal elements including cytoplasmic symbionts and TEs can also contribute to speciation (Margulis and Sagan, 2002; Rebollo et al., 2010). It should be pointed out that this category is closely associated with chromosomal speciation. The symbiotic process involves the interaction of the symbiont and host’s genome. TE-mediated speciation also usually requires genome reorganization. TE also represents a powerful mechanism to induce genome alterations by changing the topology of genes within a genome, even without gross chromosomal alterations (more discussion on this process will follow) (Rebollo et al., 2010). Finally, genic elements or speciation genes receive the most attention, as they fit well with the current mainstream evolutionary belief that a gradual accumulation of new genes is the key to speciation. The importance of incompatibilities between the genes of diverging species causing reproductive isolation was previously suggested by the Dobzhanskye Muller model of hybrid incompatibility (Wu and Ting, 2004). Despite its importance to the gene theory and the efforts to trace down speciation genes using cutting-edge technologies, to date there has been rather limited success as only a few such putative genes have been identified in various species, the majority of which come from the Drosophila species (Noor and Feder, 2006). Recently, with using the latest methods, increasing numbers of speciation genes have been identified, including in plants (Rieseberg and Blackman, 2010), which seems to support the DobzhanskyeMuller model. However, many of the cases of speciation genes are probably better explained by the chromosomal-based concept (again, see discussion below). Rather than immediately presenting the genome-mediated speciation model as the major form of speciation, this chapter will focus on the difference between genes and the genome. Because of its ultimate importance and sharp contrast to popular gene-centric views, genomemediated speciation will be discussed in Chapter 6. The following are some interesting observations/questions to illustrate the bias and challenges of the gene-speciation concept, which is very helpful to illustrate the differential roles of genes and the genome. While it is true that increasing numbers of papers have linked genes to hybrid incompatibility, it would be comforting if these purported speciation genes are actually the real causative factors of the hybrid incompatibility that has been observed. When hybrids are formed, there are usually multiple genetic alterations involved, which provide opportunities for different researchers to make specific links based on their unique concepts. This situation is explained using cancer research as an example. Researchers with different interests using different technologies will each make their own discoveries. DNA sequencing will surely reveal

74

2. GENES AND GENOMES REPRESENT DIFFERENT BIOLOGICAL ENTITIES

an array of gene mutations in a tumor, while proteomic studies will surely reveal differential posttranslational modifications of certain proteins, which lead to different and conflicting conclusions. Cancer is caused by these gene mutations in one case and posttranslational modification in the other! Others will find a strong correlation with any number of changes, including but not limited to copy number variations, specific RNA splicing forms, noncoding RNA, TEs, defects of mitochondria, altered DNA methylation, protease overexpression, endoplasmic reticulum stress, inflammation, abnormal gene expression, microenvironment changes, and, of course, chromosomal alterations. Although efforts have been made to integrate various datasets using computational or systems biology methods, there is currently no unified, established approach to integrate data from multiple levels of detection and in particular, no systematic comparison to determine which level of alteration is most important and which level should be the main window of our observation. In cancer research, seemingly unlimited genetic and epigenetic factors can be linked to the cancer phenotype and genome dynamics (see Chapter 3). Similar to cancer, if multiple levels of genetic and epigenetic alterations are examined in the hybrids where speciation genes are identified, multiple different links could be established. Moreover, the more cases of hybrid incompatibility that are examined, the more diverse molecular mechanisms behind the incompatibility will be identified, and the number of speciation genes will continue to increase. By quickly scanning recent literature, many different genes have been found that underlie hybrid incompatibility, and these genes indeed represent a wide array of functions, including those involved in oxidative respiration, nuclear trafficking, DNA binding, plant defense, pollination, and nuclearmitochondrial conflicts (Rieseberg and Blackman, 2010; Johnson, 2010; Chou et al., 2010). In addition, protein-coding genes (sequence and location), copy number variations, TEs, interactions among heterochromatin, noncoding RNA, and other genetic and epigenetic factors are all involved. Clearly, there are no general gene sets responsible for speciation. Similar to the search for cancer-causing genes, there likely will be unlimited genes or combinations of genes that can be linked to speciation. At the end of the day, the speciation gene concept will become insignificant. That is the exact reason researchers now call for a search for a more fundamental evolutionary mechanism of speciation rather than continuing to focus on individual links. It is interesting to cite John Wilkins’s similar viewpoint (Wilkins, 2008, with permission): There is a widespread tendency of biologists to overgeneralise from their study group of organisms to the whole of biology . For some time now various researchers . have sought speciation genes. These are genes that cause speciation, in a general sense, but the slide appears to be made to the conclusion that there are particular genes in many if not most cases of speciation. I want to consider this now.

2.4 REEXAMINING GENE THEORY PREDICTIONS

75

Phadnis and Orr have found that a particular gene is both a gene causing sterility between hybrids of two Drosphilid subspecies, and a segregation distorter - that is, the gene causes itself to be differentially copied when gametes are formed. A similar process of meiotic distortion occurs in mice as Mihola et al. show. A great result, but how general? In the commentary accompanying the online advance publication, Asher Mullard writes and quotes this: “.With more genes should come greater insight into speciation. Some geneticists wonder whether only particular classes of genes are important in speciation d such as epigenetic genes or segregation distorters d or whether many sorts of genes help to drive species apart.‘What is surprising about the speciation genes that have been identified [so far] is that there is a whole hodgepodge of different kinds of genes with different functions,’ says Nachman. ‘I don’t think we’re going to see [trends] until dozens of genes are identified, and there’s just a handful now.’” Mullard, 2008 But why think that there should be particular classes of genes that contribute to speciation? Sure, there may be genes that are implicated in Drosophilid speciation, or maybe even in insect speciation, but all that matters in sexual species is that some barrier to reproduction exists. It need not be a particular barrier. Consider this - how many ways are there to impede the flow of traffic? Should we expect there are only a couple of ways? Witches’ hats and workers’ signs may be common, but there are sinkholes, burning barriers of demonstrations, collapsed cranes, street parties . and the list could be indefinitely extended. I suspect that speciation is like that e it’s a negative property, and one that can be arrived at in an indefinitely large number of ways .

Surprisingly to many speciation gene researchers, these identified diverse speciation genes are often nonessential and have functions only loosely linked to reproductive isolation (Wu and Ting, 2004)! Many other important issues are associated with the speciation genes identified. First, many specific models do not reflect real-world situations. It is difficult to validate many of the identified speciation genes in natural populations (Noor and Feder, 2006). Furthermore, it has been gradually realized that many of the speciation models are seriously misleading, as the modeling process changes the nature of the systems under study (Chapter 3). Second, hybrid isolation (HI) is not equal to speciation. There is a longlasting concern within the field as to whether hybrid incompatibility genes are directly involved in causing speciation or if they evolved after full reproductive isolation (Noor and Feder, 2006). Not all genes affecting reproductive isolation today had a role in speciation (Nosil and Schluter, 2011). When the HI is severe, the hybrids cannot survive. This type of HI represents an end to a species rather than the beginning of a new one. More importantly, speciation genes do not address the key question of how the initial gene led to genome alterations, a common signature of differentiating species. Specifically, because speciation is a whole genome concept, it is challenging to study the process of evolution by increasing genetic isolation at the gene level.

76

2. GENES AND GENOMES REPRESENT DIFFERENT BIOLOGICAL ENTITIES

Third, regarding the mechanism whereby evolution causes genetic divergence within a species and subsequent incompatibility between species, unexpectedly, much of the genetic/epigenetic divergence does not seem to be driven by ecological adaptation. Rather, it results from a response to genetic mechanisms (the process of genetic alterations and internal genetic conflict), despite the fact that ecological adaptation is considered an important evolutionary mechanism causing genetic/ epigenetic divergence (Noor and Feder, 2006). Interestingly, by searching for a common mechanism rather than diverse speciation genes, using a similar analysis as was used to develop the evolutionary mechanism of cancer (Chapter 3), many genetic isolation cases are in fact linked to genome-level alterations. In other words, most diverse “speciation genes” associated with hybrid incompatibility can also be linked to genomedefined system changes reflected as genetic conflicts and chromosomal incompatibility in general. For example, many cases of “gene” speciation occur in addition to chromosomal changes. Alternatively, the data can be reinterpreted as evidence to support the speciation by chromosomal changes if one thinks outside of the (gene) box. Many investigations have also directly linked the diverse cases of reproductive isolation to chromosomes (Faria et al., 2011). For example, the divergence of noncoding repetitive sequences between species can directly cause reproductive isolation by altering chromosome segregation (Ferree and Barbash, 2009), and rapid heterochromatin evolution affects the onset of hybrid sterility (Bayes and Malik, 2009). On the surface, there are many cases that are less obviously related to specific chromosomal alterations (Maheshwari and Barbash, 2011). These studies have identified “internal genetic conflicts” as key factors contributing to HI. At a fundamental level, however, this internal genetic conflict is best explained as genome system incompatibility rather than individually identified superficial factors. For example, the small DNA sequence differences among closely related species can in fact be associated with a significant genome-level difference. Even identical genes can form different chromosomes. In addition, the gene only represents a tiny portion of the DNA sequences. As the genome defines the system and its dynamic boundary, and speciation is the definitive process creating new biosystems, the genome-level changes should represent the main mechanism of speciation in a majority of cases (Heng, 2009). One important fact is that when genome system changes occur (reflected on a genome scale rather than an individual gene or genetic element scale), they often involve multiple factors, each of which has been discovered by individual researchers who consider “their” factor to be the key. But unfortunately, as was mentioned earlier, these are not the common or universal factors for system changes as there are so many similar and different factors out there which make each of these “significant”

2.4 REEXAMINING GENE THEORY PREDICTIONS

77

factors less so. Furthermore, the majority of these identified factors often have alternate explanations (this is particularly true when researchers are focused on an individual hypothesis and study them in linear models). Again, this situation is very similar to the current cancer research situation where multiple and varying factors have been identified as the likely cause of cancer while the overall molecular understanding is actually becoming increasingly complex and confusing (Chapter 4). The problem is the same: most researchers have ignored the central importance of the genome. The suggestion that chromosome-based genomics is the predominant form of speciation over evolutionary time rather than individual genedefined genetic elements can find its roots long before the establishment of the field of genetics. The nongenetic “chromosomal” viewpoint on speciation can be traced back to George Romanes (1886) and William Bateson (1909), as both were convinced of the importance of “nongenic factors” to use modern terminology (Forsdyke, 2003, 2004). Bateson focused his research on the problem of species. Despite his influence (he introduced the term “genetics” and brought Gregor Mendel’s work to the attention of the English-speaking world), he failed to convince contemporary and modern scholars. This represents yet another interesting example of how the holistic or systematic view often loses the battle with its reductionist rival. According to Forsdyke, Bateson’s position on nongenic factors was often misquoted to support the genic viewpoint. Richard Goldschmidt, one of the most controversial biologists of the 20th century, championed the chromosomal speciation concept and argued that micro- and macroevolution are distinctively different and use different mechanisms. He proposed two main mechanisms for macroevolution: systemic mutation and developmental macromutations. Systemic mutation, he suggested, involved chromosomal changes in speciation where the linear arrangement of the genetic material had a marked impact on the set of reactions created by its immediate products (Goldschmidt, 1940). Unfortunately, and interestingly, his brilliant idea of chromosomal speciation was ignored, rejected, or overshadowed by his hopeful monster idea. For example, the prominent geneticist and evolutionist Theodosius Dobzhansky with regard to Goldschmidt’s theory stated: . systemic mutations . have never been observed. It is possible to imagine a mutation so drastic that its product becomes a monster hurling itself beyond the confines of a species, genus, family or class . The assumption that such a product may, however rarely, walk the earth, overtaxes one’s credulity .. Dobzhansky, 1941

Considering that Dobzhansky worked on chromosomal research for decades and was familiar with the diverse chromosomal differences

78

2. GENES AND GENOMES REPRESENT DIFFERENT BIOLOGICAL ENTITIES

among species, and that Goldschmidt explained systemic mutation as chromosomal alterations, Dobzhansky’s comments are puzzling to me. On the other hand, however, knowing his belief in the accumulation of small genetic changes leading to macroevolution over an extended historical period, such a blind spot is not a total surprise. Dobzhansky clearly became biased when he jumped from chromosome-based research to the gene concept (Heng, 2009). Even these scholars, who appreciated Goldschmidt’s idea of macroevolution, like Stephen Gould, misunderstood Goldschmidt’s systemic mutation and developmental macromutations. According to Gould, I do, however, predict that during this decade Goldschmidt will be largely vindicated in the world of evolutionary biology ...As a Darwinian, I wish to defend Goldschmidt’s postulate that macroevolution is not simply microevolution extrapolated, and that major structural transitions can occur rapidly without a smooth series of intermediate stages ..In my own, strongly biased opinion, the problem of reconciling evident discontinuity in macroevolution with Darwinism is largely solved by the observation that small changes early in embryology accumulate through growth to yield profound differences among adults . Indeed, if we do not invoke discontinuous change by small alteration in rates of development, I do not see how most major evolutionary transitions can be accomplished at all. Few systems are more resistant to basic change than the strongly differentiated, highly specified, complex adults of “higher” animal groups. How could we ever convert an adult rhinoceros or a mosquito into something fundamentally different. Yet transitions between major groups have occurred in the history of life. Gould, 1977

It should be pointed out that, even though both involve morphological changes, the gross morphological changes that occur during development differ fundamentally from the sudden emergence of a species as well as key evolutionary transitions. Morphological changes occurring during development are mainly determined by gene regulation within a given genome system (with high predictability), while speciation occurs mainly through genome alterations (with less predictability). Attributing a major influence to small gene alterations that have a significant impact on the developmental processes without genome alterations cannot explain speciation. Gould wanted to reconcile macroevolution and genemediated microevolution (the relationship between stepwise gene accumulation and the formation of major features) by using developmental genes. While this seems reasonable within gene-influenced evolutionary thinking and in particular the overall stepwise patterns of evolution, it does not solve the basic issue. In fact, this confusion was partially initiated by Goldschmidt himself when he proposed the two opposite mechanisms of macroevolution where one is independent of genes and the other is closely associated with genes. This also reflected his effort to reconcile the confusion between the two genetic levels. When two very different mechanisms were proposed

2.4 REEXAMINING GENE THEORY PREDICTIONS

79

to explain macroevolution, others could simply pick the one that best fits their own ideas. That is what happened. According to Michael Dietrich, The second mechanism that Goldschmidt proposed to explain macroevolution did not depend on the rejection of the classical gene . Goldschmidt proposed that mutations in developmentally important genes could produce large phenotypic effects. He called the results of these developmental macromutations ‘hopeful monsters’ because they were the embodiment of large phenotypic changes that had the potential to succeed as new species. It is important to note that Goldschmidt’s idea of a hopeful monster was not tied to his idea of systemic mutation. In fact, Goldschmidt used hopeful monsters to argue, by analogy, for evolution by systemic mutations. The possibility of mutations in developmentally important genes was intended to make the genetic mechanism of systemic mutation more plausible . Dietrich, 2003

As pointed out by Dietrich, the idea of developmentally important mutations with large effects was accepted by many well-known researchers. In contrast, the more important theory of speciation through system mutation received no support at all (Dietrich 2003). Looking back, the field was looking for answers, but only chose to see ones that fit the popular gene-centric paradigm. The “hopeful monster” has had a life of its own but in a different way than Goldschmidt had originally intended. Rather than making the genetic mechanism of system mutation more plausible, the idea of developmental macromutation has in fact caused many researchers to take their eyes off the ball of chromosomal speciation. It is sad that Goldschmidt has been most remembered as the originator of the idea of the hopeful monster, an idea that in reality represented only a minor component of his concept of macroevolution (Forsdyke, 2003; Schlichting and Pigiucci, 1998). Perhaps, even more unfortunate is that the proposed linkage between genes and developmental changes delayed or diluted the efforts of searching for genome-mediated macroevolution. It is interesting to point out that by focusing on features (phenotypic traits) rather than a genome-defined system, many researchers including Bateson and Goldschmidt had searched for the causes of profound new features during macroevolution and nongenic complementary factors involved in speciation. Some had finally realized that chromosomal alterations themselves are a major force of speciation and the newly gained features are part of the emergent properties of the newly formed genome. Similarly, many secondary features including identifiable genes are the consequence of genome alteration-defined speciation. As will be discussed in Chapter 6, this realization comes as a result of genome-based cancer research. Barbara McClintock’s earlier work, in fact, supports “systemic mutation” (chromosome alteration) causing speciation. By crossing strains of corn, she observed a genetic earthquake triggered by

80

2. GENES AND GENOMES REPRESENT DIFFERENT BIOLOGICAL ENTITIES

introducing broken chromosomes from both parents into a zygote. Neil Jones nicely summarizes McClintock’s genetic earthquake experiments: In 1944 one of the crosses, in which a broken chromosome 9 was contributed by both parents, led to some unexpected results. It triggered a “genetic earthquake” in the kernels of the ear concerned. The embryos in the kernels were undergoing the chromosome BFB (Chromatid breakage-fusion-bridge) cycle in their early development, and when the seeds were germinated and grown McClintock witnessed an enormous burst of genetic instabilities among the progeny plants. Jones, 2005

Among these products of “genetic earthquake,” 87 of 677 kernels did not germinate, 134 died as seedlings and 73 died as young plants. 383 plants (>57%) grew to maturity and were capable of self-pollination (150 of which were used for meiotic analysis) (Jones 2005). The “genetic earthquake” experiment revealed the following: 1. Extensive genome rearrangement was involved. 2. Despite death at different stages, some plants can survive with reproducibility. 3. stable phenotypes were noticed. The most important message from the genetic earthquake experiment is that the genome rearrangements can not only produce diverse features and phenotypes but also generate new genome systems. Despite the fact that many of these new genomes are unstable and associated with massive cell death, infertility, and growth abnormalities, they represent greatly enhanced potential for new speciation. At the 1980 Miami Winter Symposium, McClintock stated (McClintock, 1980): There is little doubt that genomes of some if not all organisms are fragile and that drastic changes may occur at rapid rates . It is reasonable to believe that such genome shocks are responsible for the release of otherwise silent elements, which can then initiate changes to overcome disruptive challenges. Since the types of genome restructuring induced by such elements know few limits, their extensive release, followed by stabilization, could give rise to new species or even new genera.

Following McClintock’s lead, other molecular biologists also observed evidence linking drastic morphological changes to genome alterations. According to Nina Fedoroff who cloned jumping genes: It is as if transposable elements can amplify a small disturbance, turning it into a genetic earthquake. Perhaps such genetic turbulence is an important source of genetic variability, the raw material from which natural selection can sift what is useful for the species. Fedoroff, 1984 I cannot say for sure that transposable elements are useful to the corn plant, I do know from experience, however, that corn lines with too many active transposable elements are in trouble: Some of their offspring look more like cabbages than corn plants. If I try to think like a corn plant (although sometimes I’m convinced that I think

2.4 REEXAMINING GENE THEORY PREDICTIONS

81

more like a cabbage), I conclude that my best bet is to keep my options open by hanging on to some of these principles of radical change, but shackling them as securely as possible. Fedoroff, 1992.

The message is clear that TEs can induce drastic morphological changes. Based on McClintock’s hypothesis and the genome theory (Chapter 7), it is clear that a high level of activity of TEs can be induced under stress. Recent studies demonstrate that the stress condition does induce TE activity (Wilkins, 2010). However, sex-mediated genome constraints purify most genomes with each generation, defining the extent to which chromosomal restructuring contributes to organismal evolution. The less stringent sexual filter in plants might also allow higher levels of genome dynamics, as drastic genome alterations are better tolerated in plants. In fact, polyploidy is pervasive in plants (some estimates suggest that 30%e80% of living plant species are polyploidy). In contrast, polyploidy mammals are rare and most often result in prenatal death. It is thus realized that the jumping gene, in fact, is not really a genebased concept, but rather a concept of genome reorganization as it is really about the changing relationship of genes within a genome. When the same gene jumps, on the surface, it can be traced to a specific gene function, but in all actuality something more profound has occurredethe genome context has changed by reorganizing the physical interactive relationship between genes within the genome. This leads to a new genetic package with the same genes in an altered organizational topology. Thus, in some well-controlled experimental systems, it is possible to link newly formed morphological features to distinct individual genes. In reality, system alterations occur, mediated by chromosomal alterations that affect many genes, and direct linear relationships between genotype and phenotype can be illustrated only in exceptional cases. No wonder McClintock had such a high level of appreciation of the genome rather than the gene, even though her work led to the concept of the jumping gene! As it will be discussed in Chapter 4, the jumping gene likely contributes to fuzzy inheritance and changes the chromosomal coding in addition to interrupting specific genes. Of course, the large-scale karyotype changes represent more powerful means for speciation. Michael White has estimated that over 90% of speciation events are accompanied by karyotypic changes (White, 1978). The significance of chromosomal speciation is further emphasized by Max King (King, 1993). Together, George Romanes’ nongenic torch was passed on by a series of scholars including William Bateson, Nicolai Kholodkovskii, Richard Goldschmidt, Barbara McClintock, Michael White, Max King, and Donald Forsdyke. The nongenic factor really was the result of the influence of the chromosomal factor and should now be referred to as the genome context.

82

2. GENES AND GENOMES REPRESENT DIFFERENT BIOLOGICAL ENTITIES

Because of the highly heterogeneous nature of biology, it is likely that some genes can play an important role in speciation. The cases of gene speciation, however, represent exceptions compared with chromosomal or genome speciation, as genome-level alterations play a much more dominant role in macroevolution. New testing has shown that some known speciation genes in yeast do not play a major role in yeast speciation, and the likely key factor of speciation is the diverse sequences that interfere with meiosis and the chromosomal behavior of the system (Greig, 2007). Regarding the previously noted contrary cases of chromosomal speciation (there are some cases that are inconsistent with the chromosomal speciation theory, as different species can share the same karyotypes and some species that produce sterile hybrids without obvious changes in chromosome arrangements), one explanation is the lack of resolution of classic karyotype analyses methods. Using advanced cytogenomic methods, many intrachromosomal rearrangements can now be detected that distinguish these species (Skinner and Griffin, 2012). Large-scale copy number variation can potentially contribute to this phenomenon as well; the key is to change genome compatibility as defined by sexual reproduction (see Chapters 5 and 6). In addition to large-scale copy number variation contributing to speciation, speciation can be driven by sequence-level change; however, this change must be on a large scale, effectively working as genome-level change. For instance, in the closely related yeast species such as Saccharomyces cerevisiae and Saccharomyces paradoxus, there are no identified gross chromosomal translocations, but about 15% of sequences are divergent. The genome sequencing divergence can affect fertility when two species are hybridized, suggesting that it is the chromosomal or genome difference that matters despite the fact that there are no visible chromosomal alterations at the karyotype level in these exceptional cases. It is not surprising that significance of chromosomal speciation is obfuscated when viewed through the gene-centric lens. In their 1998 article, Goyne and Orr listed some of the empirical problems of chromosomal speciation (Coyne and Orr, 1998). These problems are in fact more relevant to the gene-speciation concept. For example, they asked: “even in species hybrids whose chromosomes fail to pair during meiosis, we do not know whether this failure is caused by difference in chromosome arrangement or difference in genes.” So what is the “conclusive evidence” that genes rather than chromosomes cause reproductive isolation? Why one over the other? Once again, such drastically different conclusions suggest the importance of using a correct framework for data interpretation and calls for further study of the differences between genes and genomes. Interestingly, increased researchers are favoring chromosomes over genes for speciation (Ye et al., 2007; Heng, 2015). Even those who study

2.5 THE CONFLICTING RELATIONSHIP BETWEEN THE GENE

83

how new gene formation leads to new phenotypes realize that the insertion of a new gene into a network can change the gene network by altering the network topology. This change can create new pathways by rewiring previous gene networks, ultimately leading to new phenotypes (Chen et al., 2013). What people have not realized is that karyotype changes bear a much greater impact on the network topology, as the physical order of genes along and among chromosomes functions as the physical platform for gene networks (Chapter 4).

2.5 THE CONFLICTING RELATIONSHIP BETWEEN THE GENE AND THE GENOME The above examples forcefully demonstrate the conflicting interpretations of evolutionary mechanisms based on gene-centric and genome-based genetic concepts. To understand genome-based genomic thinking, one first needs to illustrate that the genome represents a unique level of genetic organization. Demonstrating that there is a conflicting relationship between the gene and genome and that genes and genomes represent a distinct level of genetic organization will help to eliminate the confusion between the two. The cooperative and conflicting relationship between the gene and the genome is best illustrated in Chapters 5 and 6. The following are some descriptions of experiments that show that the genome is a unique level of genetic organization that functions separately from the sequence level.

2.5.1 Chromosomal Position and Loop Size: Overall Genomic Architecture Constrains Local Structures During the gene era, attention was focused on cloning and characterizing individual genes. Much is known about molecular mechanisms of genes and their regulation under experimental conditions, but on the other hand, there is no clear idea how genes are organized within the nucleus. It is known that DNA is organized into chromatin that forms loops within the nucleus that attach to the nuclear matrix, but specifically, how genes and regulatory elements are located along chromatin loops, what the precise loop size is, and how loop behavior contributes to gene function, particularly, within a genetic network are not clear. This situation seems not to bother the majority of molecular researchers at all, as the gene has been viewed as a powerful independent information unit. In addition, molecular biology has been deeply influenced by biochemists, many of whom consider cells to be mini test tubes and hence they are not interested in studying cell-based systems. Furthermore, it is much easier

84

2. GENES AND GENOMES REPRESENT DIFFERENT BIOLOGICAL ENTITIES

to perform elegant experiments on the fundamentally simplified gene systems without chromatin context and “prove” a hypothesis that defines particular genetic elements. An example of the influence of the importance of genome-level organization is given by Peter Moens who was the first to use fluorescence in situ hybridization (FISH) to study meiotic chromosomes (Moens and Pearlman, 1989). Using the approach of multicolor DNAeprotein codetection (Heng et al., 1994; Heng et al., 1996), the structure of various DNA inserts within host meiotic chromosomes was compared, including phage DNA and human DNA inserts within mouse meiotic chromosomes. Meiotic chromosome structure is based on the synaptonemal complex (SC) which forms the backbone of the chromosome and anchors loops of DNA. It was shown that inserted phage DNA formed much bigger loops than somatic DNA. The drastic “looping out” morphology of phage DNA suggested that certain DNA sequences that serve as anchors are important to regular chromatin loop size formation. The same type of DNA inserts along different positions of the meiotic chromosome was further compared using transgenic mice. Interestingly, the meiotic loop size was determined by the position of the insertion site along the chromosome. Specifically, the closer to the telomere, the smaller the chromatin loop size of the insert (Fig. 2.1). Similar cases can also be observed by comparing the loop size of telomeric sequences located at interstitial positions and at the ends of the chromosome, respectively. In both Chinese hamsters and golden hamsters, telomeric sequences can be detected at both telomeres and interstitial positions because of

FIGURE 2.1 Transgenic insertions on meiotic chromosome core. The diagram summarizes the loop sizes of various foreign DNA insertions along mouse meiotic chromosome cores (synaptonemal complex [SC]). Phage DNA insert (P) does not form normal size loops possibly because of a lack of anchor sequences. Human DNA insertions (red color) form loops on mouse chromosomes similar to mouse endogenous DNA loops (blue color, M); however, loop size is dependent on insertion sites. Loop size is smaller when insertion occurs near a telomere (green loop, endogenous mouse telomere), Ha, and Hb (peritelomeric human inserts). In contrast, loop size is larger when insertion occurs at non-peritelomeric regions (Hc).

2.5 THE CONFLICTING RELATIONSHIP BETWEEN THE GENE

85

chromosome fusions that happened during natural evolution. The telomeric loop is extremely small at the chromosomal end, while the interstitial loop of the telomeric sequences is large, similar to other interstitial sequences. Together, these observations demonstrated that the size of chromatin loops is also determined by the position of the integration site rather than just by the DNA sequence itself (Heng et al., 1996). This conclusion that genomic topology constrains the loop size of DNA has gained further support from studies of the loop size of human DNA insertions within yeast meiotic chromosomes. It is known that the loop size of human chromosomes is about 20 times longer than yeast. However, when inserted into yeast, the loop size of human DNA formed small loops similar to the host yeast suggesting that in yeast meiosis, human and yeast DNA adopt a similar organization within the chromatin along the pachytene chromosome cores. More interestingly, the recombination rates of the human segments are also increased to match those of yeast, possibly also contributing to the loop size change in the host’s environment (Loidl et al., 1995).

2.5.2 Gene Expression and Chromatin Loops: Chromatin Loop Domains Constrain Gene Function Gene expression control by promoters and enhancers has been shown in various in vitro systems and particularly in simple expression systems. These experiments have given the impression that only the genes and directly adjoining sequences themselves matter. On the other hand, it is well known that gene silencing can be achieved by positional effects, and there are many inconsistencies even within the same experimental systems as attested by transgenic mice generation. To understand how higher levels of genetic systems operate, such as whether or not chromatin can constrain the function of genes, chromatin loop behavior of various transgenes has been studied in transfected cells and transgenic mice. Specifically, the biological significance of nuclear scaffold/matrix-attachment regions (S/MARs) and how loop anchors impact on gene expression were investigated. The concept of the nuclear matrix has been a highly debated subject over the past half a century. Since its modern introduction by Berezney and Coffey revealing that the nuclear matrix is a proteinaceous skeleton in the nucleus that was resistant to nuclease digestion (Berezney and Coffey, 1974), this field has made tremendous progress by linking the nuclear matrix to nuclear architecture and chromatin package as well as largescale gene regulation. Because of its complexity, however, many questions remain with respect to this dynamic structure and its function. One

86

2. GENES AND GENOMES REPRESENT DIFFERENT BIOLOGICAL ENTITIES

criticism was based on the incorrect viewpoint that a nucleus is a bag of biochemical solutions where free diffusion occurs and requires no nonchromatin structure. Such a viewpoint also lacks the appreciation of bioheterogeneity. To determine the biological significance of S/MARs, S/MAR behavior and gene expression dynamics must be investigated. By introducing varying copies of S/MAR-containing constructs, various transgenic mice and transfected cell lines were established. As expected, the integrated S/MAR fragments are tightly anchored on the nuclear matrix illustrated by FISH detection following DNase I digestion to eliminate the chromatin loop portion not tightly associated with the nuclear matrix. Surprisingly, however, FISH analysis also showed that the transgenic somatic S/MARs were present in both the loop portion and the nuclear matrix regions when multiple copies of gene-S/MAR constructs were introduced. In other words, not every copy of the same S/MAR is used as an anchor. Of the 12 copies of tandemly arrayed transgenes of 40 kb human protamine genes, only 1 copy is expressed. In addition, only one copy is associated with the nuclear matrix and the rest of the copies reside on the loop portion. The same phenomenon was confirmed by transfected DNA fragments encompassing the human interferon (IFN)-b (huIFNB1) gene that contained the endogenous 50 S/MAR element. To further document such anchor dynamics, two adjacent bacteria artificial chromosome (BAC) probes were labeled with two distinct colors to “paint” portions of a given endogenous chromatin loop that contains many SAR/MARs. If the anchor of this loop is fixed, then the color configuration of the two probes should be fixed among cells. In contrast, if the anchor is dynamic, the configuration should vary. When the color configuration was compared among cell populations representing different stages of the cell cycle, the average pattern of color configurations was different (Fig. 2.2). These observations demonstrate that a key feature of chromatin loop anchors is that they are dynamic and context-dependent. A dynamic chromatin domain model of transcription regulation (a pulling model) was proposed based on the following information and reasoning regarding the S/MAR-mediated mechanism (Heng et al., 2000, 2001a, 2004a). (1) The chromatin loop domain is an integral component of the transcriptional regulatory mechanism associated with the nuclear matrix. (2) The selective use of S/MARs appears to be directly linked to the movement of the loops that represents a key component of the functional mechanism of chromatin packaging and gene regulation. (3) Chromatin loops can display specific yet flexible behavior, which correlates the number of S/MAR anchors and gene expression status (Fig. 2.3). The concept of the dynamic anchor reconciles many seemingly contradictory attributes previously associated with S/MARs. For example,

87

2.5 THE CONFLICTING RELATIONSHIP BETWEEN THE GENE

FIGURE 2.2 Dynamic configurations of loops illustrated by two-color fluorescence in situ hybridization (FISH). The diagram summarizes the two-color FISH result of two adjacent bacteria artificial chromosome clones representing a 300 kb genomic region on a nuclear halo. The released chromatin loops form the halo (blue color) surrounding the nuclear matrix. The V-shaped configuration of both red and green color probes were anchored on the nuclear matrix (center panel). The linear-shaped configuration with only one probe (red or green) is anchored on the nuclear matrix, whereas the adjacent probe was not. The color configuration is not fixed suggesting that the anchor site of the chromatin loops is not static on the nuclear matrix.

Nuclear Matrix Machinery

Unbounded MAR

Gene

Machinery

Gene

Forming Association (Initiation)

Pulling (Elongation)

Gene

Machinery Bounded MAR

Disassociation (Inactivation)

FIGURE 2.3 Model for the selective use of nuclear scaffold/matrix-attachment region (S/MAR) for transcription/replication regulation. The left side of the figure shows a gene located on the loop with an S/MAR in close proximity. When functional demands require the specific association of this gene with the transcriptional machinery located on the nuclear matrix, the S/MAR moves the gene to the nuclear matrix, thereby initiating gene expression (middle of figure). Following initiation, the gene is pulled in through the transcriptional machinery, allowing transcription to occur (right side of figure). There are two types of S/MARs: functional S/MARs serve as mediators to bring genes to the nuclear matrix to be transcribed and structural S/MARs serve as anchors, which are less dynamic compared with functional S/MARs. Reproduced/adapted with permission from citation Heng et al. (2004a).

88

2. GENES AND GENOMES REPRESENT DIFFERENT BIOLOGICAL ENTITIES

now we know that S/MAR anchors are necessary but not sufficient for chromatin loops to form, there is a direct link between S/MARs behavior/ function and gene expression, and finally, the function of genes is constrained by dynamic chromatin behavior. This message challenges the principal power of individual genes. This dynamic model has now received wide acceptance but initial resistance came from two opposite viewpoints. Those who believe that cells were no more than biochemical test tubes did not believe the concept of a nuclear matrix, and those who believed the nuclear matrix to be a static concept did not accept the dynamic features of it. From a genome perspective, this publication also emphasizes the importance of the chromatin topology and its dynamic relationship constraining individual genes. Clearly, the reality is that the function of genes is defined by a higher level of genetic organization. In recent years, many novel features of chromosome architecture have been revealed using Hi-C technology (chromosome conformation capture which comprehensively detects chromatin interaction in the nucleus) for genome-wide mapping and analysis (Schmitt et al., 2016). Specifically, the topologically associated domains (TADs) have been identified as an important feature for gene regulation. It is crucial to integrate the information from TADs with the nuclear matrix, as well as the chromosomal coding system (Chapter 4).

2.5.3 Loops/Chromosome Length and AT/GC Composition: Why Little Clarification Comes from the Highest Resolution To study the process of meiotic chromosome pairing, a new method was introduced to combine multiple color FISH and protein co-detection using spectral karyotype (SKY) technology. SKY can trace each chromosome with a unique color, while protein detection can study SCs (highlighted by detection of antibodies specific for SC) (Fig. 2.4). The successful application of SKY and SCeprotein co-detection has led to a surprising discovery: the length of mouse meiotic chromosomes (SC) does not precisely correspond to the length of the corresponding mitotic chromosomes (Fig. 2.5)! For example, in mitosis, the length of mouse chromosome 4 exceeds that of chromosome 5, and chromosome 14 is longer than chromosome 15. But mouse meiotic chromosomes when analyzed by our SKYeSC co-detection revealed that the length of chromosomes 5 and 15 were, respectively, much longer than chromosomes 4 and 14. The length of the SC has been used for karyotyping in the past, assuming that there is a precise correlation between relative mitotic length and relative meiotic length. This assumption has not been challenged previously, as there was no effective method to identify each meiotic

89

2.5 THE CONFLICTING RELATIONSHIP BETWEEN THE GENE

FIGURE 2.4 Spectral karyotype (SKY)eprotein co-detection on mouse meiotic chromosome cores (synaptonemal complex [SC]). SC proteins were labeled with FITC (fluorescein isothiocyanate)-conjugated SC antibody (left panel), while each individual chromosome was labeled with a unique SKY paint for specific identification (right panel). The length of the SC can then be measured for each specific chromosome. Reproduced with permission from citation Ye et al. (2001).

Relative SC length of Mouse Chromosomes 8 Relative SC Length

7 6 5 4 3 2 1 0

1

2

3

4

5

6

7

8

9 10 11 12 Chromosome

13

14

15

16

17

18

19

FIGURE 2.5 The inconsistency between the length of mitotic and meiotic chromosomes. Relative average lengths of mouse meiotic chromosomes (measured by the length of the synaptonemal complexs) are shown. Chromosomes 5, 7, 11, 15, and 17 are longer than expected based on the length of mitotic chromosomes, suggesting a packaging disparity between mitotic and meiotic chromosomes.

90

2. GENES AND GENOMES REPRESENT DIFFERENT BIOLOGICAL ENTITIES

chromosome for most species and to precisely compare the length of each SC (Ye et al., 2001). This finding challenges the “ABCs” of biology, as chromosome 4 must be longer than 5, and the length variation observed must be attributed to a slide preparation artifact. To ensure that the proper ABCs of biology were followed, SKY data were examined using FISH-specific probes and SC co-detection (Ye et al., 2006). These experiments were repeated again and again, and each time during meiosis, chromosome 5 was longer than 4 and 15 was longer than 14 (Fig. 2.6). This important observation was confirmed by others. By using the immunofluorescence method to examine exchanges in human spermatocytes, Terry Hassold’s group reported remarkable variation in the rate of recombination within and among individuals. They then confirmed that in humans and mice, this variation was linked to differences in the length of the SC. Therefore, the SC reflects genetic rather than physical distance (Lynn et al., 2002). Another 6 months later, Terry Ashley’s group showed that the mouse SC length is strongly correlated with crossover frequency and distribution. Although the length of most SCs

FIGURE 2.6 Synaptonemal complex (SC) length of chromosomes 14 and 15 in mouse meiotic cells. Confirmation of spectral karyotype data using proteineDNA-co-detection. Chromosomes 14 and 15 were specifically labeled by rhodamine (red) and FITC (green). SC was detected with an FITC-conjugated antibody. The left image was captured using a dual-color filter specifically showing both the specific label of chromosome 14 and its SC. The right panel is the same image taken with an FITC filter (showing the chromosome 15 paint and its SC). The SC length of mouse chromosome 15 is obviously longer than 14 Reproduced with permission from citation Ye et al. (2006).

2.5 THE CONFLICTING RELATIONSHIP BETWEEN THE GENE

91

correspond to that predicted by their mitotic chromosome length-defined rank, several SCs are longer or shorter than expected, with corresponding increases and decreases in MLH1, a mismatch repair protein that colocalizes with recombination foci (Froenicke et al., 2002). But what is the mechanism associated with the differential packaging between mitotic and meiotic chromosomes? Why is there disagreement between genetic distance (length of meiotic chromosomes) and physical distance (the length of mitotic chromosomes)? By comparing the expected SC relative length based on the relative physical length of the chromosome (by calculating the ratio of total base pairs of DNA on each mitotic chromosome among the base pairs of all autosomes of mouse genomes) and the actual relative length determined from our SKY and FISH experiments, all mouse chromosomes were divided into three categories. Most of them had equivalent SC representation. That is, the expected relative length was equal to the actual relative length. Some were in the SC overrepresented group, including chromosomes 5 and 15, where the actual relative SC length is longer than the expected relative length. Some were in the SC underrepresented group including chromosomes 4 and 14. What were the structural differences among these three groups of chromosomes, and in particular, what was special about the “outliers” that disrupted the expectations? It is likely that the AT/GC content might contribute to this phenomenon as it is known that the AT/GC content is associated with the distribution of genes along the chromosome. Luckily, the mouse gene sequencing had just been completed and the access to AT/GC information for each chromosome was available. Once the AT/ GC content for each mouse chromosome was compared, it became quite clear that there is a very nice correlation between SC representation and the GC content of the chromosome. In chromosomes 4 and 14 that have a much shorter SC, the GC content was among the lowest. In contrast, in chromosomes 5 and 15, the GC content was among the highest. In the majority of “normal” chromosomes, there is average GC content. This was an unbelievably simple and exciting correlation, but was it real? It turns out that many groups had already compared the relationship between the genetic recombination rates and GC content. They all failed to establish such a highly significant correlation, despite the use of highresolution whole genome scanning methods. It was true that by using cutting-edge technologies, it was possible to compare genetic recombination and GC content in observation windows of 5e10 kb. However, by ignoring the entire chromosome, the trees are lost to the forest. The key information is that the entire chromosome functions as a unit during genetic recombination and not simply isolated specific parts. The next question was why does the GC content correlate to the differential length of the SC? Based on previous data that the anchors of

92

2. GENES AND GENOMES REPRESENT DIFFERENT BIOLOGICAL ENTITIES

loops are highly dynamic, that loop size can change depending on the chromosome architecture, and that the space of loops seems fixed along the chromosome core, a model was proposed where the overrepresented SC formed small chromatin loops. The same amount of DNA when forming longer loops might require shorter SC lengths. This is similar to building a house with a variable base: if the land area is large, building a ranch is not a bad idea. However, on a smaller site, to attain the same square footage, one has to build a house with two or even three stories. Using BAC clones that represented both AT-rich and GC-rich regions of the mouse genome, the chromatin loop size was examined using the DNAeprotein co-detection method. It was observed that the GC-rich clones do form smaller loops! In addition, in the telomeric region, even GC-rich chromatin loops were small! A new meiotic chromosome model was finally built (Heng et al., 2004b). Interestingly, we later found that Nancy Kleckner and her colleagues had suggested a similar model to explain loop size and SC length (Kleckner et al., 2003). Previous efforts had failed to link the GC content to genetic maps by using large-scale high-resolution data with sophisticated mathematical analyses. What these analyses had missed was the basic biological concept that chromosomal pairing is based on the entire chromosome and not simply a small region of the DNA. An example of how basic Biology 101 can trump the most sophisticated analyses! It has now been realized that many factors including the GC content, differential chromosomal locations, different sex, and different cell types (somatic cells or meiotic cells or sperm/eggs) can impact the chromosomal package, reflecting the plasticity of the genome-defined package, but as different species display different boundaries, this plasticity is also constrained. The key message here is as follows: choosing the correct level of genetic organization for biological studies is the most important precondition when searching for biological truths, as biological functions are very different depending on the particular level of organization. The use of higher resolution technologies can also lead to the wrong biological conclusion. This is the very rationale needed to search for genome-based biological concepts in a gene-dominated research world.

2.6 GENOME CONTEXT DETERMINES GENE FUNCTION The above examples are important to understanding the relationship between DNA/gene and chromosome/genome. The genome-level function is not simply achieved by “adding up” the functions of individual genes, as there is no purely quantitative relationship between the gene and genome. This has ultimately been proved by the Human

2.7 ACTION NEEDED

93

Genome Project: knowing the function of many individual genes has not yielded an understanding of the blueprint coded by system inheritance (see Chapter 4). The current cancer genome sequencing project has failed to detect the long expected common cancer drivers for most cancers. However, it has revealed that one universal feature of all cancers is elevated genome alterations (see Chapter 3). The most logical conclusion is that genome context defines gene function both in normal individual and cancer. This point will be systematically and continuously articulated throughout the entire book. The gene and genome dominate two different eras of inheritance studies. Because the gene and genome are different entities, understanding their own functions as well as the collaborative and conflicting relationship between the two hold the key for future genomics. What separates the gene and genome is the “parts” and “whole” relationship. While the interaction among genes can emerge into the genome’s function, the high level of genome context serves as a constraint for individual genes. Because each individual gene’s function needs to be fulfilled within the genome context, most gene functions are not self-determined but represent potential function as defined or selected by genomee environment interactions. Thus, the genome is the platform to organize and select gene response. As Barbara McClintock has insightfully pointed out, the genome (a highly sensitive organ of the cell) monitors genomic activities and corrects common errors, senses unusual and unexpected events, and responds to them, often by restructuring itself (for more, see Chapters 3 and 4). Such relationships can be used to explain many puzzling phenomena, such as why many genes can be experimentally eliminated without obvious phenotype changes, why “normal” individuals can have many gene mutations, why similar genes can form different genome-defined species, and why genome-level alterations are crucial for macroevolution while gene-level alterations are needed for microcellular evolution.

2.7 ACTION NEEDED Prior to finishing this chapter, some common feelings/responses need to be briefly addressed. Some readers might say that the differences between chromosomes and genes are obvious. “We appreciate the importance of the chromosome, of course. What you are doing is just beating a dead horse.” But in fact, the majority of researchers do not understand the importance of the genome. If they do, they would have already changed their research direction or priority. Just looking around, how many scientists have been focused on the gene-based research? Even today, with so many

94

2. GENES AND GENOMES REPRESENT DIFFERENT BIOLOGICAL ENTITIES

whole genome profile platforms, the majority of researchers are still focusing on understanding individual domainsdspecific genes. We need to be honest: there is no much needed framework to organize these genetic parts. Other readers might suggest that the limitations of the gene theory are obvious. “We know it has problems, but why not just gradually improve it like we are doing?” Well, this difference depends on our view of how science works. Clearly, we all have the same goal to search for the truth, but each approach is drastically different. Many might want to improve science by gradual modification within the current gene-centric framework. We, in contrast, happen to agree that a different framework works much better than the field’s current one, and many paradoxes can only be solved by a new genome-based paradigm. The good news is that we understand why one may think this way. I used to be passionate about the gene as well. In fact, I was trained by one of the top gene scientists, Dr. Lap-Chee Tsui, the man who cloned the cystic fibrosis gene.

C H A P T E R

3

Genome Chaos and Macrocellular Evolution: How Evolutionary Cytogenetics Unravels the Mystery of Cancer 3.1 SUMMARY Following decades of extensive gene mutation-focused research, it is now clear that the successful story of chronic myeloid leukemia (CML) represents an exception: the gene mutation theory of cancer does not apply to the majority of cancer types, where multiple levels of genomic heterogeneity, especially at the chromosomal level, dominate. This reality has been confirmed by the current Cancer Genome Project. In this chapter, after discussing various proposed alternative theories, the journey to search for the genome theory of cancer evolution is described. This journey links previously ignored stochastic genome alterations to genome instability, discovers the two phases of cancer evolution, and illustrates the mechanism behind and significance of genome chaos. The syntheses of these findings lead to the establishment of the evolutionary mechanism of cancer, which unifies the large number of diverse molecular mechanisms identified in cancer research. As a conclusion, a new model of cancer evolution is proposed, which is characterized by genome re-organization-mediated macroevolution to create new cancer genomes, followed by cancer gene-mediated microevolution to promote cancer population expansion, ultimately leading to clinically significant cancer.

Genome Chaos https://doi.org/10.1016/B978-0-12-813635-5.00003-3

95

Copyright © 2019 Elsevier Inc. All rights reserved.

96

3. GENOME CHAOS AND MACROCELLULAR EVOLUTION

3.2 SOS: WE NEED A NEW CONCEPTUAL FRAMEWORK FOR CANCER RESEARCH Nearly half a century has passed since the Nixon administration initiated the “war on cancer” with the National Cancer Act of 1971. During this period, the National Cancer Institute (NCI) alone spent over $120 billion on research. The price tag of cancer care is much higher (national estimates peg the annual cost of cancer care at about $125 billion in 2010 and $173 billion in 2020) (Harmon, 2011; Mariotto et al., 2011; NCI budget). Hopes for a cure have been high, especially with promises from the cancer genome sequencing project. The dominant cancer gene mutation theory suggests that if we can identify and target common cancer driver genes, the war will be won. That probably was the reason the thenNCI Director, Andrew von Eschenbach, issued a challenge in 2003 “to eliminate the suffering and death from cancer, and to do so by 2015” (Kaiser, 2003). The Director’s declaration that victory was close at hand was supported by the American Association for Cancer Research and sustained by the confidence within the field in all the newly available cutting-edge molecular profiling methodologies, as well as data from the human genome projects. Now, 4 years have already passed since the 2015 deadline and the goal is still but a dream. Current strategies and the concept they are based on, the gene mutationebased cancer theory, are facing the ultimate challenge. In particular, the cancer genome sequencing project not only failed to identify the Achilles’ heel of cancer (the main promise of this project) but also generated much confusion in the entire field (Heng, 2015, 2017a). Rather than having a serious discussion about the current status and future direction of cancer research, new promises are continually made. In 2012, MD Anderson Cancer Center launched the “Moon Shots Program,” a new call to cure cancer in 10 years. In 2016, President Obama announced the National Cancer Moonshot Project. Similarly, with increased popularity, precision medicine has promised more hope beyond the cure of cancer (Collins and Varmus, 2015). Having watched so many bold promises come and go during the war on cancer, it is rather puzzling why the scientific community continuous to set up these unrealistic deadlines which certainly will fail. Surely, every 5e10 years, there will be a new trend for focusing on different features of cancer. Each time, a newly developed technology or experimental system generates initial excitement coupled with feel-good stories to highlight a new paradigm. To date, most of these gamechanging ideas have broken their promises. Some work well in more linear models with defined experimental conditions, but the success fails to translate to the clinic; others work well in some exceptional cancer

3.2 SOS

97

types but fail in the majority of cancer types or cases. Life is, however, extremely complicated, and curing cancer is not as straightforward as an engineering project. We are now no longer certain about what the common cause of cancer is when analyzing the massive diverse data generated from sequencing. It is interesting to point out that most of these new approaches were introduced in the past half-century, including largescale sequencing, and are based on the same paradigm of the somatic mutation theory of cancer. With a quick check of the calendar, the overly optimistic view of many promises is obvious. The deeper questions we should ask include: “What went wrong with those powerful predictions?” “Did those predictors know something we don’t?” “Why has the impressive amount of -omics data failed to deliver?” “Could it be that the cancer gene mutation theory that supported these mispredictions was wrong in the first place?” “If so, should we search for a new conceptual framework that fits the reality of cancer?” Recent years, many more questions have been asked based on the results of various cancer genome sequencing projects (Heng, 2007a, 2009, 2015, 2017a). To briefly illustrate that the current dominant theory of cancer is problematic, we need to reexamine the current status of cancer research with two famous case studies. The following analyses will demonstrate the fact that, despite that most cancer researchers are very familiar with these cases, few have truly appreciated the profound implication of these cases to current cancer theory and practice.

3.2.1 Exceptions Versus the General Rule: The Chronic Myeloid Leukemia Story For most molecular cancer researchers, questioning the cancer gene mutation theory is out of the question. The gene-centric concept dominates in today’s biology, and there is no appetite for other compelling theories. Part of the reason is that most current active researchers were trained in the school of molecular genetics, where the causative relationship between gene mutations and disease phenotypes represents one of the key principles. Few have the willingness or courage to question their own knowledge basis, which defines who they are. When occasionally challenged, the popular defense for the cancer gene mutation theory is evidenced by the success story (perhaps the only powerful success story supporting the cancer gene mutation theory) of chronic myeloid leukemia (CML) (Rowley, 1998; Horne et al., 2013b). CML is a hematological disorder characterized by the increased and unregulated growth of predominantly myeloid cells in the bone marrow and the accumulation of these cells in the blood. CML can be described as

98

3. GENOME CHAOS AND MACROCELLULAR EVOLUTION

three successive phases. Chronic phase chronic myeloid leukemia (CMLCP) can last for years. CML-CP goes through an accelerated phase on the way to a blast crisis phase, resembling acute myeloid leukemia (AML) or lymphoid leukemia. Within the last phase of CML evolution, the survival time for patients is often under a year (Assouline and Lipton, 2011). CML was the first cancer that was linked to a specific chromosomal abnormality. A commonly altered chromosome was identified among CML patients, which was named the Philadelphia (Ph) chromosome by Peter Nowell and David Hungerford in 1960. This work represented a turning point in cancer biologydit has pinpointed chromosome abnormalities as a cause of cancer. A detailed cytogenetic characterization with chromosome 9 and 22 translocation was later achieved in 1973 by Janet Rowley with chromosomal banding method (Nowell and Hungerford, 1960; Rowley, 1973). Further molecular analyses by several investigators in the early 1980s illustrated that the 9/22 translocation resulted in the formation of a new Bcr/Abl fusion gene (with the 30 part of the ABL gene in the breakpoint on chromosome 9 and the 50 part of the BCR gene in the breakpoint on chromosome 22). Bcr-Abl codes for a constitutively active tyrosine kinase, enhancing proliferation and growth factor independence while reducing apoptosis, which results in uncontrolled cell proliferation (Jabbour et al., 2010; Zhang and Rowley, 2011). Because BCR-ABL kinase is not shared with normal somatic cells, it was reasoned that if this specific cancer gene could be targeted, CML could be cured. A small-molecule compound now referred to as imatinib was developed to inhibit the activity of the BCR-ABL kinase (Druker et al., 1996). Imatinib blocks the ATP-binding site of BCR-ABL protein, suppressing kinase signaling and inducing cell death. Imatinib therapy generates remarkable results for CML-CP patients, with a 7-year overall survival rate of 86% (Jabbour et al., 2010). Imatinib is currently the recommended first-line therapy for CML-CP patients. The overwhelming, inspiring successful story of CML, from the initial identification and characterization of the Ph chromosome to fusion gene analysis and then molecular targeting (with these milestones representing early successes of these approaches), has impacted the entire field of cancer research. Linking CML to a specific chromosome aberration confirmed that cancer is a genetic disease and the chromosomal changes are the causative factors. This achievement has greatly stimulated the field of cancer cytogenetics resulting in the identification of a large number of chromosomal abnormalities that are associated to specific cancers. According to the online 2016 Mitelman Database, there are over 66,479 cases of chromosome aberrations in cancer cataloged (De Braekeleer et al., 2016).

3.2 SOS

99

The disease progression of CML has promoted the concept that cancer represents an evolutionary process where the accumulation of genetic change through waves of clonal expansions is the key driver (Nowell, 1976). Three decades later, a new wave of evolutionary studies of cancer finally pushed into the main stream of cancer research. The characterization of the first fusion gene generated excitement for the molecular genetics of cancer. To date, over 10,277 gene fusions have been reported (Mitelman database of chromosome aberrations and gene fusion in cancer), and the total number will increase because of continuous sequencing efforts. As the Bcr-Abl gene was classified as an oncogene, this discovery naturally promoted the study of oncogenes in general. The first success story of applying molecular targeting against a fusion gene using imatinib represents the best achievement in cancer therapy so far. It changed CML from a disease with 3e5 years average survival time to an almost normal life expectancy (Rowley, 2013). Many pioneers in this field are regarded as heroes, and rightly so. All of these remarkable successes of CML research have made CML the poster child of how to win the war on cancer. It brings hope and a plan for researchers of other cancer types to follow: identify the commonly shared genetic defects (key cancer genes) and target them with drugs of molecular specificity. Following vast investments for decades, and in particular, immense efforts using recently developed large-scale genome sequencing and -omics technologies, massive amounts of gene mutation data are now available for different cancer types. Unfortunately, despite these efforts, the success of CML-CP research has failed to be repeated for most solid tumors. Not only is there a lack of high penetration of key targetable cancer genes among patients (such as the Bcr-Abl fusion gene in CML patients), but also the targeting effect is much more moderate compared with the high success of imatinib for CML-CP patients, as most of the gene targets in solid tumors coexist within highly altered and dynamic genomes. This troubling case has been raised previously: Identifying recurrent chromosomal changes has proven to be extremely challenging in solid tumors due to the lack of recurrent patterns in most tumor types coupled with a high level of non-clonal chromosome aberrations (NCCAs) and karyotypic heterogeneity (Heppner and Miller, 1998; Albertson et al., 2003; Heng et al., 2004c, 2013a-b). The vast majority of gene mutations are not shared among patients, and overwhelming mutational heterogeneity can occur with inatumor (Bielas et al., 2006; Heng, 2007a; Ye et al., 2007; Navin et al., 2011). Furthermore, even when a recurrent mutation is present, as in the case of BRAF mutations in melanoma, the effect of a targeted drug such as vemurafenib is dramatic but transient, as tumors invariably become resistant to these agents (Wagle et al., 2011) see Horne et al., 2013b, with permission from John Wiley & Sons.

Clearly, the CML story is an exception. But why?

100

3. GENOME CHAOS AND MACROCELLULAR EVOLUTION

1. The Unique Evolution of CML The answer was surprisingly revealed through watching evolution in action experiments using in vitro immortalization model (Heng et al., 2006a; Heng, 2015). By tracing karyotype evolution, the well-documented stepwise evolution observed in CML (reflected as a traceable lineage of marker chromosomes among cellular generations) was not observed among earlier passages of cell populations. Before immortalization (from passage 7 to approximately 30), the pattern of karyotype evolution was discontinuous type or punctuated. Only immediately following immortalization did the evolutionary pattern become stepwise with shared clonal karyotypes observed among the majority of cells across generations (illustrated by the karyotype of passage 54, 109, and over 250). Soon after, the two evolutionary phases were also observed within drug-resistant and mouse spontaneous transformation models (see Section 3.3.2). Altogether, for most solid tumor models examined, the evolutionary pattern is very different from CML-CP, and clonal expansion is highly dynamic and difficult to observe within the punctuated phase for all cancer types including the crisis phase of CML. Since most cancers involve complex interaction among clonal expansion, genetic diversification, clonal selection, and the emergence of new genome systems within the adaptive landscapes of tissue/organ/ body ecosystems, one needs to compare these key features between CMLCP and other solid tumor types to illustrate the major mechanistic and phenotypic differences. Table 3.1 summarizes some of these comparisons. Among these differences, many unique aspects deserve additional discussion: CML-CP shares similarities to the benign solid tumor in terms of aggressiveness and drug response. When comparing the genome landscape profile and pattern of genome-level evolution, the high level of stochastic karyotype alterations observed within solid tumors (especially during metastatic stages) can be clearly detected during the CML-BC stage. Furthermore, when a high level of chromosome aberrations is present in CML patients, typically within the blast crisis stage, the magic molecular targeting of imatinib is no longer effective. In other words, CML-BC shares more similarities with metastatic tumors. Additional details have been discussed in a previous publication, which compared different disease progression and treatment response: As CML patients progress from the chronic phase into the accelerated and blast crisis stages, imatinib efficacy plummets. Complete cytogenetic response in early chronic phase patients placed on imatinib is approximately 80%. This falls to 8% in blast crisis (Radich, 2007), where the median survival time is measured in months (Assouline and Lipton, 2011). This compares to the efficacy of EGFR targeting in prostate cancer, as monotherapy agents have failed to demonstrate high antitumor activity in clinical trials (Canile et al., 2005; Gravis et al., 2008; Guerin et al., 2010).

101

3.2 SOS

TABLE 3.1 Comparison of Hematological Malignancies (Using the Example of CML-CP) and Solid Cancers (modified from Horne et al., 2013b, Why imatinib remains an exception of cancer research. Journal of Cellular Physiology, 228(4), 665e670. https://doi.org/10.1002/ jcp.24233). Feature Pattern of evolution

Hematological Malignancies

Solid Cancers

Gradual linear (stepwise) pattern (before crisis stage)

Multiple cycles of punctuated and stepwise evolution

Fusion gene dominance (BCR-ABL)

Fusion genes with much less dominance (if present)

Fusion gene detected in early stage of CML

Fusion genes are mostly negative for benign prostatic hyperplasia

Defined temporal order of karyotypic evolution

Much more diverse karyotype alterations

Cellular population size

Large population size

Small population size

Cell motility

Cells are free to migrate throughout blood environment

Cells populations are isolated and constrained by tissue geography

Genetic drift

Lower influence on large populations

Greater influence on small populations

Microenvironment

Blood stream is tightly regulated and relatively uniform (glucose and oxygen levels, pH, etc.)

Microenvironments vary widely within and between tissues

Cell metabolism Drug delivery/ targeting efficiency

Influenced by normoxic conditions, regulated nutritional levels

Varies depending on normoxic/hypoxic conditions and nutritional gradients

Free motility of cells allows for optimal drug targeting

Varying environments may affect drug chemistry, stationary tumor masses of cells potentially hinder drug targeting and penetration

Early lineage displays more defined differentiation and is characterized by orderly and more predictable stages

Late lineage displays less linear progression and is characterized by stochastic, unpredictable stages

Cell lineage of disease onset

102

3. GENOME CHAOS AND MACROCELLULAR EVOLUTION

The frequency of additional chromosomal abnormalities increases with progression in CML. This frequency is 7% in chronic phase patients and jumps to 40%e70% in the advanced stages (Skorski, 2011). These advanced stages of the disease resemble the majority of solid tumors, where the increase of genomic instability and accumulation of genetic changes are key features that are age-related and are responsible for a relatively longer time period for the cancer to develop and progress. The linkage between genomic instability and poor prognosis has been well documented in both hematologic and solid cancer patients. We then suggest that with imatinib, we are actually treating a stage of CML that is comparable to the benign phase of solid tumors. . Horne et al., 2013b (with permission from John Wiley & Sons).

2. Population Genomic Structure and Microenvironments Influence the Pattern of Cancer Evolution and Their Responses to Treatment Both population genomic structures and the cellular environment are very different between hematologic and solid cancers. Because environment can serve as important constraint for cellular evolution, especially when coupled with stress, its interaction with cellular populations can have profound impact on the pattern and speed of cancer evolution (either slow down or speed up). Some examples have been discussed: Population size plays an important role in shaping the evolutionary patterns. Cell populations of hematological malignancies occupy a large blood environment. Within this system, initially altered cells can freely move. Any dominant alteration, such as the appearance of fusion gene products, would have a significant impact on the entire system. According to population genetics, clonal events within a large population can be dominant over non-clonal events (Gerrish and Lenski, 1998). In contrast, altered cells in solid tissues are constrained by tissue geography and local micro-environments are different, unlike the tightly regulated, relatively uniform blood environment. These altered cells represent typical small, isolated populations. Small population size implies that genetic drift has a greater influence on evolution. Solid tumors, which represent isolated small populations, mediate their evolution through the NCCA/ CCA cycle (Heng et al., 2006a). NCCAs develop into different CCAs in different tumors due to the influence of genetic drift. This principle has also been discussed in regard to the correlation between dominant mutation types, the size of a tissue within a cellular compartment, and the size of a stem cell pool (Frank and Nowak, 2004) .. As a result, the evolutionary process of these different isolated populations is diverse, requiring a longer time to evolve due to additional system constraint. Horne et al., 2013b (with permission from John Wiley & Sons).

3. Heterogeneity and Treatment Response In general, solid tumors have higher degrees of heterogeneity than liquid cancer type because of the population structure and tissue constraint. For example, there are much higher opportunities for detecting the clonal chromosomal aberrations in liquid cancers (either shared among patients or within a given individual), as clonal chromosomal aberrations in fact represent stability (when compared with nonclonal

3.2 SOS

103

chromosome aberrations [NCCAs]) (Heng et al., 2006b, 2006c; Ye et al., 2007). Indeed, the majority of occurrences and types of new translocations are detected from hematologic cancers (75%) compared with solid cancers (mainly epithelial) (25%) (Rowley, 2013). One would imagine that there are many more translocations in solid tumors than liquid types because of the diverse environments and the large number of cases, but most of them are nonclonal types among patients and between different portions of the same tumor of a given individual, which often fail to get into the literature, as these nonclonal aberrations have been considered as insignificant noise (Heng et al., 2006b; 2006c; 2016a). Many other types of hematologic cancers display higher degrees of heterogeneity than CML, contributing to the challenges for cures. One exceptional case is treating PML-RARAepositive acute promyelocytic leukemia (APL) patients with a combination of arsenic trioxide and alltrans retinoic acid, which leads to a 5-year overall survival rate of 97.4% (Hu et al., 2009). There are remarkable parallels between APL and CML-CP. First, both are typically characterized by a highly penetrant, dominant fusion gene (PML-RARA is found in over 98% of APL cases) (Vitoux et al., 2007). In comparison, many fusion genes identified in solid tumors are found in only minority cases (often can be as low as small percentages) (Shaw and Solomon., 2011). Second, like the success of BCR-ABL mouse models, PML-RARA expression yields APL in transgenic mice (de The and Chen, 2010), showing the direct contribution of the fusion gene to the onset of the disease. Third, both diseases are hematological malignancies with similar population structures and a strong influence of a more defined cellular lineage, ultimately allowing for a genomic alteration (e.g., translocation and its fusion gene) to become dominant in the entire system. The exceptional stories of CML-CP and APL explain why it is more challenging to treat cases with high degree of genomic heterogeneity, even for liquid cancers. They also suggest that some subtypes of cancer could be better treated using target-specific or even less-specific therapy, during the stepwise phase of cancer evolution, when these subtypes of cancers display evolutionary patterns similar to those of CML and APL. Because molecular targeting and immune therapy can further reduce potential side effects otherwise associated with harsh, general cellular mechanismefocused treatment, with further improved supportive care, more patients will have better treatment options. It should be pointed out that even for the case of CML-CP, heterogeneity can still complicate treatment. For example, the signature translocation between chromosomes 9 and 22 and its variants are observed as the sole cytogenetic aberration in 80%e90% of CML patients diagnosed in the chronic phase. For 7% of patients who display additional cytogenetic changes, a proportion of them are linked to lower or delayed molecular response and significantly poorer prognosis (Fabarius et al., 2011).

104

3. GENOME CHAOS AND MACROCELLULAR EVOLUTION

Moreover, following years’ treatment of imatinib, 50% of cases with drug resistance involved additional chromosomal changes rather than the original target. It should not be a surprise if some treated patients develop increased genome instability in the future. 4. Unforeseeable Negative Impact of CML on Research Community The CML story represents one of the most remarkable achievements for cancer research. How can such a wonderful example bring any negative impact on our war on cancer? Obviously, it is not the CML research itself but the attitude such success bought to the research community. The fact that CML researchers succeded has convinced the field that the same magnitude of success will be duplicated for other types of cancer. Researchers believe that even though most other cancer types will be somewhat different, the theoretical basis must be the same (featured by the accumulation of gene mutations/ pathways). What researchers need to do is follow the example of CML. Many leading scientists and policy makers have used the CML story as the evidence and rationale to support the somatic gene mutation theoryeframed status quo, to further shape the research landscape, to develop the general research strategy, and to prioritize research funding. For example, such high hope has provided the key argument in favor of cancer genome sequencing projects: “if we can cure CML, we surely can cure other cancer types, we just find the Achilles’s heel for each cancer by sequencing, and then apply molecular targeting.” With increased high-throughput technologies and bioinformatic platforms, we were told the analysis of massive numbers of samples will for sure deliver. The important fact that CML indeed represents an exception is gravely ignored. Such decades-long ignorance has resulted in the research focus mainly on the characterization of cancer gene mutations rather than profiling of karyotype alterations; on the recurrent genomic aberrations rather than nonclonal types, which are more common and define system instability; on linear types of evolution rather than punctuated cancer evolution; and on individual molecular mechanisms than evolutionary mechanism. While the research priority on the gene and its defined function in various linear models have generated large amount of data on “agents” of the complex system, it delayed our understanding of the importance of karyotype-defined system inheritance and new system emergence and widened the gap between basic research and clinic. Just over a decade ago, the effort to illustrate that CML-CP and the majority of solid cancers display different patterns of cellular evolution was considered controversial and not appreciated. When a manuscript named “Hematologic and solid tumors: differential patterns of cancer

3.2 SOS

105

evolution” was submitted in 2006, no journals were interested. Some reviewers insisted that there was no need to discuss this issue as increased samples will narrow the gap between CML and other types of cancer. This situation has drastically changed partially because of the availability of massive sequencing data that show the diverse genomic landscape for majority of cancers. Ironically, it is still unpopular to discuss this issue because of opposite reasons. “Everyone knows that CML is an exception”; some editors responded for not engaging this discussion, even though they seem to be unable to see the deeper implication of this issue on current cancer theory and strategies, which requires many serious discussions. After bouncing around for over 7 years, this manuscript was finally published (Horne et al., 2013b). 5. Exceptions of Model Systems and the Reality of Cancer Obviously, the significance of the above discussion is far beyond comparing the difference between CML-CP and the majority of cancer types, as there are many well-known successful stories of basic research in the field and many of which represent exceptions when compared with the reality of the cancer. For example, during the course of searching for causative cancer gene mutations, a large number of gene/pathway-specific linear models have been established. In these popular model systems, some very clever and crucial manipulations and/or selections are used to either artificially enhance one or few elements or eliminate the heterogeneity. Experimental and analytical strategies include but are not limited to the following: artificially promoting a given gene’s function by overexpression at an unrealistic level, studying a few interactions by ignoring the majority of the interactions (such as selecting a few genes among the thousands of impacted genes); focusing on a specific time window for data collection; using molecular probes to select positive clones among tens of thousands of cells with similar phenotypes but hosting different gene mutations; monitoring the gene status only and ignoring the chromosomal context which defines the gene’s function; and reporting the average data by washing out outliers with the help of statistics (considering them as “noise” or unreal data, not realizing that these are crucial for evolutionary selection [e.g., these are responsible for new system emergence including in drug resistance]). As direct results from the above approaches, some key concepts have been established but clearly represent important exceptions associated with the original experimental systems. Example one: It has showed that the combination of three key gene mutations (the ectopic expression of the telomerase catalytic subunit [hTERT] coupling with the simian virus 40 large T oncoprotein and an oncogenic allele of H-ras) is responsible for the cellular transformation of

106

3. GENOME CHAOS AND MACROCELLULAR EVOLUTION

human cells (Hahn et al., 1999). This experiment has been considered as a milestone for cancer research that illustrates the key contribution of somatic gene mutation for cancer. It turns out that the power of three-gene mutations is experimental system-specific and chance-related, as often five or more introduced gene mutations are needed to achieve the same result. Weinberg has admitted that this classical three-gene combination is not sufficient for direct transformation of normal human fibroblast and epithelial cells (Weinberg, 2014), even though he still ignored the alterative and better explanation that cellular transformation is driven by karyotype changes. Nevertheless, the majority of cancer researchers are still considering a minimal number of gene mutations causing cancer as the dogma. They even dismiss Weinberg’s less optimistic tone about the future of cancer research likely because they prefer to believe his original story. Now, we know that, in general, many different defined genomic alterations/environmental factors can promote cellular evolution leading to transformation (but in a stochastic fashion), and the key driver is stressinduced and CIN-mediated macrocellular evolution, which leads to the emergence of new systems. Despite that trigger factors can have drastically different molecular mechanisms, the common feature of different experimental systems is the increased chance that links cancer-related phenotype to the factors under examination. Thus, each well-defined model often represents one factor under an exceptional experimental condition, among near unlimited contributing factors (for the general patient populations). This further explains an open secret in the cancer field. At the academic/research front, so much has been achieved: the identification of thousands of gene mutations and their defined pathways and demonstration of these mutations to generate tumors in cellular and mice models showed that many drugs can eliminate cancer cells under experimental conditions. However, the attempt of applying our knowledge into the clinic is as hard as to cross Death Valley (Butler 2008). Yet, new technologies and newly popular fields continue to generate more elegant scientific publications and leading experts. Clearly, the success based on exceptional cases awards researchers. Just briefly scan the publication records of some successful cancer researchers. Most of them are famous because of their contributions on one cancer gene or pathway. During their entire career, their research directions are constantly changing, reflecting the waves of “hot topics.” Their favorite genes/pathways have been linked to almost all biological processes, one paper at a time, through application of (almost always) newly developed technological platforms. Each individual story published is very convincing, but if all papers are put together, it is rather confusing, as often, by isolation, an exceptional case makes perfect sense based on the data. By taking all of the stories into account, however, these explanations often conflict with each other as

3.2 SOS

107

different papers are focused on different molecular links. After two to three decades, it seems that the same gene or pathway is linked to everything: a typical story of blind men and an elephant. Example two: Judah Folkman once brought excitement in the field with his breakthrough angiogenesis research. By targeting the nutritional supply for cancer cells by controlling angiogenesis, he powerfully demonstrated his concept. The most impressive results have been obtained in animal model, leading to the prediction by some leaders of molecular biology that NCI would soon be closed, as cancer would be cured for sure within a few years. It turned out that such wonderful data in the animal model could not be duplicated in patients. It even worsened conditions for some patients. Clearly, this specific and homogenous mouse model represents an exception compared with real cancer in patients where the heterogeneity is overwhelming (for more information, see Heng, 2015). Unfortunately, this lesson has not yet been learned by the majority of researchers, as the overall practice of cancer research is still the same: identify a molecular target and target it in a linear model including homogenous mice. As soon as the model generates good data, it becomes exciting news. Finally, it likely will fail in the clinic. Nature, but not cancer researchers, knows well the difference between exception and general cases. Example three: The traditional understanding of cancer drug resistance is based on the gene mutation theory and clonal expansion model (a stepwise evolutionary model). When a given cancer cell population is treated with drugs, the majority of the population will be eliminated. With higher initial killing power, fewer cells will survive (including lower opportunities of new mutations). Such reasoning has led to the general practice of using maximum dosage for the initial treatment, which is also supported by the current practice to reduce drug resistance for antibiotics. Unfortunately, initial harsh treatment can also be associated with cancer’s rapid recovery. It turns out that harsh treatment can also induce genome chaos (see Sections 3.3.3 and 4.4.2.6) as a trade-off for the initial effective killing. Genome chaos triggers a rapid and massive evolution, which create new genomes that are drug-resistant (note that it is not just preexisting gene mutation selection, but rather drug treatment-induced formation of new genomes). Again, in an ideal linear model, if there is no dynamic evolutionary selection involved, the number of surviving cells will be the key factor. However, in reality, there are so many factors involved in a dynamic selective process; many factors can be dominant in a stochastic fashion, making it extremely difficult to predict. Together, the conflict between knowledge generated from exceptional models and clinical complexity has created challenges of applying basic research into the clinic. This situation has been summarized by following statement:

108

3. GENOME CHAOS AND MACROCELLULAR EVOLUTION

In the past several decades of the War on Cancer, there have been many short-lived breakthrough stories sharing the following common pattern: A new approach targets a different molecular pathway; the concept seems to make sense as judged by the dominant gene-based cancer theory and most importantly, the animal studies are convincing. Unfortunately, the initial promise leads to disappointment as new therapies that rapidly and reproducibly eliminated tumors in animal models often cannot duplicate this success in the clinic. Heng, 2015.

Interesting to point out, similar confusion about the exception and general rule has a long tradition in both genetics and evolutionary biology (from Mendel [Chapter 1] to Darwin [Chapter 6] to today’s cancer research). Further analyses have surprisingly revealed that there is a double standard in biological research when dealing with exceptions deal with the exceptions in science. On one hand, for the exceptions they believe or wish to duplicate with their own experimental systems, they are firmly convinced or have hoped that the exception represents a general rule. They prefer to believe that the failure of applying these exceptions into general cases is because of current limitations of our understanding/methodologies (even though decades have passed to duplicate them). On the other hand, during data analysis and interpretation of their daily research to identify patterns, they often considered outliers as insignificant “noise” and ignored them. Clearly, such attitude and scientific practice need to be challenged.

3.2.2 Cancer Genome Sequencing: The Results Challenge the Rationale 3.2.2.1 Initial Goal and Controversy The current Cancer Genome Project was promised by many to be the ultimate guide to solving the mystery of cancer once and for all. Before the NIH’s official launch of the Cancer Genome Atlas project (TCGA) in 2005, there were conflicting opinions toward such an endeavor (Garber 2005). Supporters argued that sequencing the cancer genome is the logical next step following the successful Human Genome Project (Collins and Barker, 2007). Because cancer results from the stepwise accumulation of cellular abnormalities resulting in loss of self-control and is fundamentally a disease of genes, decoding the entire cancer genome (sequencing all genes from a large number of cancer samples) would identify all key gene mutations. Such a large-scale and multiple-centers project would achieve the goal quicker and more efficiently than the old “cottage industry scale” research carried out by individual laboratories. In addition, the whole genome approach is less biased, and the “leave no stone unturned” approach would certainly identify the Achilles’ heel of cancer. Moreover, the remarkable story of imatinib in CML supports such a “big science”

3.2 SOS

109

project. Finally, the success of the Cancer Genome Project would further illustrate the value of the original Human Genome Project, which had already created a reference genome that would serve as a normal control to decode the cancer genome. However, the fact that the “normal” genome is highly dynamic, later suggested by the Personal Genome Project, was not considered. The overall concepts and expectations were summarized as following: The main focus of this project is to catalogue genetic changes associated with cancer by sequencing the mutated DNA. The rationale for carrying out this ambitious project comes from the availability of technology and knowledge, as well as the urgency to develop new strategies to fight cancer. According to the current conceptual framework, cancer is caused by the fixed stepwise or sequential accumulation of commonly shared mutations of oncogenes and tumor suppressor genes. (Vogelstein and Kinzier, 2004; Sjoblom et al., 2006) Therefore, if we were to sequence such mutations from representative patient populations, we should have a blueprint of all the core genes that are responsible for cancer. Furthermore, if this sequencing were performed on a large sample of patients, it would also mitigate the ‘‘problem’’ of cancer heterogeneity by reducing the ‘‘background noise’‘. It is anticipated that the list of cancer genes identified at various clinical stages can further provide the sequential genetic order of cancer initiation and progression. The implication is that this information would lead to new diagnostic and therapeutic targets. Heng, 2007a (with permission from John Wiley & Sons).

Some objections were voiced, but only by a few scholars. Their main points included the fear that “big science” would take funding away from individual scientists with hypothesis-driven smaller projects (the traditional way of funding bio-research since the end of World War II); the concern that sequencing could not deal with tumor heterogeneity (as there is a lack of mutational overlap in most cancers); the disappointment that epigenetic alterations would be missed by DNA sequencing; and the realization that there is a key genomic difference between primary tumor and metastatic cancers. Moreover, based on the newly discovered dynamic pattern of cancer evolution where karyotype alteration plays a driving role, more critical questions were raised. These directly challenged the anticipated goal of cancer genome sequencing, as well as the theoretical framework behind current gene mutationecentered cancer research. Clearly, the success of the TCGA project depends largely on the validity of the current concept of cancer. Surprisingly, this central concept has not been vigorously tested in most cancer types. In fact, this decades-old concept is based on two generally accepted assumptions. First, the pattern of cancer cell evolution is a fixed stepwise process involving common shared genetic changes (implying that recurrent sequential genetic changes can be identified). (Vogelstein and Kinzler, 1993; Heng et al., 2006b) Second, a few events defined by mutations in oncogenes and tumor suppressor genes drive cancer progression (implying that the identification of gene mutations is

110

3. GENOME CHAOS AND MACROCELLULAR EVOLUTION

the key to pinpointing the cause of cancer). (Hahn and Weinberg, 2002) Even though epigenetic contributions, such as the methylation of gene promoters and global chromatin modifications, have gained increased attention in cancer research, its mechanism in cancer evolution is still explained within the framework of the gene mutation theory. (Esteller, 2006) A question of crucial importance to the success of the TCGA project is: are these assumptions correct? . (Heng, 2007a, with permission from John Wiley & Sons)

A brief comparison of the differential patterns of cancer evolution between CML and solid tumors revealed that the evolution of most solid tumors is stochastic and unpredictable and that each run of cancer evolution often generates drastically different genomes reflected as various karyotypes. This questioned the significance of identifying common cancer mutation patterns by sequencing more samples. The gene mutatione centered concept was also questioned based on the latest concept of genome-level alterationedriven cancer, which contested the rationale of cancer genome sequencing in the first place. Finally, by discussing the issues of heterogeneity, multiple levels of system constraints, gaps between laboratory findings and clinical realities, nonlinear interactions between genetic and environmental insults, and hidden links between various molecular pathways and genome aberrations, it was predicted that the highly anticipated Cancer Genome Project would ultimately face conceptual challenges rather than technical ones and that understanding the evolutionary process is more important than characterizing specific genetic signatures. As cancer is a disease of probability influenced by genome and environmental interaction, it is highly challenging to predict the outcome just by profiling genetic landscape. Such analyses called for a new concept with which to approach future cancer research. . a new conceptual framework is now emerging that changes the focus from gene mutation to genome aberration, from stepwise progression to stochastic evolution, and from the identification of individual pathways to monitoring overall instability and dynamics of a system. Heng, 2007a

Unfortunately, the call to incorporate genome-mediated evolutionary principles into cancer research and depart from the current gene mutation theory of cancer failed to generate the necessary support, as the field was convinced by a then-rosy picture of what cancer genome sequencing promised to achieve. Perhaps supporters held some hidden key rationales which were not discussed in public. Some cancer gene hunters believed the “leave no stone unturned” strategy was the best tactic to resolve the increasing confusion (e.g., why are common shared mutations difficult to identify, despite the high number of cancer genes that are being discovered?). They hoped that a reasonable “celling” (the maximal number of mutations for a given cancer type) should be found. And for many gene-centric thinkers, the

3.2 SOS

111

promised result of the Cancer Genome Project was the only hope in validating the cancer gene mutation theory, which had displayed the rapid accumulation of anomalies (in Thomas Kuhn’s term). Nevertheless, the pressure of “curing cancer now” was high, and the strategy of “sequencing them all” became the greatest hope in bringing forth a new era in cancer research. According to the NIH and NCI leaders at the time, TCGA would fundamentally change how we approach cancer. The rest is history. 3.2.2.2 Major Discoveries and Surprises Fast forward, 14 years have passed since the announcement of the cancer genome sequencing project. The past decade has witnessed drastic developments in sequencing technologies. While several new sequencing platforms have been commercialized and implemented in laboratories worldwide (Reuter et al., 2015), the cost of sequencing has continued to drop. This has been praised as a major achievement when discussing the significance of sequencing projects. With these advanced sequencing tools, both TCGA and International Cancer Genome Consortium (ICGC) have performed whole genome and whole exome sequencing on thousands of tumor-normal pairs, establishing mutational landscapes for over 20 cancer types (Lawrence et al., 2014; Heng, 2015, 2017a). At first glance, one would think the Cancer Genome Project achieved great success, judging by the many publications in top scientific journals as well as waves of excitement from the media. News articles are commonly titled “Scientist decode entire genetic code of cancer” or “Scientists unlock genetic code in major cancer breakthrough.” Here is a typical example: “The entire genetic codes of two common types of cancer have been cracked, according to scientists, who say the breakthrough could unlock a new era in the treatment of deadly diseases.” (Han, 2009) Furthermore, whole exome or whole genome sequencing has been offered to patients by an increased number of institutions and hospitals. Carefully reading these sequencing papers, however, leads to the realization that the real data are far from the rosy picture authors have painted. In addition to the fact that these papers are hard to digest because of the involvement of large datasets and complicated bioinformatics platforms, the real findings often differ from the message carried by the title or abstract (this is especially true of newspaper headlines with quotes from authors or other scientists). For example, in many sequencing papers, the true findings are clearly just high levels of genomic heterogeneity and increased genomic and evolutionary complexity, no new magic strategy for fighting cancer. A more accurate news story would sound like this: “A large number of gene mutations have been detected from hundreds of patient samples for a recent report. What surprised the scientist are the high levels of mutation heterogeneity, which challenges the original plan to sequence these samples in order to identify a handful of

112

3. GENOME CHAOS AND MACROCELLULAR EVOLUTION

common gene mutations. These long-expected key driver gene mutations are thought to be ideal targets for medical diagnosis and intervention. How to utilize the information represented by this high level of heterogenic gene mutations represents an even bigger challenge.” What have we learned from this highly anticipated and expensive project? Have any major goals been achieved? Should more samples be analyzed? What are the significant contributions of sequencing data to the current cancer theory? How useful is it for patients to sequence their cancer genomes anyway? To answer these questions, one needs to evaluate the data in the context of the project’s original expectations, surprising results, and real challenges. Evaluation of these data will bring forth the following key points: 1. Validation of Known Cancer Gene Mutations

So far, the Cancer Genome Project has validated most previously known cancer gene mutations as “driver mutations,” including TP53, MYC, BRCA, and RAS. The estimated number of cancer driver gene mutations is about 200e550, 1%e3% of all human genes. This figure is based on estimations by Bert Vogelstein and Michael Stratton (Vogelstein, 2011; Stratton, 2013; Heng, 2017a). Nevertheless, this figure is much higher than the initial estimation based on the concept that only a few gene mutation “hits” are necessary to change a normal cell to a cancerous one. According to Weinberg, there should be less than a “handful” of oncogenes for cancer (Weinberg, 1982, 2014). More detailed information can be found in Heng 2015, 2017a. 2. There are far Fewer Newly Identified Driver Genes Than Expected

Limited new driver gene mutations (20% of the detected missense mutation in the gene at a recurrent position. In contrast, a real tumor suppressor driver needs >20% of the recorded mutations in the gene to result in inactivation (Vogelstein, 2011). Compared with the standard used by other research groups, the 20% cutoff line seems too strict, which can explain the different estimations regarding the total number of cancer driver mutations. In addition, many genes can display both the oncogene and tumor suppressor function, depending on their genomic and environmental contexts. c. Mutational spectra Certain cancer types tend to display mutational patterns. Melanomas have a pattern suggesting the involvement of UV exposure (by displaying a preponderance of C to T mutations), while lung cancers display a spectrum which can be traced to cigarette smoke (showing a preponderance of C to A mutations). Although it is academically interesting to study the mutational spectra, the implications of such studies are limited because of the following considerations: (1) patterns are only obvious when a given environmental exposure has a dominant impact in tumor evolution (it is therefore challenging to study the pattern for most other factors) and (2) these patterns seem clinically useless, as the tumor genome is highly dynamic and more important than such patterns.

114

3. GENOME CHAOS AND MACROCELLULAR EVOLUTION

4. Chromosomal-level Alterations are Overwhelming

Chromosomal changes in cancer have traditionally been ignored by most gene-centered molecular biologists. The biased attitude is influenced by the following untested assumptions: (1) the majority of chromosomal translocations are “passengers” rather than “drivers”; (2) there are about 10 times fewer genes affected by chromosomal changes than by point mutations (Vogelstein et al., 2013) (This viewpoint is perhaps only based on the number of directly impacted fusion genes. In fact, most chromosomal changes can impact a large number of genes, far greater than the fusion gene); (3) chromosomal variations are incidental (chromosomal aberrations result from cancer but do not result in cancer), as chromosomes serve as gene carriers (it is the gene that matters); and (4) the main rationale to study chromosomal aberrations is to clone genes. The higher the resolution, the better. Studying chromosomal aberrations is descriptive rather than mechanistic. Surprisingly, the results of the Cancer Genome Project begin to directly challenge these assumptions. First, it was confirmed that chromosomal changes are overwhelming in most cancer types (Baca et al., 2013), and the high degree of large-scale genomic rearrangements is the general rule, rather than the exception (Stephens et al., 2011; Heng, 2007c; Heng et al., 2016a). Second, chromosomal changes are essential for many transitions of cancer evolution, including transformation, metastasis, and drug resistance, as the dynamics of only mutation and epigenetic landscapes are not sufficient to achieve these major transitions (Heng et al., 2013a; Gao et al., 2016; Bloomfield and Duesberg, 2016; Jamal-Hanjani M et al., 2017; Davoli et al., 2017). Third, genome chaos has been “rediscovered” by sequencing. This has generated a wave of excitement (Baca et al., 2013; Horne and Heng, 2014; Heng, 2007c; 2015; Liu et al., 2014; Stephens et al., 2011a; 2011b; Ye et al., 2018a; 2018b). Because the phenomena of chromothripsis and chromoplexy were observed in the majority of cancer types, it was suggested that cancer evolution can be achieved by “sudden” genome reorganization rather than the accumulation of a series of small genetic alterations (such as gene mutation or epigenetic changes). Fourth, often, many gene mutations and chromosomal aberrations coexist. As any chromosomal change can impact hundreds of genes, quantitatively speaking, it must be more important than the changes of individual genes themselves (Ye et al., 2009). Furthermore, chromosomal changes can serve as the genomic context for those individual genes (Heng, 2009). 5. Multiple Levels of Genetic/Genomic/Epigenomic Landscapes

The Cancer Genome Project has also collected information of the epigenetic landscape and copy number variations (CNVs) landscape.

3.2 SOS

115

Compared with the gene mutation landscape, the epigenomic landscape varies substantially from cell to cell and is much more sensitive to environmental influence. The epigenomic data are useful in explaining some cases without specific gene mutations where gene function is clearly compromised. Increased effort is being made to study the high-order structures of chromatin/chromosomes and their epigenetic impact. Currently, however, most epigenetic explanations are focused on specific genes. It would be more valuable if the epigenomic impact on the global genome instability and adaptability was studied. As for CNV, one form of structural DNA variations that contributes to increased diverse phenotypes by causing gene dosage effects, cis-regulation, or rewiring networks, it was reported that germline CNVs are associated with breast cancer risk and prognosis (Kumaran et al., 2017). Furthermore, the profiling of somatic CNV can play an important role in cancer prognosis and treatment improvement, as oncogene activation can be caused by copy number amplification, and the inactivation of tumor suppressor genes can be contributed by either heterozygous deletion associated with mutation or by homozygous deletion. Surely, cancer genome sequencing has shown that somatic copy number alterations (SCNAs) are extremely common in cancer. Interestingly, over 70% of recurrent focal SCNAs detected from cancer are not associated with known cancer genes (Beroukhim et al., 2010; Zack et al., 2013). This finding suggests that SCNAs might function differently from individual cancer gene mutations. In addition, while the study of SCNAs has popularized the importance of aneuploidy in cancer, one should not confuse these two levels of CNV. In general, aneuploidy should have more impact on gene expression because of its alteration of karyotype coding, as well as the large number of genes involved. Knowing the involvement of multiple genomic and epigenomic landscapes in cancer, one immediate task is to understand their interactions and their overall outcomes in cancer evolution. Because macrocellular evolution represents a driving force for cancer (see Section 3.3) and the determining factor is at the karyotypic level, a new method of studying cancer genomics needs to be established. The contribution of different types of genomic and epigenomic aberrations needs to be quantified and integrated into the equation. It is likely that different types of aberrations contribute differently within specific phases of cancer evolution. 6. The Landscape Dynamics During Cancer Progression and After Treatment

One major practical goal of the Cancer Genome Project is to generate a molecular targeting list based on the gene mutation landscape. This goal, however, faces yet another key challenge because of the genomic landscapes’ highly dynamic nature (as illustrated by sequencing data).

116

3. GENOME CHAOS AND MACROCELLULAR EVOLUTION

There are at least three facts (or features of cancer) that challenge our desire to cure cancer by applying molecular specific targeting: 1. For a given tumor, the high level of intratumor genomic heterogeneity makes drug design extremely difficult. Which population should we target first? How many targets do we need to include? 2. Unlike CML, the pattern of most cancer progression is nonlinear. There is often no stepwise evolution between earlier and later stages, from primary tumor to metastasis, and from before and after drug treatment. Punctuated cancer evolution has been observed in most cancer types (breast, prostate, pancreatic, liver, and lung). Many different terms have been used to describe this common phenomenon, including “macrocellular evolution” and “the Big Bang.” Together, all these sequencing data have forcefully confirmed the cytogenetic findings that suggest there are two phases of cancer evolution (see later sections). 3. As soon as drug treatment is involved, the gene mutation landscape drastically changes, reducing the specificity of the designed drugs. Perhaps most importantly, these drug-induced changes often involve many genes. The drug treatmenteinduced drastic genomic changes fundamentally reduce the value of profiling the landscape of previously treated tumors. Obviously, these three facts are connected by heterogeneity and its mediated evolution. All three features will also lead to chromosomal changes, which are perhaps much more powerful than gene mutations (this will be further discussed in later sections). The truth is that cancer researchers should know these key features regardless of the Cancer Genome Project. It is well established that with all drug treatment resistance results, heterogeneity is the basis for drug resistance. Such consequences have been clinically observed and explained/predicted by many evolutionary cancer researchers and computational models (Maley et al., 2004; Heng, 2007a; 2015; Pepper, 2012; Horne et al., submitted). Nevertheless, the impressive cancer genome sequencing data might finally convince many molecular biologists to accept this sad conclusion: most molecular targets are moving targets, which makes specific targeting extremely difficult. Despite cutting-edge sequencing technologies and our strong desire to fight cancer, scientists need to respect natural laws when conducting research. 3.2.2.3 The Ultimate Challenge to Current Cancer Theory Before the cancer genome sequencing era, the cancer gene mutation theory began to display increasing anomalies (scientific facts that did not seem to fit the paradigm, according to Thomas Kuhn). One of the highest hopes for the Cancer Genome Project was to generate as much possible

3.2 SOS

117

comprehensive genomic data with which those anomalies could be resolved. Unexpected by most, the Cancer Genome Project has further revealed many more key anomalies which will likely trigger research crisis that is essential for a paradigm shift. The following examples support the above statementdthese facts discovered by deep sequencing are against the predictions of the current cancer theory. 1. Too Many Mutations, not Enough Common Drivers

To make sense of the findings of too many gene mutations in cancer, the detected mutations have been classified as “driver” and “passenger.” Most gene mutations belong to the passenger category. According to current viewpoints, driver mutations have a fitness advantage which can result in clonal expansion. In contrast, passenger mutations are assumed to have no impact on fitness (Davis et al., 2017), implying that passenger mutations might be less important in cancer evolution and even that large quantities would not pose any problems. While it was unanticipated that so many passenger mutations would be detected, it was an even bigger surprise that only a small number of cancer gene mutations are common across patient populations, as illustrated by the long tail distribution pattern of gene mutations. For instance, among 276 analyzed colorectal cancers, only 24 gene mutations were detected at a significant frequency (Cancer Genome Atlas Network, 2012). Even more striking, there are far fewer driver mutations than expected in many sequenced solid tumors. This is rather disappointing as a central goal of cancer genome analysis is the identification of cancer driver gene mutations (Stratton et al., 2009). For many individual tumors, there is zero to one driver oncogene. In contrast, for established cancer cell lines, more oncogenes are observed, suggesting a possible selective condition through which cell lines can favor proliferation genes. The lack of key driver mutations also directly contradicts the cancer gene mutation theory, which predicts that five to eight “hits” are required for cancer formation. In fact, the “multiple hits” concept not only fit the age distributions of cancer patients (Armitage and Doll, 1954) but also served as the key conceptual basis for searching for these sequential cancer genes in the first place. More bizarrely, for some normal tissues, there are many typical cancer driver mutations (Martincorena et al., 2015) Heng, 2017a (with permission from Elsevier)

A mini-driver model was proposed to explain the lack of driver mutations. It suggested that there are many mutations and that each contributes small effects on the overall cancer fitness (Castro-Giner et al., 2015). This model predicts increased challenges to target these mini-drivers. Alternatively, the overload of passenger mutations has been explained using the concept of neutral evolution (Ling et al., 2015; Williams et al., 2016), suggesting that there are no selection or fitness changes during most of a tumor’s lifetime (Davis et al., 2017). Classifying cancer gene mutations as driver or passenger and solely focusing on driver mutations has its limitations. First, the role of driver

118

3. GENOME CHAOS AND MACROCELLULAR EVOLUTION

and passenger mutations can be switched at different phases of cancer evolution when under different environmental conditions (Heng, 2015, 2017a). In crisis conditions, for example, passenger mutations must take the position of the drivers to save the population (e.g., during the formation of drug resistance). Sometimes it is hard to predict the time window that most clonal expansions can be observed. This situation may be useful in explaining an individual tumor’s heterogeneity of clonal expansion (to some degree). Second, classification of drivers and passengers is a relative concept representing current evolutionary state and potential evolutionary status. According to our study on NCCA/clonal chromosome aberration (CCA) relationships (see Section 3.3.1), selection forces can favor the dominance of either clonal or nonclonal phases (similar to the dominance of drivers or passengers at the gene level). Third, also based on NCCA/CCA research, passengers likely have the function of “survival” rather than the drivers’ function of increasing fitness (see Section 3.3.4). Fourth, it is not yet clear how competition and collaboration work between drivers and passengers. Is there a quantitative relationship? How does emergence work among passengers? And fifth, how does the involvement of karyotype changes play a role on the gene level of selection? 2. Disagree with the stepwise cancer model of accumulating gene mutations

The growth process coupled with genomic pattern of human cancer cannot be practically observed. Nevertheless, based on the Darwinian principles of natural selection, gradual changes, and influence of linear progression model of CML-CP, the standard clonal evolutionary model was proposed (Nowell, 1976). As discussed in Section 3.2.1, while the clonal evolutionary model fits well with the CML story, it has been a challenge to explain most solid tumors. A genetic linear model of tumorigenesis was then introduced based on the idea of stepwise accumulation of a few gene mutations and the importance of the acquisition of key mutations that drive the evolutionary process of tumors (Fearon and Vogelstein, 1990). This model soon gained popularity and was considered a new dogma which puts gene mutation into an evolutionary and clinical context. Back to earlier 2000s, the two phases of cancer evolution (punctuated phase followed by stepwise phase) was observed based on karyotype progression during in vitro cellular immortalization (see Section 3.3.2). Such observations questioned the linear clonal evolutionary model. However, this discovery was ignored by molecular biologists. It was believed that with a large number of samples, the cancer genome sequencing project supposedly confirm the clonal evolutionary model. Recent cancer genome sequencing data, especially data from single-cell sequencing, have rejected the linear stepwise tumor evolution model.

3.2 SOS

119

Intratumor heterogeneity has led to various alternative models including branching evolution (tumor mass is generated from multiple clonal lineages in a parallel fashion, despite that all lineages share a common ancestor), neutral evolution (most massive gene mutations are neutral), and punctuated evolution (in contrast to the gradual and sequential accumulating gene mutations, tumors are thought to grow predominantly as single expansion producing massive intermixed subclones. After the burst of genomic diversity, one or a few dominant and stable subpopulations are selected to be responsible for the tumor mass). A recent examination of SCNAs in triple-negative breast cancer supports the model of punctuated evolution with a short period of genomic crisis (massive stochastic changes) followed by a long period of genomic stasis (stepwise clonal expansion) (Gao et al., 2016). Significantly, the discovery of such patterns finally confirmed the previous cytogenetic findings (Heng et al., 2006a-c, 2011a-b, 2013a-b; Heng, 2007a, 2015; Horne et al., 2015a-c). Interestingly, both types of cancer and the categories of genomic aberrations used to construct the evolutionary trees can be better explained by different models (this point has been discussed in CML story as well). Branching evolution models are mainly characterized by point mutations while punctuated evolution by DNA CNVs and chromosomal reorganization. More importantly, the contribution of genome-level and DNA-level aberrations to different types of cancer evolution must be clearly separated into distinct categories, namely micro- and macroevolution (one of the key concepts of the genome theory this book is promoting). Making this distinction clear will reconcile many cancer evolution models and eradicate the rising confusion within the field. Based on the two phases of cancer evolution, the punctuated phase is represented by a genome reorganizationemediated survival landscape, while the stepwise phase is represented by a gene mutationemediated fitness landscape. The patterns of these two phases should be integrated into all models. For example, the punctuated pattern of CNVs reflects punctuated genome evolution (and is usually the final result). As for the neutral model where selection is not obvious at gene mutation level, strong selection at the genome level should be obvious. Unfortunately, the importance of separating the gene and chromosome is continuously being ignored by gene center researchers.

3.2.3 The Somatic Gene Mutation Theory Is No Longer Relevant The gene mutation theory of cancer, or the somatic mutation theory of cancer, has been the dominating theoretical framework for current cancer research. According to this theory, cancer is a disease of accumulated gene

120

3. GENOME CHAOS AND MACROCELLULAR EVOLUTION

mutations. It begins with a mutation in a single cell that passes it on to its progeny through clonal expansion. Additional gene mutations (either oncogene or tumor suppressor genes) accumulate to break down different levels’ constraints, finally generating a mass of malignant cells with uncontrolled growth (Heng, 2015). There are some key features/predictions/assumptions and even strong desires toward this theory. Each of them has significantly displayed various anomalies which are unfortunately not realized or ignored by mainstream researchers, as these anomalies are apparently invisible to people who passionately believe in the center-centric dogma. A brief discussion will focus on the main ones. 3.2.3.1 Challenging the Obvious: Can a Few Key Gene Mutations Be the Molecular Basis of Carcinogenesis? While the concept that gene mutations cause cancer is considered a “fact” by the majority of cancer researchers, increased data have stated otherwise. To highlight this important point, some conflicting observations/viewpoints/historical facts will be briefly mentioned: On one hand, there are many well-known facts and historical viewpoints to support the current cancer theory: a. The idea that gene mutations function as the main cancer drivers fits well with the current genetic framework. It is generally accepted that genes play an important role in determining phenotypes. It is thus logical to accept the relationship between cancer gene mutations and cancer. b. The cancer gene mutation theory has deep historical roots. The idea that neoplasia results from altered genes was systematically suggested in the 1920s (Bauer, 1928). Further studies have linked Xrays to DNA mutations and cancer. In the 1950s, a clear framework of the cancer theory was proposed (Nordling, 1953; Fisher, 1958). In a new theory on the cancer-inducing mechanism, it was assumed that cancer formation requires the accumulation of six consecutive mutations, which can be used to explain the observation that the frequency of cancer seemed to increase according to the sixth power of age. First, there is an initial mutation which results in a clone growth of the mutated cell. Then there is another mutation in the clone leading to the growth of a new clone. Following multiple runs of such mutation-clonal growth cycles, cancer occurs. Epidemiologists Armitage and Doll supported Nordling’s model and proposed the two-stage theory of carcinogenesis based on an analysis of age-specific incidence of common cancers (Armitage and Doll, 1954, 1957). Interestingly, in the 1970s, Bruce Ames developed a bacterial test system to measure various compounds’ mutagenic

3.2 SOS

121

potency. His group reported a correlation over six orders of magnitude between the mutagenic and carcinogenic potencies of these tested compounds (McCann et al., 1975), suggesting that carcinogens can cause cancer by inducing mutations. Moreover, a similar model was introduced by Knudson based on statistical analysis of retinoblastoma and was known as the “two-hit” theory of cancer (Knudson, 1971). All together, these analyses have reinforced the desire to establish the gene mutation theory of cancer. The next task was to directly identify cancer genes. c. Following the identification of the first oncogene (Src) by Duesberg and Vogt (1970) and demonstrating that activated oncogenes were protooncogenes (which are essential genes involved in fundamental processes in normal cells), various cancer gene mutations have been identified, many of them isolated from patients. Well-known examples include Ras for different cancer types, RB for retinoblastoma, Bcr-Abl for CML, BRCA for familial breast cancer, p53 for Li-Fraumeni syndrome, and EGFR for lung cancer. d. Hundreds of cancer genes have been identified even before the cancer genome sequencing era, including oncogenes, tumor suppressor genes, cell death genes, cell cycle regulation genes, metabolic genes, and immune genes. Most of these identified cancer genes satisfy common criteria (thus can be published) and are further supported by various in vitro and mice models which link genes under investigation to cancer gene characteristics. These identified gene mutations can often be found in patient samples (even though many of them can only be detected from small patient populations). In fact, the method of continuously identifying all cancer genes and designing molecular targets based on these cancer genes has been the main research strategy in the past few decades. On the other hand, despite this generally accepted framework and the seemingly overwhelming evidence to support the cancer gene mutation theory, the following questions and paradoxes persist: a. Exceptions or general rules? The strong connection between specific gene mutations and cancer is always observed from familial cancer types which only account for a small portion of the patients in a given cancer type. For example, Lynch syndrome (hereditary nonpolyposis colorectal cancer) is linked to a mutation of genes responsible for DNA mismatch repair, which accounts for 3% of colon cancer cases. A similar rate is applicable to a proportion of patients with familial breast cancer (with BRCA mutation) among all breast cancer patients. For a majority of the sporadic cancer cases, the mechanism may differ greatly and show more diversity, suggested by the fact that they do not share gene mutation profiles.

122

3. GENOME CHAOS AND MACROCELLULAR EVOLUTION

High uncertainties even for familial cancer types also exist. The BRCA mutation involves a large number of additional somatic gene mutations in cancer evolution, making it difficult to predict. A good example is that heterogeneity can often be observed even from identical twins who have the same BRCA mutation but completely different disease conditions. Furthermore, whole exome sequencing failed to identify any known driver gene mutations in some hereditary cancer cases. In general, there is no simple relationship between gene mutations and most cancers. b. Gene mutations or chromosomal reorganization? Various wellknown experiments have demonstrated the “causation” between gene mutations and key features of cancer including transformation, tumor growth, metastasis, and drug resistance. However, most of these typical molecular characterizations have been solely based on genes rather than chromosomes, and the common underlying mechanism, which is genome instabilityemediated macrocellular evolution, has been ignored. In recent years, the hidden link between cancer genes, CIN, and cancer evolution has been revealed. Among many molecular pathways, CIN serves as the common key driver for cancer evolution. If the CIN status was examined while studying these cancer genes, the important contribution of CIN would be obvious. Examples of this are included in some of the most well-known experiments. For example, Bob Weinberg’s group has published famous results using three key cancer gene combinations (SV40 large T antigen, telomerase catalytic subunit, and H-Ras) to transform normal human diploidy fibroblasts (Hahn et al., 1999). When reexamined, the chromosomal changes were clearly essential features (Akagi et al., 2003). Interestingly, when the same experimental platform (the combination of three genes) was performed to transform primary mammary epithelial cell by Weinberg’s group (Elenbaas et al., 2001), the high level of chromosomal aberrations (NCCAs) was clearly observed but ignored, and only the possible involvement of c-Myc gene was considered for a clonal chromosomal aberration or CCA. Additional genetic alterations were explained as the results of in vitro growth after transformation (rather than the likelihood that they play a key rule for transformation). Later illustrated by others’ systems, it is clear that additional oncogenes are required for human cell transformation in addition to the three famous genes. For example, five or more introduced gene mutations are needed (Weinberg, 2014). Meanwhile, the ultimate importance of CIN has been realized (Heng, 2015). Nevertheless, given gene mutations can often be linked to different molecular pathways in different models. That is

3.2 SOS

123

the reason one gene mutation can be linked to very different mechanisms. Such a realization can unify the diverse gene mutations that contribute to CIN-mediated cancer evolution. c. While many gene mutations can be detected from cancer samples, they offer limited information about their contribution to the earlier stages of cancer evolution and do not necessarily support the cancer gene mutation theory. Only when the evolved cancer is a typical stepwise process where all gene mutations will be accumulated during evolution can these earlier genes be detected from clinical samples (as the end products). Knowing that (1) it is difficult to detect six drivers for most cancer samples (indicating that there is no accumulating process) and (2) the punctuated cancer evolution through genome reorganization rather than stepwise accumulations dominates (suggesting that genome reorganizationemediated evolution does not require gene mutations), these detected cancer gene mutations might actually not be involved in the initial phases of cancer evolution at all. It is more likely that oncogenes are actually products of the very late stages of cancer evolution (see Section 3.4 for a new model that integrates cancer gene mutations). d. Many experimental models artificially amplify the impact of gene mutations in cancer. To illustrate the function of a specific cancer gene mutation, many experimental methods are used to support the “causative” relationship between the manipulated gene and the cancer phenotype (see Section 3.2.1). But in the case of cancer, it is hard to illustrate this relationship when there are many other factors involved (see molecular medicine). In fact, many mice models lack immune systems and environmental heterogeneity, which are far from reality. Unfortunately, most of the information we have of cancer is based on such experimental data. No wonder it has been so difficult to apply new exciting findings from basic research to actual clinic. The incapability of translating the results of linear models to diverse patients has been referred to as cross Death Valley (Butler 2008). Furthermore, although the gene mutation can contribute to microcellular evolution, the microevolution should not be inferred to macroevolution (see Section 3.3.2 and Chapter 6). Understanding and regulating gene mutations versus understanding and controlling chromosome aberrationemediated genome chaos are very different tasks. e. There are many gene mutations including driver mutations in normal tissue. It is very surprising to know that some cancers lack driver mutations. It is even more surprising to know that many mutations, including many typical cancer drivers, have been detected from normal skin tissues (Martincorena et al., 2015; Brash, 2015). Based on further studies on other tissue types, abounded gene

124

3. GENOME CHAOS AND MACROCELLULAR EVOLUTION

mutations in normal tissue is common. According to cancer genetics 101, the main function of individual tumor cells to acquire new mutations is to promote their growth and survival. However, in addition to driver mutations, the size of clones carrying driver mutations is not to be much larger than those without drivers, which makes no sense to researchers. Additional analyses illustrated that the size distribution of mutant clones is consistent with clones growing by neutral drift, suggesting that most mutations in normal tissue are neutral (Simons, 2016). This explanation, however, did not solve the puzzle, as for many cancer types, their gene mutation pattern also belongs to neutral evolution (Ling et al., 2015; Williams et al., 2016; Davis et al., 2017). What is the key contribution of gene mutations to cancer then? Gene mutations are obviously not the key to this puzzle; an essential principle must be missing. 3.2.3.2 Challenging the Concept of Sequential Accumulation of Gene Mutations in Cancer Sequential accumulation of genetic alterations has long been observed in CML. In 1990, Bert Vogelstein’s group published the influential model of colorectal cancer illustrating the accumulation of gene mutations (APC, TP53, and Ras) during cancer progression (Fearon and Vogelstein, 1990). This model claims that cancer is caused by sequential mutations of specific oncogenes and tumor suppressor genes. There are many reasons this model has quickly become so popular in the field. It demonstrates the power of introducing the theoretical framework to understand cancer. First, by combining genetics (gene mutations) with evolution (stepwise accumulation) as well as cancer’s phenotypes, this linear model illustrates the simplicity of understanding cancer. Second, the simplicity of this linear model has motivated researchers to search for the earlier markers of this process and their molecular therapeutic targets. Third, it fits well with many biological phenomena and beliefs, such as gene orderly function in developmental processes and the accumulation of small changes overtime leading to big changes. Soon after, many publications have reported gene mutations in different cancer types. As for the highly heterogenous data seen in clinic, it is anticipated that if we accumulate more data, the stepwise pattern will finally emerge. There are some fundamental limitations in regard to the above viewpoints, however. First, evolution is never a linear pattern. Second and equally important, the gene mutation data in patients are highly heterogenic and unpredictable. The realistic nature of what we are dealing with has now been confirmed by the cancer genome sequencing projects. It turns out that despite the stepwise phenotypic changes (hyperproliferation of intestinal

3.2 SOS

125

crypts followed by formation of polyps and growth of polyps into invasive tumor), colorectal tumors display heterogeneity and do not support the Vogelgram. It should be mentioned that it was Vogelstein’s group who had revealed the highly heterogeneous genomic landscape of the colorectal tumor which questioned their own model. Clearly, the complexity of the cancer genome and his own first-hand experience of studying the cancer genome by large-scale sequencing have changed his perspective of cancer research (see Chapter 1). 3.2.3.3 The Limitations of Searching for Hallmarks of Cancer Another popular topic in cancer research is the hallmarks of cancer. Despite its wide usage, the hallmark of cancer is not a theory, but a proposal to summarize or categorize all identified cancer genes and their molecular mechanisms. Around the turn of the last century, increased cancer genes have been identified, leading to increased complexity and confusion. Doug Hanahan and Bob Weinberg reasoned that a few biological traits or “hallmarks” that encompass almost all the biology of all types of human tumors should be proposed to explain the complex behaviors of cancer cells. Initially there were six hallmarks, including growth stimulation, evasion of growth suppressors, resistance of apoptosis, replicative immortality, activation of angiogenesis, and induction of invasion and metastasis (Hanahan and Weinberg, 2000). Ten years later, abnormal metabolic profiles and the compromised immune system were added to the list, along with genome instability and tumor-promoting inflammation (named as enabling characteristics) (Hanahan and Weinberg, 2011). Both publications were historically successful, evident by extremely high citation numbers. Interestingly, according to Weinberg’s recent writing, their initial motive in writing this publication was rather less ambitious. We fully expected the review article that we cobbled together to disappear, sinking quickly like a stone thrown into a quiet pond. ... it was unlikely to resonate with the diverse community of cancer researchers, many of whom would dismiss it as simplistic. As it turned out, we were wrong. . a tribute not to its writing, but instead to the profound need ... to find some unifying themes among the ever-growing mass of observations. Weinberg, 2014

In addition to the fact that these hallmarks provide a unifying theme for many, what other factors contribute to such extreme enthusiasm? Subsequent analysis will shine some light on this topic. The research community swiftly embraced the hallmarks of cancer, as such synthesis has supported the notions that common cancer genes are responsible for the majority of cancers and the complexity of cancer can be dissected into simplified

126

3. GENOME CHAOS AND MACROCELLULAR EVOLUTION

molecular principles. The gene/pathway classification based on individual hallmarks provides explanation for the large number of diverse gene mutations, which is in contrast to the original estimation that only a handful of gene mutations would be discovered. Further, these hallmarks have been highly influential as they also provide the rationale and research direction for continued gene-based cancer research. Horne et al., 2015c (with permission from John Wiley & Sons).

Surely, this concept of the hallmarks of cancer has rationalized the effort to continue characterizing diverse molecular mechanisms of cancer. It boosts the confidence of many researchers while the overwhelming genomic heterogeneity has reduced each individual gene’s clinical value, especially when these gene mutations have low penetration in patient populations. It also reensures the significance of studying a given molecular aspect of cancer as long as it can be linked to these hallmarks. The diagram of hallmark has become one of the most frequently used images to kick off a seminar that focuses on the characterization of a cancer gene. No matter how trivial the message is, speakers often feel relevance by linking a specific gene mutation or pathway to the big picture of cancer research. Despite its extreme popularity, more questions begin to rise through the lens of complexity and cancer evolution (Lazebnik, 2010; Sonnenschein and Soto, 2013; Horne et al., 2015a-c; Fouad and Aanei, 2017). While the molecular knowledge of these hallmarks is drastically increasing, the clinical implication remains limited, as cancer dynamics cannot be summarized by a few isolated/fixed molecular principles. Furthermore, the highly heterogeneous genetic signature of cancers, including massive stochastic genome alterations, challenges the utility of continuously studying each individual gene mutation under the framework of these hallmarks. It is therefore necessary to re-evaluate the concept of cancer hallmarks through the lens of cancer evolution. Horne et al., 2015c

Such examination has resulted in the following conclusions/thoughts: 1. There is a highly dynamic relationship between individual hallmarks and it is nearly impossible to separate most of them. Current molecular experiments are based on more or less linear models, which allow researchers to focus on specific pathways or given cellular features. In reality, many genes function within a complex network and often simultaneously involve many pathways, either at the same or different stages of cancer evolution; pathway switch can either occur within the same cell or among different cells. Such interactions make it impossible to establish truly separated hallmarks as they are essentially overlapping. For example, the same cancer gene can function as oncogene or tumor suppressor or neither and it can either promote or reduce genome instability. One can characterize a hallmark in isolation but cannot predict which hallmarks will become dominate for a specific clinic case. Such network dynamics makes the effort of classification of

3.2 SOS

127

different gene mutations according to the hallmark of cancer less meaningful in clinic, especially when there are a large number of cancer genes and most of them with low penetration in patients. 2. For most complex systems like cancer, there is a crucial difference between “parts characterization” and “system behavior prediction” because of the involvement of emergent properties. Such principles question our design of fighting cancer through the understanding or categorization of these cancer gene mutations. Following four decades’ study of p53 gene mutation, for example, there is extensive knowledge about how p53 mutation works and the details of its interactive partners in each type of hallmarks, albeit the list is still increasing. However, so far there is no expected usage of it in cancer treatment. Knowing that p53 can be lined to each hallmark of cancer certainly fails to make the difference in clinic. Certainly, we can just continue to follow the example of studying the p53 gene and continue characterizing increased cancer gene mutations and classifying them into different hallmarks. Such efforts will generate a large number of publications but will not deliver the cure (see Chapter 8). Furthermore, it is challenging now to classify a large portion of identified gene mutations directly from cancer patients into any individual hallmark. 3. Among 10 cancer hallmarks (the number of hallmarks will only increase), genome instability should be the hallmark of all hallmarks (if the definition of hallmark of cancer is even important). The reasons CIN is most important are the following: CIN can be linked to each individual hallmark (either caused by or lead to) and most gene-defined pathways. CIN can also be linked to levels of systems stress (internal and environmental). The CIN thus plays a central role in connecting and regulating stress, molecular specificity, homeostasis, systems adaptation, and evolution. Increased evidence has illustrated that many initial stress responses through specific molecular mechanisms can become general systems stress. Furthermore, when the stress is either high or persistent, it often leads to CIN and results in altered genomes (Heng, 2015; Horne et al., 2015b). Because of the stochasticity of genome reorganization, it will generate different genomic coding with less specificity (only survived by selection force). These CIN-mediated cancer genes should be highly diverse because of unlimited genomic coding. Accordingly, studying the CIN and genome-level aberrations should become the priority. First and foremost, these seemingly stochastic types of genome aberrations detected from cancer need to be reexamined and integrated into various evolutionary models. Measures of system instability by degree of CIN represent a better

128

3. GENOME CHAOS AND MACROCELLULAR EVOLUTION

FIGURE 3.1 The relationship between stress, hallmarks, and CIN. Stress can lead to hallmarks or CIN, hallmarks can lead to CIN (which can be traced by specific pathways), and CIN can lead to stress and hallmarks (in stochastic fashion). Understanding a gene’s isolated function (as one or a few categories of the hallmarks) differs from predicting the evolutionary outcomes, as two events are linked by the stochastic relationship (modified from Fig 1, Heng et al., 2013a, with permission from Springer).

prediction than monitoring various hallmarks. As illustrated in the following diagram, the relationship between stress, hallmarks of cancer, and CIN-mediated macro/microevolution is a highly dynamic and stochastic one. One can understand each of the hallmarks and link a given gene mutation to it, but the challenge is the unpredictability of the “fortune wheel” of hallmarks. Such reality leads to the gap between knowledge of a molecular mechanism of a specific gene mutation and the evolutionary selection in patients. That is the main rationale to promote the understanding of evolutionary mechanisms of cancer rather than a large number of molecular mechanisms (Fig. 3.1) (Ye et al., 2009; Heng et al., 2011a, 2013a). 4. Historically, the idea of cancer hallmark has played an important role in reducing the confusion of why there are so many cancer gene mutations (which is against the concept that cancer is caused by a few common key gene mutations). However, the effort did not solve this issue; it only delayed the effort to address this issue. Now, after cancer genome sequencing, there are even more gene mutations that need to be explained; merely classifying them has nothing to do with the cancer theory. 3.2.3.4 Clinic Facts Do Not Support the Cancer Gene Mutation Theory of Cancer To judge a theory, one needs to ask how relevant or precise its predictions are. Previous discussions have illustrated that three key features or predictions of the cancer gene mutation theory are problematic. What about the clinic facts? It was also discussed that some huge clinic successes only represent exceptional cases such as CML.

3.2 SOS

129

The current cancer genome sequencing project has ultimately questioned the cancer gene mutation theory based on a massive amount of clinic samples. Three lines of evidence can be summarized: (1) for many sporadic cancers, it is difficult to identify the driver mutations. In contrast, there are many driver mutations in normal tissues, obviously only depending on the gene mutation theory to explain cancer is limiting; (2) chromosomal aberrations are overwhelming, which is hard to ignore; and (3) the status of chromosomal aberrations or even CNVs has a better clinic prediction than gene mutation profile.

3.2.4 Increased Calls for New Cancer Theories In regard to all increased doubts, confusion, and anomalies mentioned thus far, this crucial question is long overdue: is it time for a new cancer theory? 3.2.4.1 Noted Competing Theories/Concepts Some scholars have voiced their opinions and replied “yes,” even introducing competitive theories/concepts of their own. Some representative theories are briefly mentioned here. More information can be found in Debating Cancer (Heng, 2015). 1. Aneuploidy Theory

The aneuploidy theory insists that a chain reaction of aneuploidization leads to carcinogenesis (Rasnick, 2011). Initially proposed by Boveri over 100 years ago, aneuploidy has been systematically linked to cancer by Duesberg’s group since the late 1990s (before this topic became popular) (Gibbs, 2003). Moreover, using aneuploidy data, Duesberg et al. have firmly denied the key role of gene mutations in cancer, despite that he studied the first oncogene which promoted the cancer gene mutation theory. In recent years, aneuploidy studies have become popular, partially because of using yeast as a model system to study the molecular mechanism of aneuploidy (Sheltzer et al., 2011; Siegel and Amon, 2012; Zhu et al., 2012; Bonney et al., 2015; Gordon et al., 2012; Pavelka et al., 2010). In addition to demonstration of a paradoxical role of aneuploidy (acting both in promotion and suppression of cancer) (Weaver et al., 2007; Silk et al., 2013; Heng, 2015), it was recently illustrated that there is a complex relationship between aneuploidy, CIN, genome reorganization, and tumorigenicity (Ye et al., 2009), that aneuploidy can change the karyotype coding system (Rancati et al., 2008; Heng et al., 2011a), and that nonclonal aneuploidy can anticipate the emergent properties of cancer (Ye et al., 2018b). As aneuploidy represents one type of genome aberrations, it can easily be integrated into the genome theory of cancer evolution (Heng et al., 2006a-c; Heng, 2009, 2015).

130

3. GENOME CHAOS AND MACROCELLULAR EVOLUTION

2. Tissue Organization Field Theory

Tissue organization field theory (TOFT) states that like normal morphogenesis, carcinogenesis occurs at the tissue level rather than cellular level. Cancer causing agents affect the underlying mesenchyme/ stroma, rather than cause mutations in epithelial cells. The resulting disruption of cell-to-cell and cell-to-extracellular matrix communication leads to cancer evolution. Furthermore, normal tissue environments can normalize neoplastic tissues. It is therefore important to study normal and neoplastic tissue together in their natural architecture (Soto and Sonnenschein, 2011; 2013). Clearly, the approaches of focusing more on the tissue level as well as the regulation of reversible process of carcinogenesis are of more importance. Further research is needed to illustrate how somatic evolution works in the context of tissue constraints and how both genomic and non-genetic landscapes change during cancer evolution. 3. Cancer Attractor Theory

The cancer attractor theory proposes that abnormal regulatory signaling and mutational rewiring of the network could disrupt cellular differentiation and result in tumorigenesis. By treating cell types as attractors, Huang et al. argue that cancer cells are trapped in abnormal attractor states (Huang et al., 2009 ; Huang, 2013). Huang’s group has also illustrated the importance of various nongenetic variations in cancer progression, as these nongenetic variations are intimately associated with cell statuses. Their efforts have generated high interest in the field by providing the necessary tools to understanding cancer stem cells and epigenetic landscapes, two popular topics in current cancer research. The similar frameworks have been applied to different models. For example, the intrinsically disordered proteins have been linked to network rewiring of protein interaction and cancer (Kulkarni et al., 2013; Tompa, 2012). Another example is to apply the cancer attractor idea and consider cancer cells behaving as simple minimal replicators, which are able to operate in a robust manner under noisy conditions (Sole et al., 2014). Interestingly, Ao’s group proposed the endogenous network theory to combine molecular biology, mathematics, engineering, and physics under stochastic dynamics and evolutionary adaptation. They considered cancer as an intrinsic robust state of the endogenous network that is not optimized for the interest of the whole organism (Ao et al., 2010). For most of these models, despite the progress that has been made in understanding network structure and dynamics, state/phenotype switching (gene regulatory network controls gene activities), and landscape adaptation, they should perhaps focus on integrating separate gene/epigene-mediated network rewriting and chromosome/karyotypemediated network reorganization (for overall boundary) into their

3.2 SOS

131

models, knowing the ultimate importance of genome variationemediated macrocellular evolution. 4. Theories Influenced by the Natural History of Evolution

There are a few theories/hypotheses in this category. The first type is that the cancer represents a type of atavism. It was proposed that cancer is a preprogrammed and preloaded ancient survival response to a threatening cellular microenvironment. This mechanism is not active under normal situations; it can be activated under certain conditions and eventually lead to cancer (Davies and Lineweaver, 2011). Different ideas have been proposed, including that cancer shares its roots with incipient eukaryogenesis (Sterrer, 2016). Note that this is a rather provocative idea. Although cancer shares some features of ancient unicellular life forms (such as favoring fermentation in low-oxygen conditions), this phenotypic similarity could also be explained by selection of unique features different than that of normal cells. Cancer cells can quickly adapt to new environments by favoring high-oxygen conditions. Moreover, cancer cells almost always form new genomes during the survival process which are drastically different from ancient genome forms. The second type involves the retrotransposon-mediated genomic changes. Adam Wilkins proposed that the stress-induced epigenetic effects of retroelements can change the activities of neighboring genes contributing to cancer initiation; furthermore, retroelements anticipate the chromosomal rearrangement in cancer progression (Wilkins, 2010; Campbell et al., 2014; Nazaryan-Petersen et al., 2016). The third type is that human genome mismatches the rapidly changing environment and lifestyle. Gluckman et al. have suggested that a rapid environmental change can trigger the mismatch between the organism and the environment they are living in. Such mismatch will lead to disease conditions including cancer (Gluckman et al., 2011, 2013). This idea fits well with how stress-induced CIN links to cancer (Heng et al., 2013a-c; Horen et al., 2014). The fourth type is the tumor society. In early 1980s, Gloria Heppner and others proposed that tumors should be considered societies or ecosystems in which the various members (clones) interact to produce a dynamic group or emergent properties that define the overall behavior of a population (Fidler and Hart, 1981; Heppner, 1984; Woodruff, 1983; Heng, 2015). For example, prediction of a tumor’s response to drug treatment is not practical solely based on the sensitivity of its individual subclones (Heppner and Miller, 1989). Recently, the tumor society concept has received increased attention from evolutionary biologists (Cleary et al., 2014; Crespi et al., 2014).

132

3. GENOME CHAOS AND MACROCELLULAR EVOLUTION

5. Theories Influenced by Developmental Biology and Epigenetics

Because of increased difficulties of using gene mutation to explain cancer, the epigenetic viewpoint of cancer is becoming more popular. In fact, many other theories such as TOFT and cancer attractor models focus more on the developmental concept and epigenetic dynamics (Esteller, 2008; Feinberg et al., 2006; Feinberg, 2014; Jones and Baylin, 2002; Heng et al., 2009). Feinberg’s group proposed a unifying model of cancer to illustrate how epigenetic dysregulation allows for rapid selection leading to tumor cell survival at the expense of the host (Timp and Feinberg, 2013). This model is based on the observations that various epigenetic modifications drive tumor cell heterogeneity with phenotype plasticity and stochasticity. Another significant idea in this category is the cancer stem cell hypothesis. It states that the cancer stem cell differs from other somatic cells and only certain progenitor cells in the body can become cancer. These cancer stem cells display self-renewal properties and thus have the chance to accumulate various gene mutations through waves of clonal expansions. Cancer stem cells are also responsible for drug resistance. The concept of cancer stem cells faces some vigorous challenges. Because nonecancer stem cells can become cancer stem cells, the significance of focusing on cancer stem cells is drastically reduced. Moreover, even if the idea that cancer can only be caused by cancer stem cells is true, we would still need to understand how evolutionary mechanisms function in these cancer stem cells. 6. Theories Related to Genetic and Environmental Factors

Superficially, these following theories seem very different, yet they can all be linked by factors that contribute to cancer. The first one is the infection theory of cancer. Currently at the global scale, close to 20% of all cancers are related to infectious diseases. It has been proposed that infection is the major cause of most common and complex diseases including cancer (Ewald, 2009). Ewald has even estimated that by the year 2050, more than 80% of all human cancers would be linked to infection. While infection is an obvious link for cancer, the challenge is to explain why only a small portion of infected patients display cancer (Heng, 2015). The second one is the Warburg effect and the mitochondria’s contribution to cancer. Most cancer cells rely on aerobic glycolysis to generate energy, differing from normal cells which mainly rely on mitochondrial oxidative phosphorylation. This phenomenon was termed as “the Warburg effect.” (Warburg, 1956). In recent years, the metabolic link to cancer has become a new cancer hallmark. With increased knowledge of the mitochondria, it was promised that tumors “addicted” to aerobic glycolysis can be targeted as the new Achilles’ heel of cancer. Unfortunately, cancer is complicated, even when dealing with metabolic

3.2 SOS

133

features. Now, even the Warburg effect itself is no longer that black and white. First, Warburg’s interpretation of mitochondrial dysfunction is inaccurate, as mitochondrial function is essential for cancer cell viability. Second, tumors do not consistently inhibit mitochondrial bioenergetics as Warburg suggested (Wallace, 2012). Furthermore, different metabolic types are highly dynamic and are constantly under evolutionary selection. The third one is the mutator phenotype. Long before the cancer genome sequence project, it was thought that the mutation rate of normal cells is insufficient to generate the large numbers of mutations in human cancers. Loeb et al. proposed that cancers exhibit a mutator phenotype. Specifically, the loss of function of one gene, such as a DNA repair gene, can accelerate the mutation rate at other loci of the genome, increasing the likelihood of a tumor acquiring advantageous mutations (Loeb et al., 1974). Based on cancer genome sequence data, it would be interesting to illustrate the relationship between mutator phenotype and driver gene mutations, as well as genome chaos. 3.2.4.2 The Search for New Framework Despite the various theories being proposed, they all oppose the current framework of the cancer gene mutation theory one way or another. The mixed attitudes toward replacing or modifying them, however, vary greatly among these thinkers. Some want to improve the old theory by including new elements (such as epigenetic contributions, stem cells, metabolic factors) while others want drastically different ideas (such as denying the importance of the gene all together). So far, the tipping point to replacing the current dominant theory has not been reached. Nevertheless, more and more researchers have voiced their concern toward the current framework of cancer research. Even leading supporters of the cancer genome mutation theory express their doubts, albeit in a more casual approach. We lack the conceptual paradigms and computational strategies for dealing with this complexity. And equally painful, we don’t know how to integrate individual data sets, such as those deriving from cancer genome analyses,...

Bob Weinberg further expressed his frustration concerning the ultimate challenges of complexity: for example, despite the enormous quantity of observations about cancer made in the past 50 years, essentially no insights on its mechanisms (from its initiation and progression) were achieved. Even worse, we are currently unable to really assimilate and interpret most of the data we generate. He concluded that the field of cancer research has now come full circle: the reductionists’ initial success on molecular mechanisms has been replaced by our current dilemma. In fact, we cannot even really know what we are doing!

134

3. GENOME CHAOS AND MACROCELLULAR EVOLUTION

How will all this play out? I wouldn’t pretend to know. It’s a job, as one says on these occasions, for the next generation. Passing the buck like this is an enormously liberating experience, and so I’ll keep on doing it! Weinberg, 2014.

While such a candid public statement is not easily voiced from a leading scholar of the cancer gene mutation theory, very few, including the author himself, have changed their ineffective scientific methods or integrated new ideas that can be useful to future research proposals. Why not?

3.3 GENOME CHAOS: REDISCOVERY OF THE IMPORTANCE OF THE KARYOTYPE IN CANCER As illustrated by the CML story, the excitement of identifying chromosomal aberrations (such as the Ph chromosome) has since then been overshadowed by the promise of characterizing genes (such as Bcr-Abl fusion gene). This general trend is driven by the strong desire in the molecular biology field to favor the highest resolution possible and to avoid dealing with stochasticity in the name of searching for patterns. Of perhaps equal importance, chasing popular subjects in the field to secure funding has also led to chromosomal studies in cancer becoming “out of fashion.” As newcomers to cancer research when our laboratory was established in 1999, the simple strategy was to follow the trend. With a newly equipped spectral karyotyping (SKY) system, as well as an array of cutting-edge molecular cytogenetic and cytogenomic platforms (many of which were developed by our own group), we were posed to identify key chromosomal regions hosting the key cancer genes. With the available genomic recourses resulting from the soon-to-be-finished Human Genome Project (from bacterial artificial chromosome clones to detailed gene maps), our first goal was to identify the common abnormal karyotypes from cancer cell lines and/or patient samples using SKY. We would then narrow it down to a few candidate genes within the chromosomal regions and, finally, identify the cancer gene. This was before the cancer genome sequencing era. SKY technology, which paints each chromosome with a unique color, obviously represents a powerful tool with which to study cancer karyotypes (Schrock et al., 1996; Speicher et al., 1996; Heng et al., 2011b; 2003; Ye et al., 2001, 2016). Before the SKY era, it was a nightmare to karyotype these cell lines or patient samples in which there were many highly complex marker chromosomes involved because of the limitation of Gbanding. It was promised that with the help of SKY, these longunidentifiable clonal chromosomal patterns would become emergent.

3.3 GENOME CHAOS

135

The SKY platform has unexpectedly directed our research into uncharted waters. The exciting journey of searching for new genomic and evolutionary frameworks started from the characterization of NCCAs, the seemingly insignificant “noise.”

3.3.1 Linking Incidental NCCAs to CIN and Evolutionary Potential Rather than identifying some common translocations in different types of cancers as we had hoped, our extensive efforts with SKY analyses concluded that the vast majority of chromosomal changes are of the NCCAs type; moreover, the signature abnormal karyotype, in the case of most solid tumors, is nowhere to be found! We must first ask, what do NCCAs and CCAs look like? The following panel illustrates some examples of structural (such as simple types of translocation) and numerical (such as aneuploidy) NCCAs and CCAs. For more complicated NCCA types, see Chapter 4.4. Second, what are their definitions and classifications? Current cytogenetics defines a clonal chromosome aberration (CCA) as a given chromosome aberration which can be detected at least twice within 20 to 40 randomly examined mitotic figures. Based on this definition, the frequency of CCA needs to be higher than 5%e10% in an examined cell population. In literature, however, when a CCA is reported, researchers often refer to aberrations with frequencies that are over 30%. Using the cut-off line of CCAs, a non-clonal chromosome aberration (NCCA) should refer to aberrations observed at a frequency of less than 5%. According to our experience, we usually examine 50e100 mitotic figures when scoring NCCAs and CCAs, and therefore, 4% is used as the cut-off (i.e., less than 2 in 50 mitotic cells examined); this is done even though, theoretically, the cut-off line could be 1% or lower (i.e., if more than 100 mitotic figures are used). NCCAs can be classified into structural and numerical types (Ye et al., 2007; Horne et al., 2015; Bayani et al., 2007). There are increased structural types of NCCAs being reported. Within the punctuated macro-cellular evolutionary phase, massive amounts of NCCAs can be detected, often coupled with complex chromosomal aberrations. In addition to being classified by their structural and numerical differences, CCAs can be further classified into different types. In the “watching karyotype evolution in action” experiments, there are many short-lived transitional CCAs (i.e., those CCAs detectable before the establishment of a cell line), and late-stage, more stable CCAs (which serve as the featured aberrations for the cell line or the specific cancer sample). In the clinic, there are some signature CCAs which can be used as a common marker for a given disease, such as the Philadelphia chromosome for chronic myelogenous leukemia or CML, and an extra chromosome 21 for Down’s syndrome. In general, CCAs dominate in the stepwise microcellular phase of cancer evolution. .” Heng et al., 2016a

In cytogenetics, NCCAs have historically been considered to be insignificant genetic “noise” (Mitelman, 2000). The prize of cytogenetic

136

3. GENOME CHAOS AND MACROCELLULAR EVOLUTION

studies is the identification of common karyotypes or chromosomal markers (CCAs) with clinical significance. The best example is the identification of the Ph chromosome for CML. The majority of NCCAs have been ignored and do not even appear on cytogenetic reports. In our experience, however, CCAs are exceptions and NCCAs represent the general rule when detecting chromosomal aberrations. With SKY technology specifically, in the case of most unstable cancer samples, it is hard to detect “identical” abnormal karyotypes! Yes, some common marker chromosomes can be detected, but so can the many other nonshared ones. The data from Fig. 3.2 can only be obtained from some very stable cell lines.

FIGURE 3.2 Example of nonclonal chromosome aberrations (NCCAs) and clonal chromosome aberrations (CCAs). Spectral karyotyping (SKY) karyotypes of four tumor cells isolated from a xenograft mouse model of human breast disease illustrating CCAs and NCCAs. Images (A) and (B) represent CCAs, as the two cells share identical sets of altered chromosomes (both structural and numerical alterations) (total of 7 translocations and two aneuploidy), highlighted by light yellow coloration. These altered chromosomes clearly belong to the recurrent aberration category. Images (C) and (D) represent both CCAs and simple types of NCCAs. In addition to the CCAs identical to those in (A) and (B), there are additional chromosomal changes highlighted by the light blue coloration that are not shared by other cells. These nonclonal alterations belong to the nonrecurrent aberration category. Image (C) shows one numerical NCCA (this cell gained an additional copy of chromosome 2) and (D) shows one structural NCCA (this cell gained an additional translocation involving chromosomes 20 and 22). Both image and figure legend are modified from Heng et al., 2006b, with permission from John Wiley and Sons).

3.3 GENOME CHAOS

137

We soon discovered that not only can NCCAs be detected in almost all systems we examined but also that their frequencies are associated with many genetic and experimental conditions, such as gene mutations, oncoprotein expression, culture conditions, stage of the cell lines, response to drug treatment, etc. The list quickly becomes longer and longer. After years of research, the linkage between NCCAs and CIN has been clearly established (Heng et al., 2006a-c, 2013a-b), which is very useful for studying cancer evolution, as CIN functions as a common driver for cancer evolution. Taken together, NCCA frequencies have been identified as a key feature of system instability which is essential to the cancer phenotypeeelevated NCCAs can be linked to almost all oncogenes, tumor suppressor genes, dysfunctions of important pathways and any experimental manipulation that involves genome instability and cancer. These phenotypes include onco-virus infection, aging, inflammation, mitochondrial dysfunction, radiation, epigenetic dysfunction, and others. Stochastic genome alterations are not “noise” but rather drivers of cancer evolution as well as an indicator of the status of a system’s stability. In addition, many of the NCCAs could be evolutionarily neutral at an earlier stage, and increased NCCA frequency results in increased evolutionary potential. Heng, 2015

Following the establishment that NCCAs increase cancer evolutionary potential by providing genomic variability and can thus be used as a reliable index for monitoring CIN, a theoretical framework was urgently needed to search for the mechanism of how NCCAs work. One breakthrough was the realization that chromosomes define inheritance, and altered chromosomes result in an altered genomic blueprint (Chapter 4). If these NCCAs can change the genomic information for host cells, then the traditional practice of ignoring them in order to simplify the analysis is problematic. We have thus dedicated much of our effort to synthesizing the genome-based inheritance theory. Meanwhile, the following conclusions have also been reached: 1. The NCCA/CCA cycle can be used to measure the dynamics and stability, as well as the growth and survival status, of a cellular population (Heng et al., 2006aec; Ye et al., 2007). This relationship can be applied to many similar situations in which a biological system and its response to changes is the focus of study (e.g., heterogeneity/homogeneity; fuzziness/precision, outliers/ average). The common basis for these paired relationships is evolutionary potential and system stability. Of equal importance, the NCCA/CCA relationship can bridge the gap between genotype and phenotype. Obviously, evolutionary potential is of ultimate importance when studying the cellular evolutionary process. For example, NCCA/CCA studies can serve as a good model for

138

2.

3.

4.

5.

6.

7.

3. GENOME CHAOS AND MACROCELLULAR EVOLUTION

studying the relationship between passenger mutations and driver mutations. Clearly, we must not ignore these so-called passenger mutations. The NCCA/CCA relationship can be used for studying the system stress response and in particular for identifying how the system changes under stress. For example, it has been noticed that many of the popular experimental manipulations can drastically change the system under investigation, as reflected by new emergent pathways and genomes (Stepanenko and Heng, 2017; also see Chapter 8). Such observations explain the key limitation of many experimental systems. In the future, monitoring the NCCA/CCA relationship needs to be carried out when genetic manipulation is applied. The chaotic genome belongs to one complicated type of NCCAs. There are many more unreported NCCAs which need to be systematically analyzed (Heng et al., 2004c, 2006a-c; 2013b). The overall degree of NCCAs, as well as some specific subtypes, can be linked to diseases. For example, elevated NCCAs have been linked to various diseases such as Gulf War illness. Elevated NCCAs can also be used for cancer diagnosis (Heng et al., 2011a-b, 2013a-c; 2018; Liu et al., 2018; for more, see Chapter 8). Recently, increased clinicalrelated reports have involved the study of NCCAs (Chandrakasan et al., 2011; Niederwieser et al., 2016; Rangel et al., 2017; VargasRondon et al., 2017, Ramos et al., 2018; Chin et al., 2018). The universal detection of NCCAs, both in vitro and in vivo, including germlines, suggests the importance of fuzzy inheritance. The fact that each cell line and individual display its own frequencies of NCCAs suggests that inheritance is likely defined by the range of changes, rather than specific changes (Heng, 2015; see Chapter 4). Both the frequencies and complexity of NCCAs are linked to tumorigenicity, the aggressiveness of cancer, and drug resistance (see Section 3.3.3). NCCA studies have led to the discovery of the evolutionary mechanism of cancer, which can unify diverse molecular pathways (see Section 3.3.4) (Ye et al., 2009; Heng et al., 2013a). Furthermore, the phase transition between NCCA and CCA holds the key to understanding the relationship between macro- and microevolution. NCCAs might serve as agents facilitating the emergence of a cellular population, and NCCA-defined heterogeneity likely plays an important role in this process, which explains why a small amount of NCCAs may be more significant in determining the direction of evolution during crises compared with normal stable conditions. Another significant contribution of NCCA studies has been to solve the confusion of the clonal concept. For a long time, most molecular geneticists considered cellular populations obtained from the same

3.3 GENOME CHAOS

139

source to be clonal. Despite our continuous effort to highlight this confusion in the field, only a limited amount of researchers have gotten the message. The key is the realization that NCCAs represent a common feature for most cell lines, and each occurrence of NCCAs represents a new genome system. The term “clonal” has two meanings: lineage and identity (or similarity). Although a clear lineage may be determined based on historical information and/ or short sequences of DNA, it does not mean that the cells share the same genome. For example, parental and daughter cells are connected by lineage, but in most cancers, parental and daughter cells often display different genome identities. In contrast to the traditional assumption that genome-level change would be passed on to the daughter cells if the cell divides, for these cells with unstable genomes, parental cells cannot pass on the same genome (Heng et al., 2006a-c; 2010, 2011a; 2013a). This leads to the unique feature of the cancer cell population, where an entire cell population can display different types of genomes Horne et al., 2015c (with permission from John Wiley and Sons).

3.3.2 Two Phases of Cancer Evolution The two phases of cancer evolution (punctuated phase followed by the stepwise phase) were discovered by us when we performed the “watching karyotype evolution in action” experiment. Briefly, an in vitro immortalization model of Li-Fraumeni fibroblast cells was used to illustrate the relationship between genotype and phenotype during evolution. In this model (Bischoff et al., 1990), different passages of cultured cells could be traced, starting from normal fibroblast cells; early passage cells (population doublings (pd) < 15) were followed by cells in crisis (pd25e30), then cells entering the postcrisis stage (pd 40e70), and finally cells becoming an immortalized cell line (pd > 200). Cells representing different stages were harvested to make chromosomal slides, and SKY analyses were performed on these cell populations. Karyotype evolution was compared among these different stages of cells, and patterns of NCCAs and CCAs were compared. Two types of information were expected: (1) The profile of karyotype changes would correspond to cellular phenotype, as well as the evolutionary process. In particular, we hoped to identify specific chromosome aberrations that cause immortalization and to illustrate detailed clonal expansions associated with the process of immortalization. (2) The pattern of multiple runs of parallel evolutions. The rationale was that comparison of these patterns would reveal a similar karyotype for the same phenotype of immortalization process (including early passage, crisis stage, postcrisis stage, and stable cell lines). Most surprisingly, there was no persistent clonal expansion at the karyotype level during the earlier stages (from pd7 to pd30). As illustrated in Fig. 3.3, at pd7, there were already three clonal translocations, in

140

3. GENOME CHAOS AND MACROCELLULAR EVOLUTION

FIGURE 3.3 Examples of karyotypes from earlier, precrisis, and postcrisis stages. Three spectral karyotyping (SKY) images representing main karyotypes of three stages, which illustrate the discontinuous patterns of karyotypic evolution from early passages to postcrisis stage. At passage number 7 (pd7) (top panel), there were three clonal chromosomes with translocations (indicated by arrows). At pd19 (middle panel), these three rearranged chromosomes in pd7 were absent and were replaced by many new rearranged chromosomes (both clonal and nonclonal, indicated by arrows). At pd54 (lower panel), again, nearly all of the rearranged chromosomes from pd19 were absent and were replaced by new rearranged chromosomes (most of which were CCAs). Most of the CCAs detected from pd54 (indicated by arrows) were faithfully passed onto >pd200. Overall, there is no fixed clonal expansion of specific CCAs across the whole process of karyotypic evolution. Modified from Heng et al. (2006a). Stochastic cancer progression driven by nonclonal chromosome aberrations. Journal of Cellular Physiology, 208(2), 461-472. https:// doi.org/10.1002/jcp.20685.

3.3 GENOME CHAOS

141

addition to many instances of aneuploidy. Although these translocations were shared among pd7 (about 25% of cells), none of these passed into pd19 cells. For pd19, there was a newly formed translocation set (over 75% of the cells). However, at pd54, a few persistent marker chromosomes appeared, which replaced the clonal markers detected from pd19. Unlike all cells before crisis, all clonal translocations from pd54 were faithfully passed through many passages (up to pd > 300) (see Heng et al., 2006a). Putting all NCCAs and CCAs data together within the evolutionary process, the linkage between karyotype pattern, genome instability, and the resulting phenotype becomes obvious, resulting in a general pattern of NCCAs/CCAs frequency and types during the entire immortalization process (Fig. 3.4).

FIGURE 3.4 The two phases of cancer evolution illustrated by karyotypic pattern and phenotypes. A diagram of karyotypic patterns evolving during the immortalization process and its relationship with corresponding cellular phenotypes. Different colors within the pie diagram (green, purple, orange, blue, and red) represent different clonal patterns that were detected at each stage of the continuous cell culture. Before the crisis stage (from pd7 to pd24), any clonal chromosomal aberrations that occurred were short lived and to be replaced by new karyotypes with limited clonal chromosomal marks. From pd19 to pd24, there were very limited clonal aberrations in each cell, usually one to two, and the majority of chromosome aberrations belong to the NCCAs. Cellular crisis led massive cell death and a survival cell was emergent. Post cell crisis, however, all clonal chromosomal aberrations were faithfully passed through to the later passages and displayed continuous patterns of evolution or stepwise clonal expansion (represented by red color pie). The colored dots and the area underneath the curve represent the level of NCCAs detected from each corresponding stage. The higher the peak of the curve, the higher genome instability was detected from the cell populations. High levels of NCCAs were linked to the discontinuous patterns of CCAs. Phenotypes (such as growth status and immortalization features) are placed in their corresponding stages of continuous culture. Modified from Heng et al., 2006a, Stochastic cancer progression driven by nonclonal chromosome aberrations. Journal of Cellular Physiology, 208(2), 461-472. https:// doi.org/10.1002/jcp.20685.

142

3. GENOME CHAOS AND MACROCELLULAR EVOLUTION

This was truly eye-opening data! With these interesting patterns of karyotype evolution, we suddenly realized many hidden facts of somatic cellular evolution. This decade-long research effort has generated a great deal of interesting data and new ideas. The following brief discussions summarize some rationales, terminologies, and implications. For further details and additional discussions, see Heng 2015. 1. Terminologies: Punctuated and stepwise evolution initially referred to the karyotype pattern observed in an immortalization model where both nonclonal and clonal expansion were detected (Heng, 2006a). Now, these concepts also apply to the DNA level, as sequencing efforts have recently confirmed these evolutionary phases in cancer progression (Baca et al., 2013; Navin et al., 2011; Stephens et al., 2011). Different from clonal diversification, there are massive infrequent chromosomal aberrations within the punctuated phase leading to stochastic genome re-organization, interrupting the inheritance of karyotypes between mother and daughter cells. In the clonal phase, however, the majority of cells are clonal across generations with traceable karyotype diversification. Punctuated equilibrium was proposed to explain why most species exhibit minimal net evolutionary change (phenotype) for most of their geological history, and significant evolutionary changes occur rarely and rapidly (on a geologic time scale). We borrowed the term punctuated to describe the rapid and drastic genomelevel changes in contrast to stepwise evolution where the same genome is maintained coupled with possible gene mutation accumulation .. Horne et al., 2015c (with permission from John Wiley and Sons).

The same pattern was later observed from different systems (human and mice) with various treatments, which illustrated that it is a general rule in cancer evolution. The punctuated phase is often observed from major transitions (immortalization, transformation, metastasis, and drug resistance), representing a common mechanism of macrocellular evolution. The concept of punctuated equilibrium was originally suggested by Niles Eldredge and Stephen Jay Gould to explain why most species exhibit minimal net phenotypic changes for majority of their geological history and why key evolutionary changes happen rarely and rapidly on a geologic time scale (Eldredge and Gould, 1972). In contrast, our definition of punctuated evolution focuses on the pattern of genotypic (karyotypic) evolution, which serves as the basis for phenotypic changes. The two phases of karyotype evolution offer a better explanation for the genetic/ genomic basis of punctuated phenotypic evolution, which drastically differs from Eldredge and Gould’s viewpoint (for more discussion, see Chapter 6). Moreover, the concurrent mapping of NCCA/CCA profiles with the two phases and the transition between them has identified that system stress functions as a trigger factor for the phase transition. It was clearly difficult for Gould to explain, when focusing only on phenotype, how the accumulation of small changes leads to big change. That was perhaps one reason why he downplayed the great significance of

3.3 GENOME CHAOS

143

punctuated equilibrium later in his life. Unsurprisingly, when the terms “stochastic/unpredictable” or “punctuated” cancer evolution were initially introduced (Heng et al., 2006a-c, 2009; Heng, 2007b), it was challenging to pass this message. We have faced great difficulties in communicating with mainstream evolutionary biologists about the genomic punctuated pattern of evolution. Following the large-scale sequencing of cancer genomes, however, this terminology has finally become popular. Still, confusion persists in regard to the concepts of genotype versus phenotype, as well as gene mutation versus genome reorganization, despite many of our writings. 2. The Dynamics of the Two Phases of Evolution a. These two phases are interchangeable, depending on system stability. In other words, system instability and evolutionary selection is the mechanism of transition between these two phases. It is likely that multiple cycles of this biphasic evolution are required for cancer evolution to occur. b. Both the frequency and the degree of complexity of altered genomes are important in defining the punctuated phase. The chaotic genome, one type of complex karyotypes, can frequently be detected from punctuated phase, for example. Although chaotic genomes have been observed previously, no serious studies have been initiated to systematically characterize them. With their vivid color and overwhelming frequencies, it was no longer reasonable to continue ignoring them. c. The two phases of evolution require different types of genetic/ genomic mechanisms. The punctuated phase is coupled with new genomic information defined by chromosomal reorganization, whereas the stepwise phase dominates genetic information defined by gene mutation. 3. Implications a. Perhaps one of the most important syntheses is to link the punctuated phase to macrocellular evolution and furthermore to organismal evolution (Chapter 6). For example, it explains why many intermediate karyotypes are not detected at late stages of evolution, as they existed only during short periods of time and within limited individuals. b. The unexpectedly discontinuous pattern of genomic information also clarified for us the necessity of redefining system inheritance and fuzzy inheritance (Chapter 4). c. This pattern suggests a new model of cancer, which focuses on the importance of creating the new genome at punctuated phase and

144

3. GENOME CHAOS AND MACROCELLULAR EVOLUTION

subsequently duplicating more cells with the selected genome with the help of “oncogenes” (see Section 3.4). d. It also explains the mechanism of genome reorganizationemediated drug resistance, as well as why cancer can be eliminated in some linear models but it is hard to apply these models in the clinic. In contrast, the cycle of two phases and their stochastic response to treatment are much harder to predict. e. The two phases of evolution also played an important role in encouraging us to reexamine the main function of sexual reproduction (Heng, 2007b; Gorelick and Heng, 2011) (for more information, see Chapter 5) and to develop the new model of how macroevolution followed by microevolution works for organismal evolution (Chapter 6).

3.3.3 Genome Chaos: Reorganizing the Genomic Landscape Complex types of chromosomal translocations are commonly detected from cancer samples. However, the molecular mechanism of their formation is unclear. For example, how do these complex translocations occur? Are they formed at once or formed by a few separate steps? It has been a long and difficult journey to characterize these ignored structures and to understand both their considerable significance in genomic research and their implications for human diseases. We observed the drug-induced complex type of translocations back in the early 1980s during our studies of chromosomal condensation. Following treatment with inhibitors of topoisomerase II, many cell lines, as well as frog lymphocytes in culture, can display chaotic mitotic figures in which many chromosomes can join together to form new chromosomes with strange morphologies. However, without SKY methods, this striking complexity was not appreciated by various reviewers. The lack of mechanistic explanations for this phenomenon has failed to convince even some of the leading cytogeneticists and cell biologists such as Joseph Gall and TC Hsu (years later, TC Hsu admitted that he should have paid more attention to these structures). Despite the fact that these highly chaotic karyotype images have been discussed with many scientists since the early 1980s, including in proposals for the NSF and NIH since the 1990s, few have realized their significance. Many of these original images were finally published recently, after more than three decades of discoveries (Heng et al., 2004c, 2013b; Stevens et al., 2007). During our studies of cellular immortalization, we have observed a large number of highly complex karyotypes, in which some altered chromosomes consisted of more than six translocations and yet were still stable (Heng et al., 2006a).

3.3 GENOME CHAOS

145

Soon, we demonstrated that these drastically changed karyotypes are frequently observed from the punctuated phase of cancer evolution, and a small portion of them can be passed into the stepwise phase. They can also be induced with high efficiency by drug treatment with modified protocols (Heng, 2007c; Liu et al., 2014; Ye et al., 2018a). The following figure illustrates morphological changes before and following drug-induced genome chaos (Fig. 3.5). Before treatment, most cells display a normal chromosomal morphology, with each chromosome displaying one SKY color. After treatment, most cells display either chromosomal fragments or highly complex translocations, illustrated by the multiple colors on these newly formed chromosomes. Not only are there many chromosomes in one mitotic figure but also many of the newly formed chromosomes are much longer than normal chromosomes. Clearly, massive chromosomal reorganization has occurred. We refer to this process of complex, rapid genome reorganization, resulting in the formation of chaotic genomes (both structurally and numerically), as genome chaos or karyotype chaos (Heng, 2007c; Liu, 2014; Heng, 2015). In the past decade, following the establishment of an effective platform for inducing genome chaos, various types of chaotic karyotypes have been characterized. Furthermore, by connecting the dots between stress, C-Frag, CIN, chaotic karyotypes, the transcriptome, cellular survival, and the emergence of new and stable karyotypes, the molecular mechanisms and biological significance of genome chaos have been illustrated (Stevens et al., 2007; Heng, 2009). The following discussions represent some examples: 1. What Happened? When drastically altered chromosomes were observed, most researchers thought that these “bizarre-looking” chromosomes would certainly lead to cell death and were thus meaningless. That was perhaps the main reason that some types of these structures have been ignored for decades. During studies of chromosome condensation, it was realized that a small proportion of them will be able to survive and become different classical types of chromosomal aberrations, such as simple translocations and aneuploidy (Heng et al., 1988). Most importantly, it was further realized that many of these highly disorganized chromosomes are survivable, as evidenced by the observation that they can form clonal karyotypes essential for cellular immortalization and drug resistance (Heng et al., 2006a, 2008; Stevens et al., 2007; Heng, 2009). Using the experimental induction method, chaotic genomes can be easily generated. The advantage of using this method is that the entire process of genome chaos can be monitored. Immediately after drug induction and rescuing of the treated cells, massive cell death occurs,

146

3. GENOME CHAOS AND MACROCELLULAR EVOLUTION

FIGURE 3.5

This figure illustrates the transition from a relatively normal genome to a chaotic genome. Both normal and chaotic mitotic figures were detected by SKY. In the before-treatment cell, individual chromosomes were observed featuring only one color. In contrast, in the posttreatment cell, there are massive chromosomal translocations (shown by multiple colors on the same chromosome) and chromosome fragments.

coupled with high levels of C-Frag. Then, the chaotic genomes appear. The entire process includes cell death with chromosomal fragments, the rejoining process of chaotic yet unstable genomes, and the emergence of more stable but simpler karyotypes. This process is illustrated in Fig. 3.6. Such experiments have been repeated multiple times to compare the patterns of chaotic genome evolution. Disregarding the types of drugs and various cancer cell lines used, the pattern is highly conservative. The following are various conclusions established after comparison of multiple runs of chaotic genome evolution: a. The entire process can last from 3 weeks to 2e3 months. Different treatments prolong certain phases. For example, when mitotic inhibitors are used to disrupt microtubules, the time required for the chromosome fragmentation phase can be delayed because of an additional phase in which accumulation of polyploidy cells takes place, followed by additional chromosome fragmentation caused by polyploidy, and finally by the chaotic genome phase and its selection. b. Although many diverse factors can trigger the process when generating the maximal capacity for potential cell death, the overall instability of the cellular population is key. For normal cells with functional checking points, it is much harder to induce high levels of genome chaos. c. Genome chaos generally only represents a transitional event in this process. Only scoring it from cancer samples, which are the end

3.3 GENOME CHAOS

147

FIGURE 3.6 Spectral karyotyping (SKY) images of the sequential process of druginduced genome chaos (A to D) and diagrams to illustrate the mechanism of reorganizing gene order along chromosomes (E to G). For genome chaos to occur for a cell with a relatively normal karyotype, illustrated by SKY where each chromosome displays one color, (A) responds to stress by undergoing chromosome fragmentation (B). Incomplete cell death by chromosome fragmentation results in rejoining of chromosomal fragments through nonhomologous end joining and other mechanisms to produce chaotic genomes (C). Examples of highly rearranged chromosomes are enlarged (bottom) to show detail. The massive amount of cellular heterogeneity that occurs as a result of genome chaos induction then undergoes selection and a winning, typically less chaotic, karyotype eventually expands clonally (D). E to F illustrates the mechanism of destroying and recreating new chromosomal coding templates from the parental genome (E) to newly emerged genomes (G) through stress-induced chromosome fragmentationemediated genome reorganization (F). E to G simplifies the events taking place in A to D (reused from Liu et al., 2014, Genome chaos: Survival strategy during crisis. Cell Cycle, 13, 528-537; and Heng et al., (2011b). Evolutionary mechanisms and diversity in cancer. Advances in Cancer Research, 112, 217-253. https://doi.org/10.1016/b978-012-387688-1.00008-9).

products of the process, will certainly tone down its important contribution to cancer evolution. Rather, one must trace the entire process of cancer evolution (this is similar to the missing links in organismal evolution). In other words, despite the fact that genome chaos does occur after all drug treatments, if we only examine the last stage, we will likely generate inaccurate conclusions based only on the resulting stable karyotypes. The final message is that for many cancer types, just because there are less chaotic genomes

148

3. GENOME CHAOS AND MACROCELLULAR EVOLUTION

observed at the late stages of clinical samples, it does not mean that less genome chaos has been involved initially. Now, it is clear that for prostate cancer and breast cancer, even at the late stages of clinical diagnosis, the volume of chaotic genomes is overwhelming. d. Multiple runs of drug-induced evolution produce resistant cells with different karyotypes. No two similar karyotypes were selected from different runs of the experiment. Nevertheless, the final stages of these stable karyotypes are generally much simpler than the initial chaotic genome. e. The gene expression profile is highly dynamic for cell populations within the phase of a chaotic genome (Fig. 3.7) (Stevens et al., 2013a-b, 2014; Heng, 2015). These dynamics explain the importance of outliers in macrocellular evolution. It also demonstrates the low potential for reproducing specific patterns of the transcriptome when the cell population comprises unstable genomes. When the overall transcriptome is drastically altered and multiple parallel experiments generate different pathway specificities, researchers need to report the full landscape of the transcriptome instead of cherry-picking a few pathways. Recently, single-cell transcriptomics has reinforced this point. Component 2

Ambion p109 p109 dox

Component 1

Component 3

FIGURE 3.7 Principal component analysis of global gene expression data demonstrates that replicate samples from cellular populations with elevated levels of genome chaos (pd109 dox, brown spheres) are less similar than replicate samples from cellular populations with less genome chaos (pd109, blue spheres). Expression profiles from pd109 replicates differ more from each other than profiles derived from hybridization of Ambion reference RNAs to the same microarray chip. Similarity between samples is represented by the threedimensional distance between the plotted spheres (reused from Liu et al. (2014). Genome chaos: Survival strategy during crisis. Cell Cycle, 13, 528-537).

3.3 GENOME CHAOS

149

f. The highest frequencies of chaotic genomes are often associated with major transitions of cancer evolution (e.g., immediately before immortalization, transformation, and metastasis and immediately after high-dosage drug treatment). Moreover, the appropriate treatment, adequate reculture conditions, and the amount of cell populations are important for producing the high frequencies of chaotic genomes and the emergence of new effective cell populations. In addition, the inhibition of nonhomologous end joining can reduce the frequencies of genome reorganization although not able to completely prevent rejoining. 2. Causative Factors According to molecular genetic traditions, efforts were made to search for key factors that cause genome chaos. However, based on the cellular immortalization model (Fig. 3.4), the culturing process and cellular crisis are clearly associated with genome chaos. When additional models were analyzed, the circumstances became more complicated. For example, different drugs that involved different molecular mechanisms induced the same phenomenon, suggesting that many different factors can be linked to genome chaos. Further experimentation and publications could potentially indicate a link between chaotic genomes and many additional diverse molecular events including viral infection, different molecular targets with small-molecule inhibitors, culture conditions, and the inactivation of many genes. Because diverse molecular pathways can likewise be linked to C-Frag, with the common linkage among these being general stress on the systems (Stevens et al., 2011b), and as C-Frag represents one stage of genome chaos, this common stress can be linked to the chaotic genome as well. Despite the above analyses, many researchers are still focused on singlemolecular mechanisms to explain genome reorganization. So far, the status of p53 and the formation of micronuclei have been used to illustrate the mechanism of chromothripsis. As illustrated in the evolutionary mechanism of cancer, the same phenotype of genome reorganizationemediated cellular survival can be achieved by diverse individual molecular mechanisms. The common mechanism of stress response genome context change should be the focus of current research (Ye et al., 2018a, b). 3. Mechanisms Genome chaos obviously represents a powerful cellular survival strategy. Therefore, a common mechanism, rather than a specific molecular mechanism, should be established. This mechanism is explained with the following concepts: a. The evolutionary mechanism represents the cell’s genomic response to a high level of stress which can eliminate the majority of cells. The ultimate result of this mechanism is cellular survival by the creation of new viable genomes.

150

3. GENOME CHAOS AND MACROCELLULAR EVOLUTION

b. The genomic information mechanism represents an effective way of creating a new genome by altering the genomic code via genome reshuffling. Some of the newly formed genomes can survive in very harsh conditions. Interestingly, during cellular recovery following drug induction of chaotic genomes, there is a massive amount of cell death as well as rapid rounds of cell fusion and division in search of functional genome types before the establishment of stable genomes (Fig. 3.8). Although most will not reach the final stage, many of these transitional genomes are still important for passing genomic information. This continuous process of genome reshuffling provides the best opportunity in generating potentially successful genomes via trial and error (with the necessary genomic information). This also explains why there are many clusters of nuclei, reflecting the fusion/fission process, observed during genome chaos. Further analyses based on the genome theory suggest that the dynamics of fusion/fission cycles is under selection for minimally survivable genomes. Nuclear clusters have been examined by SKY and other cytogenetic analyses. They show that many small sizes of nuclei only contain one to a few chromosomes, which clearly have less genomic material needed for minimally survivable genomes (Fig. 3.8CeD). The fusion process sometimes brings suitable parts of genetic material together and some combinations can be good enough to form new genomes that will survive and achieve evolutionary selection despite the massive population death. Such processes are similar to mitochondrial fusion/fission cycles when searching for maximal membrane potential. Interesting to point out, in addition to standard genome reorganization, the integration of DNA fragments can simultaneously occur, as discussed in the section of fuzzy inheritance, as genome reshuffling and massive cell death can also produce more DNA fragments. c. The outliers succeed under high-stress conditions: despite that somatic evolution favors small changes with increased fitness in the microcellular evolutionary phase, drastically altered outliers have an increased chance of survival under high-stress conditions (for more, see Chapter 4). d. The chaotic genome provides an elevated level of transcriptome dynamics: transcriptome dynamics were compared following key stages of genome chaos, and the altered karyotypes were closely linked to transcriptome profiles. It is challenging to link these karyotype alterations to a specific profile other than an elevated level of dynamics (Stevens et al., 2013a, b, 2014; Heng, 2015). In a cover page feature article titled “Reshaping the Cancer Transcriptome,” Genetic Engineering & Biotechnology News (GEN) has highlighted our publication (Stein, 2014).

3.3 GENOME CHAOS

151

FIGURE 3.8 Examples of unusual rounds of cell division and spectral karyotyping (SKY)/reverse DAPI images of nuclear clusters. (A) An image of a giant nucleus (DAPI image) detected from HT-29 cells (a human colon cancer cell line) cultured in situ. Typical normal-sized nuclei are surrounding the giant nucleus. (B) An image of a cluster of cells derived from one giant nucleus. Because many of these cells are stochastically generated and display different amounts of DNA, these cells represent nonclonal chromosome aberrations when they enter into metaphase. Live imaging shows that there are continuous division/ fusion events for unstable cancer cells, suggesting a new means of generating fuzzy inheritance (Heng et al., 2016c; Heng et al. unpublished observations). (C) The multiple colors SKY used to profile the different sizes of micronuclei. While the bigger nuclei display different colors, illustrating the multiple chromosomal compositions, the small nuclei often display one color, indicating formation of material from one chromosome. According to the reverse DAPI image of the same figure of SKY (D), the 1-color MN are from one chromosome as judged by the number of centromere signals within each MN. The bigger MN is linked to a number of chromosomes. A and B are adapted from Heng et al., 2016a; and C and D are reused from Heng et al., 2013b, Karyotype heterogeneity and unclassified chromosomal abnormalities. Cytogenetic and Genome Research, 139(3), 144e157. https:// doi.org/10.1159/000348682, with permission from Karger).

152

3. GENOME CHAOS AND MACROCELLULAR EVOLUTION

4. Implications When this general mechanism of genome chaos is understood, it becomes clear why genome chaos can be detected from all major transitions in cancer, as the emergence of new systems with newly produced genomes represents the most effective way to achieve phase transition. It also explains why all of these transitions can be linked to very different karyotypes and even to diverse gene mutations. Furthermore, it explains why there are so many bizarre-looking structures during the punctuated phase of cellular evolution, as they all represent potential genomic packages essential for macrocellular evolution in which maximal heterogeneity is the key to success. Surely, because of the function of sex (see Chapter 5) and the separation of somatic cells and germline (see Chapter 6), the germline transmission of a drastically altered genome is highly unlikely in the human population, except in some cases featuring small localized reorganization which “cheats” or escapes a checkpoint phase. A germline chromothripsis event, for example, has been detected in 11 individuals through three generations. Although numerous recurrent miscarriages have been associated with this family, the specific chromothripsis appeared stable during transmission (Bertelsen et al., 2016). Interestingly, Barbara McClintock has emphasized the importance of drastic genome changes (see Chapters 1 and 2), although she was unaware of how genome reconstruction occurs at the time. The “genetic earthquake” she spoke of, however, most likely signified genome chaos, and telomere dysfunction was linked to drastic genomic reorganization. That was the very reason she stated that understanding genome reorganization is the future of genetics during her Nobel Prize speech. During a workshop called “Requirements for the Cure of Cancer: Formulating a Plan of Action,” which was organized in 2007 by Arny Glazier and co-sponsored by the Van Andel Research Institute (Grand Rapids, Michigan), chaotic genomes and their linkage to cancer evolution were presented. The idea of the importance of genome chaos was instantly admired by many scholars including Dr. Don Coffey (who was on the National Cancer Advisory Board). He predicted that the understanding of how genome chaos occurs holds the key to current cancer research. He also advised against publishing the findings of genome chaos for now as such drastic findings may have unanticipated consequences on chemotherapy (treatment-induced drug resistance could change people’s attitudes toward treatment options). The formal publication of the potential effects of chemotherapy and risk of inducing chaotic genomes has thus been delayed for a number of years, even though it has been presented in a number of conferences (Heng, 2007c; Heng et al., 2008).

3.3 GENOME CHAOS

153

One of the likely reasons many are resistant in accepting our findings is that these chromosomal changes are too drastic and puzzling based on modern conventional genetics and evolutionary concepts. Therefore, these structures must be artifacts of in vitro culture and these structures must be disregarded according to many. It is hard for them to acknowledge that these structures have any biological meaning. The chaotic genome was rediscovered after sequencing patient samples during the cancer genome projects. However, many different terms have been coined, likely because of the fact that most researchers are unfamiliar with published molecular cytogenetic data. Many nonscientific publications including The New York Times have reported these discoveries and used descriptions such as “monster cancer chromosomes,” “Frankenstein DNA generated by shattered chromosomes,” and “massive genome havoc.” (Slezak, 2014). The following are some new terms introduced in recent years: Recently, the cancer genome sequencing project has generated large amounts of data, which inevitably confirmed the importance of studying genome chaos (Stephens et al., 2011; Horne and Heng, 2014; Heng 2017a, b). Various chaotic genomes were detected within nearly all types of cancers. In some cancer types such as prostate cancer, chaotic genomes were detected in a majority of cases. Interestingly, these fragmented and stitched chromosomes were given many different names by different investigators including “chromothripsis,” “chromoplexy,” “chromoanagenesis,” “chromoanasynthesis,” “chromosome catastrophes,” and “structural mutations” (Baca et al., 2013; Crasta et al., 2012; Forment et al., 2012; Holland and Cleveland, 2012; Inaki and Liu, 2012; Jones and Jallepalli, 2012; Liu et al., 2011; Malhotra et al., 2013; Righolt and Mai, 2012; Setlur and Lee, 2012; Tubio and Estivill, 2011). Based on their descriptions chromothripsis refers to the chaotic genome mainly involving local re-organization (within a single chromosome), while chromoplexy refers to a more whole genome re-organization involving many individual chromosomes. We prefer using “genome chaos,” “karyotype chaos,” or “chromosomal chaos” to refer these structures due to their broad coverage (from local to global re-organization, from structural to numerical changes) and the simplicity of terminology (Heng et al., 2006a; Heng 2007c; 2015). Ye et al., 2018a

Because of the fact that the majority of publications on genome chaos are mainly based on the sequencing data, less effort has been used to study the pattern of karyotype-mediated cancer evolution. For example, the structural and numerical chaotic genomes can be used to monitor the evolutionary phase. In Fig. 3.9 both human and mouse cell line models illustrate the correlation between the increase in structural and/or numerical chaotic genomes and the key transition of cancer evolution. In the human cell immortalization model (top and middle panels), from preimmortalization to immortalization, both structural and numerical chaotic genomes are drastically reduced. Drug treatment, however, can drastically elevate the level of chaotic genomes, transforming a relatively stable status to an extremely unstable status. For the spontaneous transformation of the mouse

154

3. GENOME CHAOS AND MACROCELLULAR EVOLUTION

FIGURE 3.9 The profile of structural and/or numerical chaotic genomes in different stages of cancer evolution (prior to transition, during transition, immediately after transition, and drug treatment induced chaos to create a new transition process). The top panel is the

3.3 GENOME CHAOS

155

ovarian surface epithelial model, from the earlier (M-E) to the intermediate (M-I) and to the later stage (M-L), cell populations become more aggressive with a rapid growth rate, an enhanced capacity for anchorage-independent growth and rapid tumorigenic potential in vivo. Similar to the human model, the level of the numerical chaotic genome is reduced from the intermediate to the late stage. As soon as a drug is introduced, the chaotic genome is drastically increased (M-L-Dox). Although the absolute number of chaotic genomes varies within these different systems, the overall trend in system behavior is the same. The drastic pattern change suggests a phase transition between different stages of cancer progression (from prior transformation and after; from prior drug treatment and after). In addition, treatment can change the pattern of outliers (e.g. becoming more dominate). In other words, the system enters into the chaotic phase ready for macrocellular evolutionary selection. Such comparisons demonstrate the cycling behavior of evolutionary selection, and the importance of outliers in major transition. They also illustrate that many chaotic genomes are induced by drug treatment rather than just preexisting. This observation is of importance in rethinking the strategies of maximal killing in cancer treatment.

3.3.4 The Evolutionary Mechanism of Cancer 3.3.4.1 Linking Genome Heterogeneity to Tumorigenesis, Metastasis, and Drug Resistance One of the main messages from this chapter has been the ultimate importance of genome instabilityecaused genome heterogeneity in cancer evolution. Most of our efforts (from analyzing the limitations of cancer gene mutation theory to explaining the challenge of identifying common cancer gene mutations by sequencing cancer genomes; from reinterpreting the significance of NCCAs to discovering genome chaos; and from establishing the two phases of cancer evolution to linking macrocellular evolution to genome reorganization) have promoted the points that (1) it

=profile of structural chaotic genomes from the human immortalization model. The middle panel is the profile of the numerical chaotic genome from the same human cell immortalization model. The lower panel is the profile of the numerical chaotic genome from the spontaneous transformation of the mouse ovarian surface epithelial model. Each column (both blue and red) represents one cell. Y-axis represents the total number of chromosomes or total chromosomal translocation events for each examined cell and X-axis represents different phases of cellular evolution. Blue represents the amount of nonchaotic chromosomes and red represents the amount of chaotic chromosomes. For each treatment group, such as p7, p19, M-E and Dox (doxorubicin) treated, at least 50 randomly selected mitotic figures were scored for structural and numerical aberrations. The total number of translocations or total number of chromosomes for each cell is listed in each column. For p7 and p54, a portion of cells examined did not display newly obtained chromosomal aberrations.

156

3. GENOME CHAOS AND MACROCELLULAR EVOLUTION

is time to focus on the genome, rather than only on individual genes; (2) most stochastic genome alterations are not genetic noise but essential variants for cancer evolution; (3) genome instability and heterogeneity, rather than common gene mutations, comprise the ultimate driver for cancer evolution; (4) genome-level changes define the cellular system, while gene-level changes only modify the systems; (5) the mechanism of individual genes can be unified by genome-level selection, and many previously established gene mutation mechanisms can also be linked to genome-level changes (and thus need to be reinterpreted); (6) the domination of a new genome often represents an instance of punctuated evolution, while the domination of a gene mutation within a cellular population represents stepwise evolution; and (7) drastic genome alterations are essential for survival under crisis or for breaking down the system constraint for emergence into new systems, whereas gene alterations are needed for enlarging the population during the microevolutionary phase. With these concepts/observations at hand, cancer initiation and progression have been characterized as various cycles of punctuated and stepwise evolution (Heng et al., 2006a, 2009), each of which can break a specific constraint. Based on our experiments as well as synthesis of the literature, it is clear that all major transitions involved in cancer evolution can and should be explained by the two phases of evolution. Logically, tumorigenesis, metastasis, and drug resistance have been linked to the two phases of cancer evolution in which genome heterogeneity is the key (Heng, 2007c, 2009). In addition to several conference presentations and increased publications (Heng et al., 2007c, 2008a, b; 2009), the above message has also received local news coverage. In 2010, Detroit Business Journal published the following news: Wayne State University researcher recommends dramatic shift in cancer research For decades, cancer has been believed to be caused by a sequential accumulation of common gene mutations, with the identification, characterization and targeting of common genetic alterations and their defined pathways dominating the field. A Wayne State University researcher is challenging this notion, however, with evidence that the general mechanism of cancer occurs at the level of the genome, not the gene . Considering cancer as an evolutionary process is vital to both basic research and clinical applications,” Heng said. “Unfortunately, most previous efforts have focused on individual cancer genes, which represent only a small part of the evolutionary story of cancer. It appears that finding a general mechanism will require us looking to the system as a wholeethe genome. . Heng’s group applied the newly established genome theory to describe how somatic cells evolve within individual patients. Using cell culture and animal models, they identified three key components of somatic cell evolution that are responsible for cancer formation: increased dynamics induced by stress, elevated genetic and epigenetic heterogeneity, and natural selection mediated by genome alteration.

3.3 GENOME CHAOS

157

Results of the study showed a correlation between cancer progression eventse immortalization, transformation, metastasis and drug resistanceeand changes at the genome level . Decades of cancer research have been conducted by searching for ‘silver bullet’ genes that are common among cancer patients that can be targeted for treatment and prevention, yet no consistent pattern of gene mutations has been found,” Heng said. “Genetic changes at the genome level, however, exist universally in cancer . . the field of cancer research should begin to shift its search for evolutionary mechanisms to the genome level. “With the genome theory of cancer evolution, we hope to initiate a new direction in cancer research that focuses on the genome,” he said. “We believe this shift will yield a new platform to fight cancer (with permissions from Wayne State University).

Such a shift takes time. Nearly another decade has passed since our prediction. Now, there are finally some encouraging indications that the ultimate importance of chromosomal aberrations in cancer will be appreciated by an increased amount of molecular cancer researchers. A few good examples were mentioned in a recent review article: There have been attitude changes towards the study of aneuploidy as well. When direct evidence simultaneously characterized gene mutation and chromosomal aberrations as drivers for the phenotypic implication of metastasis (Gao et al., 2016), the authors clearly emphasized CIN, and the potentially involved gene was not even mentioned in the title. This likely represents a new favored approach focusing on genome-level changes. There is also the realization that chromosomal aberrations contribute more significantly to metastasis than gene mutations do (Bloomfield and Duesberg, 2016) which supports the hypothesis that chromosomal aberration-mediated genome evolution is responsible for all major transitions in cancer evolution, including metastasis and drug resistance. Furthermore, and surprisingly to many molecular researchers, chromosome aberration profiles have been demonstrated to have a much stronger prediction value in the clinic compared to DNA sequencing profiles (Jamal-Hanjani et al., 2017). This conclusion has gained strong support from various cancer genome sequencing projects (Davoli et al., 2017; Zanetti 2017), which prompts an important question regarding the differential contribution of chromosome aberrations and gene mutations to the cancer genotype. All together, rapidly accumulated data has forcefully highlighted the importance of aneuploidy in current cancer research, and more detailed molecular information linking individual gene mutations or epigenetic events to aneuploidy will soon flourish. Ye et al., 2018b.

It is also notable that some major research papers have now either directly linked genome chaos as a molecular cause to a specific cancer type (Anderson et al., 2018) or concluded that chromosomal instability is driving metastasis (Bakhoum et al., 2018), even though these researchers also used specific molecular mechanisms to explain the importance of the chromosomal alterations. For example, fusion genes and a cytosolic DNA response have been highlighted as consequences of the observed genome alterations, respectively.

158

3. GENOME CHAOS AND MACROCELLULAR EVOLUTION

The new phase of cancer research, however, needs to pay more attention to the genome level, rather than cherry-picking some impacted genes, as the altered genome codes for a new system (which is much more important than some directly impacted genes). In the case of one of the above examples, the majority of disrupted genes are not known oncogenes, and genome chaosegenerated fusions appear to be associated with an aggressive form of Ewing sarcoma. The altered genome structures not directly related to these fusion genes must play an important role. It is known that genome instability functions as a key factor related to the aggressiveness of tumors. Furthermore, as has been demonstrated, almost every run of successful genome alterations can be linked to a specific gene (see Section 3.4 about the new cancer model). The continuous effort of identifying yet another gene mutation to explain how chromosomal alterations contribute to cancer will help researchers to illustrate a specific mechanism, but will not help patients because of the fact that there is so much mutation/genome heterogeneity and the mutation landscape is highly dynamic and unpredictable. In contrast, the established relationship between the degree of genome heterogeneity/complexity and system phenotypes (tumorigenesis, metastasis, and drug resistance) can be used for helping diagnosis and treatment, as these often have greater predictive power than the use of an individual gene mutation profile. In particular, following the identification of the specific phase of evolution, the strategy of using system constraints to slow down cellular evolution can be used, which should be much better than the current method of pushing maximal cell killing (for more, see Chapter 8 and Heng, 2015). Recent experiments have demonstrated that despite the initial power with which it can reduce the number of cancer cells, the maximal dosage of chemo-reagents can paradoxically promote in the treated cellular population the swift development of drug resistance via the induction of genome chaos. Of equal importance, such phenomena can also be achieved by other treatments such as the molecular targeting approach. The bottom line is, regardless of treatment type, if the massive killing of cancer cells occurs, a new genome system can often emerge (Section 8.6. Myths in drug resistance, from Debating Cancer, Heng, 2015). 3.3.4.2 Focus on the Evolutionary Mechanism of Cancer Rather Than the Diverse Individual Mechanisms To further reinforce the idea that researchers need to focus on the common mechanism of genome-mediated cancer evolution rather than specific gene-mediated mechanisms, when the latter are so many and highly diverse, it is key to accept the importance of the evolutionary mechanism of cancer.

3.3 GENOME CHAOS

159

During comparative studies of tumorigenicity in which five different well-characterized model systems were used, the degree of karyotypic heterogeneity (population diversity) was directly linked to tumorigenicity. Because the tumorigenicity of each model has been linked to different and specific molecular pathways and there is no common molecular mechanism shared among them, we realized that the common link of tumorigenicity between these diverse models is elevated genome diversity. Based on the concept that genome-level heterogeneity is a key to cancer evolution and that stress can induce increased system dynamics, reflected as increased NCCA frequencies, we proposed that the evolutionary mechanism of cancer is equal to the sum of all individual molecular mechanisms. X Evolutionary Mechanism ¼ Individual Molecular Mechanisms While there are many individual molecular mechanisms (see Fig. 3.10), there are four key components for understanding how evolutionary mechanism works: (1) stress-induced system dynamics (e.g., genomic, epigenomic, and increased stochastic changes); (2) population diversity (genome heterogeneity, which can be triggered by diverse gene mutations or molecular pathways when the stress is sufficiently high); (3) selection based on the genome package (macrocellular evolution); and (4) new genome systems are capable of breaking down of higher levels of constraints and become the dominant population. Significantly, the evolutionary mechanism of cancer can not only explain and unify diverse molecular mechanisms (also see Section 3.2.3.3 and Fig. 3.1, Fig. 3.10) but also offer some new perspective to many important and difficult-to-address questions, such as “Why do we get cancer in the first place? Could we eliminate cancer all at once? What is the relationship between different types of genetic/nongenetic variants within the framework of the evolutionary mechanism of cancer?” As various stresses are a constant feature of life and genomic/nongenomic variants are essential for cellular adaptation, these variants, which represent not only genomic errors but also useful agents, cannot be eliminated; such is also the case with cancer. In a sense, cancer is the evolutionary price we pay, and cancer incidence will likely increase in the future (Horne et al., 2014; see Section 7.7 Will the evolutionary process ever eliminate cancer? From Debating Cancer, Heng, 2015). The complex relationship between gene, epigene, and genome can be explained either case by case in well-controlled experimental models (which, in fact, often requires ignoring a large amount of unexplainable data) or by simply focusing on phenotypic selection based on the stage of evolution using the multiple levels of genomic landscape model (Heng et al., 2011b). Most gene mutations and epigenetic alterations can change

160

3. GENOME CHAOS AND MACROCELLULAR EVOLUTION

Stresses Internal Genome Alterations Epigenetic Disregulation Oncogenes Tumor Supressors Mitochondrial Transcriptional Regulation Post Translational Modification Cell Cycle Control Cell Growth/Differentiation Cell Death DNA Replication/Repair Chromosome Condensation Chromosome Segregation Protein Degradation Unfolded Protein Response Endoplasmic Reticulum Calcium Aging/Telomere Shortening Cytoskeletal Cellular Communication Cellular Interaction Tissue Architecture Immune Oxidative External/Environmental Drug Treatment Viral/Bacterial Infection Metabolic Nutrition Status Radiation Exposure Chemical Exposure Mechanical Pollution Lifestyle Mental Stress

Heterogeneity Epigenetic DNA Methylation Histone Modification Nuclear Matrix Interaction Nucleosomal Packaging Position of Histone Variants Non-coding RNA Genetic Polyploidy Aneuploidy Translocation Nuclear Position Alteration Gene Amplification Copy Number Variation Gene Mutations Splice Variants Nucleotide Polymorphisms Repeat Instability Micro RNA Loss of Heterozygosity Defective Mitotic Figures Chromosom Fragmentation Mitochondrial Genome

Genome Based Somatic Cell Evolution

Gene Based Micro-Evolution

Non-Genetic

Experimental Manipulation Over Expression Gene Knock Out/Knock Down Chemical Inhibitors Culture Condition

Local Iandscape

Global Iandscape

Local Iandscape

Microevolution

Macroevolution

Microevolution

Adaptation

Break constraint

Enlarge population

FIGURE 3.10 More details of the four key components (or stages involved) of the evolutionary mechanism of cancer. There are many types of internal and environmental stresses, all of which can independently or in combination lead to system heterogeneity (which can again be classified into different types). All of these changes will ultimately lead to genome system changes, which is a key for macroevolutionary selection. Following the local landscape to global landscape transition, microevolution will once again help the newly formed system to become dominant (modified from Heng et al., 2010a. The evolutionary mechanism of cancer. Journal of Cellular Biochemistry, 109(6), 1072-1084. https://doi.org/10.1002/jcb.22497, with permission from John Wiley & Sons).

3.4 A NEW GENOMIC MODEL FOR CANCER EVOLUTION

161

the local landscape (within the microevolutionary phase), whereas only genome-level changes can alter the global landscape (within the macroevolutionary phase). A newly formed system can break down system constraints (i.e., at the tissue level and in other systems above the cellular level). After new systems are born and survived, gene/epigene-mediated microevolution will ensure that the new genome systems become successful populations (more see next session: New Genomic Model for Cancer Evolution). This concept can apply to different non-cancer systems, such as yeast, as well (Chang et al., 2013).

3.4 A NEW GENOMIC MODEL FOR CANCER EVOLUTION If the current cancer gene mutation theory is no longer relevant (when the data does not fit the major predictions of a theory), a better theory needs to be established to replace it, as a scientific community cannot carry on its meaningful practice without the proper conceptual framework. For any new cancer theory to replace the current one, it needs to have the following advantages: First, it should have a better capability to explain the observations and facts, both from experiments and the clinic. Specifically, why are there so many gene mutations and yet a limited number of driver gene mutations? Why are the involved cancer gene mutations highly diverse among patients? Why are chromosomal aberrations overwhelming in the majority of patients? Why are there so many gene mutations in normal tissue as well? Second, it should have a solid basis in genomics and evolutionary theory. It needs to combine the contributions from the multiple levels of genetic/genomic and epigenetic landscapes and integrate the patterns of both macro- and microevolution in a stage-specific manner. Different types of genomic changes are dominant in different phases of cancer evolution. Furthermore, the system inheritance or blueprint encoded by the karyotype should not be ignored (see Chapter 4). Third, it needs to explain the variable penetrance of both genomic and environmental factors that increase the opportunities of cancer success. Such knowledge could also explain why the susceptibility of gene mutations, as well as external environmental factors, can contribute to cancer, but only with uncertainty when applied to individuals. After all, patients always want to know why there is a cancer and why their personal responses to the same treatment are often drastically different. Fourth, it should have better predictive power in the clinic and offer better strategies for diagnosing and managing cancer.

162

3. GENOME CHAOS AND MACROCELLULAR EVOLUTION

Fortunately, the following concepts have gradually been developed/ matured:

1. The Ultimate Cause of Cancer The general cause of the increased incident of cancer perhaps reflects the trade-off for normal cellular function in dynamic environments and a human’s living experience. First, different individuals displaying different genomic variations are essential for human survival (Heng, 2015; Heng et al., 2016a-c), which also provide some “cancer susceptibility genes” to some less fortunately carriers; this is one example of the trade-off. Remarkably, from a cellular function point of view, it is now clear that in contrast to the traditional viewpoint, a large portion of genomic and nongenomic variants observed from normal tissues should not only be considered the result of bio-errors but also as the by-products of cellular processes including energy production. Most initial materials for cancer evolution can in fact be linked to cellular adaptation as a trade-off. Second, regarding infections, injury, inflammation, radiation/chemical/carcinogen exposure, lifestyle, and social behavior/stress, all of these can be classified into the category of living with dynamic environments; this is commonly associated with cellular adaptation, either as a form of defense or a means of bringing back homeostasis. Third, the ability of humans to live longer also drastically favors cancer evolution. It is likely that the functional compensation of the aging process also needs genomic variations. Some of these recent realizations about the variants in cellular adaptation and its trade-off have been discussed as following: . In recent years, the biological significance of these seemingly random genetic “backgrounds” were studied, which has led to the appreciation of genomic heterogeneity in cancer evolution. Further synthesis suggests a relationship between stressinduced NCCAs and the advantages offered by their presence for cellular adaptation, as well as the trade-offs caused by their presence in cancer evolution and possibly in other disease conditions (Heng, 2015; Horne et al., 2014). Moreover, many diseases are the results of genomic variants which do not fit the current environments. Due to the dynamics of environments and the nature of fuzzy inheritance, it is impossible to eliminate all of these variants. Paradoxically, these genomic variants might be necessary for the species’ long-term survival, and they should be considered as a life insurance policy despite their high costs. Such a concept of trade-off not only addresses the key evolutionary mechanism of many diseases including cancer, but may also provide some answers to patients who ask the “why me” question. In a sense, cancer as an evolutionary trade-off can be illustrated by different perspectives: at the mechanistic level, cancers are the by-products of evolution (that is, the same mechanisms which make us human also make cancer successful); at the species level, as population heterogeneity is important for species survival, an individual with high genome instability can be considered as paying the price for our species; and at the individual level, most bio-features, including lifestyle, could be beneficial in some aspects and yet harmful in other aspects. Even for non-clonal aneuploidy-mediated cellular

3.4 A NEW GENOMIC MODEL FOR CANCER EVOLUTION

163

heterogeneity, while this phenomenon can provide a potential advantage for cellular adaptation, it can also, paradoxically, generate non-specific system stress, which can further produce more genetic and non-genetic variants which favor the disease condition (Heng, 2015). Ye et al., 2018b.

2. Cancer Evolution: The Game for Outliers Despite the fact that current molecular genomic analysis is largely based on average cellular profiles, one needs to focus on outliers to understand cancer evolution. As illustrated in the two phases of cancer evolution, most of the homogenous cells detected in the stepwise phase were produced by outlier cells which survived during the crisis stage. The case is the same for the majority of drug-resistant cell populations, all of which are descendants of a few survivors with altered genomes. Since each independent run of cancer evolution often involves different molecular mechanisms (illustrated by the fact that most of the genomic profiles are different among patients), and often, the evolutionary potential will not become reality, there always is a great deal of uncertainty for cancer evolution, as reflected by the nearly unlimited amount and variety of NCCAs (most of them are outliers and are invisible for current molecular analyses). The resilience of NCCAs, despite their frequent death or massive turnover, allows some outliers to survive, and then, to become a dominating population, with the help of growth-promoting genes. Unfortunately, the traditional molecular characterization of cancer pathways (and moreover, the numerous statistical platforms used for this characterization) are based mainly on the average population, within the stepwise phase of evolution. The genomic profiles of the detected tumors at this stage offer very limited information about the causative factors or drivers of cancer, as the genomic landscape of the sample collected reflect only one, late-stage events. In other words, it only displays the profile of the stable population following microevolution. That is the reason why one cannot fully understand the mechanism of cancer simply by studying the end products of the evolution. Furthermore, even in the case of the end products of tumors, NCCAs are present. For many unstable tumors, there are mixed macro- and microevolutionary phases at the time of examination. However, these outliers will escape the detection from population-averaging methods. When drugs are used to eliminate the majority of the population based on the average profile, the treatment will bring about the dominant macroevolutionary phase in which the hidden outliers or treatment-induced new outliers will once again become the survivors. This explains why understanding the tumor genomic landscape has limited clinical implications. The following discussion further supports the importance of outliers in cancer research:

164

3. GENOME CHAOS AND MACROCELLULAR EVOLUTION

As biologists, we should know better. Drastic evolution often occurs on the shoulders of outliers, while the average cell is usually left behind with no “surprise” story to tell. The average defines a given type, and maintaining the average cell profile is essential for normal physiological function. In contrast, outliers define the direction of macro-evolution. Conversely, the formation of survivable outliers detected in the somatic cell macro-evolution phase are not due to the accumulation of gene-level changes that are based on the average cell population, but rather large, drastic genome-based changes. Averages reduce variation. Averages leave no hope for drastic revolution. Averages make a complicated process look relatively simple, but the average is not representative of cancer. Based on the continuous NCCA/CCA cycles detected in cancer evolution, the transition from outlier-dominant populations to average-dominant populations, and again new outliers and new averages, it is the outliers that define the direction of cancer evolution. In a sense, cancer evolution can be considered as a game of outliers. Heng, 2015 (with permission from World Scientific).

3. Cancers Represent Emergent New Genome Systems The key function of newly emergent karyotypes for the success of cancer is obvious. There are multiple levels of system constraints within the human body (from tissue structure to the immune system); for each of these major constraints, only new karyotype-defined systems can do the job, and an oncogene’s function of cellular proliferation alone is not sufficient to break down the system constraints. This is also the reason why it is difficult to understand the mechanism of cancer, even though (1) it is easier to demonstrate the oncogene’s function in vitro, where the key selection criteria are growth-related features; and (2) it is easier to link gene mutation to cancer in immuno-deficient mouse models. Increased evidence supports the key role of CIN in cancer (Hoeijmakers 2001; Heng et al., 2006aec; 2011a-b, 2013a-c; Schukken and Foijer, 2018), and CIN-mediated karyotype evolution is responsible for creating a unique karyotype that is survivable in a given environment. When such a new genome system is created, a seed of cancer is planted. Although many different “seeds” can be created, only one or a few can survive during macrocellular evolution. Whether or not one of these seeds can become successful, a cancer mass also depends on many factors, including the help of some oncogenes. Different karyotypes team up with different combinations of oncogenes, which is perhaps the reason why most cancer patients display different karyotype and oncogene profiles. Because the vast majority of cancers display altered karyotypes and the phase transition (from punctuated phase to stepwise phase) is always associated with a winning karyotype, an interesting follow-up question is can these new genome systems represent new cellular species? As discussed in Chapter 4, the karyotype-encoded blueprint defines systems, and most organismal species has their unique karyotype. It is thus not unreasonable to consider cancer a new cellular species.

3.4 A NEW GENOMIC MODEL FOR CANCER EVOLUTION

165

On the surface, calling cancer a new species is highly controversial. However, it is very useful to do so, as this could trigger the much-needed debate about not only the karyotype in cancer but also the karyotype in organismal evolution. In fact, there is a long history which dates back to 1897 of considering cancer as a new species (Hansemann, 1897; Boveri, 1914 (see Boveri 2008); Knauss and Klein, 2012). Supporters include some well-known evolutionary thinkers such as Julian Huxley in 1956 and Van Valen in 1991. In recent years, an increased number of cancer researchers have adapted and further developed this idea (Duesberg and Rasnick, 2000; Ye et al., 2007; Vincent, 2010; 2011). In particular, genome theory has promoted the idea of using genome identity and similarity to define species, and that the main function of sexual reproduction is to maintain the genome-defined species (Heng, 2007b, 2009). In addition, cancer cells have similarities to bacterial species, many of which are also hostdependent. Furthermore, increased cases of contagious cancer have been reported, providing strong support for the species hypothesis. In fact, many cancer cell lines indeed represent artificial species. Taken together, this issue needs serious attention. More details about the viewpoints both for and against this can be found in Debating Cancer, Chapter 7: Do different cancers represent different species? (Heng, 2015). Combing the above three concepts, a new genomic cancer model is proposed to reflect the key contribution of stress/cellular adaptationproduced genomic alterations (evolutionary materials), the CIN-mediated emergence of new systems with new karyotypes through genome chaos (macrocellular evolution), and the expansion of the successful genome through oncogene-mediated proliferation (microcellular evolution). There are four distinctive features of this model (Fig. 3.11): First, the major chromosomal changes occur before gene mutations during cancer evolution. According to the traditional framework of gene mutation theory, chromosomal aberrations result from cancer gene mutations; thus, gene mutations are the “initial causes,” which has been the rationale for identifying gene mutations as biomarkers for diagnosis and treatment. While the traditional model is suitable for some familial cancers, it has been difficult to use for explaining most sporadic cancers. By switching the order of the involvement of chromosomes and cancer genes, this model not only emphasizes the ultimate importance of elevated CINmediated genome chaos in macrocellular evolution but also redefines the role of cancer genes, as most of these are “helpers” for establishing the dominance of new emergent systems with unique genomes. Of equal importance, the switched order fits well with the observation of various evolutionary models and clinical samples, especially when genome-based explanations are used. For example, distinctive karyotypes (as the end products of cancer evolution) have been commonly observed from the following systems:

166

3. GENOME CHAOS AND MACROCELLULAR EVOLUTION

FIGURE 3.11 The model of cancer evolution. This general model of cancer evolution illustrates the interactions among key players within crucial events of cancer formation. To simplify this complex process, only three stages are listed. The first stage focuses on genomic/epigenetic/environmental interaction. An individual’s genotype includes overall genome instability and immune system status. Internal environmental factors include tissues/organs constraints, stress response, the trade-off from cellular adaptation, and factor of time. For example, for a familial cancer type, internal CIN is much higher than normal individuals. It reaches the NCCA/CCA cycle and genome chaos stages much more quickly. For sporadic cancer types, the necessary CIN results from stress and increased genomic variants over time (the trade-off of cellular adaptation and cellular damage). External environmental factors include microorganismal infection, exposure to carcinogens, injury, and other factors that can unbalance cellular homeostasis. The second stage is the macrocellular evolution phase, in which a CIN-mediated NCCA/CCA cycle can trigger on genome chaos, leading to large number of newly created genomes (each shape represents a unique genome or karyotype). Most newly formed genomes will be eliminated, and very few can be selected within the macrocellular evolutionary phase. The third stage is the microcellular evolutionary phase, where one survived genome (circle) undergoes the proliferation process, which is mediated by clonal expansion. In this phase, cancer genes can play an important role in bringing the “seeds” of cancer into the clinical tumor. Different cycles of macro/ microevolution might be needed to break down different types of system constraints required for transformation, metastasis, and drug resistance, for example. Note that the probability of successfully moving to next phase is very low. If one run of evolution reaches a dead end (which happens at high frequencies), a new run of evolution will follow. The whole process is a continuous one, often with multiple runs occurring simultaneously, even though it is likely that the formed, dominant cancer population can reduce a newcomer’s chance of success.

multiple runs of immortalization of human cells (Heng et al., 2006a-c), multiple runs of drug resistance, especially when induced by genome chaos (Heng, 2007c; Stepanenko and Kavsan, 2014; Liu et al., 2014; Horne et al., submitted), the transformation of mouse cells (Roberts et al., 2005), SV40-infected human and rodent cells (Bloomfield and Duesberg, 2016), a rats carcinogenesis model (Bloomfield et al., 2014), and a hybridomas model (Bloomfield and Duesberg, 2018). In fact, if the majority of gene mutationefocused studies are reexamined using molecular cytogenetic/

3.4 A NEW GENOMIC MODEL FOR CANCER EVOLUTION

167

cytogenomic methods, the stronger link to distinctive karyotypes will certainly be detected. Indeed, increased clinical analyses strongly support this viewpoint. As previously mentioned, a burst of genomic diversity has been observed from many cancer types including colon cancer and breast cancer (Sottoriva et al., 2015; Gao et al., 2016), followed by one or a few dominant and stable subpopulations. Punctuated evolution results from mutational bursts and cataclysmic genomic alterations (or genome chaos) (Heng, 2015; Sun et al., 2018). If the karyotype information can be obtained, the two phases of karyotype evolution should be evidenced and more unique karyotypes will be reported. In fact, among all published tumor cases, over 80% of them display different clonal chromosomal aberrations (over 61,000 were reported in the earlier 2010s, according to the Mitelman Database). If one only considers solid tumors, the extent of karyotypic diversity karyotype will be even greater, which illustrates that most karyotypes are not shared among patients with the same types of solid tumors. Furthermore, chemotherapy-induced altered karyotypes can be observed from many patients following years of treatment (Ramos et al., 2018), including some chaotic genomes. Interestingly, in the case of some patients, the frequency of NCCAs is higher after treatment than during treatment. These elevated NCCAs have been suggested as contributing factors for secondary cancers (Ramos et al., 2018; Frias et al., 2019). Even in regard to one of the best examples linking cancer gene function to tumor growth by controlling MYC activity in an animal model (Shachaf et al., 2004), reinterpretation of the data could lead to the discovery of important information. In this linear model, MYC inactivation seems to be able to change liver tumors into normal liver tissue; however, the fundamental achievement was to reduce the size of the tumors (reverse the second phase of microevolution). The seeds of tumors remained. Indeed, the authors point out that some of these apparently normal cells remain in a state of tumor dormancy. We suggest that these persistent cells likely display altered genomes, which are generated from the punctuated phases of evolution. Surely, each different tumor displays different genomic alterations as reported. It is thus not unreasonable to think that the initial function of MYC in this model is to elevate CIN, which leads to the tumor’s genome, and the later function of MYC is to increase the mass of the already-formed tumor. In a sense, the dormancy of a tumor is a unique stage in which a limited number of tumor cells are constrained and exist beyond clinical detection. Interestingly, the logic of placing chromosomal changes ahead of gene mutations was already illustrated by CML story. Moreover, nearly three decades ago, J. Michael Bishop, the Nobel laureate famous for his studies of oncogenes, had already stated the importance of the large-scale genome changes that initiate cancer. He wrote: “Cancer is largely caused

168

3. GENOME CHAOS AND MACROCELLULAR EVOLUTION

by genomic catastrophes that results in the activation of proto-oncogenes and/or inactivation of tumor-suppressor genes” (Bishop, 1991). Unfortunately, however, this idea has been ignored, perhaps even by the author himself, as few have studied the mechanism of how genomic catastrophes occur and lead to alteration of cancer genes. Now, knowing the importance of CIN-mediated genome chaos (the mechanism of how the genomic catastrophes occur) and the later-stage role of oncogenes (of increasing the tumor mass, rather than mainly accumulating gene mutations that will allow the tumor to form), it is time to systematically test the new model. Second, all cancer-promoting factors (genomic, epigenetic, and environmental alike) can be linked to the NCCA/CCA cycle followed by the CIN-mediated emergence of a new system, under the framework of stress, the trade-off of cellular adaptation, multiple levels of genomic variants, and evolutionary selection. Despite that the molecular mechanisms can be highly diverse, they can all function to destabilize the genome. In addition, heterogeneity itself (including that of altered cells and tissue, such as scars) might also function as a stress. With this simplification, different competing models of cancer can be reconciled. For example, normal tissue organization and immune function can reduce CIN. In contrast, microorganism infection, inflammation, environmental mismatch, and cancer susceptibility genes can increase CIN. The list goes on. Of course, for the success of each main transition (from stress response to genomic/nongenomic changes, from NCCA/CCA cycle to chaotic genome, and then from selected genome to increased mass of tumor cells), the odds of success are extremely low. Thus, it often requires a perfect storm for cancer to beat the odds. But again, it happens as all other evolutionary stories. Third, in the microcellular evolutionary phase, clonal expansion can often involve multiple clonal lineages promoted by different cancer gene mutations. It is possible that different clones are sometimes needed for the emergent cancer phenotype. Moreover, as cancer evolution is a continuous process and multiple runs of independent evolution can coexist even in a single tumor, instances of micro- and macroevolution can often be simultaneously detected. Fourth, this model suggests that by slowing down the microcellular evolutionary phase, it is possible to reduce the size of the tumor without triggering the macrocellular transition. This might be the basis for adaptive therapy (Gatenby et al., 2009). In addition, tumor dormancy might result from strong constraints that keep the tumor cell population extremely low to the extent that the cells cannot be clinically detected. In other words, the tumor genome generated from the macrocellular phase serves as “seeds” for a phase transition far before it becomes visible. This small number of seeds can be crucial for immortalization, transformation, tumor dominance, metastasis, and drug resistance. Recent findings which state that metastasis occurs when primary tumors are very small support this model.

C H A P T E R

4

Chromosomal Coding and Fuzzy Inheritance: The Genomic Basis of Bio-information and Heterogeneity 4.1 SUMMARY Current genomic research, including the observation of a significant degree of genome heterogeneity in normal tissues and cancer samples, has been “full of surprises.” As a result, there has been a strong demand to reexamine genomic conceptual frameworks. As genes code “parts inheritance,” there should be another type of genomic information that determines and organizes the interactive relationship among these parts. Logically, the genome, or karyotype, is proposed to encode such a novel type of genomic information. Named “system inheritance,” this genomic information is responsible for the emergent functions defined by the genes and their genomic topology. Interestingly, system inheritance can be understood as chromosomal coding, determined by the order of genes/ other regulating sequences along and between chromosomes, which provides the 3D platform of gene interactions and defines the network structure. Furthermore, genetic information is not as precisely coded as we previously envisioned. In contrast, genetic information often defines a range of potential statuses rather than any specific status. In this chapter, following a brief historical review, the journey to search for system inheritance and fuzzy inheritance is described. These new types of inheritance can not only explain many surprises generated by current genomic discoveries but also provide a mechanistic understanding of heterogeneity, as well as explain the relationship between genotype, environment,

Genome Chaos https://doi.org/10.1016/B978-0-12-813635-5.00004-5

169

Copyright © 2019 Elsevier Inc. All rights reserved.

170

4. CHROMOSOMAL CODING AND FUZZY INHERITANCE

and phenotype. Finally, even though fuzzy inheritance can be detected from different levels of genetic organization contributed to by diverse molecular mechanisms, some examples of fuzzy inheritance at the chromosomal level are described.

4.2 CHROMOSOMAL OR KARYOTYPE CODING 4.2.1 The Rationale of Searching for New Types of Inheritance As discussed in previous chapters, increased evidence has illustrated that there is a key distinction between genes and chromosomes, as they represent different biological identities and follow different patterns of evolution. While a strong case has been made from both a historical and cancer dynamics point of view, the fact that only limited literatures have systematically addressed this issue clearly reflects the lack of needed understanding from the current research community. Having read so far, readers will likely ask the following questions: Why don’t I think of these questions in such way? Am I missing some key points here? On one hand, these presentations/arguments seem to make sense. On the other hand, if they make sense, what about my previous gene-based knowledge? If these reinterpretations are truly correct, how can they be continuously ignored, especially by the field of genomics and evolution in which the inheritance is the key component? Are these case studies only applicable to cancer, with limited impact on general concepts as well as other genetic diseases? In fact, an array of similar questions has repeatedly been asked by hundreds of scientists from various fields during our journey to discovering and promoting genome-based inheritance. In the past 25 years, we have pushed out over 60 publications on this topic. Because many of these papers have gone through multiple journals, hundreds of reviewers were involved. During earlier years, the reviewers’ questions and our rebuttals were often much longer than the publications themselves. I have also been invited to more than 80 conference talks and invited seminars where I delivered our new concepts, reaching thousands with similar questions. Furthermore, I have formally discussed our findings with hundreds of scientists in the field of cancer research and molecular genetics, including over five dozen well-known scholars (Nobel laureates, members of the National Academy of Sciences, and leading researchers). As most of these discussions were based on one-on-one conversations and PowerPoint slides, our ideas have been systematically presented. Because many of the conversations are private and the goal is to candidly exchange ideas, I have received many questions. I have also received diverse responses to my questions to them, such as: “Which

4.2 CHROMOSOMAL OR KARYOTYPE CODING

171

concepts do you like or dislike the most?” “Can any of our concepts be challenged or supported by your own experimental system?” and “Should the research community re-examine the most basic concepts in genetics and evolution based on many recent surprising discoveries?” Just a few years ago, disagreements were very common. People often passionately defended the powerful role of cancer genes, the current research strategies, and the genetic and evolutionary foundation of cancer research. Typical responses included the following: If what you said is correct, everything we have learned needs to be thrown out of the window. Gene-based cancer research has gone well so far, as illustrated by the successful story of using imatinib to treat chronic myeloid leukemia patients within the chronic phase. What we need is to work harder to find the magic bullets for each cancer type. Every cancer has its own Achilles’ heel. If we can sequence many cancer samples, the long-expected pattern will become obvious, and druggable targets will be available to cure cancer. We should not be too critical of the limitation of gene-based research, as it is the best means we have. You need more a balanced view toward genes. Discussing the limitation of gene research is like beating the straw man e who doesn’t know the importance of the chromosome? I am a simple minded person; I have to start on something simple. Who doesn’t know the limitation of genes?

Ironically, moments later, these same individuals would say, “The chromosomal pattern does not offer mechanistic understanding, due to its low resolution. The gene mutation with defined molecular pathway is the only answer.” In the past few years, these conversations have seemed to become much easier. The failure of the major promises of the cancer genome projects to deliver, combined with the overwhelming degree of heterogeneity, especially chromosomal heterogeneity, which is observed in many types of cancer, has effectively downplayed the significance of key cancer gene mutations. More attention is now being focused on epigenetics, systems biology, geneeenvironment interaction, and the importance of metabolism and the microbiome, as if these new terms will somehow finally deliver what cancer gene-focused research has failed to deliver. Typical responses have also changed. Could these chromosome-level changes be explained by epigenetics? Now, I believe that the massive chromosomal aberrations you described are real. But as a Darwinian evolutionary biologist, I am still not sure what type of evolutionary role they will play. Cancer is bizarre; other diseases should be very different in terms of genome instability.

172

4. CHROMOSOMAL CODING AND FUZZY INHERITANCE

In vitro observation must differ from in vivo, and it is hard to image a biosystem that is so dynamic. Complex phenotypes can still be explained by genetic laws, although this requires a large sample size and some factors that we currently do not understand. Gene theory certainly can include the data from chromosomes. “The current confusion is due to the lack of data.” (People seem to have forgotten that we experience less confusion when we have much less data). “It is not helpful just understanding this stochastic feature e we need solutions.” (This is stated as if the highly dynamic gene mutation profile is fixed.)

When told that genome chaos is frequently detected from in vivo systems and that most common and complex diseases can be linked to elevated genome instability, people often shrug their shoulders. Well then.

To study the impact of our communications to other scientists, I have checked many researchers’ new publications a few years after our conversations took place, to see if there is any trace of influence from our concept. The answer is rather disappointing. In the case of over 95% of them, they are business as usual, continuing to ignore the significance of genomic topology. In contrast, a small minority of people have started to apply the genome-based concept, albeit not citing any of our previous publications. I used to think, as soon as scientists see the evidence, they will change. Now, I understand that just illustrating the pattern of cancer is not enough to change cancer researchers’ scientific worldview; we need to change the concept of genomics and evolution first. Only when they have the new paradigm, can they “see” the evidence. Of course, earlier on, several well-known thinkers immediately realized the importance of our “different way of thinking,” as they could realize the paradigmatic difference based on our data and analyses. They highly praised our ideas and strongly encouraged us to keep pushing, no matter what. “These concepts are highly original and make sense, and they are worth fighting for.” Even in their position, I was told, it is hard to push new ideas. “I cannot even convince my students to follow your ideas,” they said. “You are playing a totally different game.” The highly contrasting response from the research community led to the realization that the science we are pushing may not be the routine science. Despite the difficulties of getting funding, publishing our papers, and convincing others, we are a few of the lucky ones who can have the opportunities to work on the necessary new paradigm, which can offer better concepts and unify different platforms. Clearly, just working on cancer theory is not enough; we should use the unique window offered by cancer research to think big and bring changes for century-old genetic theory. The first big question is: What defines bio-inheritance and what is

4.2 CHROMOSOMAL OR KARYOTYPE CODING

173

the role that genes/epigenes or genomes play? Only by establishing new genomic principles can we ultimately solve the mystery of cancer and convince others to change their worldview of cancer.

4.2.2 New Challenge: What Defines Inheritance? Despite its widespread use in the information era, the definition of information itself is highly diverse because of its broad coverage and involvement in many disciplines. According to the Oxford Living dictionary, there are two main definitions: “Facts provided or learned about something or someone” and “What is conveyed or represented by a particular arrangement or sequence of things.” Clearly, the second definition is more applicable to the genetic concept of information. Interestingly, it has been suggested that information is something that allows its custodian to make predictions with accuracy better than chance (Adami, 2016). In addition, information is context-dependent, and different levels of systems display different types of information. Furthermore, in biological systems, information is stored diffusely at different levels, and the emergent properties are clearly involved (Heng, 2015). While the information from lower levels is easier to obtain, it has little to do with system control (Heng, 2013c). To understand the emergence of biocomplexity, bioinformation can further be characterized as specified information (Orgel, 1973) or functional information (as a measure of system complexity) (Szostak, 2003; Hazen et al., 2007). Such complexity also represents a big challenge for defining genetic information; that is, the heritable biological information coded in the nucleotide sequences of DNA or (in the case of some viruses) RNA, which is considered a blueprint for biosystems and information linkage among individuals of a given species. For example, what defines inheritance, really? Asking such a question would not have received any serious attention just over a decade ago. Who doesn’t know the answer? Genetic inheritance refers the reception of genetic qualities by transmission between generations or, in other words, the process of transmission of genetic traits from parents to offspring. The key mechanism is the passing of the gene-encoded information. According to the central dogma of molecular biology, the causative relationship between geneeproteine phenotype holds the key to understanding how inheritance works in biology, and that has been the rationale for gene-centric research for over half a century. Before the era of the Human Genome Project, there were successful examples of linking an individual gene to a phenotype by pinpointing the gene mutation and its specific alteration of a protein/pathway in a human disease, and the cloning of the gene mutations responsible for cystic

174

4. CHROMOSOMAL CODING AND FUZZY INHERITANCE

fibrosis represents one of the best examples (Rommens et al., 1989). To transfer such success to the study of common and complex diseases, it was believed that multiple genes must be involved, which can be identified using increased patient samples. Unfortunately, this approach has so far failed to identify these few common gene mutations for almost all major diseases (the story of cancer research, for example, has been extensively discussed in Chapter 3). Similarly, for many genome-wide association studies (GWAS), in contrast to their initial promise, most of the results are disappointing. As mentioned in Chapter 1, many phenotypes seem to involve a huge number of genetic loci across the genome, and most of them only contribute the reception of genetic qualities by transmission to a tiny portion of the phenotype, which significantly limits the practical application of GWAS. This has led to the realization that the issue of missing heritability is a serious one, which directly challenges the basis of gene-based inheritance (Heng, 2010). There are plenty of explanations and strong determination to solve this issue of missing heritability. Inheritance must be there, and we need to work harder to get the job done, so people are convinced. One strongly voiced opinion from the GWAS community is that more samples are needed to achieve the analytical power; however, they seem to forget that some sample sizes are currently very large already. Efforts and promises have also been made to search for better computational algorithms to filter out the noise and enhance pattern identification. Furthermore, better population cohorts are selected, and increased efforts have been used to digest the genetic contribution to phenotype in the context of geneeenvironment interaction. Despite all of these efforts, if the concept of the polygenic model is no longer sufficient, future successes will be limited. That is the reason why many new ideas, including the “omnigenic” model, have been proposed to advance the field (see Chapter 1, section 1.4.1). Of course, epigenetic mechanisms are also on the top list of explanations. “If it is not the gene, it must be the epigenetics or epigene,” many conclude. While increased studies have linked epigenetic mechanisms to different phenotypic features, most of these explanations are still in the context of the current gene function framework. For example, many explanations are based on the known gene’s function (as a means to understanding the epigenetic role through the gene). This is not very surprising, as epigenetic regulation functions as a fine-tuning mechanism in relation to a gene’s and the genome’s function. Knowledge of the epigenetic contribution will certainly improve our understanding of the genotypeeenvironmentephenotype relationship, but will perhaps not fundamentally alter the way of our thinking, especially if overall inheritance is mainly considered to be the key feature of genes. Perhaps more fundamental questions need to be asked. For example, why has the approach of using increased sample sizes failed to identify the

4.2 CHROMOSOMAL OR KARYOTYPE CODING

175

expected genetic causality in most cases? If we know that key genetic factors are real, how can we explain our inability to identify them experimentally? Why have there been so many surprises about the relationship between genes and phenotypes during the genomic era? When dealing with the unexpected function of a given gene, it is often mentioned that a gene’s function is context-dependentdbut what exactly does the gene context mean? Should we reexamine the foundation of genetic theories to define these contexts, rather than continue this hand-waving? Clearly, a new framework beyond gene-mediated inheritance is necessary to answer these questions. First, if the gene and epigene cannot offer sufficient explanations, one needs to discuss the conceptual limitation of gene- and epigene-defined inheritance; second, one needs to search for the correct types of inheritance. It is likely that we might have missed some important types of inheritance, which should organize gene/epigene interaction. Such an effort represents the direct reexamination of the century-long gene-centric genetic theory.

4.2.3 Genes Code “Parts Inheritance” The original usage of the word “gene” referred to the basic physical and functional units of inheritance (see Chapter 2). Based on this definition, the power of the gene was limited, as the unit is not the system. Despite that some individual traits have been linked to specific genes in the past, how genomic inheritance controls gene interaction within a genome is virtually unknown. Obviously, there is a big gap between knowing how an individual gene codes a specific protein and understanding how genetic networks work for entire systems (which is the rationale of pushing systems biology). The Human Genome Project has promised to fill the gap, as it was believed that sequencing all genes would reveal the blueprints of life. Now, this promise has failed, as knowing all sequence details did not reveal how the system works. Furthermore, data from various genome projects have often forcefully conflicted with traditional genetic concepts! It is therefore time to seriously discuss the limitation of focusing on gene characterization (or parts description) to illustrate the genotypee phenotype relationship at higher levels of systems and to search for new concepts that can better explain the massive genomic data. Most genetic researchers who grew up in the gene era are very familiar with the reductionist’s dogma of molecular genetics: to dissect and characterize genetic parts and then to put all the parts together (based on the parts information) to understand the whole of inheritance and phenotypes. For genomic researchers, even though the analytical scale is much larger, often genome-wide, and efforts also include noncoding elements and genomic architecture, the conceptual basis is still the same. To advance the field, the research community needs to change the current

176

4. CHROMOSOMAL CODING AND FUZZY INHERITANCE

way of thinking and, specifically, to take the first crucial step of redefining gene-coded inheritance. The following syntheses were formed during our decades-long journey to reexamine gene-coded inheritance. Most of these conclusions/ reasonings are based on the analysis of many surprising pieces of data, through the lens of “parts-whole relationship,” information, and the genome theory of evolution. 1. The genome is not equal to the sum of all genes or its entire sequence, and in most cases, there is no one-to-one correlation between parts (individual genes) and the emergent properties at the higher system level. Furthermore, just studying the quantitative contribution of each agent will not enable us to predict system behavior, which also explains the missing heritability and the limitation of GWAS studies, for example. Many researchers are perhaps familiar with the statement, “the genome is not a bag of genes,” and they agree on it. Unfortunately, in practice, they often literally consider the genome as a bag of genes. 2. Different genetic/epigenetic and genomic codes are needed for constructing and controlling different levels of biosystems. Some levels of coding (such as gene-level coding) can be the same or similar among different systems (among different species, for example); other types can be highly species-specific. In addition, even though it might be easier to access the lower level of information, this type of information often contributes less to the overall control of the system. Moreover, distinctive biological mechanisms/principles might be required to understand different levels of biological systems. While the current concept of the genetic codon applies to all biosystems with limited diversity, it offers no explanation for genetic basis of species specificity. Perhaps even more surprisingly, little effort has been made to address this issue, as there are only some vague assumptions, such as that which states that species specificity should be achieved by species-specific genes (both in number and type). It is straightforward to separate different viruses by gene specificity, but it is highly difficult to using specific genes to define mammalian species. Equally important, from a research strategic point of view, molecular biologists have the tradition to push the highest molecular resolution, which often ignores an important aspect of research, namely, the context in which levels of the system should be examined. For example, a key question regarding inheritance needs to be addressed (both by individual researchers and the community at large): which genomic level should we monitor and for what types of information? Should priority be given to some of them or all

4.2 CHROMOSOMAL OR KARYOTYPE CODING

177

of them (from nucleotide, DNA motif, gene, copy number variation (CNV), gene cluster, individual chromosomal aberration, and genome chaos)? To fully understand the importance of these questions, one needs to appreciate the trade-off between resolution of methods and importance. As we pointed out: . while easier to study and comprehend, the lower levels that are illuminated by high-resolution studies have paradoxically reduced biological importance than higher levels of the system . There is a balance or trade-off between resolution and research importance Heng, 2015

3. The importance of parts, both individual genes and the total number of all genes and/or DNA content, can be significantly reduced within a complex genome containing a large number of genes. This ironic observation can be found from data generated from various genome projects: a. Biocomplexity does not simply correlate with gene numbers: The increased number of genes can correlate to the complexity of the biosystems among selected species. Viruses have only a handful of genes, whereas Escherichia coli have over 4000. The fruit fly has nearly 15,000, and humans have over 22,000. Just based on these examples, increased gene number should contribute to biocomplexity. Until recently, many considered the increasing complexity in the animal kingdom to be due to an ever-increasing number of genes, for example (Science Daily, 2017). However, this is not the case. Not only do different animals have similar gene numbers and share many genes but also the eukaryotic genome size fails to correlate well with apparent complexity (the size varies wildly, over more than a 100,000-fold range) (Eddy, 2012). The total number of genes in humans is more than that of a chicken (near 17,000) but less than that of a mouse (25,000) and much less than that of a grape (over 30,000) (Guenet, 2005; Pertea and Salzbrg, 2010). If judged by the DNA content, amoebae, with some of the largest genomes, have a genome size that is up to 100 times larger than a human’s. Clearly, the gene-centric view will humble us. b. The total number of genomic parts (numbers of gene or DNA content) can vary significantly among closely related species and individuals of the same species: Even some related species in the same genus display different haploid genome sizes by three- to eightfold. This phenomenon is common in plants, such as in species of rice (Oryza), Sorghum, and onions (Allium) (Eddy, 2012). Furthermore, the genomic materials of maize (Zea mays) have expanded by about 50% since the divergence

178

4. CHROMOSOMAL CODING AND FUZZY INHERITANCE

of the species from Zea luxurians about 140,000 years ago (and not merely by polyploidization). In addition, different individuals have slightly different individual gene sets, which are likely contributed by CNVs (Sebat et al., 2004; Iafrate et al., 2004). It was later estimated that the gene number can vary by 73%e87% between any two individuals (Alkan et al., 2009). It was further suggested that a higher number of genes could be the result of individual variations (Li et al., 2010). Furthermore, each individual displays approximately 60 new mutations (Conrad et al., 2011), and healthy people, on average, have about 15 complete knockout events; this number is even higher in more homogenous populations (MacArthur et al., 2012; Alkuraya, 2015). Other studies have also identified complete gene knockouts in apparently healthy individuals (Narasimhan et al., 2016; Sulem et al., 2015). On the other hand, even for some individuals who harbor disease-causing mutations in eight different genes (which are linked to severe Mendelian childhood diseases), there is no reported clinical phenotype of the disease (Chen et al., 2016). c. Many genes are not essential under experimental conditions: A huge effort has been made to examine the essential genes across different forms of life. The essential gene refers to those genes whose null mutation leads to lethality or sterility. Mycoplasma genitalium is a parasitic bacterium with the smallest known genome of any free-living bacteria (with total of 580,070 base pairs [bp]). More than 100 of the 485 protein-coding genes of this bacterium are dispensable when disrupted one at a time. In other words, 80% of all protein-coding genes are essential (Glass et al., 2006; 2009). However, for the budding yeast Saccharomyces cerevisiae, only approximately 20% of protein-coding genes are essential for growth in laboratory conditions (Giaever et al., 2002; Chen et al., 2012). Similarly, for the fission yeast Schizosaccharomyces pombe, less than 20% of genes are essential under experimental conditions (Decottignies et al., 2003). The low proportion of essential genes downplays the importance of many genetic parts. Interestingly, by comparing documented phenotypes of null mutations in humans and mice, it was illustrated that >20% of human essential genes have nonessential mouse orthologs, despite their relatively close evolutionary relationship (Liao and Zhang, 2008). Furthermore, many specific genes can be experimentally deleted or inactivated without obvious phenotypes in given conditions. Multiple surprising results have come from experiments in which some important, well-characterized genes failed to generate expected phenotypes when being knocked out. In addition to the large-scale data from yeast, ample examples of a lack of expected

4.2 CHROMOSOMAL OR KARYOTYPE CODING

179

phenotype from gene knockout can be found from Arabidopsis (among many Arabidopsis knockouts, few of them present informative phenotypes that provide a direct clue to gene function) (Bouche and Bouchez, 2001), zebrafish (there is a poor correlation between morpholino-induced gene knockdown and mutant phenotypes) (Kok et al., 2015), and mice (knockout mice have revealed new roles for many genes, and haploinsufficiency and pleiotropy are both surprisingly common) (White et al., 2013). Null mutations in different species often lead to different phenotypes (Liao and Zhang, 2008). It was also observed that many KO mice display no phenotypes, and phenotype of some KO mice can disappear after generations. This puzzling phenomenon is often credited to redundancy in gene function, which is caused by duplicated genes or by the compensation for one gene by another based on overlapping functions and expression patterns. However, increased systematic analyses do not support this simple explanation. Alternative explanations include (1) technical artifacts (toxicity or off-target effects of the knockdown reagents) (Baek et al., 2014; Olejniczak et al., 2016); (2) the hypothesis that genes might evolve by very weak selection, which is unsuitable for current laboratory experiments (Tautz, 2000); (3) the idea of a different form of genetic robustness, referred as genetic compensation or transcriptional adaptation, which triggers upstream of protein function (Rossi et al., 2015; El-Brolosy and Stainier, 2017); and (4) the genome theory, in which there is no simple linear relationship between most genes and phenotype because of fuzzy inheritance and the emergent properties (Heng, 2015; see more in Section 4.3 and Chapters 7 and 8). Regardless of its explanation, this phenomenon undeniably points out the limited prediction power of phenotype based on parts (genes), especially when there are so many parts involved in a dynamic environment. 4. Parts can be replaced among different systems. Many genes which encode parts/materials can be replaced across the species within a given system. For example, despite the sequence divergence during the evolutionary separation of the past 1 billion years, many human genes can be used to rescue yeast cells whose own copy of the gene had been turned down, turned off, or removed under experimental manipulations. A recent study has demonstrated that among the selected 414 genes essential for yeast’s life, almost 50% of the analogous human genes enable the yeast to survive after inserting these human genes into yeast cells one at a time (Kachroo et al., 2015).

180

4. CHROMOSOMAL CODING AND FUZZY INHERITANCE

5. Just having genetic parts is not enough for producing the system’s function: the limitation of genes in evolution and cellular survival a. A specific gene’s function is influenced or controlled by factors beyond the gene: Example 1: Not all inserted copies of the genes arranged as an array can be expressed, and only a small proportion of inserted genes can be expressed when associated with nuclear matrix (Heng et al., 2004a, Chapter 2). It is also known from transgenic mice experiments that the function of a specific transgene is often integration site-dependent. Example 2: The same gene mutation can be detected in some individual patients with diseases and in healthy individuals. Even in the case of twins with the same cancer gene mutations, the consequences can be drastically different. Increased reports state that the gene mutation profile has less prediction power than chromosome-based data (see Chapter 8). Example 3: Why does a sponge have so many genes and yet such limited function? Perhaps one of the most bizarre and yet powerful stories illustrating the gap between genetic parts and the function of a system is a sponge, one of the earliest evolving metazoans (multicellular animals) and one which already existed some 635 million years ago. It has 18,000 genes (of which 70% are homologous to human genes), despite that the sponge only has a simple body plan which lacks organs, muscles, and nerve cells. If only judging by what types of genes they have, a much more complicated phenotype should result as the sponge genome clearly has a diverse toolkit used by other, more advanced animals. The genome also includes analogues of genes that, in organisms with a neuromuscular system, code for muscle tissue and neurons (Mann, 2010), for example. Further analyses of eight transcriptomes from all poriferan classes revealed surprising genetic complexity in sponges, as there are representatives of most molecules involved in cellecell communication, signaling, complex epithelia, immune recognition, and germ lineage/sex, with only a few absences (although these could be important). This study suggested that genetic complexity arose early in evolutionary history, as supported by the presence of these genes in most animal lineages (Riesgo et al., 2014). Moreover, other nongene types of parts, such as regulatory parts, have already evolved in sponges. By analyzing Amphimedon queenslandica, Gaiti et al. showed that the regulatory landscape used by complex bilaterians was already in place. The regulatory tools include distal enhancers, repressive chromatin, transcriptional units marked by H3K4me3 that vary with levels of developmental

4.2 CHROMOSOMAL OR KARYOTYPE CODING

181

regulation, and the genomic locational relationship of genes (for spatiotemporal gene expression) (Gaiti et al., 2017). Altogether, the genomic profile of the sponge clearly demonstrates that just having the parts (genes) is not sufficient. There are other key elements that are missing in terms of delivering complex phenotypes. Although it is still possible that a small number of missing genes or regulatory elements (which can be detected in more advanced animals) are the key to distinguishing sponges from other complex animal species, the likelihood of this is low. This is especially clear when considering that most animals have a similar number of genes, which has completely changed the viewpoint that increasing complexity in the animal kingdom was due to an ever-increasing number of genes that are responsible for animal development and growth. b. The lost function of specific gene achieved by knocking it out or inhibiting it can be restored by mechanisms unrelated to the original genes. Interestingly, evidence of this is often generated from watching evolution in action experiments, and this capability is essential for macrocellular evolution and cell survival. They also provide some clues of “what other elements besides the genes” can determine the higher system’s function. Example 1: Drug resistance of cancer cells can quickly be established by mechanisms other than gene mutation. Treatment resistance in cancer is a well-known phenomenon for chemo- and radiation therapy. It has been reasoned that targetspecific therapy should reduce drug resistance. Unfortunately, increased reports demonstrate that it is not the case (Heng et al., 2010b; Liu et al., 2014). In fact, many effective molecular targeting approaches, including therapy specific for antiangiogenesis, can quickly generate resistance. One interesting observation is that a significant portion of these resistant clones involve genomic changes irrelevant to the original target gene. In other words, while dysfunction of a specific cancer gene can kill many cancer cells initially, many ultimate survivors can emerge by reorganizing the genome system such that it is resistant for the given treatment (Heng, 2015). The link between drug resistance and genome/ karyotype alteration suggests that system reorganization can rescue cancer cells when specific parts are under attack. As genome chaos can be induced by most current drug treatments, it is now clear that the drug resistance represents a universal outcome for all types of drug treatments. We further predict that most survival clones likely will display altered karyotypes, as genome reorganization is the most powerful mechanism for rescuing

182

4. CHROMOSOMAL CODING AND FUZZY INHERITANCE

the cell population when their parts are under attack from harsh treatment. Increasing data have strongly supported this viewpoint (Deusberg et al., 2007; Stepanenko et al., 2016). Our recent data have illustrated that genome chaos-mediated cancer drug resistance represents the major system survival mechanism that is responsible for rapid drug resistance (Horne et al., submitted). Example 2: How the lost functions resulting from knocking out a specific key gene can be recovered without restoring the original gene. The MYO1 gene, encoding the myosin II protein in yeast, is essential for cytokinesis. Haploid yeast has only one copy of this key gene; it is thus anticipated that the deletion of this gene will result in the dysfunction of cell division, wiping out the targeted population (as cell division represents the fundamental mechanism for cellular survival). Surprisingly, however, despite massive death, some MYO1-deficient cells were able to restore cytokinesis through alternative mechanisms. Detailed studies illustrated that the newly surviving cells displayed different types of cytokinesis under evolutionary selection and, perhaps more importantly, that this regained phenotype is not achieved by restoration of the MYO1 gene but by extensive polyploidy and/or aneuploidy. In fact, when the MYO1 gene was reintroduced into these surviving cells, the originally very important gene was no longer useful within the new emergent systems (Rancati et al., 2008). Through the lens of genome alterationemediated macroevolution, this exceptional experiment demonstrated the following: (1) The important function of “essential” parts (specific key genes or pathways or other molecular parts) can be restored by reorganizing the genome system during drastic evolutionary selection; new systems with similar phenotypes can form without restoration of the original parts; (2) when a new system emerges, the function of parts can be redefined, as the chromosomal changes function as the context of these genes; (3) the emergence of successful new genomes represents a rare event, as the majority of the targeted cells were dead. However, despite the very low chance, new systems can often find their way by riding out evolution. Many other stories carry a similar message. For example, despite the importance of the spindle microtubules in mitosis, in the case of the fission yeast S. pombe, their nuclear envelope does not break down during mitosis. Under certain experimental conditions, their cell division can continue into next cell cycle (following a brief delay) even without the mitotic spindle (an important part) (Castagnetti et al., 2010). This was explained by authors as a possible primitive nuclear division process that is independent of spindle microtubules. However, even for many

4.2 CHROMOSOMAL OR KARYOTYPE CODING

183

advanced life forms, targeting some crucial parts could also lead to drastic system behavior changes needed for survival. There is ample evidence that when crucial pathways or machineries are rendered dysfunctional in various cell lines or tumor tissues, including the DNA replication pathway, the metabolic pathways, etc., this is often when the survivors emerge. c. “Important” parts could be less important under experimental conditions. Example 1: Rewiring the gene network by adding “parts” results in less drastic changes in the phenotype than expected. To examine the gene regulatory interaction, promoter regioneopen reading frame (ORF) fusions were constructed and stably integrated in the E. coli genome, which changes the network structure by adding new connection. However, the vast majority of added network connections (in a panel of 598 gene networks) did not display new phenotypes, including these highly connected hub genes. This illustrated the importance of plasticity of the E. coli genome and the lesser importance of altered genetic parts or even the additional altered network connections (Isalan et al., 2008). Example 2: Some ultraconserved DNA elements can be deleted. It is generally accepted that most evolutionarily conserved DNA sequences are crucial for the organism. This idea was examined by removing four ultraconserved elements from mouse. These four noncoding sequences (from 222 to 731 bp) are 100% identical among human, mouse, and rat. Surprisingly, as pointed out by authors, deleting these ultraconserved noncoding sequences (parts) failed to generate the expected phenotypes. To maximize the likelihood of observing a phenotype, we chose to delete elements that function as enhancers in a mouse transgenic assay and that are near genes that exhibit marked phenotypes both when completely inactivated in the mouse and when their expression is altered due to other genomic modifications. Remarkably, all four resulting lines of mice lacking these ultraconserved elements were viable and fertile, and failed to reveal any critical abnormalities when assayed for a variety of phenotypes including growth, longevity, pathology, and metabolism. This completely unexpected finding indicates that extreme levels of DNA sequence conservation are not necessarily indicative of an indispensable functional nature. Ahituv et al., 2007

Example 3: Dysfunction of some genes can enhance specific functions: the conflict between parts and the system. It is generally believed that bio-efficiency is at its highest level, following a billion years of evolutionary selection. Many impressive examples have been learned from Biochemistry 101. It is thus surprising to observe that experimentally changing some amino

184

4. CHROMOSOMAL CODING AND FUZZY INHERITANCE

acids of a protein can sometimes significantly increase the specific form of bio-efficiency under investigation (from the activities of a given enzyme to overall metabolic and growth profile). The realization that there is a collaborative and yet conflicting relationship between “parts” and the “system” should explain this issue. Under evolution, the system’s overall benefits will likely overpower the parts’ selfishness, and the status of many parts is not at the optimal level. It is the system’s requirements that matter the most. This is the reason why the better selected parts’ interaction is a balance between specificity and plasticity, as the system’s needs from parts are highly dynamic within unpredictable environments. It also explains why the fuzzy inheritance is fundamental for bioinformation (see later section). Various adaptive mutation experiments have demonstrated that mutations can deliver adaptive advantages by optimizing metabolic pathways and phenotypic performance under defined selective pressure (Hong, 2011; Cheng et al., 2014), illustrating that the optimal potential of molecular pathways is not achieved by wildtype genes. Clearly, under highly selective experimental conditions, the optimization of parts’ efficiency can artificially be achieved. However, such optimization only further emphasizes the importance of the system control corresponding to a specific environment and the constrained function of the parts. In summary, together, despite the diverse nature and different experimental rationales, the above data have collectively and forcefully illustrated that (1) genes are the parts of inheritance with high flexibility. However, they differ from the genomic blueprint. (2) The relationship between gene and phenotype is rather complicated, as phenotypic penetration is limited for the majority of genes, and their functions are dependent on the genomic and environmental context. (3) “Parts” can be switched among different species and can be removed from and/or introduced into the same or different systems, which can or cannot change some features but does not redefine the system. (4) The functions of genes are clearly limited or constrained by other nongene factors, and the power of the gene is significantly decreasing in the postsequencing era. Based on the systems concept regarding the “parts and whole” relationship, as well as the system control principle for a multilevel genomic system, it is logical to search for the genomic context that organizes the parts (genes). This context-defined blueprint must be responsible for emergence at the higher level.

4.2 CHROMOSOMAL OR KARYOTYPE CODING

185

4.2.4 A Chromosomal Set Codes “System Inheritance” In contrast to the gene-defined “parts inheritance,” the concept of “system inheritance” should be used to describe the true genomic blueprint, which codes for how parts interaction occurs within the biosystem in a highly dynamic environment. The challenge is, what type of genomic organization defines the system inheritance if we know that it is not the gene or epigene? As the chromosomes represent the highest level of genomic organization and the genome functions as an emergent platform for all genes and environmental response, they become an obvious choice for carrying system inheritance. 4.2.4.1 Background and Rationale There are many fundamental facts that suggest that the distribution and configuration of the genetic materials along chromosomes are important for the system’s function. First, a majority of different species display different karyotypes, and closely related species often display similar karyotypes (White, 1978; King, 1995; Ye et al., 2007). There are good examples which demonstrate that chromosomal rearrangement is associated with different species. The chromosomal relationship between chimpanzees and humans represents one of the most discussed examples. Zooefluorescence in situ hybridization (FISH) (cross-species chromosome painting) has clearly illustrated the details of how different chromosomal blocks can be rearranged among different species (Wienberg et al., 1990; Yang et al., 1999; Graphodatsky et al., 2011). In other words, among these different species, and especially among most mammals, the gene content is rather similar, whereas the topological relationship of genes or genetic modules (genes plus regulatory and other DNA elements) is different among species, as reflected by their karyotypes. Second, genome reorganization plays an important role in new genome emergence during cancer evolution (see Chapter 3). Because most of the transitional events of cancer (cancer formation, metastasis, and drug resistance) are achieved by the emergence of a new system in the form of the new genome, in which the generation and evolution of chaotic genomes (which change the previously existing genomic order within the chromosomes) is an essential condition, this also suggests that the rearranged chromosomes change the system’s information, which is the key for evolutionary success (Ye et al., 2018b). Despite the fact that traditional molecular cancer research has been focused on the oncogene fusion, increased studies have instead examined on gene expression dynamics at the global level, as chromosomal translocations can have an impact on the genome-defined information package.

186

4. CHROMOSOMAL CODING AND FUZZY INHERITANCE

Third, one chromosomal function is to either promote or constrain the potential function of DNA sequences. Both the chromatin loop size and expression of the introduced genes are controlled by chromosomal positions (Heng et al., 1996, 2004a), indicating the role of managing genes. The chromosome-related gene silencing effect is another well-known phenomenon. Fourth, it has increasingly been realized that topological information is of ultimate importance in molecular biology (from molecular interaction sites to network structure and to genomic topology). In fact, albeit on a limited basis, the spatial organization of the eukaryotic genome was observed over a century ago (Rabl, 1885; Boveri, 1909, see Boveri 2008). Before the large-scale sequencing era, the concept of chromosome/ chromatin “territories” was introduced (Cremer et al., 2006; Heng et al., 2001a), and various molecular cytogenetic methods have been used to study chromatin/chromosome organization in the context of genomic topology (Heng et al., 1996; Heng et al., 2001b, 2004a, 2004b). Increased evidence suggests that genes and chromosomes are nonrandomly localized within the nucleus (Misteli, 2005; Cavalli and Misteli, 2013; Brickner et al., 2016). For example, a number of gene loci have been illustrated to be positioned at the periphery when inactive, but on their developmentally regulated activation, these are moved to the nuclear center (Kosak and Groudine, 2004; Takizawa et al., 2008); gene-poor chromosomes tend to be positioned toward the nuclear periphery (Parada et al., 2004). In addition, many chromosomal features, such as gene density, size (base pair length), and co-regulated gene activity, have been linked to their organization in nuclei (Croft et al., 1999; Bolzer et al., 2005; Rajapakse et al., 2009). Fifth, following various genome sequence projects with comparative information, increased attention has been paid to the pattern of gene distribution along chromosomes, the potential mechanism underlying this distribution, and its biological significance. The conserved gene order along chromosomes has been studied in both prokaryotic and eukaryotic organizations. In prokaryotic systems, some gene pairs are conserved among different bacterial and archaeal genomes. The proteins encoded by conserved gene pairs appear to interact physically (Dandekar et al., 1998). A phylogenetic analysis illustrated that a different arrangement of a cluster of genes involved in division and cell wall synthesis separates bacilli from other bacteria, suggesting that the relationships between these genes are not random and likely involve growth and division in bacteria (Tamames et al., 2001). In addition, significant patterns of gene order are observed within, as well as between, the genomes of Haemophilus influenzae and E. coli, and these functionally related genes tend to be neighbors more frequently than do unrelated genes (Tamames et al., 1997).

4.2 CHROMOSOMAL OR KARYOTYPE CODING

187

Overall, it could be universal in prokaryotic genomes that gene order is extensively conserved between closely related species but rapidly becomes less conserved among further distantly related organisms; even between some very distant species, remnants of conserved gene order can be found. Furthermore, there are examples of especially well-conserved clusters of genes, including the genes for ribosomal proteins (Nikolaichik and Donachie, 2000) and the dcw cluster (a group of genes, e.g., 16 in E. coli, which is involved in the synthesis of peptidoglycan precursors and cell division) (Mingorance and Tamames, 2004). Thus, the degree of gene order conservation can be used for phylogenetic measurement. It was further concluded that the conservation of gene order between different organisms is emerging as an informative property of genomes (Tamames, 2001). However, the mechanisms of maintaining gene order are not well understood, as it is challenging to explain the conservation of gene order in prokaryotic organisms using only the concept of operons (in which the issue of gene order seems to rank higher than the concept of operons, as these clusters often span more than one operon) (Lathe et al., 2000) and lateral gene transfer. In eukaryotic systems, the lack of operons, and the general belief that chromosomes are simply the vehicle of the genes, leads to the general view that gene order is random, which reflects the misconception of the genome as a bag of genes. Furthermore, there has been a strong belief that different species should be defined by different genes during the long period of time that allows the accumulation of new genes (the large number of genes in eukaryotes seems to have provided hope for identifying many speciesspecific genes). The comparative genome sequence data directly question these assumptions. After all, the sequences are not randomly distributed along chromosomes, and despite the fact that most eukaryotic genomes lack operons, they display some gene clusters that are associated in function, even though their DNA sequences are unrelated (Da´vila Lo´pez et al., 2010). Such nonrandom gene order seems to have biological significance. Genes with similar expression patterns, and some functionally related genes (such as genes from the same metabolic pathway), tend to cluster more often than what would be expected by chance (Cho et al., 1998; Boutanaev et al., 2002; Cohen et al., 2000; Lercher et al., 2002; Lee and Sonnhammer, 2003; Schmid et al., 2005); additionally, a significant number of genes responsible for subunits of stable complexes are located within close proximity to each other (Kleinjan and Lettice, 2008; Trinklein et al., 2004; Yang et al., 2007). Of course, there are some classic examples of gene clusters in eukaryotic systems, as in the vertebrate b-globin locus, and the Hox and histone genes. There are some examples/explanations that illustrate the formation of gene clusters during evolution. One is the relocation of genes during

188

4. CHROMOSOMAL CODING AND FUZZY INHERITANCE

evolution. The DAL cluster is the largest metabolic gene cluster in yeast (with six adjacent genes encoding proteins that enable S. cerevisiae to use allantoin as a nitrogen source). These genes, which were previously scattered around the ancestor genome, became relocated to a single site through a set of simultaneous genomic rearrangements, leading to a biochemical reorganization of the purine degradation pathway, which switched to importing allantoin instead of urate (Wong and Wolfe, 2005). Another example involves a region of the genome containing the maltase and larval cuticle gene clusters. This cluster appears to have been relocated between nonhomologous chromosomes in Drosophila (Vieira et al., 1997). Another example is that of the paralogous genes that are clustered following gene duplication. As for the mechanism of how the genome maintains these clusters, it refers only to evolutionary selection, without clear details. Sixth, systems biology has become increasingly important in regard to how individual genes function within the complex genetic network. Despite the large amount of research on network dynamics, it is not clear how genomic topology plays a role in defining network structure and influencing its behavior. Different species must have their own unique boundaries for their genetic network, and therefore, the species-specific genomic landscape must be considered in regard to the formation of genetic network. Based on the following observations that (1) the genome is made up, in part, of a series of ordered chromosomal blocks within nucleus and (2) the location of specific genes in the genome is significant for the genes’ function, including the likelihood of specific translocations that contribute to cancer (Roix et al., 2003; Meaburn et al., 2007; Heng et al., 2006aec), the order of genes or DNA sequences along a chromosome must be involved in the formation of the network. Seventh, as mentioned in the previous session, “parts” inheritance has so far failed to provide a reasonable explanation for the phenotypic differences among species. For example, it is hard to explain the vast range of diverse bodies seen in the animal kingdom (from sea anemones to humans) using only differences in genes and environments. If it is not the gene number or the genes encoding “housekeeping” proteins (e.g., enzymes, histones) (as they are remarkably conserved among species) or the genes encoding “toolkit or regulation” proteins (e.g., transcription factors, molecules for cell signaling) (these are even more conserved than housekeeping genes) or the environments (as many different species live in the overlapping environments), then which genomic elements are mainly responsible for these drastically different phenotypes? Following years of examining the coding regions, a new trend (and a more challenging task indeed) is to examine the intergenic regions (including diversified enhancers).

4.2 CHROMOSOMAL OR KARYOTYPE CODING

189

Rather than continuously studying individual “parts,” another new approach is to search for the highest level of genomic organization for the answer (Heng, 2007a, 2009). As the order of genomic elements (genes, or DNA sequences, or chromosomal blocks) along and between chromosomes is unique for most species, such important topological information should be carefully examined, as it likely provides the crucial contribution for the emergent properties of system inheritance. 4.2.4.2 The Model and Its Prediction Based on the above facts/syntheses, we further introduced the idea that order of genes and/or other DNA sequences or blocks along and among chromosomes represents a new layer of information, and we thus proposed that the entire set of chromosomes of a given species is a new genomic coding, which defines the network structure (how individual genes interact) and serves as a blueprint or system inheritance (Fig. 4.1). Within this model, the genomic context of genes can be defined as genomic topology (genomic context ¼ gene content þ genomic topology). Furthermore, it suggests that the physical position of the gene (or the address of the genes in the nucleus) is important for network dynamics. In other words, the karyotype defines the boundaries of a network structure for a given species, which integrates the network into the genome-defined system (Heng, 2009, 2015; Heng and Regan, 2017; Heng et al., 2018). Moreover, this model integrates the concept of emergent properties with gene/environments interaction, in which the gene address within the nucleus functions as a structural element for network formation, and it considers genomic topology as an inheritable key initial condition.

FIGURE 4.1 This diagram illustrates how the genome context defines the network structure. Two chromosomes are drawn, representing all chromosomes within the cell and depicting the way in which genomic physical interactions affect networks. When the genome context changes (represented by translocation) during a “shattered genome” episode, the physical relationship between the same gene sets (genomic topology) changes and so does the pattern of the network (represented by the letters). Reused from Heng, H. H. (2009). The genome-centric concept: Resynthesis of evolutionary theory. Bioessays, 31(5), 512e525. https:// doi.org/10.1002/bies.200800182.

190

4. CHROMOSOMAL CODING AND FUZZY INHERITANCE

For the sake of simplicity, we have illustrated the relationship between gene order and network structure using two metaphase chromosomes. The real case should involve more chromosomes and exist at the level of interphase chromatin (Fig. 4.1). This model states and predicts the following: 1. It is not just the gene content but the genome context (i.e., genomic topological information) that defines the system inheritance. When there is a certain extent of gene numberemediated complexity (i.e., when a system has a sufficient number of genes), a slight increase or decrease in gene number will be tolerated by the system. 2. Karyotype changes (in the form of translocations, aneuploidy, sizable deletions or duplications, and/or the insertion of a large amount of “foreign” genomic fragments by horizontal gene transfer, including transposable elements [TEs]) can alter the system inheritance. The majority of species should thus display distinctive karyotypes. Changing the order of genes (even without directly changing the genes) could change the function of a specific gene and/or genome’s function, which is reflected by phenotypic changes. When crucial modules are involved, some changes could be fatal. 3. Even a single translocation can drastically change the transcriptome of the genome beyond the translocated region. Similarly, a single chromosome number change could have an impact on other genes outside of this chromosome. 4. Under certain conditions, some instances of aneuploidy or translocations can be tolerated, if these changes do not drastically impact the essential functional modules. However, when the environments are drastically different, the phenotypic impact of these seemingly harmless or moderate changes could become clinically or evolutionarily significant. 5. The order of genes can be altered by other genomic events in addition to overall genome reorganization. It is possible that small size of CNVs or TEs allows them to be tolerated and evade monitoring mechanisms. 6. The topological distribution of lower-level agents can have an impact on the emergent properties of phenotypes. The molecular interactions within a cell are topology-specific, and the microsubcellular environment is not uniform, even within the nuclei and cell. 7. Karyotype alterations should be commonly observed in diverse complex diseases and illnesses. However, the seemingly stochastic nonclonal chromosome aberrations (NCCAs) are likely dominant. It is possible that NCCAs can also alter the function of gene mutations by changing the genomic context.

4.2 CHROMOSOMAL OR KARYOTYPE CODING

191

8. Some of these altered karyotypes will have adaptive advantages. It is possible that a certain degree of NCCAs can have an important function for cellular heterogeneity. Different individuals should have their own baseline of NCCAs. As a trade-off, increased NCCAs can often be linked to abnormal cellular conditions. Elevated NCCAs could be used as an index for measuring elevated system instability. 9. Unlike germline cells, many karyotype changes in somatic cells are frequently NCCAs. The quantitative value of NCCAs, rather than a specific clonal chromosomal aberration (CCA), is more useful for predicting evolutionary potential. In the case of cancer, however, some well-known CCAs can serve as a marker for diagnosis. 10. While this model focuses on the genome- or chromosome-level reorganization, it still allows for some genes to play a role in further system modifications. It is possible that gene accumulation can occur following genome reorganization, increasing the quantity of copies of an individual gene within the cell population, as we have suggested in cancer model (see Chapter 3). 11. System inheritance encodes a range of potential phenotypes, and what the real phenotype will be is defined by environmental interaction (see Section 4.3). All of a sudden, by accepting the karyotype as a new form of genomic coding, many major genomic confusions can now be explained. Despite that most of these confusions are highly diverse, they all, more or less, involved the issue of system inheritance. For example, the story of knocking out the MYO1 gene in yeast (an example in which the loss of function due to deleting a specific key gene can be restored by a changing chromosome status, Rancati et al., 2008) can now be explained as the emergence of a new system (achieved by chromosomal number changes), which creates new a way of cell division without the original gene. The power of drug-resistant clones of cancer cells is caused by the newly emergent systems resulting from rapid and massive genome reorganization following evolutionary selection. Even the story of the sponge’s availability of genes and their simple ancient phenotype, which is otherwise hard to understand, can be satisfyingly explained. The sponge has enough genes, but the relationship among these genes, or the genomic coding, determined the fate of this species. Sexual reproduction further “fixed” its fate, such that these animals cannot escape the circumstance of having many genes, but in a spongespecific genomic topology. Perhaps the most interesting example is the genomic relationship between human and chimpanzee. With a 98% similarity in gene sequence, scientists have been puzzled by how we are so closely related genetically and yet behave so differently. However, with

192

4. CHROMOSOMAL CODING AND FUZZY INHERITANCE

the realization that the karyotype-defined system inheritance is different between human and chimp (the human has 46 chromosomes while the chimp has 48 chromosomes), the system behavior difference is no longer surprising. Equally interestingly, the remarkable similarity of the karyotypes between human and chimp (with one chromosomal fusion and a few inversions) clearly indicates the closest evolutionary relationship between them among all mammals, supporting the idea that chromosomal coding is closely related to speciation. Similarly, in addition to the fact that the genetic (gene) similarity of human and mouse is 92%, the similarity of the karyotypes of these species is separated by approximate 250 events of the genome reorganization. Currently, there are many experimental data that support this model’s predictions, even though many of these experiments were originally designed for other purposes. Some examples are listed here. More detailed discussions can also be found in Debating Cancer, Chapter 5 (Heng, 2015). • There are many gene clusters, such as the Hox gene cluster, the vertebrate b-globin locus, and the histone gene cluster. • The synteny (the conservation of blocks of gene order) relationship is well-illustrated in animals and many plants, especially in mammals. • The chromosome constrains a gene’s function (Heng et al. 1996; 2004). • Transcriptome changes involve the entire genome, far beyond the specific genomic alterations introduced (Heng, 2015; Stevens et al., 2013a, b; Ye et al., 2018b). • A novel trait can evolve through genomic rearrangement and gene amplification in bacteria (Blount et al., 2012). • The formation of specific pathways can occur by the formation of new gene clusters (Wong and Wolfe, 2005). • The link between genome alteration and diseases (as well as organismal macroevolution) is common (Heng et al., 2013c; Heng 2017b). Overwhelming chromosomal changes offer better clinical prediction power than gene mutation data (Ye et al., 2018b). 4.2.4.3 The Mechanism and Significance of Preserving Chromosomal Coding Following the publication of this model, one frequently asked question about the mechanism of preserving and passing chromosomal coding is the following: if the order of DNA sequence along a chromosome is a coding, how does the genome system keep that order? It is rather straightforward to understand the mechanism of how an individual gene preserves its genetic code: through the precise process of AT/GC pairing during DNA replication (including the DNA repair

4.2 CHROMOSOMAL OR KARYOTYPE CODING

193

mechanisms). As for how such a process can occur at the entire chromosome level, in addition to AT/GC paring at the DNA level, there must be a mechanism to maintain the overall sequence order along chromosomes. As will be discussed in Chapter 5, a number of “filters,” including meiotic paring, can provide such a function (e.g., the Y chromosome unpaired region is often unstable and subject to massive DNA loss during evolution). As a big surprise to us, it became clear that the main function of sexual reproduction is to maintain system inheritance by preserving the karyotype (order of genes), rather than just increasing gene-level diversity (Heng, 2007b, 2015; Gorelick and Heng, 2011). Such an understanding can not only address a key question in evolution (the function of sex) but also emphasize the importance of chromosomal coding for macroevolution. On one hand, it preserves the genomic information for different species (by maintaining the system); on the other hand, when new chromosomal coding is emergent, it generates the potential for speciation (see Chapter 6). This realization makes a lot of sense of many observed phenomena which were previously hard to understand. For example, while a given genome can form new clusters during the evolutionary process (Wong and Wolfe, 2005), the main function of a genome system’s maintenance is to preserve the chromosomal coding. Despite the diverse mechanisms involved, common evolutionary mechanisms create new chromosomal coding (by macroevolution) and preserve it (by the function of sex) as soon as it is created. This message was amplified by the evidence that the correct genomic topological arrangement (or simply the “gene order”) is essential for constructing synthetic life forms according to the original sequencing information. Craig Venter’s group has created the world’s first synthetic life form by synthesizing DNA molecules and allowing them to function within a “borrowed” cellular environment. Under such conditions, synthesized DNA can provide the genetic instructions for a “created system.” Despite the historical significance of this milestone experiment (which, by the way, has occupied 20 scientists for more than 10 years), one needs to realize that this is not about computer-designed genetic information, but rather that copied information from a natural organism can work under artificial experimental conditions. Interestingly, the maintenance of the precise order of genes is important for making this work, which illustrates the importance of system inheritance. A recent effort to design and synthesize a minimal bacterial genome has further illustrated the importance of system inheritance in comparison with “parts inheritance” (Hutchison et al., 2016). This is especially significant because, despite understanding the function of all major genes, scientists have failed to put a functionable genetic blueprint

194

4. CHROMOSOMAL CODING AND FUZZY INHERITANCE

together for the minimal cell. According to Venter, “Not one design worked.” Clearly, simply putting important “parts” together is not enough. When asked “Does modern science have sufficient knowledge of basic biological principles to build a cell?” his answer was “The answer was a resounding no.” To build the minimal bacterial genome, his group then used a new approach of modifying the existing natural genome by trimming “extra” genetic materials by trial and error. This artificial evolutionary experiment worked, leading to a functional genome which is smaller than any independently replicating organism we know to date. Equally surprisingly, among these 517 minimal genes, the functions of 149 of these genes which are required by their system are unknown, which strongly suggests that our current knowledge has missed some important biological principles. This exciting experiment forcefully demonstrated that (1) the genome is not just a bag of essential genes, even in artificial experimental conditions, and (2) naturally existing genomes must contain some important, hidden genomic information, and the physical relationships among genes are important. This message, combined with the importance of system inheritance, should be kept in mind by many researchers who are keen on bringing back extinct species from the dead (called de-extinction) simply by manipulating the DNA or gene-related features. Currently, there are different platforms through which to achieve de-extinction. The first traditional option is backbreeding (which is not really a true de-extinction). By selectively breeding based on living species that share traits similar to the extinct one, scientists would artificially amplify some traits that were shared by extinct species. This approach is extremely limited, as regardless of a big or small body size, a dog is a dog. In most cases, the extinct species are different species in comparison with currently living species. Some have also suggested introducing genes from different species to establish some new features or amplify some traits. A second approach is animal cloning. With the maturation of animal cloning technology (from the success of Dolly the sheep to the recent monkey twins Zhong Zhong and Hua Hua) (Wilmut et al., 1997; Liu et al., 2018), animal cloning among closely related species has become an option. In fact, an extinct animal (bucardo or Pyrenean ibex, one of four subspecies of the Spanish ibex or the Iberian wild goat) has been resurrected by cloning for the first time (through the use of frozen skin). However, the clone died minutes after birth because of a lung defect (Folch et al., 2009). As more reliable cloning technology becomes available, the key challenge for de-extinction is to get the fresh tissues access to fresh nuclei in which the system inheritance or blueprint is stored. This is the key condition for nuclei transferebased cloning. Unfortunately, this condition

4.2 CHROMOSOMAL OR KARYOTYPE CODING

195

is often ignored by some who want to bring back the ancient animals simply by sequencing their DNA. Without active chromosome coding information, the individual parts information alone is insufficient for bringing these extinct animals to life. The most popular option is genetic engineering using CRISPR technology, which allows scientists to edit genomes with extraordinary precision. Following sequencing comparison between living species and extinct animals, gene-editing tools such as CRISPR can swap the relevant genes of an extinct animal into the living species’ genome, whereupon the hybrid genome can be implanted into a surrogate. Again, it would be hard to get the chromosomal information for these extinct animals. Perhaps it will be necessary for more attention to be paid to computational biology to reconstruct the chromosome coding. More importantly, like most DNA manipulation methods, CRISPR technology can also trigger elevated genome-level alterations, as reflected by high NCCA rates (Stepanenko and Heng, 2017; Heng et al., unpublished observations). Recent publications have also reported the incidence of increased genomic region reorganization caused by CRISPR (Boroviak et al., 2017).

4.2.5 Why Has Chromosomal Coding Long Been Ignored (If It Is Indeed Important)? In addition to the usual suspects (too many genes/pathways and too little time; no incentive to think differently, but habit of going with the flow for the sake of funding and publications; confidence in the concept that the higher the molecular resolution, the better the technological platforms), answering this question requires some historical and philosophical discussions. Unlike the routine daily practice of science, in which the details of scientific methods matter the most, for establishing some fundamental concepts, philosophical reasoning, historical lessons, and scientific metaphors often play a decisive role. 4.2.5.1 Historical Lessons: Topology Is a Key Piece of Bioinformation While “genetic information” might be one of the overused terms in current genetics, it was 1953 when genes became “information” (Cobb, 2013). The establishment of the DNA model was not only identified as the material basis for gene but also brought about the realization that DNA codes for bioinformation (Watson and Crick, 1953b) (note that this Nature paper differs from the most famous one “Molecular structure of nucleic acids; a structure for deoxyribose nucleic acid,” which was published 1 month earlier by the same authors). The lack of the information concept in genetics has perhaps delayed the acceptance of DNA, rather than protein, as the genetic material.

196

4. CHROMOSOMAL CODING AND FUZZY INHERITANCE

For example, after Oswald Avery identified DNA as the “transforming agent” in bacteria in 1944 (Avery et al., 1944), one obstacle to the immediate acceptance of DNA as the genetic material was this: since there are only four types of nucleotides (ATGC) in DNA, such a boring sequence of uniformly repeating arrangements, shared by all different species, was surely not able to provide the complexity needed for genetic materials. In contrast, protein was favored because of the fact that there are 20 different amino acids in proteins. In addition, protein is abundant in cells and had been actively studied in genetic research. Accordingly, many researchers who were against the concept of DNA as the genetic material unfairly considered Avery’s experiment the result of protein contamination. Experiments/arguments were accumulated, which stated that the arrangement of ATGC within DNA is not uniform, and DNA molecules might differ in the sequence of bases. These represented the necessary appreciation of how ATGC can contribute to the genetic information (by the arrangement of the nucleotides’ sequence). Furthermore, the idea of genetic “code” was gradually formed, particularly under the influence (directly or indirectly) of the newly emergent concepts of control, feedback, and information. Following are some examples summarized by two historical reviews (Cobb, 2013, 2014). Masson Gulland wrote: “There is at present no indisputable evidence that any polynucleotide is composed largely, if at all, of uniform, structural tetranucleotides” (Gulland, 1947). Kurt Stern developed a model of helical nucleoprotein molecules in which the nucleic acid chains were modulated in combination with polypeptide chains, similar to “the modulations impressed on a smooth surface by the stylus of a sound recorder” (Stern, 1947). Erwin Chargaff suggested that “differences in the proportions or in the sequence of the several nucleotides forming the nucleic acid chain also could be responsible for specific effects”. He further stated, “We must realize that minute changes in the nuclear acid, e.g., the disappearance of one guanine molecule out of a hundred, could produce far-reaching changes in the geometry of the conjugated nucleoprotein, and it is not impossible that rearrangements of this type are among the causes of the occurrence of mutations” (Chargaff, 1950). And finally, for many reasons Cobb has mentioned, in the 1953 paper, Watson and Crick stated that “it therefore seems likely that the precise sequence of the bases is the code which carries the genetical information.” The race to identify the genetic coding was “officially” on, and what remains is the history of identifying the genetic codon (for a more detailed story, please see Cobb, 2013). It should point out that the new concepts about cybernetics, information, and coding must have had a profound influence on the understanding of genetic coding. For example, in 1948, Wiener published Cybernetics:

4.2 CHROMOSOMAL OR KARYOTYPE CODING

197

Or Control and Communication in the Animal and the Machine, which popularized his vision of messages, codes, and information. The same year, Shannon published The Mathematical Theory of Communication, which outlined a general mathematical framework for information theory. Also in 1948, at the Hixon Symposium held in California, a gene was described by some biologists as an “information tape” that could program the organismdlike Alan Turing’s “universal Turing machine.” In 1950, Hans Kalmus published A Cybernetical Aspect of Genetics, stating that the gene is a “message” of a “chemical nature” (Kalmus, 1950). Clearly, these ideas have shaped how gene-mediated inheritance works. History often repeats itself. The realization that the order of ATGC represents a genetic coding has laid out the conceptual platform of the search for the genetic codon for gene-mediated parts inheritance, which forcefully illustrates that topology is information for genetic materials. Now, the same field is confused again about whether the order of genes on chromosome (a topological information) is a new type of genomic information, the system inheritance. On the surface, the reasons for these confusions seem to juxtapose former reasoning: Regarding the order of ATGC, researchers used to think that the pattern was repetitive and thus did not provide information. In contrast, researchers think that the gene order along chromosomes does not matter, as they are randomly distributed. Fundamentally, these explanations share the same conceptual flaw: ignoring the genetic or genomic topology that is key information. By briefly reviewing the history of the search for parts inheritance, we hope that the current genomic community can learn from history, and we encourage further research on system inheritance. 4.2.5.2 Accepting System Inheritance Is Necessary in the Search for the Correct Context of Genomic Information Studying the relationship between genotype, environment, and phenotype has been a main tradition of genetics. As discussed in previous chapters, during the long history of Mendelian genetics, most successful genetic studies have been based on a single gene’s function using various model systems. By reading all of these exciting stories in textbooks, students are convinced that gene-defined parts inheritance works. In some opposite cases, the explanations provided were that these cases involve additional genes and need more samples for sufficient analyses or that environmental factors play a more influential role. It is perhaps not realized by students that most data which cannot be explained by parts inheritance are not even reported. That is why there are currently so many surprises from large-scale genomic studies using both normal and patient populations (which go beyond the use of only model systems and involve less bias of sample

198

4. CHROMOSOMAL CODING AND FUZZY INHERITANCE

selection). In fact, reexamining many classic stories can often illustrate that these represent exceptions, rather than general rules. For example, many of the experimental settings in these stories do not involve significant genome-level changes. Moreover, these linear systems featured reduced heterogeneity. Under these conditions, gene-defined parts inheritance can be linked to phenotypes without interference from genome-level alterations. However, when studying complex phenotypes within dynamic environments, and especially, when dealing with disease conditions in which stochastic genomic alterations dominate, the game changes; this has led to confusion, as many genetic principles no longer seem to be working (e.g., the importance of individual genes is drastically reducing). As genetic/genomic information depends on its context, distinguishing parts inheritance and system inheritance is the crucial step for studying context-defined inheritance, which is a key for understanding how genetics and genomics have an impact on phenotypes in a contextdependent manner. There is a series of genomic and environmental contexts that need to be distinguished when designing experiments. Different contexts require different platforms focused on the gene, the genome, or both. Following are some examples of different contexts, their defined types of inheritance, and necessary specific considerations: Individual genes have multiple layers of contexts: other genes, epigenetic status, karyotype, cellular populations, tissue organization, and even individual status, including social interaction. Among these many complex relationships, however, genome-level selection is particularly important, as somatic evolution requires inheritance, and the genome represents the highest level of genetic organization. For diseases that involve somatic cell evolution (such as cancer), genome-level dynamics is crucial. It is easier to experimentally illustrate the genetic contributions of individual genes in some model systems with minimal heterogeneity (including environmental heterogeneity) than in systems with high heterogeneity. The influence of lacking a specific gene can become invisible (less important) in highly dynamic environments in which evolutionary selection is not focused on individual genes, but on the potential of the genome package. It is important to know the genomic context of the study, as normal biological processes and abnormal processes involve different contexts. For example, it is productive to study the gene’s function during developmental processes where the stepwise gene function dominates. In contrast, it is less useful to trace the stepwise accumulation of gene mutations in cancer research, in which punctuated genome-level alteration is a common driver, and the long-expected stepwise accumulation of gene mutations is nowhere to be found (see Chapter 3). Similarly,

4.2 CHROMOSOMAL OR KARYOTYPE CODING

199

distinguishing well-controlled normal processes and often chaotic -pathological processes is of importance. Of equally significance is the fact that because genes (parts inheritance) are tightly associated with microcellular evolution and the chromosome set (system inheritance) is tightly associated with macrocellular evolution, such contexts must be integrated into somatic evolutionary research. Environmental constraints should also be considered in a dynamic fashion, as the status of the environments can clearly have an impact on the pattern of evolution by slowing down or speeding up this process, depending on the combinational conditions. Furthermore, the stability of the genome system (see Section 4.3) and the environments can change the contribution of average and outliers. As such, the survival landscape and the stepwise adaptive landscape could be very different. The acceptance of “system inheritance,” therefore, represents an important change from classical Mendelian genetics to 4-D genomics. More details can be found under “Why has system inheritance flown under the radar for so long?” in Debating Cancer (Heng, 2015). 4.2.5.3 The Limitations of Reductionist Tradition and the Power of Metaphor Blind faith toward scientific tradition has also played an important role. As the reductionist approach worked well for molecular biology before the large-scale genomic era, despite that it has become obvious that parts inheritance is not the blueprint, and the missing heritability is a serious issue, few are interested in searching for new genomic frameworks above the gene or epigene. Instead, efforts have been focused on the issues of sample size, computational power, and statistical methods. Although increased research has illustrated that biosystems are typical complex adaptive systems in which system behavior is defined by the emergent properties of its lower-level components, most researchers are still characterizing lower-level parts with the hope of understanding the systems. The publication of the evolutionary mechanism of cancer (Ye et al., 2009; Heng et al., 2010a, Chapter 3) has vigorously challenged this tradition by arguing that the number of combinational molecular mechanisms of cancer is simply too large to be handled by the current approach. Even if we can practically understand all of them after over a century of studies with unimaginable costs, the clinical value of the majority of them is still extremely limited for individual patients. Now, nearly 10 years have passed. The concept of the evolutionary mechanism of cancer did not slow down the trend of characterizing more molecular pathways. Clearly, this message has not gotten through, even to those who have read our papers. The challenge is to clearly demonstrate the fundamental limitation of parts characterization for future clinical usage, especially when they are so many and so diverse, and to forcefully

200

4. CHROMOSOMAL CODING AND FUZZY INHERITANCE

highlight the significance of system inheritance which should have better clinical implications. During conversations with many researchers, the metaphor of the relationship between building materials (parts) and the architecture of the building and its functions (system behavior) seems to be able to catch people’s imaginations. As illustrated in Fig. 4.2, while the building materials are important, they differ from the architecture which is a key to the function of distinguishing different buildings or structures. One can understand all of the details of how to make bricks, but to build different structures with different functions, only having the materials is not enough; new information is needed. Clearly, the blueprint consists not of how to make building materials but of how to put different materials into specific topological arrangements for function. With this simple metaphor, biologists can realize the limitation of focusing only on parts inheritance without knowing system inheritance.

FIGURE 4.2 Topology diversifies function: the importance of topology in achieving function. While the parts of a system (bricks) are the same or similar, it is the topology of those parts that diversifies function (brick house, path, or arch). Here the relationship between individual parts and the functional whole serves as a metaphor for understanding the relationship between parts (genes) and the whole (genome). The same parts can have different functions.

4.2 CHROMOSOMAL OR KARYOTYPE CODING

201

FIGURE 4.3 Topology unifies function: while the materials or parts of systems (plastic, paper, glass, or metal) can be different, these systems can serve similar functions (cup) by having similar topological features. Equally important, the same material (e.g. glass) can have different functions (skyscraper walls, eyeglasses, hourglass, or colorful beads). Together, topology defines function.

The function of parts or materials (e.g., glass, metal, plastic, or paper) and their relationships with topologically defined functions can also be used as a metaphor (Fig. 4.3). Similar to the relationship between bricks and building structures, glass can serve very different functions depending on how it is arranged in context (as colorful beads, as eyeglasses, as skyscraper walls, as an hourglass, and so on). Furthermore, different parts or materials can have the same function, as long as they have similar topological design. For example, all cups can serve the same function, whether made from metal, plastic, paper, or glass. Here, the type of material used is less crucial under certain conditions. The analogy of these cups illustrates how topological features can unify function. Of course, the above two examples are much simpler than any biological systems, which involve many different types of parts and require the dynamic environmental interactions that comprise system-level regulations. Nevertheless, the importance of the topological information is the same, and distinguishing the parts and the system is fundamental, especially for biological systems. In other words, the same gene can have different function within different genomes, and the different genes can also have the similar function within different genomes.

202

4. CHROMOSOMAL CODING AND FUZZY INHERITANCE

4.3 FUZZY INHERITANCE While heritability is a concept that summarizes how much variation in a trait is because of variation in genetic factors versus environmental factors (Visscher et al., 2008), inheritance is a concept that focuses on how characteristics are passed from one generation to the next. The traditional view of genetic inheritance is rather simple: it is the process through which DNA (genetic materials that code all needed bioinformation) is passed from parents to their offspring, and there is high precision and stability in the number and constitution of chromosomes across generations. In recent decades, there have been efforts to redefine bio-inheritance (Heng, 2009, 2015; Prasad et al., 2015). While such efforts have focused on epigenetics, the concept of karyotype-defined system inheritance perhaps represents an even more profound endeavor. Equally importantly, current large-scale genomic research (including karyotype studies using various in vitro and in vivo systems) has revealed the highly dynamic nature of the genomic landscape in various tissues, resulting from different strategies of evolutionary selection based on the separation of germline and somatic cells. This signifies a new realization: it is ultimately important to focus on heterogeneous genomeeenvironment interactions within evolutionary process where bio-inheritance needs to be less specific or fuzzy.

4.3.1 Rationales for Searching for New Types of Inheritance The realizations that (1) the karyotype represents system inheritance and (2) karyotype changes are mainly responsible for macrocellular cancer evolution immediately conjured some even more thoughtprovoking questions: What is the genomic mechanism and biological significance when daughter cells inherit different genomes (cellular genomic systems) from mother cells? Can such a mechanism be used to explain the basis of karyotype heterogeneity commonly observed in cancer? What is its biological significance for understanding inheritance in general if karyotype heterogeneity can be detected from other noncancer systems or/and if similar types of inheritance can also be detected at the gene and epigene level? Moreover, because it is challenging to explain karyotype-mediated-punctuated macroevolution using the neo-Darwinian evolution theory (which fits well with the gene-mediatedstepwise microevolution), would a new form of inheritance be more relevant to understand organismal evolution? In other words, can “fuzzy” inheritance, as discovered in cellular evolution, be essential to explain the main mechanism of speciation? Thinking through these questions has certainly prompted us to redefine the traditional principles of inheritance in context, especially

4.3 FUZZY INHERITANCE

203

when considering the many differences between gene-centric and genome-defined system points of view, among the organismal and cellular levels, and between macro- and microevolution. Our current understanding of how inherited traits are passed between different generations originated from Mendel’s principles of inheritance, which were illustrated by his experiments in pea plants. Mendel’s law of genetics can be briefly summarized as the following principles (when updated with current knowledge and terminologies): (1) inheritance involves the passing of discrete units of inheritance from parents to offspring (fundamental theory of heredity). Such units of inheritance are now called genes; (2) during reproduction, the inherited factors (now named alleles) that control traits are separated into gametes and randomly reunite during fertilization (principle of segregation); (3) different units (now called genes located on different chromosomes) will be inherited independently of each other which can explain more complicated traits involving more units of inheritance (principle of independent assortment). Many significant modifications have been integrated into modern genetic principles since Mendel’s time. For example, it is not only known that many traits are controlled by multiple genes, but that a majority of traits are polygenic; that environmental interaction plays an important role in a majority of traits; and that there is a rather complicated relationship between dominant and recessive traits because of different levels of expressivity or penetrance. Cases like incomplete dominance (where one allele is not completely dominant over another), codominance (where the phenotypes generated by both alleles are co-expressed), and multiple alleles (like the human gene for ABO blood type) are well known, demonstrating why Mendel’s law of dominance needs to be modified. However, despite the fact that increased genomic “surprises” have seriously challenged some key aspects of gene-based inheritance theory (Chapters 1 and 2), the traditional concepts of inheritance are still very popular, reflected by the following status quo beliefs: genes function as independent units of inheritance; there is a linear relationship between key genes and disease phenotypes; evolutionary dynamics should be studied at the gene level, which is the basis for mathematical analyses; during the process of passing genes between generations, there is a limited degree of genomic modifications because of germline mutations and new genetic recombination from meiosis through sexual reproduction; genomic profiles for a given individual are very stable and the phenotype is conceived of as a largely static entity, at least within the individual’s lifetime; phenotypes are either caused by a single gene or polygenic factors; and by using more samples and better computer programs, we will ultimately find the magic combination of key genes for most diseases.

204

4. CHROMOSOMAL CODING AND FUZZY INHERITANCE

The gene-centric principles of inheritance have been logically applied to somatic cell genetics. Cell division is considered a precise mechanism, by which a daughter cell inherits an identical genomic profile from its mother cell, including its karyotype and gene mutations. Meanwhile, aberrations occur at low frequencies and require prolonged time windows to accumulate. As there is an undeniable reproductive relationship between mother and daughter cells, the passing of inheritance seems obvious. However, explaining how inheritance works when inherited karyotypes between mother and daughter cells are drastically different (as observed from the punctuated phase of cancer evolution) is challenging. Because no system inheritance (karyotype plus gene topology) is passed, what is the inheritance mechanism? Evidently, both the degree of individual genome change and the frequency of altered cells within the population are too high to be explained by the traditional concept of inheritance. True, it is acknowledged that there is a certain degree of variation passed during inheritance among individuals, but the force of “maintaining the same” is dominant (mainly at the species level) when compared with the “force of change” (mainly at individual level within species) (e.g., the low mutation rate and limited chromosomal aberrations). In addition, according to current evolutionary theory, it is assumed that only small variations are selected for increased fitness while many large genomic alterations are automatically eliminated, which is part of the rationale of ignoring chromosomal level of changes in current cancer research. Here is an interesting question: do the principles of inheritance differ between generations of an organism (such as Mendel’s peas) and generations of somatic cells (such as cancer cell populations)? If so, what type of concepts and methods should be used to study somatic cell inheritance? As discussed in a previous section, traditional genetics have focused on gene-defined parts inheritance. One unrealized fact is that in the many model systems used to study genetics, from peas to fruit flies to mice, genomes are faithfully inherited between generations, as achieved by the function of sexual reproduction (Heng, 2007b, 2009) (for more details see Chapter 5). As a result, the phenotypic impacts caused by genome-level alterations are invisible and ignored when studying gene-related genotypeephenotype relationships. In contrast, when studying somatic cells, genome-level alterations often represent a major player compared with individual genes (a good example is cancer evolution); thus, it is hard to completely ignore them. More detailed discussions can be found in Section 5.7.4 (Why has system inheritance flown under the radar for so long?) of Debating Cancer (Heng 2015). In addition, there are other important features among organismal and cellular inheritance that differ: for the many successful classical cases that support Mendelian inheritance, gene-related traits often exhibit narrow

4.3 FUZZY INHERITANCE

205

expressivity and/or high penetrance (within the population). In other words, many exceptional cases involve rather simple types of genotypee phenotypeeenvironment interaction. In contrast, many traits associated with cancer are indicative of the behavior of adaptive systems during different phases of cancer evolution. These traits are complicated by nonlinear variants, which make it hard to establish a causative relationship between genotype and phenotype. Furthermore, the phenotypes are often not “black and white” but associated with a large number of “continuous variants.” Examples include the speed of cellular proliferation, drug sensitivity, and the degree of heterogeneity. All of these features are very fuzzy when compared with classic examples such as the “on”/“off” status of expression for a single gene or the “green”/“yellow” color of peas. Note that these “yes”/“no” features described in textbooks are in fact much fuzzier in reality. Such a realization is part of the rationale to rethink the concept of inheritance (see Chapter 3 and Section 4.3.4). Moreover, cellular populations under well-controlled experimental conditions can simplify the relationship between genotype and phenotype. These studies could be useful when illustrating the link between specific genetic profiles to a given phenotype. Surprisingly, however, such studies can also question the power of inheritance. For example, observed cell-to-cell differences have been explained as a “lack of persistent genetic control.” Accordingly, epigenetic control was suggested as the new mechanism to explain local influence on phenotypes (Locke M, 1990). Equally importantly, the potential large number of cells as well as the much shorter time cycle required for producing new generations also increases the odds of success for outliers, especially under highly stressful conditions. A similar phenomenon would be much more difficult to observe in organismal evolution (see Chapter 6). In conclusion, as there are many significant differences between organismal and cellular populations, it is challenging to understand the genomic behavior of somatic cells using the traditional principles of inheritance. It is therefore necessary to search for a new type of inheritance which can not only solve the puzzles but unify both systems. It is also no longer acceptable to continually ignore chromosomal-level alterations which often dominate the genomic landscape of somatic cells.

4.3.2 A New Inheritance Needs to Explain Heterogeneity: A Key Genomic Feature of Cellular Population The solution is to acknowledge that the inheritance of cellular populations is both real and different from organismal populations, and a new concept of inheritance is thus needed to explain the special characteristics of inheritance in cellular populations.

206

4. CHROMOSOMAL CODING AND FUZZY INHERITANCE

As discussed in Chapter 3, the frequencies of NCCAs in a given cell population are directly linked to the population’s genome stability or instability (both internal and induced) (Heng et al., 2006a-c, 2016a; Heng, 2015). There are four typical representative situations: (1) Population with normal karyotypes and very low frequencies of NCCAs (e.g., very early stage of cultured normal human fibroblast cells or short-term cultured lymphocytes from normal individuals); (2) population with clonal abnormal karyotypes and low frequencies of NCCAs (e.g., earlier stage of immortalized cells or many established stable cell lines); (3) population with clonal abnormal karyotypes and high frequencies of NCCAs (e.g., most unstable cancer cell lines); and (4) population with massive NCCAs (e.g., cells within the punctuated phase of cancer evolution, such as cells following the induction of chaotic genomes). These four situations can be further combined with each other to form more complicated cellular populations. By comparing the pattern of evolution, the degree of heterogeneity, the dynamic relationship between clonal and nonclonal populations, and environmental stresses, the following characteristics of inheritance have been studied/revealed in the past two decades using various model systems: 1. Inheritance of a given unstable cellular population can pass the degree of genomic changes, but not specific changes Most stable cell lines display a rather stable frequency of NCCAs within a certain time window. Such phenomena can be observed from human and mouse systems with and without specific gene mutations or environmental challenges that produce high levels of cellular stress (Heng et al., 2006b, 2006c, 2016a). For those cellular populations, a mother cell passes the same karyotype to its daughter cell, a process that occurs in most cells, and a small minority of cells generates NCCAs. When cells with NCCAs divide, they die off, produce new NCCAs, or generate new CCAs (when environments occasionally select). In addition, which cell will generate NCCAs seems to be a stochastic phenomenon. Interestingly, despite these low frequencies of NCCAs persisting, there is no visible impact on population structure, unless any high level of stress is introduced (which might push a specific NCCA to become dominant). In unstable cell populations (most cancer cell populations), a much larger proportion of cells involve NCCA production. Although various CCAs are often detected, these CCAs are constantly changing. When the majority of mother cells go through cell division, each passes an altered rather than identical karyotype to its daughter cell. In other words, fixed genomes or specific NCCAs are not inherited in unstable cell populations, and the only heritable feature is the proportion of cells with newly generated NCCAs.

4.3 FUZZY INHERITANCE

207

Based on all cell lines analyzed, stable cells pass dominant karyotypes (including CCAs) plus a low frequency of NCCAs (indicating that a given system inheritance dominates while still maintaining a low range of altered system inheritance), whereas unstable cells pass high frequencies of NCCAs without dominant karyotypes (indicating that inheritance is at the high range of potential system inheritance). Clearly, stable and unstable cells display different types of inheritance, respectively. Moreover, for unstable cells, inheritance is about the fixed range of variants rather than a specific type of variant. Ideally, we would like to trace each cell division to make sure that most cells can generate NCCAs in a highly unstable population, but it is technically challenging as spectral karyotyping (SKY) protocol requires cell harvesting, which kills the cells under examination. Nevertheless, when analyzing the genomic profiles of cell populations, there is no clonal karyotype but an overwhelming difference of karyotypes between close generations, which indicates that a lack of system inheritance is the most likely, or perhaps the only reasonable explanation. Such an explanation is strongly supported by karyotype profiles of cell populations immediately following the induction of genome chaos, where the majority of cells display different karyotypes. 2. System inheritance is unstable for many cell lines and can be drastically altered during crises Somatic cell division is considered asexual reproduction, which differs from sexual reproduction (see Chapter 5). Because of the highly dynamic nature of the genomic landscape in somatic cell populations (required by cellular adaptation and the result of lacking a sexual filter to maintain genome integrity) (Horne et al., 2013a; Ye et al., 2018b), genome profiles in somatic cells are constantly changing. As a result, a given system inheritance is often short-lived and frequently replaced by other system inheritance (reflected as the NCCA/CCA dynamics). This is very different from system inheritance among generations of the same species with sexual reproduction, where the system inheritance remains the same/intact as long as the species survives. The limitations of cellular inheritance (especially for various in vitro systems) are as follows: there is no long-term consistency if cellular dynamics are high; system inheritance is more sensitive to experimental manipulations and environmental factors; the multiple levels of genomic interaction (especially the interaction between genome alteration and gene alteration) reduce the power of individual genes; evolutionary selection can promote outliers under crises, which reduces the ability to predict a specific genomic change or specific cellular inheritance based on the average profiles; and finally, it may be more challenging to replicate experimental results when the cell population is highly dynamic.

208

4. CHROMOSOMAL CODING AND FUZZY INHERITANCE

The short-lived system inheritance is particularly visible during the punctuated phase of cancer evolution. As illustrated in passage 7 to 54 in Fig. 3.4, the system inheritance for each stage was replaced, as reflected by different karyotypes. In contrast, passage 54 to 109 is within the stepwise evolutionary phase, and the system inheritance is thus maintained. Similar short-lived inheritance can also be observed from induced genome chaos. Although the molecular details regarding how genome reorganization occurs need further research, it is clear that new genome emergence in crisis situations is the driving force for the “phase switch” that drastically alters inheritance. This also underscores the importance of how harsh environments change system inheritance. 3. A single cell can pass heterogeneity to an entire population The inherited degree of karyotype changes in a population, in fact, is its degree of genome-level heterogeneity. To illustrate that heterogeneity can be passed from generation to generation, it is necessary to demonstrate that the heterogeneity of a population can be passed by one randomly chosen cell. A panel of single cells was isolated from different cell populations representing different degrees of genome instability. By comparing the karyotype profiles of a parental cell population and a panel of single cellederived subpopulations, we confirmed that for populations with stable genomes, new cell populations inherited identical karyotypes from the single isolated cell. In contrast, a single cell picked from an unstable population could not pass down the same karyotype. Rather, it generated a cell population with altered genomes. Furthermore, the level of heterogeneity in the daughter cell population mirrored that of the parent cell population. Thus, a single cell can restore the same degree of heterogeneity found in the parent population. In other words, a single cell can pass a similar degree of heterogeneity to its daughter cell population (Heng, 2015; Abdallah et al., unpublished observations). Interestingly, under special circumstances, the behavior of some single cells is drastically different from others. These single cells will no longer produce a population like the parental population, but a new population with a clonal genome. These cells are responsible for “phase transitions” or macrocellular evolution. Examples include all major transitions in cancer evolution, all of which represent the success of emergence for outliers under cellular crises coupled with high evolutionary selection pressure. 4. Karyotype heterogeneity is associated with other cellular heterogeneities At the cellular level, there are many heterogeneous features besides the karyotype, including cellular growth, differentiation, drug resistance, sensitivity to various stimuli, morphological transition, immortalization,

4.3 FUZZY INHERITANCE

209

migration, cloning formation, capability to generate tumors in animals and death. Based on the immortalization model (Fig. 3.4) and genome chaos model (Fig. 3.5), there are nice correlations between stages of cancer evolution, patterns of karyotype change, and specific cellular features, such as immortalization and drug resistance, all of which can be achieved by the emergence of new genome systems. Increased data have suggested that karyotype heterogeneity likely causes or promotes other types of heterogeneity, even though it is challenging to pinpoint causality within a complex and adaptive system. Nevertheless, a comparative analysis was performed using a series of single cells isolated from stable and unstable cell populations. Both the genome and growth profiles were compared for the cell populations, each of which was derived from a single cell. Such comparisons can determine whether or not there is a close link between karyotype heterogeneity and growth heterogeneity. To test if the growth profile (whether the population grows quickly or slowly) can be inherited, the proliferation of single cellegenerated populations was scored. Our studies showed that when the genome is unstable, the traits of fast or slow growth cannot be passed through a single cell. Instead, the single cells isolated from fast-growing colonies always produced cells with a spectrum of proliferation speeds, including cells with very fast and very slow growth. Similarly, single cells isolated from slow-growing colonies cells also produced fast-, average-, and slow-growing cells, even though they often displayed on a nonenormal bell curve. The high-level heterogeneity of growth in both fast- and slow-growing cells was similar to the growth heterogeneity of the parent population. Thus, the growth heterogeneity is indeed passed on. In contrast, cells with stable genomes produce more homogenous growth patterns that are similar to the parent population. After multiple generations, selection of stable fast- and stable slow-growing cells did result in the selection of a specific karyotype and growth rate (unpublished observations). Clearly, the genome profiles determine the growth patterns. Further studies also linked genome heterogeneity to overall tumorigenicity of the cells, as well as potential drug resistance, all of which represent key features of cellular heterogeneity outside the domain of karyotype heterogeneity (Heng, 2015). Recently, cell death heterogeneity and transcriptome heterogeneity were illustrated as heritable and linked to genome heterogeneity (Abdallah and Heng, unpublished observations). We anticipate that most cellular heterogeneities must have an inheritable basis. 5. Inheritance of heterogeneity: the mechanism of heterogeneity By summarizing the above key features specific to cellular inheritance, it becomes clear that these features are essential to understanding the mechanism of how cellular populations pass on karyotype heterogeneity.

210

4. CHROMOSOMAL CODING AND FUZZY INHERITANCE

The inherited degree of genomic changes, rather than specific changes between cellular populations, represents genome heterogeneity; a feature passed by a single cell is also the degree of the heterogeneity that is shared by both parental and offspring populations. As for the time limitation for cellular inheritance, it is known that the degree of heterogeneity can be altered during cancer evolution, in particular during phase transitions. Equally importantly, high levels of environmental stress can play an important role in changing the inheritance of heterogeneity and selecting outliers, which also explains many puzzling relationships between genotype and phenotype during macrocellular evolution. Finally, the close relationship between genome heterogeneity and other cellular heterogeneity further explains the multiple related factors that can contribute to cellular heterogeneity, which highlights the importance of environmental interactions, and why the genome has the ultimate impact on phenotypes. As we discussed in Chapters 1 and 2, the issue of genomic heterogeneity has been ignored before recent large-scale genomic studies on various human diseases because of the incorrect assumption that considers heterogeneity insignificant “noise.” Now, knowing that (1) heterogeneity is the key feature of life, whereas any given state of homogeneity is temporary and conditional and should be considered a special case of heterogeneity and (2) despite the fact that most researchers prefer to deal with homogeneity (for essential similarity in their research), the results achieved without integrating the reality of heterogeneity are often incorrect. Many biological concepts need to be reexamined under the new framework.

4.3.3 The Inheritance of Heterogeneity in Organismal Systems in Both Physiological and Pathological Conditions As heterogeneity is the enduring principle of living organisms, it must be related to many biological aspects beyond cellular populations. Our immediate interests were investigating (1) how commonly karyotype heterogeneity can be observed from noncancer systems and (2) whether this type of less precise inheritance can be observed from genetic and epigenetic levels (e.g., in gene mutations, CNVs, and mitochondrionmediated inheritance). In other words, how universal is this type of inheritance? 1. Inheritance of heterogeneity can be universally observed in all types of organisms It turns out that all types of inheritance examined display inheritable heterogeneity.

4.3 FUZZY INHERITANCE

211

For example, karyotype heterogeneity can be observed from different normal and abnormal tissues during developmental stages and under various physiological conditions. The high level of karyotype heterogeneity has also been validated by different sequencing projects. One remarkable sign of progress is the acceptance that somatic mosaicism is a common feature in humans; previously, it was unthinkable. Based on the high association between karyotype heterogeneity and other diseases, it has been proposed that elevated genome instability (which is revealed by elevated karyotype heterogeneity) might serve as a general mechanism for many common and complex diseases or illness conditions (Heng, 2015; Heng et al., 2006a-b, 2016a-b; Ye et al., 2018b; also see Chapter 8). In addition, a low degree of karyotype heterogeneity has been confirmed in stable populations of budding yeast. As different aneuploid karyotypes can exhibit different degrees of CIN and the degree of inherited heterogeneity is much higher than the mitotic error rate (Zhu et al., 2012), this study represents an example of inherited karyotype heterogeneity in yeast. Moreover, recent large-scale genomics studies have discovered overwhelming somatic variations at the gene and epigene levels. For instance, the high level of de novo gene mutations has been documented by personal genome sequencing and especially by single cell sequencing (Conrad et al., 2011; Veltman and Brunner, 2012; Wang et al., 2014) (for more examples see Section 4.2.3). The high number of gene mutations has also been detected from normal tissues, and most individual cells do not share these gene mutations (Martincorena et al., 2015). The long tail distribution of the gene mutations is now observed in the yeast genome, which challenges previous viewpoints that such patterns were cancer-specific. Similarly, a high level of epigenetic stochasticity in methylation patterns has been demonstrated across multiple normal and diseased tissue types (Landan et al., 2012; Schultz et al., 2015). The highly complicated epigenetic landscape revealed from single cell analyses is mind-boggling. Even long before the genome era, the observation that cell populations can inherit a given degree of heterogeneity has been noticed in many experimental systems (including bacteria, stem cells, and various cancer cell lines), even though different terms/concepts have been used to describe such phenomena. Specifically, single and often small numbers of cells (including some isolated cells based on cell surface markers) can reconstitute a diverse phenotypic heterogeneity of the parental cell population (e.g., stem cell markers, drug resistance, morphological features). For example, despite that antibiotic treatment can kill 99.99% of a bacterial population, the surviving bacteria (approximately one per million) can regrow a population with the same phenotypic features as the parental population, a phenomenon known as “bacterial persistence” (Bigger, 1944). The reinstallation of similar population patterns can be

212

4. CHROMOSOMAL CODING AND FUZZY INHERITANCE

observed from isolated stem cells that generate mixed stem cell and nonestem cell populations, epithelial/mesenchymal transition, as well as drug resistance in cancer cells (Heng, 2015; Jolly et al., 2015). Recently, there is increased interest to study this phenomenon, but the mechanism behind it remains unclear despite many different hypotheses (Balaban et al., 2004; Golding et al., 2005; Norman et al., 2015; Jolly et al., 2018). In the case of cancer cell drug resistance, explanations include (1) stochastic cell-to-cell variability, (2) switching between dormant and active states, (3) treatment-induced cellular reprogramming, (4) cell-to-cell variability generated by protein conformation dynamics, and (5) proliferation of successful outliers as promoted by genome chaos. We anticipate that many more specific mechanisms to explain how cell populations pass heterogeneity can be found in published literature. Of course, the above examples of phenotypic heterogeneity only represent inherited features based on cellular or bacterial populations. Can a similar phenotypic heterogeneity be observed in more advanced organisms? The answer is a resounding yes. In fact, there are often multiple layers of heterogeneity involved for more complicated systems. Perhaps some of the most striking examples involve different individuals of the same species who inherit the same genome with drastically phenotypic differences. In ants, for example, although different castes have the same genome, each caste has remarkably different morphology, physiology, and behavior. As each female embryo has the potential to become a queen, major, or minor worker, environmental stimuli seem to be the key factor to set the specific developing trajectory toward its ultimate caste classification. These environmental factors include the chemical components and amount of larval nutrition, pheromones, and temperature (Ho¨lldobler et al., 2009). Recent studies have identified DNA methylation as a crucial factor, supporting the importance of epigenetic regulation in determining the caste of a developing individual (Chittka et al., 2012). Interestingly, it was also found that queen pheromones can impact the DNA methyltransferase activity of their daughters and keep them as sterile and industrious workers (Holman et al., 2016). In addition to ants, similar phenomena can be observed within the ranks of bee castes. The influence of environmental factors on morphology can also be found in reptiles and fish, where incubation temperatures change the sex ratios of offspring (Pieau et al., 1999). In addition to explaining the distinct kinked-tail phenotype in mice (methylation at a retrotransposon site within the Axin-fused allele) (Rakyan et al., 2003), increased studies have focused on heritable epigenetic modifications (Jablonka and Raz, 2009; Jablonka, 2012, 2013). Recently, chromatin states have been linked to epigenetic landscapes to explain epigenetic inheritance (Jung et al., 2017).

4.3 FUZZY INHERITANCE

213

It is well known that plant species respond to various environmental conditions and display phenotypic plasticity. Classic examples include sun versus shade leaves, heterophylly, environmental control of cleistogamy, and the response to herbivory, mowing, and competition (Schlichting and Levin, 1986). For example, heterophylly is widespread among land plants (Nakayama et al., 2017). Different mechanisms, including phytohormones and specific gene-mediated regulation, have been used to explain heterophylly (Nakayama et al., 2014). There are increased reports that explain phenotypic plasticity using the function of TEs and epigenetic impacts. For example, phenotypic heterogeneity observed from the toadflax flower (Linaria vulgaris), which was first described and puzzled by Linnaeus, has been explained as caused by the heritable methylation status of a single gene (Lcyc) (Theissen et al., 2000). Another example is the mechanistic study of phenotypic heterogeneity in the alligator weed (Alternanthera philoxeroides), an invasive weed that can colonize both aquatic and terrestrial habitats. Using a genome-wide methylation profiling method, epigenetic variation and its correlation with phenotypic variation were examined. This analysis illustrated the correlation between epigenetic reprogramming and the reversible phenotypic response of plants to specific environmental variants (Gao et al., 2010). Nevertheless, all the above variants, no matter which genomic and nongenomic mechanisms are involved, lie within certain species-specific ranges, which are defined by species-specific fuzzy inheritance within a dynamic environment. After studying the geographic ranges of cretaceous mollusks, it was proposed that there is heritability at the species level (Jablonski, 1987; Zeliadt, 2013). the species varied in their geographic range size, that their survival or duration in the geologic record varied with geographic range, and that this variation in geographic range was “heritable”: that is, closely related species were more similar in range size than expected by chance.

Clearly, the fact that every species has a geographic range can be explained by the fuzzy inheritance range selected by environmental dynamics. 2. Abnormal phenotype and the hidden inheritance of heterogeneity from normal genomes Molecular genetics has effectively established causative relationships between abnormal genotypes (such as mutations) and abnormal phenotypes (such as diseases). The successful identification of defective genes in many Mendelian diseases is thought of as a convincing example. It is also well known that genetic manipulation (inactivation or overexpression of some essential genes) can drastically alter developmental patterns. As for

214

4. CHROMOSOMAL CODING AND FUZZY INHERITANCE

the overwhelming phenotype heterogeneity from a defined genotype, some assumed that there are genetic modifiers of these “phenotypespecific genes” and then additional layers of “modifiers of modifiers” that are responsible. Others consider environmental impact as a main factor. It is well accepted that normal development can be altered by changing environmental factors (from classical transplantation experiments to current approaches to epigenetic change and bioelectric influence). But explanations often go case by case, and there is no general mechanistic understanding. It is thus challenging to know why a particular gene has more deterministic power than others. Moreover, most convincing experimental observations are based on single factor analyses. For example, when a genetic factor is fixed, it is tested against environmental factors, or vice versa. It is much more complicated when multiple factors are involved, which is what occurs in real life. It is thus hard to quantitatively compare how much each of the different elements contributes to phenotypic heterogeneity (among genetic, epigenetic, cellular/individual/social environments, and chance). That is the reason why a large number of specific mechanisms can be linked to the same phenotypic heterogeneity. This opens a door for different investigators to emphasize different mechanisms using the same phenotypic data. Recently, bioelectric signals (along with other physiological inputs) have been illustrated as a novel epigenetic layer that regulates the pattern formation of developmental processes, a good example of phenotypic heterogeneity (Sullivan et al., 2016). Interestingly, the manipulation of bioelectric circuits can override default genome patterning outcomes, resulting in head shapes resembling those in other species of planaria and Xenopus. Because animals used are with wild-type genomic sequence, these experiments suggest that bioelectric signals can alter the patterning of the wild-type phenotype. Furthermore, under some specific conditions, such as spending 5 weeks aboard the International Space Station, one of the amputated fragments (1 animal out of 15) regenerated into a doubleheaded worm. More surprisingly, amputating this double-headed worm in plain water (in a Tufts University lab) resulted again in the same double-headed phenotype. In addition, when tested 20 months after the return to Earth, the space-exposed worms still showed significant quantitative differences in behavior and microbiome composition compared with the control group (Morokuma et al., 2017). These observations strongly suggest that some of these features that were gained during space travel are inheritable even with conditions on Earth. At first glance, these cases might be explained as evidence that the genome is less important than bioelectric circuits in these species for patterning (at least under special situations during the regeneration process). However, further analyses can lead to opposite explanations which support the importance of genome-defined phenotypic heterogeneity.

4.3 FUZZY INHERITANCE

215

The main reasoning and conclusions are as follows: 1) The planaria’s specific features are determined by their genomes in the first place (e.g., it exhibits superior capability for regeneration, as evidenced by the fact that over 20% of the cells in an adult planaria are stem cells). That is why a tiny piece of a planaria (even as little as 1/279th of the original organism from which it is cut) can regenerate into a full organism in only a few weeks (Handberg-Thorsager et al., 2008). This regeneration capability is highly exceptional; it would be challenging to perform similar experiments on many other animal species such as mammals. In fact, examples of heteromorphosis can be found among many organisms, from protozoans to chordates, but it is easier to find in lower forms of animals. 2) Flatworms (Dugesia japonica) are an asexual species. According to the new realization that, in contrast to traditional viewpoints, asexual reproduction often displays high levels of genome alterations (Heng, 2007b, Chapter 5), it might not be correct to claim that these animals all exhibit identical genomes. It is therefore premature to conclude that phenotypic heterogeneity is not caused by genomic alterations, but rather just an epigenetic response. In other words, considering a population of asexual species as isogenetic is not correct. It would be highly interesting to compare genome-level diversity for these creatures. 3) There are multiple genetic/epigenetic/environmental factors that can contribute to heteromorphosis. However, it is useful to distinguish different systems examined. For animals whose germline and somatic cells are separated with a high degree of cellular differentiation, it would be difficult to reproduce similar experiments done with planaria. In addition to space travel, twoheaded planaria regeneration can be induced by treating amputated fragments with other agents that alter levels of calcium, cyclic AMP, and protein kinase C activity in cells (Chan et al., 2014) and by interfering with the canonical Wnt/b-catenin signaling pathway (Gurley et al., 2008). We anticipate that genome-level alterations would be involved here. 4) Regardless of the mechanism of phenotypic heterogeneity, the story of planaria supports the viewpoint that we should not consider most phenotypic heterogeneity as abnormal. As the most important element for evolution is the generation of variants, we should change our negative attitude toward them (Heng et al., 2016b). It is possible that these double-headed worms would represent a new species (if their genome context is changed). It is rather unreasonable to name some species as normal and others as abnormal.

216

4. CHROMOSOMAL CODING AND FUZZY INHERITANCE

The above discussions suggest to us that the phenomenon of phenotypic heterogeneity generated from normal genomes following other treatments should be considered as the specific feature of the genome. In other words, the genome codes for a large array of potential phenotypes most of which will only be visible under certain environmental conditions. These variable phenotypes, including some so-called “abnormal phenotypes,” are often associated with gene mutations or known environmental toxicity.

4.3.4 The Definition of Fuzzy Inheritance and Its Key Differences Compared to Traditional Inheritance The original purpose of introducing “fuzzy inheritance” is to distinguish this new type of inheritance from traditional inheritance. As briefly discussed in Section 3.3.1, the initial interest to study inheritance responsible for karyotype heterogeneity was triggered by NCCA studies. As soon as the frequency of NCCAs was linked to CIN-mediated cancer evolution and the mechanism of NCCAs was linked to the basis of karyotypic heterogeneity, we introduced the term “fuzzy inheritance” to describe the inheritance responsible for passing a degree of karyotypic heterogeneity, not fixed abnormal karyotypes, between cell populations (Abdallah et al., 2013; Horne et al., 2015a, 2015b; Heng, 2015). The idea that the inheritance of heterogeneity might represent a common feature for organismal systems (beyond somatic systems) serves as a key rationale to redefine bio-inheritance. We pursued this idea even though, like most genomic researchers, we believed that the correlative relationship among single geneemediated phenotypes must be precise. As discussed in Chapter 3, this is far from the truth. Following reanalyses of some of Mendel’s data, we realized that high levels of phenotypic heterogeneity also existed in his classical experiments. He, and a majority of later scientists (us included), simply did not recognize them. While the concept of fuzzy inheritance is still in its infancy (there is not sufficient research on this subject, especially from other groups, and our main research papers on this topic are still stuck in the process of review following years of communications within the research community), some key principles and features have emerged. 1. Definition: In contrast to the gene theory, which states that a gene codes for a specific, fixed phenotype, the genome theory suggests that most genes code for a range of potential phenotypes. From this “fuzzy” range of phenotypes, the respective environment can then allow the best-suited status to be “chosen” [Heng, 2015, 2017a]. For example, the gene for pea color codes for an entire potential spectrum of colors, from yellow to intense green (including blends of yellow and green, or green with yellow spots),

4.3 FUZZY INHERITANCE

217

not just two fixed, distinctive colors (yellow or intense green). In cancer, the emergence of “genomic context” adds yet another layer of complexity and instability that pushes fuzzy inheritance’s dynamics to a maximal status. Ye et al., 2018b

Further explanations: a. A fuzzy range of phenotypes: (1) The relationship between genotype and phenotype is nonlinear. The range of potential is inherited, but a specific phenotype within this range is not. The prediction of a specific phenotype within this range of total possible phenotypes is thus not precisely but fuzzily determined. Use height as an example: one can predict that offspring of a tall family likely will be tall, but exactly how tall is a fuzzy estimate. Additionally, under certain conditions, the offspring can be short. (2) Most phenotypes cannot be classified using binary categories (dominant vs. recessive), even for some traditional examples of single geneedefined phenotypes. Again, refer to the phenotypic features of “tall” and “short.” How tall is tall, really? In Mendel’s pea experiment, a stem length of 6e7 feet was considered tall, while a stem length of 0.75e1.5 feet was labeled short. What about 5.5 feet or 3.0 feet? The correct way of categorizing phenotypic heterogeneity is to accept the fuzziness of phenotypes rather than artificially categorizing them into two groups. (3) Most importantly, inheritance is about passing a genomic package that can code an array of potential states, including the tallest heights, the shortest heights, and everything in between. Each heritable package is individual-specific in terms of determining ranges of potential heights. (4) The range of the phenotype heterogeneity of each individual is rather limited compared with the entire range of phenotype heterogeneity of a species. For example, in humans, height ranges from approximately 0.4 to 2.8 m (based on the record heights of the shortest and tallest men, Chandra Bahadur Dangi, 0.546 m, and Robert Pershing Wadlow, 2.72 m, respectively). It is likely that under some conditions, this range can be extended even wider (e.g., improved medical care can keep those tallest and shortest individuals alive; under no gravity, people can grow even taller). Within each family, the range of variation should be much smaller. In a sense, system inheritance has defined the boundary of the species’ range of potential heights. The genomic profile also defines each individual’s height range (Fig. 4.4).

218

4. CHROMOSOMAL CODING AND FUZZY INHERITANCE

FIGURE 4.4 A diagram illustrating the relationship between phenotype (height) of a species and individuals’ genotypes (colored lines). Each line represents the potential range of height for each individual. Members of the same family share the same color. The key message is that the genotype represents a range of potential height.

5) Phenotypic heterogeneity includes all variants, including many “abnormal” variants, as a normal genotype can also code for diseased phenotypes under certain conditions. In fact, many extreme heights are often associated with health issues. The opposite is also true where a typical Mendelian gene mutation will not lead to a disease phenotype under certain conditions. b. Environment factor: 1) The environment (both the genomic environment of specific genes and living environment of the individual organism) plays a selective role for specific phenotypes within the potential range coded by the genotype. One can imagine that different environments will favor specific phenotypes among a range of potential ones (by shifting the peak of the genotype heterogeneity). Alternatively, multiple environmental factors can function as different agents which impact the emergent phenotypes. 2) In a similar and stable environment, parents and offspring will have a similar distribution pattern of phenotype heterogeneity. Occasionally, tall (or short) parents can generate short (or tall) offspring, respectively. In a similar and stable environment, the phenotypic similarity between parents and offspring will be high. In a very unstable environment, the probability for the emergence of an outlier phenotype will increase. In addition, even under identical environmental conditions, these variants exist because of the stochasticity of the biological system (Fig. 4.5).

4.3 FUZZY INHERITANCE

219

FIGURE 4.5 A diagram showing the relationship between genotype and phenotype between generations. The length of the arrow lines represents a range of potential phenotypes (e.g., from short to tall). The red boxes represent the real phenotype. Upper and lower panels represent two generations. Although the same degree of fuzzy inheritance is passed (the same length of the arrow lines), the real phenotype is different because of interaction with environments. Here, the environment functions as a phenotype selector (making selection based on the range defined by fuzzy inheritance).

3) For genotypes with low heterogeneity (and more penetration within the population), environmental impacts are less dominant, while environmental impacts can be more dominant for genotypes with high heterogeneity. 2. Some key differences between fuzzy inheritance and traditional inheritance 1) Conceptually, traditional inheritance puts more weight on the certainty of individual genes, which is best applied to some exceptional traits. The concept of fuzziness is based on complex and nonlinear genome/environmental interactions, which can define the range of heterogeneity but not with specific certainty; this fits well for the majority of traits. In the era of classical genetics, traditional inheritance was very useful. In the era of genomics, fuzzy inheritance should be more useful. 2) System inheritance determines the range of phenotypic heterogeneity of a species and the range of phenotypic heterogeneity of specific parental genetic influence. In contrast, traditional inheritance determines the inherited genes’ influence rather than heterogeneity. 3) The traditional viewpoint states that while the genotype codes with certainty, phenotype plasticity is because of external environmental influences. While fuzzy inheritance does appreciate the environments’ role of selection, it considers the genomic code itself fuzzy in the first place.

220

4. CHROMOSOMAL CODING AND FUZZY INHERITANCE

4) There are some key differences between somatic cell systems and organismal systems. Somatic cells can display both microand macroevolution involving both karyotype heterogeneity and gene/epigene heterogeneity. At the organismal level, the germline has by and large eliminated karyotype heterogeneity during the microevolutionary phase. However, as will be discussed in Chapter 6, during the speciation process, karyotype heterogeneity plays an essential role. Furthermore, despite the importance of the germline (to pass the genomic/epigenomic landscape to somatic cells), the majority of individual phenotypes are achieved by somatic cells. Therefore, understanding fuzzy inheritance within somatic cells is crucial to understanding various human diseases. Most of our research on fuzzy inheritance is based on somatic cell models such as cancer models. As many cancer models represent a system with the highest multiple levels of genomic heterogeneity or, in other words, a system that displays the highest fuzziness of inheritance, it is more straightforward to illustrate how it works in a diagram (Fig. 4.6).

FIGURE 4.6 Two models representing precise (left panel) and fuzzy inheritance (right panel). Each differently colored circle represents a cell with a unique genome (and a specific karyotype), and purple-colored circles represent normal cells. Three generations are illustrated for each model (six cells in each panel). Under a precision inheritance model, normal cells produce normal cells (five purple), and the abnormal cell produces an identical abnormal offspring (one red). Because the aberration rate is very low, the parental population and offspring population display similar, precise karyotype profiles. In contrast, under the fuzzy inheritance model, the majority of cell karyotypes are continually changing, while the degree of heterogeneity is the same among all three generations. Because the degree of heterogeneity is inherited in unstable cell population, each individual cell can pass the same fuzzy inheritance by generating different cells with new karyotypes. In other words, a single cell isolated from an unstable cell population will not pass its same genome to its daughter cell, but a spectrum of heterogeneous genomes instead.

4.3 FUZZY INHERITANCE

221

Much more research is needed for understanding the pattern at the gene/ epigene level. The fuzzy inheritance model can also apply to gene and epigene levels of inheritance. Note that although the precision model is generally accepted, it conflicts with the flood of research involving genomic data.

4.3.5 The Mechanisms of Fuzzy Inheritance It is rather straightforward to address the “why questions” about fuzzy inheritance. The multiple levels of fuzzy inheritance represent an effective genomic strategy for phenotypic plasticity, which is especially required when dealing with dynamic environmental changes within a short time frame. According to the current evolutionary theory, the accumulation of specific gene mutations in populations is thought as the key genetic mechanism of adaptation for long-term, consistent environmental changes. For most short-term environmental changes (that are often drastic or alternate between opposite directions), different genetic mechanisms are needed to achieve quick adaptation, as there is no time for gene mutationemediated genetic changes to catch up with the dynamics of these short-term environmental swings. A better strategy is to install a flexible genomic/epigenomic system that is responsible for phenotypic plasticity. This system would instantly be able to adapt to drastic environmental conditions (including extreme opposites) and all situations in between (average environmental conditions). Here, the advantage of multiple levels of fuzzy inheritance becomes apparent. Specific genes code for a range of potential phenotypes (fuzzy inheritance coded phenotype heterogeneity)drather than an unknown, potentially useful, or useless fixed phenotyped that more aptly fits any environmental dynamics that occur. As for the “how question” or the biological origin of fuzzy inheritance, it was initially thought that various bio-errors cause variants, such as glitches in a computer program, as no system is perfect. More specifically, errors arise during DNA replication/checkpoints/repair and other imperfect chromosomal machineries, for example. While gene mutations and environmental stresses can contribute to increased genomic fuzziness, supporting the link between bio-errors and induced heterogeneityemediated abnormal phenotypes, recent studies have shown that increased genomic heterogeneity can be detected from specific developmental stages (e.g., early development and tissue repair), and a fairly high baseline of genomic heterogeneity is commonly detected in most normal individuals, suggesting that this previously ignored genomic heterogeneity is of positive function. Further studies of cancer evolution (genome chaos in cancer genomes under high stress conditions)

222

4. CHROMOSOMAL CODING AND FUZZY INHERITANCE

(Liu et al., 2014), as well as the biological meanings of RNA polymerase switching in bacteria under stress (Ponder et al., 2005), illustrated that these “errors,” under these conditions, are no longer errors but new strategies for system survival. In other words, increasing system variants is a mechanism to improve the odds of survival and evolution. These types of genomic variants often display much higher frequencies and extreme morphologies compared with variants that are occasionally observed under normal situation that result from the “error process.” Some examples of well-known heterogeneity are listed in Table 4.1. The following are some specific examples that illustrate the diverse molecular mechanisms from different genomic/epigenomic levels that contribute to fuzzy inheritance. These cases will improve our understanding of how heterogeneity impacts genotype-to-phenotype mapping, although this topic commonly focuses on single-nucleotide polymorphisms (SNPs) and CNVs. a) Mechanisms of fuzzy inheritance at the karyotype level Although the core karyotype is stable for a given species, there is a low degree of variants within populations including structural and numerical alterations (e.g., aneuploidy, chromosome translocations, inversions, deletions, insertions, amplifications, ring chromosomes). Most genome-level abnormalities, which are present in every cell of the body, occur in the egg cell or sperm. It was assumed that they occur as an accident and that each type of them has its own molecular mechanism. Aneuploidy: Aneuploidy results from chromosome missegregation during meiosis, representing a major cause of infertility and inherited birth defects. It is estimated that 10%e30% of all human fertilized eggs and about one-third of miscarriages are aneuploid (Hassold and Hunt, 2001). 0.3% of newborns display aneuploidy, even though there are limited cases where aneuploidy is survivable in humans: Down syndrome with trisomy chromosome 21, Edwards syndrome with trisomy 18, Patau syndrome with trisomy 13, Klinefelter syndrome with an extra X, and Turner’s syndrome with the absence of one X. Interestingly, the trisomy 21 mosaicism has been observed even among normal individuals (Hulten et al., 2013). Reciprocal translocation: Reciprocal translocations, in contrast, can involve all chromosomes, although most of the carriers display seemingly normal phenotypes. Despite its high incidence rate (from 1 in 500 to 1 in 625 of human newborns), only about 6% exhibit symptoms involving intellectual disabilities or congenital anomalies. It is important to note that carriers of balanced reciprocal translocations display increased risks of generating gametes with unbalanced chromosomal translocations, an event associated with miscarriages or abnormal offspring. Balanced parental translocations may also be implicated in the pathogenesis of IVF (In vitro fertilization) implantation failure (Stern et al., 1999). In addition, it is possible that some

4.3 FUZZY INHERITANCE

223

TABLE 4.1 Multiple Levels of Genetic/Epigenetic/Environmental Heterogeneity. Multiple Genetic Levels Gene/Nucleotide Level Nucleotide polymorphism Various types of repeats (e.g., microsatellite shifts) Spectrum of mutations (including conditional mutations) Heterozygosity (allelic) Splice variants Gene family members (paralogs) Combinational effects of multiple genes and mutations Genome/Chromosome and Subchromosome Level Copy number variation, microdeletion/inversion Loss of heterozygosity (LOH) Chromosomal translocation/inversion/duplication/rings/isochromosomes Defective mitotic figures Chromosome fragmentation Aneuploidy/Polyploidy Genome chaos Mitochondrial genome alterations Multiple Epigenetic Levels Chromatin folding and attachment to the nuclear matrix Packaging of nucleosomes Position of histone variants Covalent modification of histone tails DNA methylation Noncoding RNAs Change of System Status Independent of Epigenetic Alteration Environmental Influence on the Multiple Levels of Homeostasis Tissue specificity Physiological condition alteration (aging, immune, hormone, and metabolic levels) Brain-system interaction Nutrition status Different types of exposure stress Variety on dosage, duration of the exposure Differential impact on individual cell/organs Certain levels of stochastic response Modified with permission from Heng, H. H., Bremer, S. W., Stevens, J. B., et al. (2009). Genetic and epigenetic heterogeneity in cancer: A genome-centric perspective. J Cell Physiol, 220(3), 538e547. https://doi.org/10.1002/jcp.21799.

224

4. CHROMOSOMAL CODING AND FUZZY INHERITANCE

reciprocal translocations contribute to, and are eliminated by, spontaneous abortion. As for the mechanism of chromosomal translocation, DNA double-strand breaks (DSBs) are thought to be prerequisites. Chromosomal inversion: The functional consequences of human chromosomal polymorphic inversions have received increased attention (Feuk et al., 2005; Thomas et al., 2008). Different mechanisms can generate chromosomal inversions: nonallelic homologous recombination between inverted repeats, DSB repair (e.g., nonhomologous end joining) and replication-based fork stalling and template switching (Escaramı´s et al., 2015). Although inversions have been linked to specific phenotype changes in various species, as well as evolutionary dynamics and advantages, detailed human studies are still lacking. A recent review highlighted the progress on this subject (Puig et al., 2015). Although inversions can lead to simple mutations causing disease, polymorphic inversions have now been linked to specific phenotypes and complex diseases. Furthermore, inversions can cause a predisposition for generating new chromosomal rearrangements, functioning as an important link to CIN. Stefansson linked inversion with gene expression and discussed the example of inversion on natural selection (Stefansson et al., 2005). Karyotype heterogeneity in somatic systems: Perhaps the past decade has witnessed the drastic attitude shift in molecular geneticists toward karyotype heterogeneity. Before the genome era, the relationship between germline and somatic cells of the same individual was considered rather simple. Because of high genetic fidelity, somatic cells must share identical genomes, despite the existing data that illustrated detectable chromosomal alterations from a majority of normal tissue types and cancers (Heng et al., 1988; Biesterfeld et al., 1994; Heng et al., 2004c, 2016a; Ye et al., 2018b). The idea that germline cells and somatic cells carry the same genetic information (which ignored the karyotype heterogeneity in somatic cells) was one of the reasons why researchers used DNA from blood cells for the mechanistic study of diseases involving different tissue types. Unlike inherited karyotype alterations, these genome alterations are generated after conception, often resulting in mosaicism. The most striking fact is that these somatic karyotypic variations can often be detected from normal tissues, sometimes with high frequencies. For example, w80% of human embryos possess aneuploid blastomeres (Vanneste et al., 2009). It was suggested that massive genome reorganization (genome chaos) might arise more often than previously thought in human gametogenesis and early embryogenesis (Pellestor et al., 2014). Aneuploidy in the liver is overwhelming. About 60% of hepatocytes in mice and 30%e90% of hepatocytes in humans display aneuploidy. Furthermore, close to 90% of mouse hepatocytes and nearly 50% of human hepatocytes are polyploid (Duncan et al., 2010; Duncan, 2013). Recent studies suggest that stochastic hepatic aneuploidy promotes the

4.3 FUZZY INHERITANCE

225

function of liver regeneration and postinjury restoration, illustrating the adaptive function of karyotype heterogeneity. Even in brain tissue, about 30% of mouse neuroblasts are aneuploid (Rehen et al., 2001), and an elevated rate of aneuploidy is observed in neuronal and nonneuronal cells in human brains (Rehen et al., 2005; Iourov et al., 2008a, 2008b). Importantly, the aneuploid neurons seem functional (Kingsbury et al., 2005), even though some disease conditions can further increase the frequencies of aneuploidy (e.g., ataxiaetelangiectasia brain displays 20%e50% of aneuploidy comparing with normal brain with 10%) (Iourov et al., 2009). The most striking contribution of karyotype heterogeneity to pathological processes is genome alterationemediated macroevolution of cancer. Not only are 90% of solid tumors aneuploid (Weaver and Cleveland, 2006; Compton, 2011), but various other structural and numerical alterations are overwhelming, especially within the punctuated cancer evolutionary phase when the evolutionary selection is extremely high (Aballah et al., 2013, 2014; Heng, 2015; Heng et al., 2004c, 2006a, 2013b, 2016: Ye et al., 2018a, b). For example, despite their different mechanisms of formation, defective mitotic figures (DMFs), sticky chromosomes, chromosome fragmentations (C-Frags), and chaotic genomes can all be linked to system stress response. In addition, these alterations all bring a high level of uncertainty to somatic evolution (for more details, see sections concerning NCCAs, genome chaos, and unclassified types of karyotype aberrations). Clearly, cancer has pushed karyotype heterogeneity to its maximum level to favor the emergence of new genomes. This point has been extensively illustrated within the current cancer genome sequence project (Heng, 2015, 2017a). It should be noted that there is a close relationship among different types of chromosomal aberrations, as well as other levels of genomic aberrations. For example, under the same general mechanism of genome instability, chromosomal structural and numerical changes are closely linked. This is the reason why measuring the level of NCCAs is more useful than focusing on any specific type of chromosome change (see section on NCCAs) (Heng et al., 1988, 2013b). In addition, many gene mutations can lead to chromosomal changes, and CIN can promote gene mutations and epigenetic alterations (Shen et al., 2005; Heng et al., 2006a; Ye et al., 2018b). It is known that patients with chromosomal instability syndromes display elevated levels of chromosomal aberrations, including complex chromosome rearrangements. These genome-level aberrations are mainly contributed to by impaired DNA DSB responses during meiosis. The dysfunction of many genes, including ATM, BLM, WRN, ATR, MRE11, NBS1, RAD51 DNA methyltransferase gene, and others, has been linked to chromosomal instability syndromes (Xu et al., 1999; Poot and Haaf, 2015).

226

4. CHROMOSOMAL CODING AND FUZZY INHERITANCE

Recently, the distribution of oncogenic and tumor-suppressing loci in the human genome was examined. The conclusion was that the human genome may be susceptible to CIN hitchhiking. In other words, genomic clustering of fitness-affecting mutations favors the evolution of chromosomal instability (Raynes and Weinreith, 2018). This observation illustrates the close relationship between gene and genome in terms of mechanisms of promoting fuzzy inheritance. Interestingly, as pointed out by the authors, “it is surprising that the human genome may be organized in a way that promotes CIN evolution,” as CIN is clearly linked to cancer evolution. However, knowing the importance of CIN (both reflecting and generating fuzzy inheritance) and cancer as a trade-off of the human dynamic genome, this is not surprising at all. As insightfully pointed out by Ferlini and Fini in their book review for genetic heterogeneity and human diseases (Heng, 2013a), “The human genome: better be dynamic” (Ferlini and Fini, 2015). b) Mechanisms of fuzzy inheritance at CNV level CNVs represent an important type of genomic variation which has attracted a great deal of interest since its discovery (Sebat et al., 2004; Iafrate et al., 2004; Bruder et al., 2008). Initial expectations were high; there were high hopes to use CNVs to fill the gap between gene mutations and disease phenotypes. Following a decade of extensive studies, a CNV map of the human genome has been established with high-quality data on healthy individuals of various ethnicities (Zarrei et al., 2015). One important message is that there is a high level of uncertainty in the majority of CNVs. In other words, the overall impact of CNVs is fuzzy (it is all about potential), and a specific CNV’s function is context-dependent. Interestingly, genes that are associated with disease are the least affected by CNVs. Furthermore, common CNVs cannot account for the vast majority of missing heritability as revealed by GWAS, as collectively, CNVs could only explain less than 5% of previously reported GWAS hits (Conrad et al., 2010). Similarly, during the Cancer Genome Project, somatic copy number alterations (SCNAs, which are different from germline CNVs) have been profiled for a large number of patient samples. In one example, nearly 5000 samples were analyzed. Among 140 recurrent focal SCNAs, 70% could not be linked to any known cancer genes (Zack et al., 2013). Finally, the idea of cytogenetically microscopically visible harmless CNVs (CG-CNVs) has been discussed (Liehr, 2016). CG-CNVs can be present as heterochromatic or even as euchromatic variants in healthy individuals. These “harmless” variants need to be further studied as they may be advantageous or disadvantageous under certain combinations or environmental conditions. Nevertheless, all these different types of CNVs contribute to the fuzziness of the genomic package.

4.3 FUZZY INHERITANCE

227

c) Mechanisms of fuzzy inheritance at gene level There is a large array of molecular mechanisms responsible for fuzzy inheritance at the gene level. In contrast to the initial rationale to sequence all genes and understand their defined functions, costly -omics approaches only further reduced the certainty of gene functions. Any mechanisms that can promote gene mutation, elevate the gene regulation dynamics, or enlarge the scale of network interaction, to name a few, should serve as specific mechanisms for fuzzy inheritance at the gene level. One general link is that high stresseinduced system dynamics, which favors pathway switching, results in gene-level variants (Stevens et al., 2011b; Horne et al., 2014; Heng, 2015; Heng et al., 2016a, b). Of course, well-known nucleotide instability, SNPs, and microsatellite instability can both reflect and contribute to gene-level fuzzy inheritance, as the result of defects in the nucleotide and base excision repair pathways (Pikor et al., 2013), and defects in the mismatch repair system (Vilar and Gruber, 2010). Too many gene mutations in normal tissues: Using methods of direct measurement, the human somatic mutation rate is shown to be almost two orders of magnitude higher than the germline mutation rate (Milholland et al., 2017). Based on the estimate that each human baby has six new deleterious mutations, each human somatic cell could have up to hundreds of deleterious mutations. Such estimations agree with the surprising observations that a high burden of mutations, higher than that of many tumors, can already be detected in the physiologically normal skin of four individuals (Martincorena et al., 2015). Furthermore, many of the mutations are typical driver mutations, which are already under strong positive selection in normal tissue. Together, they have raised some very bizarre but interesting questions: Why are there so many mutations in normal tissue (even more than in some cancer samples)? What is the positive contribution from these mutations for normal cellular function? The high rate of somatic mutations clearly will generate a high level of gene mutation heterogeneity, which likely will reduce germline-defined inheritance. In addition, gene mutation heterogeneity is high in germline cells. A recent single sperm sequencing study illustrated that there were 25e36 candidate point mutations in each sperm cell (Wang et al., 2012). The total gene mutation pool of an individual’s sperm is enormous. Moreover, knowing that different species have different mutation rates (Pennisi, 2018) and that different species have different genome contexts (karyotypes), some questions emerge: Should we consider the mutation rate, a genomic characteristic, as a key feature of the genome? Would the mutator phenotype proposed by Larry Loeb also be considered a feature of the genome (Loeb et al., 1974, 2008; Heng, 2015)?

228

4. CHROMOSOMAL CODING AND FUZZY INHERITANCE

Finally, based on the accepted importance of the stochasticity in genomics, it is interesting to ask whether there is any link between fuzzy inheritance, stochastic gene expression, and stochastic epigenetic regulation (Elowitz et al., 2002; Feinberg, 2014). Increased layer of complexity and uncertainty between gene and phenotype: With new advancements in genetics/genomics and systems biology, multiple layers of regulation systems between gene and phenotype have been discovered. Yes, an individual gene can code, but its potential coded message is within a certain degree of uncertainty, as many other factors can alter the resultant phenotype. For example, a gene’s fate can be influenced by genome topology-mediated transcriptional regulation, RNA splicing, noncoding RNAemediated transcription/translation modification, protein modification and degradation, and protein cellular distribution. Furthermore, no gene is an island: there are a gene’s modifier genes, proteineprotein interactions, functional module interactions, and whole network interactions, all within a dynamic environment. Finally, random chance is also a factor: how lucky is that one winning sperm that fertilizes the egg? Importantly, from DNA sequences to transcriptome, to proteome, to metabolome, and finally to phenotype, there are at least four information transitions. Although the correlation between each transition is good, the final information correlation between genes and phenotypes is drastically reduced because of the accumulated uncertainty from each transition. The above reduced correlation can quickly worsen, if high levels of environmental stress interfere with normal developmental process or normal physiological conditions where the well-controlled gene regulation works. Under severe inflammatory conditions (caused by infection, cellular injury, metabolic imbalance, cellular death, and various abnormal cellular growth), the cellular context will change the phenotypic landscape, altering the functions of the gene involved. For example, these pathways under normal conditions that are supposed to fight cancer cells will be hijacked by cancer cells to work on behalf of cancer cells, reflecting massive pathway switching. The heterogeneity within a given gene, or the spectrum of mutations, also contributes to such interactive heterogeneity (in terms of interaction with other targeting proteins). The p53 mutome, with an extraordinary variety of more than 2000 different mutant p53 proteins, represents a typical example. It was recently proposed that different p53 mutants can be categorized as continuous variables (even though they may not be independent of each other) (Stiewe and Haran, 2018). Of course, epigenetic marks can affect the heritable variation of genes. For example, heritable cytosine methylation increases rate of C / T point

4.3 FUZZY INHERITANCE

229

mutations and alters genomic instability, which impacts heterogeneity at both the gene and genome levels. No wonder it is much easier to link a gene to phenotype when the emphasis is placed on a single layer of variation, within one stage of analysis, within a defined subclass of molecules, with focused pathways, within a short time frame, and under a linear experimental condition. Increased appreciation on decreased molecular specificity: The high degree of molecular specificity has been the conceptual basis for understanding how biology works, from enzyme action to DNA replication, transcription, and translation. When discussing bio-interactions, specificity is the key. Surprisingly, however, large-scale studies have revealed the importance of “less specificity” in biology. For example, many enzymes can have hundreds to thousands of substrates, and the affinity between most of them is moderate. To achieve specificity, cellular topology as well as the time of interaction becomes the key. In other words, the structure specificity can be reduced so as to work with more potential partners. Meanwhile, the system’s specificity can be achieved by using help from space specificity and time-mediated specificity. For example, while the affinity of two molecules may not be the best match, the interaction between them will occur if they are in the right place at the right time. In a sense, time and space become preconditions for the heterogeneity of interaction (for more discussion, see Heng, 2015, “It is not noise, stupid! It is bio-complexity”). Equally importantly, such types of heterogeneity can increase from simpler life forms (e.g., viruses) to more complex ones (e.g., humans). In viruses, genetic certainty is much higher, and molecular modeling has better predictive capacity. In contrast, in multicellular organisms, layers of cellular differentiation and communications are introduced, and genome/gene/epigenetic heterogeneity becomes dominant, resulting in a reduced capacity to predict phenotypes from the genotype. In humans, many other nonebio-factors such as culture can impact behavior. Under certain conditions, some higher levels of system constraint can overpower even biological influence. Clearly, the more system heterogeneity involved, the more complexity researchers face. It is thus much harder to model and predict based on molecular specificity. “It is difficult to depict clear mechanisms underlying biological emergence and to give explanations based on causal mechanisms” (Kolodkin et al., 2013), especially when emergence becomes stronger because of increased heterogeneity (Heng et al., 2019). Every gene links to every trait? Traditional genetic research has focused on a gene’s function on few isolated traits. In fact, even for the action of naming genes, researchers often pick names according to its first characterized features (e.g., protein size) or initially described function or traits. It is rather common that the same gene has different names

230

4. CHROMOSOMAL CODING AND FUZZY INHERITANCE

reflecting the different features it is associated with. It is no surprise that the same gene or its products can be associated with more functions when different questions are asked about other parts of the same system (more information on the p53 story can be found in Chapter 8). One strong supporting piece of evidence that a given gene’s function is drastically decreasing comes from the discovery that most genes have an array of possible functions, and that even for a given function, there is a range of potential phenotypes. Across this book, the same storyline has been repeated from different perspectives. Essentially, there is increased uncertainty for many individual genes, and the evolutionary process cares less about individual genes and more about overall system complexity and dynamics. The common strategy is to increase the layers of heterogeneity-mediated complexity through different agents, where the individual agents are not that important but the emergent system under selection is. In a way, the conclusion that every gene can be linked to every trait is no longer a shock. In addition to the discussion in other chapters, the fact that large number of genes can be linked to metabolic functions is a strong supportive example. In fact, some students had long thought of this type of question before the -omics era. During the early 1990s, a classmate asked a major professor, and leading researcher of transcription factors, a simple question regarding transcriptional factor interaction. If, this student asked, proteins interact with many other proteins, could we say that every protein can interact with one another and that all proteins of the entire cell are connected? The professor and other students laughed at this seemingly naı¨ve question. But if this is true, what is the meaning behind studying a small group of protein interactions in our lab? It seems that the time is coming for the entire field to ask a similar question: if it is true that every gene involves every trait, then what? d) Mechanisms of fuzzy inheritance at epigenetic level As epigenetics is closely related to genetics, all biological aspects previously studied by genetics can now be reanalyzed by epigenetic platforms, which make epigenomics very popular in current researchdespecially when the gene mutation theory has so far failed to explain the basis for many cases of phenotypic plasticity and cannot identify key gene mutations for most common and complex diseases (Heng, 2017b). One of the key functions of epigenetics is a new layer of genomic regulation (rather than replacing the existing genomic mechanisms of gene regulation). For example, DNA methylation on specific sequences can irreversibly silence transcription, even in the presence of all of the factors required for their expression (Bestor et al., 2015). It is known that DNA methylation plays an essential role in irreversible promoter

4.3 FUZZY INHERITANCE

231

silencing (e.g., for monoallelic expression of imprinted genes, for silencing of transposons, and for X chromosome inactivation). The new technical platforms and efforts from some large-scale epigenetics programs have greatly contributed to epigenetic research (Birney et al., the ENCODE Project Consortium, 2007; Kundaje et al., 2015; Cheow et al., 2016; Uszczynska-Ratajczak et al., 2018). Interestingly, the topic of epigenome heterogeneity is now under extensive discussion, especially in the stem cell and cancer research fields (Hoey, 2010; Easwaran et al., 2014; Mazor et al., 2016). There are diverse molecular mechanisms contributing to epigenetic heterogeneity. In addition to chromatin compaction and remodeling, histone modifications, and RNA splicing, more mechanisms of epigenetic heterogeneity have been used to study the activity of endogenous retroviruses, function of noncoding RNA, and high-order structure of the chromatin/chromosome. As we mentioned previously, the phenomena of many classical stories of phenotypic plasticity in plants and animals are now explained by the mechanisms of epigenetic heterogeneity. On the contrary, epigenetic heterogeneity also contributes to cancer evolution (Heng, 2015, 2017a). One related issue is the observation that DNA methylation in pluripotent stem cells is much more dynamic and errorprone when compared with differentiated cells (Ooi et al., 2010). Such epigenetic heterogeneity has potential negative implications for applying stem cell technology into clinics. One debated issue concerns the function of noncoding RNA. It was claimed that the majority of these novel transcripts are functional (e.g., 80% exhibit biochemical functions) (The ENCODE Project Consortium, 2012); others disagree given that the vast majority of them are present at very low levels and demand clear functional validation (Palazzo and Lee, 2015). While increased studies have illustrated that at least a substantial fraction of transcripts are functionally enhancing organismal fitness (Uszczynska-Ratajczak et al., 2018), the function of the larger portion of lncRNAs remains elusive. Some important questions emerge: what is the overall function of the massive level of increased “noise” for a higher level of the system? How important is such lncRNA heterogeneity for evolution when compared with gene-level heterogeneity? Continued emphasis on individual lncRNAs will not address these questions. Furthermore, studying each lncRNA in a linear isolated experimental system (following the traditional approach of characterizing each individual gene) will generate many publications that will likely exhibit limited applications, as most lncRNAs (like most genes) do not function in isolation. Interestingly, there are studies that search for the relationship between individual cells and a population. It was proposed, for example, that some features of DNA methylation reflect the emergent property of a large number of

232

4. CHROMOSOMAL CODING AND FUZZY INHERITANCE

individual cells whose respective feature of DNA methylation can be highly variable (Flunkert et al., 2018). e) Mechanisms of fuzzy inheritance for mitochondrion Mitochondria display high levels of fuzzy inheritance because of variable factors involved from multiple levels/stages between gene mutations and phenotype. Furthermore, altered mitochondria can contribute to almost all complex diseases because of its key function of providing cellular energy, one of the essential features of a biosystem. A few points will be mentioned to outline the complicated relationship between mt-genome and its diverse phenotypes to briefly illustrate the dynamic feature of mt-genomic systems. More detailed information is described elsewhere (Wallace, 2005; Wallace and Chalkia, 2013). mtDNA displays an exceptionally high mutation rate (100e1000-fold higher than nuclear DNA), which leads to a very high gene sequence evolution rate (10e20 times higher than comparable nuclear DNA genes). In addition, different classes of variants (recent deleterious mutations, ancient adaptive mtDNA mutations, and somatic mtDNA mutations) are clinically (phenotype) relevant. The vast majority of mitochondrial genomes in all adults are mutated. There is a high level of mtDNA reorganization reflected in fusione fission cycles. In contrast, a similar phenomenon (fusionefission cycle following genome chaos) in nuclear somatic genomes is less frequently observed (Kowald and Kirkwood, 2011). Interestingly, the phenotype of fusionefission cycles can be both beneficial and harmful depending on its context. For example, a model simulation showed that a higher frequency of mitochondrial fusionefission can either provide a faster clearance of mutant mtDNA or quicken the accumulation of mutant mtDNA (Tam et al., 2015). Because both altered nuclear and mitochondrial genes can contribute to mitochondrial diseases, a large number of nDNA-coded mitochondrial genes (>200) can be associated with diseases when mutated. The interaction between mitochondrion DNA and nuclear DNA can further complicate such a relationship. For example, mild nuclear mitochondrial gene variants can become a clinical issue when combined with an incompatible mtDNA. Highly diverse mt-mutations can be found in one individual (each mutation at the rate of 1%e2% of all mitochondrial genomes) (Smigrodzki and Khan, 2005). When a specific mtDNA mutation reaches a certain level within a cell population, the percentage of mutant mtDNAs can drift by a mechanism called replicative segregation. Importantly, an intracellular mixture of mutant and normal mtDNAs, called heteroplasmy, can have dramatic impacts on a patient’s phenotype, leading to different types of diseases. As different organs display different sensitivities to partial

4.3 FUZZY INHERITANCE

233

bioenergetic defects (in order: the brain, heart, muscle, kidney, and endocrine systems) (Wallace, 2005), the same mitochondrial deficiencies can result in organ-specific symptoms. Recently discovered mitochondrial nanotunnels, thin doublemembrane protrusions that connect the matrices of nonadjacent mitochondria, are highly interesting. Such direct communication among mitochondria through exchanging proteins, mRNAs, and fusionefission promotion can further increase mitochondrial heterogeneity (Vincent et al., 2017). f) Mechanisms of fuzzy inheritance of other interesting observations In this category, observations that involve heterogeneity at different levels of the genomic/epigenetic organization, and some new types of fuzzy inheritance, are grouped together. The main purpose of this section is to emphasize that many surprising biological phenomena can be explained by fuzzy inheritance. By briefly mentioning some of these unlikely sources, which can potentially influence genomic heterogeneitymediated phenotypic plasticity, we hope that readers can start searching for their own newly explainable lists. 1. Increased fuzziness involves the maintenance/modification of genomic materials, the process of decoding genomic information, and the complicated genotypeephenotype relationship TEs can alter chromosomal coding, leading to the reconfiguration of gene expression networks both in germline and somatic cells (Heng, 2015). It can also generate chromosomal instability and increase the level of genomic heterogeneity. Traditionally, the genetic effect of TEs is focused on the interruption of specific genes at its insertion site. According to system inheritance, however, altering the chromosomal code and generating further heterogeneity perhaps are more important, which might explain some phenotypes caused by TEs that cannot be explained by specific integration sites. In addition to altering genome coding and specific gene coding, TEs also contribute to genome size, gene content, gene order, centromere function, and epigenetic regulation (Bennetzen and Wang, 2014). Interestingly, by comparison sequences, it is clear that TEs can be in and out of a given genome during a million-year time frame, suggesting its heterogeneous nature. Increased studies have illustrated that TEs can be reactivated under certain environmental conditions, such as stress. Stress has been shown to induce TE transcription or integration or redirect TE integration to alternative genome sites (Levin and Moran, 2011). The dynamic nature of TEs is now linked to human physiological and disease conditions. It has

234

4. CHROMOSOMAL CODING AND FUZZY INHERITANCE

been proposed that the activation of TEs because of genome chaos serves as a new mechanism of cancer evolution (Wilkins, 2010). Significantly, TEs are expressed and active in the brain, illustrating the highly dynamic nature of neuronal genomes. These new findings on TE expression and function in the central nervous system could have major implications for understanding the brain’s neuroplasticity (Reilly et al., 2013). Indeed, it has been suggested that retrotransposable elements may disrupt the brain’s genome (Poot, 2017). Recently, a new type of mobile genetic element has been proposed to describe circulating nucleic acids or degrade cell-free chromatin derived from the cells that die in the body every day (Mittra et al., 2015; Mittra et al., 2017). These DNA/chromatin fragments can freely enter into healthy cells, integrate into their genomes, and induce DNA DSBs, apoptosis, and inflammation within. These genomic fragments are proposed to be linked to drug resistance, aging, and cancer evolution. Interestingly, neutralizing this cell-free chromatin can reduce chemotherapy toxicity. While further research is needed to characterize these fragments (e.g., to study their integration pattern and stability and their relationship with other genome alterations such as genome chaos), it nevertheless represents yet another form of fuzzy inheritance, which will contribute to the genomic heterogeneity of other cells (healthy or cancerous alike). In fact, the mobile chromatin fragments also can be generated from live cells. During our genome chaos research, it became clear that in addition to large fragments rejoining, DNA transfer during crises or under specific conditions is rather common. The key again is genome instability. A few cell lines have been investigated. One cell line displays the sticky chromosome phenotype, where all chromosomes are connected by chromatin fibers. It is known that various conditions, including telomere shorting, chromosomal condensation defects, and possibly methylation status, can lead to sticky chromosomes. When live cells are examined under a fluorescent microscope, following a low concentration of DAPI staining, the transferring of DNA among cells is very high in this cell population. This observation also links fuzzy inheritance to genome instability (Heng et al., 2013a, b; Ye et al., unpublished observations). A genome-wide survey of the contribution of short tandem repeats (STRs) to gene expression in humans was recently completed (Gymrek et al., 2016). It was found that expressive STRs contribute 10%e15% of cis heritability mediated by all common variants. Note that STRs represent one of the most polymorphic and abundant repeat classes. Tandem repeats are highly unstable as they

4.3 FUZZY INHERITANCE

235

frequently induce DNA replication slippage or intragenic recombination, resulting in the addition or removal of repeated units. Such instability contributes fuzziness within regulatory sequences that are linked to functional benefits. For example, in yeast, these variable elements in promoters contribute to the evolutionary tuning of gene expression by modifying local chromatin structure (Vinces et al., 2009). In general, up to 10%e20% of eukaryotic genes and promoters contain an unstable repeat tract, the heterogeneity in repeats often has interesting phenotypic consequences. In addition to their role in Huntington’s disease, variable repeats also confer useful phenotypic variability, which facilitate organismal evolvability, including cell surface variability, plasticity in skeletal morphology, and circadian rhythm tuning (Gemayel et al., 2010). A recent exciting example is that a gene’s function can be defined by the heterogeneity of repeats. The yeast polyubiquitin gene is a unique ORF composed solely of tandem repeats. It is showed that natural variation in repeat numbers may optimize the organismal survival, as the optimal number of repeats varies under different types of stress. Moreover, the number of repeats is evolutionarily unstable within and between yeast species, which links to the cell’s capacity to respond to sudden environmental perturbations (Gemayel et al., 2017). Transcriptional regulation impacts the phenotype. Traditionally, the specificity of DNA-binding sites is a key for how transcription factors work. -Omics studies, however, revealed that binding specificity is much lower than previously believed. In fact, transcription factor’s consensus binding sites and genomic occupancy appear rather heterogeneous in many cases. Increased studies have revealed that many features beyond the DNA sequence, such as DNA shape, methylation, and posttranslational modifications, can increase specificity. Furthermore, partnering between/among transcription factors and changing chromatin accessibility can enhance specificity (Todeschini et al., 2014). Together, these non-DNA factors can contribute to the expression of heterogeneity. Recently, a bivalent master switch model was proposed to support and extend the bivalency model posited in mammalian cells (Kang et al., 2017). In this model, local competition between acetylation and deacetylation may play a critical role in switching between active and silent states of bivalent protein complexes during development. In the future, many more environmental factors will be linked to the capability to switch the balance between “active” and “silent” states created by many different gene regulatory mechanisms.

236

4. CHROMOSOMAL CODING AND FUZZY INHERITANCE

Exosomes, vesicles of endocytic origin, are released by many cells and found in all body fluids. Recently, they began representing a hot button topic for studying cellular communication and regulation. The demonstration that the RNAs of many exosomes, both mRNA and microRNA, are functional is of importance (transferred exosomal mRNA can be translated after entering another cell) (Valadi et al., 2007). New research has linked exosomes to the maintenance of cellular homeostasis by eliminating harmful cytoplasmic DNA from cells (Takahashi et al., 2017) and transferring genetic material to recipient cells such as the full mitochondrial genome (Sansone et al., 2017). As it is hard to quantitatively control this transfer process, it will increase genomic heterogeneity among recipient cells. A related example is that exosomes in human semen carry a distinctive collection of small noncoding RNAs (Vojtech et al., 2014). It is reasonable to link these RNAs to functional heterogeneity. Furthermore, sperm carry a large RNA load (Miller et al., 2005), and different types of potential functions have been discussed (Hosken and Hodgson, 2014). We propose that highly dynamic RNA profiles, including massive RNA fragments, likely involve fuzzy inheritance, simply by increasing the overall dynamics. Interestingly, increased heterogeneity potential can also be observed from oocyte differentiation, albeit using a different mechanism. Recent studies suggested that the formation of functional oocytes requires a “nursing” process in which 80% of fetal germ cells sacrifice themselves by donating their cytoplasmic contents through germline cysts (Ikami et al., 2017). Again, further studies likely will reveal that the organelle transport process during oocyte differentiation will also transport other types of genomic materials. Perhaps one of the best examples of the mechanism of fuzzy inheritance is the Hsp90 story. Hsp90 is a highly conserved ATPdependent molecular chaperone, which is required for the activation and stabilization of more than 200 client proteins and is essential for many cell signaling pathways (Kravats et al., 2018). The most interesting observations came from phenotypic studies. In the case of flies (Drosophila), Hsp90 buffers genetic variation in morphogenetic pathways. In the case of plants (Arabidopsis), reducing Hsp90 function generates a spectrum of morphological phenotypes (Queitsch et al., 2002). Because of such important functions, Hsp90 was considered a capacitor of phenotypic variation. Different mechanisms have been proposed to explain how Hsp90 mutations induce a wide range of phenotypic abnormalities. One interpretation is that Hsp90 can increase the sensitivity of

4.3 FUZZY INHERITANCE

237

different developmental pathways to hidden genetic variability, and chaperone machinery may be a buffering mechanism of phenotypic plasticity; another interpretation highlights that Hsp90 can either prevent phenotypic variation by suppressing the mutagenic activity of transposons (Specchia et al., 2010) or by increasing phenotypic variation through epigenetic mechanisms, such as altering the chromatin state (Sollars et al., 2003). As increased complicated functions of Hsp90 are revealed (it promotes both folding and degradation, in addition to regulating expression of other quality control components, and interaction with steroid receptors) (Grad and Picard, 2007; Theodoraki and Caplan, 2012), more specific mechanisms have been proposed. An alternative explanation is offered with the understanding of fuzzy inheritance. Despite its diverse mechanisms, the common evolutionary function of buffering genomic variants is to reduce the impact from increased system heterogeneity, either from internal sources or external environmental stress. Such a function can be well-understood in the context of how fuzzy inheritance works. Specifically, there are many layers of information transfer between the genotype and phenotype. Altered genomic codes can produce altered proteins. If, however, these altered proteins can either be modified to a “normal” state or be degraded by Hsp90, the altered genomic information will be hidden (they still exist within the genome, but cannot be distinguishable by phenotype under the current conditions). When the function of Hsp90 is compromised, especially when under stress, however, some altered proteins will have a chance to contribute to the phenotype, the overall system is less stable. These usually “hidden” phenotypes will become visible. In more general terms, Hsp90 can buffer the effect of fuzzy inheritance (from individual genes and genomic interactions). By increasing the tolerance for many genomic variants or by reducing the high level of molecular specificity required by specific mechanisms or by applying various epigenetic regulations, the system actually increases the capability to handle genomic heterogeneity. 2. Evolutionary processes of adaptation and survival under crisis: Among all these listed individual mechanisms, the common link is that the successes of evolutionary processes require increased heterogeneity from diverse genomic components, the agents of emergent properties. Specifically, either for the case of system adaptation under a normal range of stress or for system survival by forming new genomes in crises, heterogeneity ensures the success of micro- and macroevolution.

238

4. CHROMOSOMAL CODING AND FUZZY INHERITANCE

For example, to repair an altered genomic system under normal physiological conditions, a method of precise repair is ideal. However, under high levels of stress where system adaptation is necessary, less precise repair systems will be favored. Imprecise repair systems are not a good choice under normal situations, but increased variants that can be produced will have a better chance of adaptation in crisis situations. Under crisis conditions, when gene- and epigene-based routine mechanisms are no longer sufficient to rescue the system, as evidenced by the fact that the majority of individuals will be eliminated, genome chaos can quickly create new genome systems by reorganizing the old ones. Only such new systems can have the potential to survive and succeed under crises. The best example is the emergence of cancer drug resistance populations following highdosage drug treatment (for more, see the genome chaos section).

4.3.6 Potential Significance and Implications of Fuzzy Inheritance Although it is too early to discuss the issue of fuzzy inheritance fully because of its infancy, its potential implications are briefly mentioned to trigger readers’ interests for their own further searches. 1. Current genomics needs a solid conceptual basis of inheritance The establishment of a solid conceptual basis for current genomics is of ultimate importance, especially when there are overwhelming observations that challenge the traditional definition of inheritance. A new and better conceptual framework will be essential for future genomics following over a century of gene-centric genetics, in which the gene functions as an independent information unit. According to many scientists and science historians, one of the major contributions of Mendel’s work was to introduce the precision of mathematics into genetic analysis. It turns out that such precision does not reflect the common biological reality (see Chapters 1 and 2). Now, the prediction power of precise parts based on fixed mathematic combinations is limited. Obviously, many mathematical models of genetics need to be adjusted accordingly. Fuzzy inheritance can also better explain how “phenotype ¼ genotype þ environments.” We used to think that the genotype codes with certainty, but the environment can reduce its impact (both genetic penetrance and expressivity). Now, the genotype itself is highly variable for a majority of traits, environments mainly function as a phenotypic selector, even though it actively anticipates interactions with genotype in an adaptive manner. Nevertheless,

4.3 FUZZY INHERITANCE

239

environments can only select phenotype plasticity within the range defied by the genotype. If the environmental stress is too high to be tolerated by a system (individual cell or organism), death will occur (or new systems could emerge, albeit at extremely low frequencies). In other words, high stress will lead to either death or the emergence of a new system with a new phenotype (which is located outside of the previous system’s encoded potential phenotype range) (See more discussions from Chapter 6). 2. The definition of inheritance impacts evolutionary theory “How inheritance works” can greatly impact evolutionary theory as the concept of discretely inheritable variants represents a key feature of Darwinian evolution. For example, Mendelian genetics functions as a keystone for the modern synthesis of natural selection. It also distinguishes among scholars under the same umbrella of genetics and evolution theory: e.g., the viewpoint of how individual alleles affect the phenotype of organisms separated Sewall Wright from R.A. Fisher. In addition to the genetic drift model, Wright was convinced that most noteworthy evolutionarily phenotypic traits were the result of multiple gene interactions, rather than the single genes rising and falling in frequency as Fisher’s models of evolution illustrated (MacNeill, 2011). Now, knowing that inheritance is fuzzy and that required population variants can be achieved without altering the same gene set by accumulating mutations within populations, current evolutionary theories will likely be impacted. 3. It is important to understand the mechanism of genomic heterogeneity Genomic heterogeneity represents a major challenge for current disease research. The illustration of how multiple layers of genomic/epigenomic fuzzy inheritance contribute to heterogeneity can improve different strategies of fighting cancer. In addition, it can explain many phenomena such as phenotype plasticity, the relationship between layers of different types of heterogeneity, and different patterns of cancer evolution. Increased studies have provided supportive evidence of the contributions of multiple types of fuzzy inheritance to heterogeneity at different levels and processes, including the gene, the epigene, the transcriptome, the karyotype, cellular growth, and death, even though the precise name of fuzzy inheritance was not spelled out (Cannella et al., 2009; Lawrenson, 2010; Creekmore et al., 2011; Duesberg and McCormack, 2013; Stevens et al., 2013a, b, 2014; Lodato et al., 2015; Zhu et al., 2015; Bakker et al., 2016; Cerulus et al., 2016). In fact, a large amount of literature belonging to this category was examined using the lens of bio-heterogeneity.

240

4. CHROMOSOMAL CODING AND FUZZY INHERITANCE

It should be pointed out that even though many disease processes are contributed to by increased fuzzy inheritance, the overall impact of multiple fuzzy inheritance is beneficial to life, as the same gene can have different functions in different genomes, different tissues, or different developmental stages. 4. Fuzzy inheritance favors system robustness/resilience and new system emergence in crisis conditions The high degree of multiple system diversity contributed to by fuzzy inheritance is of importance both for cellular adaptation and new system emergence during crisis. For example, all layers of fuzziness contribute to cellular adaptation, and karyotype fuzziness is essential for the initial stage of speciation (see Chapters 5 and 6). Interestingly, lower levels of fuzziness can provide alternative options for ensuring higher level function in highly dynamic environments, a strategy of “imperfect parts for the best possible survival and adaptation.” The interaction between fuzzy inheritance and environments also explains the mechanism of the emergence of outliers under high stress conditions (often within the macrocellular evolutionary phase) (Heng, 2015). It is necessary to mention that fuzzy inheritance might have played the most crucial role in biological world that we have ever realized, from individual to population survival, from one species’ fate to the entire collection of all living beings, and from the past to the future. The following examples/questions deserve much more attention for future research: a) Fuzzy inheritance ensures cancer cell survival: The fusion of cancer cells with macrophages contributes to increased fuzziness, leading to cells with increased metastatic behavior, which correlates with disease stage and predict overall survival (Gast et al., 2018). b) Each individual cancer cell may be eliminated (because of individual death), but the population connected by fuzzy inheritance can be immortal. Life continues on with different cells. c) One of the most interesting ideas is that different species are in fact connected by fuzzy inheritance. It is specifically true when considering massive genome reorganization during the mass extinction periods: when only new species can survive, the emergence of new systems is more important than any previously existing systems. Life goes on, just with different types. 5. Fuzzy inheritance can provide insight for many other important issues Equally important, fuzzy inheritance can offer explanations to many hotly debated topics, including the missing heritability

4.3 FUZZY INHERITANCE

241

(Chapter 1 and 2; Heng, 2009, 2010, 2015; Heng et al., 2011a). Because there are so many genomic factors, including genes, which can contribute to genomic uncertainty, it will be challenging to identify isolated elements, especially when genome-level variants can be dominant during macrocellular evolution. In addition, fuzzy inheritance questions the practice of using the isogeneic concept for investigating many biological systems. Often, a cellular population is assumed to be isogeneic population when targeting a specific molecular element so that the gained phenotype can be linked to the function of the specific element under investigation. Now, knowing a population is hardly isogeneic, the attitude toward our experimental designs needs to be changed. Similarly, well-known cancer gene mutations are often selected from an array of lesser-known mutations for personalized treatment, based on the assumption that the lesser-known gene mutations are less important and that the cell population can be treated as isogeneic. Another explanation is that targeting well-known cancer gene mutations is what we know we can do right now. We cannot wait as we are running against time. Fuzzy inheritance differs from “soft inheritance” (acquired phenotypic changes that can be passed on to offspring) and “low heritability” (as both high and low heritability can be fuzzy). Further research is needed to illustrate their relationships as there are certain overlapping features among them. 6. Re-interpretation of some classical experiments using the concept of fuzzy inheritance Now, with the appreciation of fuzzy inheritance, one cannot help but question Mendel’s practices in the 1860s, as his data had amply demonstrated such fuzziness (see Chapter 1). Interestingly, Mendel is not the only one who observed the spectrum of phenotypic variants related to a specific fixed genotype. Using homozygous princess beans where two members of each pair of genes are identical, Johannsen did observe bean size variations (Johannsen, 1909). The beans produced by each plant are somewhat variable in size, and when arranged according to sizes they give the normal curve of probability. All the beans from any one plant and all of the descendants of this plant have the same distribution, no matter whether large beans are continually selected, or small beans are picked out in each generation. The offspring always give the same groups of beans. Johannsen detected nine races of beans in those he examined. He interpreted his results to mean that the differences in size of the beans from a given plant are due to its environment in the widest sense. Morgan, 1926

242

4. CHROMOSOMAL CODING AND FUZZY INHERITANCE

It is important to point out that, first, inheritance is obvious as the range of the size and its distribution patterns was inherited and, second, within the inherited range, there were size variations. These criteria are what we used to describe fuzzy inheritance. Interestingly, when discussing the meaning of bio-information, the feature of fuzziness of bio-information is noticed in DNA sequence analyses. For example, many DNA sequences display fuzzy meaning, and a large portion of genomes (up to 90% in some plants) belong to this category (Koonin, 2016). Importantly, the sequences with fuzzy meaning can be recruited for various functions to achieve more specific outcomes. Such a transition from “potential” to “reality” is achieved by environmental selection, according to the concept of fuzzy inheritance. Of course, the fuzzy information domains include, but are not limited to, DNA sequences.

4.4 OVERLOOKED GENOME VARIATIONS Given the importance of system inheritance and fuzzy inheritance, the next task is to systematically study and record the diverse types of genome variations. Although they have been overlooked, they reflect the heterogeneity of system inheritance. Moreover, it is necessary to effectively evaluate their value both in basic research and translational research. Only when the basic characterization of various chromosomal/ nuclear aberrations is achieved, the field of cytogenetics and cytogenomics can move forward to address additional questions, such as: What is the relationship between these nonclonal chromosomal/nuclear aberrations and classical chromosomal aberrations? How can we predict phenotype based on the degree of NCCA mosaicism? How can we develop a combinational biomarker based on different types of genomelevel aberrations?

4.4.1 Generally Accepted Chromosomal Variations While many numerical and structural chromosomal variations have been observed from plants and animals for over a century, most human chromosomal variations were identified and linked to human diseases during the golden age of cytogenetics (from 1950s to 1980s). Table 4.2 lists some of the most common types of chromosomal variations frequently studied in human cytogenetics. As discussed in Chapter 3, one key limitation of current cytogenetics is that the primary focus for identification of chromosomal aberrations has been on the clonal aberrations. The frequencies of chromosome

4.4 OVERLOOKED GENOME VARIATIONS

243

TABLE 4.2 Examples of Commonly Studied Chromosome Variations in Human. Numerical Chromosome Abnormalities Sex chromosome abnormalities Autosomal abnormalities (e.g., trisomy, the most common aneuploid) Uniparental disomy Polyploid Structural Chromosome Abnormalities Deletions (both interstitial and terminal); duplications; fission; insertions Dicentric chromosomes Homogeneously staining regions Inversion Isochromosomes Marker chromosomes Quadruplications Ring chromosomes Telomeric associations Translocations (reciprocal andrnonreciprocal; Robertsonian translocations; jumping translocations) Chromosome breakage Sister chromatid exchanges (SCE) Normal Chromosome Variations (Chromosome Polymorphism) Variation in heterochromatic regions, satellite (variation in length, in number and in position) Fragile sites Small supernumerary marker chromosomes (sSMCs)

aberrations of NCCAs (such as breakage) are only used for some chromosome instability syndromes. Recently, this situation has started to change (Heng et al., 2006b, 2013b; Niederwieser et al., 2016; Rangel et al., 2017; Ramos et al., 2018; Frias et al., 2019).

4.4.2 Ignored and Unclassified Chromosomal/Nuclear Aberrations It is rather surprising to know that there are many unclassified types of chromosomal or nuclear variations, which can be observed from most research and clinical samples. However, they have been largely ignored because of the lack of understanding and unacceptance by current clinical regulations. Over the past few decades, we have continuously identified new types of chromosomal aberrations and studied their mechanisms as well as their biological significance. The following sections highlight some examples.

244

4. CHROMOSOMAL CODING AND FUZZY INHERITANCE

4.4.2.1 Free Chromatin Cytogenetic slides prepared through a standard protocol (chromosome harvesting by hypotonic treatment followed by fixation and air-drying) display standard mitotic figures and interphase nuclei along with spindle- or rope-shaped structures (Fig. 4.7), which are seemingly “invisible” for most researchers. When these spindle or rope-like structures are pointed out to researchers, they are mostly considered nonchromatin contamination or artifacts. Certainly, it is not useful at all. Feulgen staining, 3H-labeling, and DAPI staining were used to illustrate that these structures are DNA-containing fibers rather than nonchromatin materials generated during slide preparation. NOR (nuclear organizer region) staining and electron microscopy analysis showed that these structures were chromatin and there is no nuclear envelope surrounding these 30 nm chromatin fibers. “Free chromatin” was thus used to describe these released chromatin fibers (Heng and Chen, 1985). Furthermore, free chromatin can be generated from different plant and animal cells and certain reagents can significantly increase their frequency (Heng et al., 1988). Interestingly, it was realized that free chromatin should be used for the high-resolution gene mapping (Heng and Zhao, 1987). This idea was finally put into practice in 1990 in Dr. Lap-Chee Tsui’s laboratory in Toronto. The rationale was as follows: when threedimensional interphase chromatin is released into a two-dimensional linear fiber, the physical information among genes is preserved (at least within a short distance along the linear chromatin fiber). If the released chromatin fiber can be used as the hybridization targets for FISH

FIGURE 4.7 Low power view of mitotic figures, interphase nuclei, and free chromatin (indicated by arrows) from a cancer patient’s chromosome slide prepared by conventional cytogenetic methods from short-term blood culture.

4.4 OVERLOOKED GENOME VARIATIONS

245

detection of different DNA markers, the physical mapping information can be “read out” directly from the FISH signals along chromatin fiber. Indeed, after 1 year’s worth of hard work, it was revealed that nice chromatin fibers can be released and can hybridize with FISH probes. The released chromatin fibers and the results of FISH mapping are illustrated in Figs. 4.8 and 4.9. We presented our chromatin fiber FISH 1991 at the 8th International Conference of Human Genetics, which generated a wave of excitement, and initiated a new experimental system of high-resolution fiber FISH (Heng et al., 1991, 1992). Further modifications immediately followed to simplify the procedure for preparing released fibers and to improve the resolution including using DNA “halo” preparations (Wiegant et al., 1992). Additional, straightforward releasing methods were soon introduced by different groups based on the use of alkaline solution, nonionic detergents, SDS, and with or without mechanical force to stretch chromatin or DNA on glass slides (Parra and Windle, 1993; Fidlerova et al., 1994; Haaf and Ward, 1994; Heiskanen et al., 1994; Bensimon et al., 1994). With its power of direct visualization, fiber FISH has contributed to characterization of telomere and centromere structure (Heng et al., 1996),

FIGURE 4.8

Detection of free chromatin structures by DNA-binding dyes (x900). (A) Two interphase nuclei and two spindle-like free chromatins after staining with DAPI (without any treatment for release chromatin). (B) Elongated free chromatins (released by a special buffer) stained with Giemsa. Reused from Heng et al., (1992). High-resolution mapping of mammalian genes by in situ hybridization to free chromatin. Proc Natl Acad Sci U S A, 89(20), 9509e9513. https://doi.org/10.1073/pnas.89.20.9509.

246

4. CHROMOSOMAL CODING AND FUZZY INHERITANCE

FIGURE 4.9 Visualization of free chromatin by the fluorescence in situ hybridization (FISH) (x700). (A) The chromosome 7especific somatic hybrid cell line 4AF/102 metaphase spread stained with DAPI to show the hamster and human chromosomes. (B) DAPI staining of free chromatin from the same cell line to show the total hamster and human DNA content. (C) The same metaphase preparation as in a with FISH detection showing hybridization of human chromosome 7. (D) Human chromosome 7 visualized as a long fiber in the free chromatin after FISH and FITC detection. Reused from Heng et al., (1992). High-resolution mapping of mammalian genes by in situ hybridization to free chromatin. Proc Natl Acad Sci U S A, 89(20), 9509e9513. https://doi.org/10.1073/pnas.89.20.9509.

meiotic chromosome loops, and has validated the concept of the CNV (Iafrate et al., 2004). Now, fiber FISH still plays an effective role to reveal complex genomic rearrangement (Ye and Heng, 2017). Despite its important contribution to physical mapping and Human Genome Project (Heng et al., 1992), the biological significance of free chromatin itself is not clear even today because of the lack of follow-up studies. These structures can be related to hypotonic conditions, cell

4.4 OVERLOOKED GENOME VARIATIONS

247

cycle stages, and drug treatment. For example, some cell lines display higher free chromatin frequencies, and there is a clear doseeresponse relationship between free chromatin and many chemotherapeutics. It is possible that the elevated frequencies are related to nuclear envelope instability and overall genome instability. It has also been suggested that free chromatin could be used to monitor toxicity [Heng et al., 1988; Heng and Zhao, 1987; Heng and Shi, 1997]. 4.4.2.2 Defective Mitotic Figures DMFs (Fig. 4.10) were accidently discovered during the development of high-resolution banding methods for frog chromosomes (Heng et al., 1987a). DMFs were initially named “uncompleted-packing-mitotic figures,” based on the coexistence of condensed chromosomes and undercondensed chromatin fibers within one mitotic figure (Heng and Chen, 1985; Heng et al., 1987b, 1988). DMFs represent an ideal window to study the pattern of high-order chromosomal packaging because of the transitional structures that connect the condensed chromosomal regions and undercondensed regions. Unfortunately, there have been very limited publications on this issue since our initial report over 30 years ago. A major challenge was the difficulty to reconcile the generally accepted idea that there is a scaffold within metaphase chromosomes and how DMFs form if there is a scaffold within chromosomes (Laemmli UK, pers. Commun; Gall J, pers. Commun). Another challenge was the inability to induce DMFs in high frequencies, which delayed the effort to study the mechanism of DMF formation. Nevertheless, it was demonstrated that the possible molecular mechanism of DMFs is a combination of a condensation defect and a G2-M checkpoint defect (which solved an early mystery why the use of inhibitors of topoisomerase II to induce DMFs only work on certain cell lines). Moreover, DMFs are commonly detected in various cancer cell lines and patient samples, strongly suggesting that defects in the chromosomal condensation process contribute to cancer evolution (Heng, unpublished data). The issue of condensation defects has been addressed by two additional groups. For example, treating cells with 5-Aza-dC can induce heterochromatin undercondensation (Haaf and Schmid, 1989). A phenomenon similar to DMFs has also been reported in replication delay and condensation delay (Smith et al., 2001). Depending on the specific phase of the cell cycle, many factors and mechanisms, including replication errors, can contribute to DMF formation. DMFs also serve as a platform to study higher-order chromosome structure. Transition regions between a folded chromosome and unfolded chromatin fiber are keys to understanding the details of final folding (Fig. 4.15). DMFs also represent a good phenotype for studying the

248

4. CHROMOSOMAL CODING AND FUZZY INHERITANCE

(A)

(B)

(C)

(D)

(F)

(E)

(G)

FIGURE 4.10

Examples of various types of defective mitotic figures (DMFs). (AeE) Typical DMFs detected from treated blood cultures of human (AeD) and frog (E), respectively. (A, C) Spectral karyotyping (SKY) images. (B, D) Corresponding reverse DAPI images. The arrows indicate the uncondensed chromatin regions. (E) DAPI image of a frog DMF, where both the condensed chromosomes and uncondensed chromatin fiber are clearly illustrated. In these DMFs, the condensed mitotic chromosomes are distributed at one end, which is the main form of DMF. (F) DMF with an atypical pattern of distribution but the differential condensation among different chromosomes is evident. One normal, condensed chromosome is indicated by an arrow. (G) New type of DMF displaying diffused chromosomes detected in a patient with chronic fatigue syndrome. As indicated by the arrows, some chromosomes seem to be decondensed. For all types of DMFs, the common key feature is the differential condensation among chromosomes. Reused from Heng, H.H., Liu, G., Stevens, J.B., et al. (2013b). Karyotype heterogeneity and unclassified chromosomal abnormalities. Cytogenetic and Genome Research, 139(3), 144e157. https://doi.org/10.1159/000348682.

condensation process and can serve as an index to monitor the error of final condensation in human diseases. Accordingly, following interesting questions now can be addressed by studying DMFs. Is there a condensation order among chromosomes (similar to a replication order)? Using specific cancer cell lines, it seems that some chromosomes have higher than expected frequencies of DMFs, indicating that these chromosomes may condense later than others. In a tested prostate cancer cell line, a single copy of chromosome 1 (uncondensed) is often detected among a few condensed chromosomes. FISH painting

4.4 OVERLOOKED GENOME VARIATIONS

249

confirmed that single copy of chromosome 1 for this particular cell line is among the last to condense (unpublished data). What is the relationship between DMFs and other types of chromosomal aberrations? DMFs can be co-detected with C-Fragemediated mitotic death, sticky chromosome, and aneuploidy. Can the DMF phenotype be linked to different diseases? In addition to being linked to various cancers (for example, the elevated frequencies of DMFs can be induced by topoisomerase II inhibitors for cell lines with G2 checkpoint deficiency), elevated frequency of DMFs can also be found in some Gulf War illness (GWI) patients (Liu et al., 2018). Further studies are needed to determine whether DMFs are linked with other diseases. 4.4.2.3 Chromosome Fragmentations C-Frag represents the major form of mitotic cell death where condensed chromosomes are progressively degraded (Heng et al., 2004c; Stevens et al., 2007, 2011a, 2011b, 2013b; Stevens and Heng, 2013). C-Frag has been observed for several decades, but it is often confused with chromosome pulverization (also known as premature chromosome condensation) [Stevens et al., 2010]. Unlike apoptosis and mitotic catastrophe, C-Frag is linked to diverse types of cellular stress (e.g., gene mutations, endoplasmic reticulum stress, infection, drug treatment, and centrosome dysfunction). C-Frag therefore represents a general response to system stress (Stevens et al., 2010, 2011a-b, 2013b, 2013c). C-Frag can be classified into different types: early-stage C-Frag, latestage C-Frag, or mixture (Fig. 4.11). C-Frags can be co-detected and linked to various chromosomal aberrations. As illustrated in Fig. 4.12, sometimes only one or a few chromosomes will be eliminated through C-Frags, which likely will lead to the generation of aneuploidy. Moreover, the fragmentation seems tightly linked to the formation (triggering?) of genome chaos (Stevens et al., 2007, 2013b; Heng, 2015). An important message is that cell death is an unpredictable factor in cancer evolution (Stevens et al., 2013b). There is a complicated relationship between cell death heterogeneity and macrocellular evolution. In general, triggering the death of cancer cells is good for the host, but C-Fragemediated genome reorganization can in fact speed up cancer cell evolution, leading to development of drug resistance. More discussions can be found from Heng (2015). 4.4.2.4 Unit Fibers Bak et al. (1979) described “unit fibers” as substructures of metaphase chromosomes. These “unit fibers” display a constant diameter of about 0.4 mm, which is approximately fivefold less than the final condensed chromatids in metaphase chromosomes. Using measurements by Bak

250

4. CHROMOSOMAL CODING AND FUZZY INHERITANCE

FIGURE 4.11 Examples of chromosome fragmentations (C-Frags). (A) In early-stage C-Frag, many individual chromosomes are normally condensed. (B) In late-stage C-Frag, most of the chromosomes are fragmented. In addition, C-Frag can occur in both early (C) and late mitotic figures (D). (Reused from Heng et al. (2013b). Karyotype heterogeneity and unclassified chromosomal abnormalities. Cytogenetic and Genome Research, 139(3), 144e157. https://doi.org/ 10.1159/000348682).

FIGURE 4.12

Spectral karyotyping images (right) and inverted DAPI (left) show that single chromosomes can be eliminated via chromosome fragmentation (C-Frag) (arrow), indicating a potential relationship between C-Frag of individual chromosomes and aneuploidy. (Reused from Stevens et al. (2007). Mitotic cell death by chromosome fragmentation. Cancer Research, 67(16), 7686e7694. https://doi.org/10.1158/00085472.can-07-0472).

4.4 OVERLOOKED GENOME VARIATIONS

251

FIGURE 4.13 Example of unit fibers. These Giemsa-stained unit fibers were prepared from a frog chromosome culture treated with drug (topoisomerase II inhibitor) (2 h) before slide preparation. There are a few interphase nuclei. The bundle of all unit fibers comes from one cell. Note that sister unit fibers exist in parallel. The diameter of these unit fibers is approximately 0.2 mm. (Reused from Heng et al., (1987b). Structure of the chromosome and its formation. II. Studies on the sister unit fibers. The Nucleus, 30, 2e9).

et al., with one additional run of helix, the unit fibers would form a metaphase chromosome. However, the unit fiber model lost its influence partially because of the popularity of the scaffold model. Surprisingly, similar structures were induced by topoisomerase II inhibition in short-term lymphocyte cultures from various species (Heng and Chen., 1985). Frog chromosomes in particular produce unit fibers with clear morphology (Fig. 4.13). Furthermore, drug-induced unit fibers differ from Bak’s unit fibers isolated from metaphase chromosomes in that the former produces two parallel fibers named “sister unit fibers” and the latter displays a single unit fiber (Fig. 4.14). The explanation is that the isolation procedure separates the sister chromatids of the metaphase chromosomes before decondensation into unit fibers. The existence of the sister unit fibers strongly suggests that the metaphase chromosomes are packaged by multiple levels of coiling organization, in which the unit fiber is the substructure. The relationship between unit fibers and fully condensed chromosomes can clearly observed in Fig. 4.15, where the transitional regions can be used to illustrate how the last step of chromosome condensation is achieved (by another run of helix or fold without coil). Nevertheless, it is about the time to reinvestigate this model especially using high-resolution electronic microscopy. 4.4.2.5 Sticky Chromosomes Sticky chromosome refers a phenotype where chromosomes stick to each other and are entangled with chromatin fibers following a standard protocol of chromosome preparation. When chromosomes are sticky,

252

4. CHROMOSOMAL CODING AND FUZZY INHERITANCE

FIGURE 4.14 Detailed view of the sister unit fibers. There are total of four sister unit fibers tangled with each other, where one of them displayed a higher degree of condensation. There is an interphase nucleus in the top right corner. The chromatin and interphase nucleus was stained by Giemsa. The sister unit fibers are indicated by arrows. Reused from Heng et al. (1987b). Structure of the chromosome and its formation. II. Studies on the sister unit fibers. The Nucleus, 30, 2e9.

FIGURE 4.15 An image of a defective mitotic figure (DMF) induced from cultured frog cells. Condensed chromosomes are seen in the top left corner. Other chromosomes have not properly condensed. Some chromosomes display physical connections between condensed chromosomes and unit fibers. The left corner multiple-coiled model was adapted to explain the multiple-coiled feature illustrated by this DMF. Two interphase nuclei are in the middle of the figure. Reused from Heng et al. (1988). Structure of the chromosome and its formation. II. Studies on the sister unit fibers. The Nucleus, 30, 2e9.

4.4 OVERLOOKED GENOME VARIATIONS

253

FIGURE 4.16 Images of sticky chromosomes. (A) A portion of the mitotic figure displays sticky chromosomes where multiple sticky chromosomes form a cluster (as indicated by the arrows) (Giemsa staining); (B) a comparison between nonsticky chromosomes (top right) and sticky chromosomes (indicated by an arrow), which is different from 2A as the sticky chromosome cluster likely belongs to a different mitotic figure; (C, D) sticky chromosomes are detected across the entire mitotic figure. In picture D, many sticky chromosomes have fused together. Reused from Liu et al. (2018). Detecting chromosome condensation defects in gulf war illness patients. Current Genomics, 19, 200e206.

chromatin fibers are often visible among chromosomes. Sticky chromosomes also display a fuzzy morphology (Fig. 4.16). The mechanism of this phenotype is less clear. It can be observed in high frequencies after drug treatment, such as ethidium bromide (EB). Some cancer samples that display high frequencies of DMFs also display an increased frequency of sticky chromosomes. Prolonged hypotonic treatment or over denaturation of the chromosome slide during FISH detection can promote the sticky phenomenon. There also seems to be a correlation between sticky chromosomes and difficulties in preparing suitable mitotic figure spreads. While sticky chromosomes have traditionally been considered products of failed chromosomal slide preparation, they gradually have gained acceptance and are associated with defects in DNA replication, chromosomal condensation, and methylation. For example, alteration of DNA methylation can prevent the synchronization of chromatin compaction leading to improper condensation (Flagiello et al., 2002). DMFs have also been detected with sticky chromosomes (Heng et al., 2013b; Liu et al., 2018). Interestingly, the chromosomes for some normal individuals seem to be more sensitive than others to increase the frequency of sticky chromosomes when exposed to a prolonged hypotonic treatment. Similarly, some cancer cell lines also extensively display the phenomenon of sticky chromosomes under normal conditions (without over hypotonic treatment, for example) (Anja Weise and Thomas Liehr, personal communication). Live cell imaging has revealed that these cells have high

254

4. CHROMOSOMAL CODING AND FUZZY INHERITANCE

plasticity with respect to the gain or loss of genetic material during cell division. This important observation highlights the common phenomenon of fuzzy inheritance. In addition, a specific chromosome surface protein has recently been identified, which may lessen the stickiness of chromatin to prevent chromosomes from tangling together (Cuylen et al., 2016). Specifically, Ki-67 protein forms a steric and electrostatic charge barrier to avoid chromatin on different chromosomes from interweaving with each other. Furthermore, induced genome chaos is also associated with increased frequencies of the sticky nuclei (Heng et al., 2013b; Ye et al., unpublished observations). Clearly, many molecular mechanisms can lead to the same phenotype of sticky chromosomes. Equally important, the same phenomena can involve different types of diseases or illness condition. For example, elevated sticky chromosomes have been reported in GWI patients as well (Liu et al., 2018). Sticky chromosome is a common phenomenon in plant hybrids study. It is known that chromosome stickiness impairs meiosis (e.g., sticky chromosome leads to C-Frag, resulting in sterility) and influences reproductive success (Sosnikhina et al., 2003; Pessim et al., 2015). As discussed in Section 4.3 and Chapter 6, Section 6.6.3, it also linked to chromosomal variation as a potential for speciation. Publications on this subject in human cells often involve irradiation or chemical treatment (Al Achkar et al., 1989; Redpath et al., 2003). There seems a link between sticky chromosomes and condensation defects, as well as chromosome instability in general (Heng et al., 1988, 2013a-b). 4.4.2.6 Genome Chaos As mentioned in Chapter 3, massive chaotic chromosomes can be induced by using various drug treatments. In the 1980s, topoisomerase II inhibitors were used to treat a humanemouse hybrid cell line, which resulted in chromosomes of very long length (Fig. 4.17). These long chromosomes are likely resulted from multiple chromosome fusions. However, at that time without the framework of chromosomal coding, few have appreciated these types of findings. The establishment of a new conceptual framework and protocols to induce chaotic genomes has opened the door to systematically characterize chaotic genomes. A few considerations need to be remembered when designing new experiments: 1. It would be challenging to classify each subtype of chaotic genomes, as one can almost find anything under a chaotic process. In different mitotic figures, there are high degrees of chromatin that stick to each other (Fig. 4.18); hundreds of chromosomes within one cell (Fig. 4.19); and even an entire karyotype consisting of a single giant chromosome (Fig. 4.20)!

4.4 OVERLOOKED GENOME VARIATIONS

255

FIGURE 4.17 Image of drug-induced long chromosomes, one of the examples of chaotic genome (indicated by an arrow) observed in the 1980s.

FIGURE 4.18 Image of metaphase chromosomes connected by sticky chromatin fibers (Reused from Liu et al. (2014). Genome chaos: Survival strategy during crisis. Cell Cycle, 13, 528e537).

2. Many of these highly complex chaotic genomes represent transitional karyotypes, which likely will disappear in stable populations. 3. Many chaotic genomes represent both the “consequence” of genome instability and the “cause” for further instability. 4. It would be interesting to study the relationship between genome chaos and earlier embryonic development. 5. It is anticipated that certain degree of genome chaos can be detected from normal individuals as well. Recent reports suggest that

256

4. CHROMOSOMAL CODING AND FUZZY INHERITANCE

FIGURE 4.19 Images of abnormal separation and condensation of sister chromatids (left) and in extreme cases organization of the entire genome within one giant chromosome (right). The scale is 10 mm. (Reused from Liu et al., (2014). Genome chaos: Survival strategy during crisis. Cell Cycle, 13, 528e537).

FIGURE 4.20 Nonclassified spectral karyotype (left) and DAPI (right) images of cells containing over 700 chromosomes from human (A) and over 400 chromosomes from mice (B). (Reused from Liu et al. (2014). Genome chaos: Survival strategy during crisis. Cell Cycle, 13, 528e537).

4.4 OVERLOOKED GENOME VARIATIONS

257

chromothripsis in healthy individuals may contribute to genome abnormalities in offspring (DePagter et al., 2015; Pellestor and Gatinois, 2018). Future research is needed for this topic. 4.4.2.7 Micronuclei Cluster Micronuclei (MN) refer to the small nuclei that result from chromosomes or chromosome fragments that separate from daughter nucleus during cell division. By linking MN to DNA damage, MN study becomes a popular approach to study the toxicological effect (Fenech et al., 2003). Interestingly, MN induced from peripheral lymphocytes culture are somewhat different from MN found in various cancer cells. MN tend to exhibit highly diverse morphology in various cancer cells, often forming MN clusters, especially when the cellular population is highly unstable. Fig. 4.21 represents some examples of unstable cancer cells. In unstable cancer cell populations, MN exhibit a range of sizes and often form clusters. It is likely that some of these MN still have their capability to form chromosomes. In cancer cytogenetic analyses, researchers can observe clusters of mitotic figures. Comparing with MN cluster, it is likely that these mitotic figure clusters are derived from MN cluster (Fig. 4.22). Given that during the genome chaos process, many MN cluster formed as the results of fusion/dividing cycles, these MN must be functional in terms of anticipating emergence or serving as transitional structures (Ye et al., 2019). Interestingly, judged by the chromosome morphological feature, many of

FIGURE 4.21

Images of chromosomes and micronuclei (MN) cluster. Left panel: Five different cells are shown. A mitotic figure can be seen in the right corner, evidenced by condensed chromosomes. Directly to the left of the mitotic cell is an interphase nucleus. An additional interphase cell can be seen in the top left corner. Two clusters of MN are shown in the middle. Right panel: A large cluster of MN. Within this cluster are dozens of MN, ranging in size from small to large, exhibiting the different morphologies that can exist.

258

4. CHROMOSOMAL CODING AND FUZZY INHERITANCE

FIGURE 4.22 The mitotic figure/interphase nuclei cluster. The circles are drawn to help reader to separate chromosomes from different “microcells.”

these MN do not appear to be in the same cell cycle phase. In addition to the interphase nuclei, some metaphase chromosomes enter the condensation stage a bit earlier than others. 4.4.2.8 Unclassified Chromosomal or Nuclear Abnormalities/ Variations There are many more unclassified chromosomal or nuclear variations (Fig. 4.23). As they all reflect cell population instability, characterization of these structure is needed. More details can be found from previous publications (Heng et al., 2013b; Heng 2015). 4.4.2.9 Unification of the Different Types of Chromosomal Aberrations Despite their large degree of morphological differences, different chromosome and nuclear variations can be unified under the following principles: First, many different types of chromosomal aberrations can be directly linked to different stages of the cell cycle (Fig. 4.24). These links can explain why many different types of chromosomal aberrations are linked with each other. For example, nearly all phases of the cell cycle (DNA replication, condensation, and segregation) can be linked to DMFs, CFrag, and aneuploidy. Disruption of each stage can induce genome chaos. Because each type of chromosomal aberration can be linked to a large number of genes and environmental stresses, trying to link specific genes to specific types of chromosomal aberrations has limited clinical value (see the evolutionary mechanism of cancer). Second, most of these ignored and unclassified chromosomal/nuclear aberrations are categorized as NCCAs. Because genome instability unifies

4.4 OVERLOOKED GENOME VARIATIONS

259

FIGURE 4.23 Unclassified chromosomal or nuclear abnormalities/variations. (A) DAPI image of a mouse nucleus with “holes” following BrdU treatment. The holes are indicated by the two arrows. The slide was made by routine cytogenetic preparation of mouse bone marrow following BrdU treatment. The bright spots are heterochromatin. (BeE) Irregular shapes of nuclei observed from cancer lines (both spectral karyotyping and reverse DAPI images). (F) Huge interphase nuclei in a human blood cell culture following doxorubicin or ethidium bromide treatment. The large nuclei (indicated by an arrow) are clearly bigger than average size nuclei (the rest of the nuclei). (GeH) Special morphology of nuclei following doxorubicin treatment. Some nuclei show morphology similar to late-stage chromosome fragmentation. (Reused from Heng et al. (2013b). Karyotype heterogeneity and unclassified chromosomal abnormalities. Cytogenetic and Genome Research, 139(3), 144e157. https://doi.org/10.1159/000348682).

FIGURE 4.24 All stages of the chromosomal cycle can be linked to different types of chromosomal aberrations. The key is the formation of variants with different genomes. The repairing process and partial cell death also can form new genomes.

260

4. CHROMOSOMAL CODING AND FUZZY INHERITANCE

all types of NCCAs, it may be more effective to monitor genome instability through the quantification of NCCAs. Third, the issue of somatic mosaicism plays an increasingly important role in human diseases (Heng, 2013b). NCCAs certain make this issue more complicated. The mosaicism likely involves the emergent properties; the degree of mosaicism is of importance. One key question that needs to be addressed is why somatic mosaicism can be tolerated during early development but become problematic during cancer evolution. Equally important is to discuss how to use somatic mosaic data to predict disease condition. Fourth, although there are many different types of chromosomal/ nuclear abnormalities (even the same type of abnormality, such as micronuclei clusters, can be generated from very different mechanisms), the biological function of these abnormalities is fundamentally the same: altering the chromosomal coding by changing the functional relationship among genes. Recent micronuclei case studies have led to a model of micronuclei fusion/fission, which forms different chromosomal combinations (Ye et al., 2019). Similar to the model illustrated in Fig 3.6, all different types of micronuclei clusters can contribute to the alteration of chromosomal coding, including the level of somatic mosaicism. Finally, the systematic characterization of various types of chromosomal/nuclear abnormalities can be useful for future chromosome therapy to correct certain types of abnormalities (Kim, 2014, 2017).

C H A P T E R

5

Why Sex? Genome Reinterpretation Dethrones the Queen 5.1 SUMMARY A widely held belief in biology is that asexual reproduction produces identical genetic clones while genetic recombination during sexual reproduction promotes genetic diversity. Our cancer evolution studies unexpectedly challenge this century-long concept: further comparisons and analyses of data from organismal evolution suggest that sex functions as a constraint rather than a diversifying factor at the genome level and that sexual and asexual species display drastically different patterns of evolution. In this chapter, the question of “why sex?” is briefly examined using pertinent current theories. Surprisingly, the main function of sexual reproduction is to maintain the chromosomal coding or karyotype to preserve a species’ identity. Interestingly, there is a conflicting relationship between gene dynamics and genome constraints, as is illustrated. Moreover, a computer simulation is performed to support this main function of sex. This journey demonstrates the importance of the new framework of evolution, the crucial contributions of data reinterpretation, the value of scientific collaboration and communication, the importance of addressing rather than ignoring obvious paradoxes, and the necessity of challenging some fundamental assumptions in biology.

Genome Chaos https://doi.org/10.1016/B978-0-12-813635-5.00005-7

261

Copyright © 2019 Elsevier Inc. All rights reserved.

262

5. WHY SEX? GENOME REINTERPRETATION DETHRONES THE QUEEN

5.2 WHAT IS THE PURPOSE OF SEX? THE ANSWER IS NOT OBVIOUS What is the purpose of sex? When this question is posed, most people laugh. “The answer is obvious!” This includes evolutionary biologists despite their varied and drastically different modes of thought relative to the general public. Biologists care about the biological and evolutionary meaning of sex and this includes explaining why sex is so overridingly important that it can even cost the lives of the participants, such as male praying mantis who sometimes literally loses his head over sex and female black widow spiders who prefer to dine on the male Latrodectus after mating. For nearly a century, sex has been biology’s biggest mystery (Lane, 2009). For many decades, this question has been referred to as the “queen problem” within the evolutionary field. According to Graham Bell: Sex is the queen of problems in evolutionary biology. Perhaps no other natural phenomenon has aroused so much interest; certainly none has sowed as much confusion. The insights of Darwin and Mendel, which have illuminated so many mysteries, have so far failed to shed more than a dim and wavering light on the central mystery of sexuality, emphasizing its obscurity by its very isolation. Bell, 1982

Science writer Carl Zimmer stated: “Sex is not only unnecessary, but it ought to be a recipe for evolutionary disaster . And yet sex reigns . Why is sex a success, despite all its disadvantages?” (Zimmer, 2010). Nick Lane wrote: “. Just think about the biology. Sex is nuts. Cloning makes much more sense.” (Lane, 2009). Mark Ridley said: “Evolutionary biologists are much teased for their obsession with why sex exists. People like to ask, in an amused way, “isn’t it obvious?” Joking apart, it is far from obvious .. Sex is a puzzle that has not yet been solved; no one knows why it exists” (Ridley, 2001). Michael Brooks listed sex as one of the 13 things that do not make sense, one of the most baffling scientific mysteries of our time (Brooks, 2010). These accounts do not exaggerate. The reason these well-known science writers and editors are cited is that they themselves have not been personally responsible for any specific hypothesis and thus can provide an unbiased or less-biased overall picture better than some evolutionary scientists who are often passionate about a specific hypothesis they developed. Ever since the Swiss naturalist Charles Bonnet’s 18th century discovery that females can produce offspring without being fertilized by males (parthenogenesis), the debate on the function of sex in biology has continued to rage (Fournier et al., 2005; Watts et al., 2006). Charles

5.2 WHAT IS THE PURPOSE OF SEX?

263

Darwin’s grandfather, Erasmus Darwin, one of the leading intellectuals of his time, proclaimed that “Sexual reproduction is the chef d’oeuvre, the master-piece of nature” (Darwin, 1800). However, according to Darwin, “We do not even in the least know the final cause of sexuality; why new beings should be produced by the union of the two sexual elements, instead of by a process of parthenogenesis. The whole subject is as yet hidden in darkness” (Darwin, 1862). August Weismann greatly influenced this subject as he believed that the function of sexual reproduction was to mix the genotypes of two individuals. Accordingly, he linked sex to genetic diversity. Interestingly, while he assumed that the production of diverse offspring contributes to evolutionary novelty, he offered a very insightful argument in 1886 that sex decreases large-scale genetic variance, yet increases some other small genetic changes. Unfortunately, by 1891, he had abandoned this brilliant idea and emphasized only the aspect of increasing genetic diversity (Gorelick and Heng, 2011). His subsequent opinions became the basis of many hypotheses on the function of sex, as exemplified by viewpoints/ models discussed by Guenther (1906) and Fisher (1930) and later by Muller (Muller, 1932, 1958), Crow and Kimura (1965), and Maynard Smith (1968, 1978). For over a century, scientists have tried to come to grips with this important question. According to Ernst Mayr, “Since 1880 the evolutionists have argued over the selective advantage of sexual reproduction” but “so far, no clear-cut winner has emerged from this controversy” (Mayr, 2001). For the past three decades, inspired by a number of popular hypotheses, this “queen” question has remained one of the biggest mysteries in biology. No hypothesis can reconcile the facts and truly explain why sex exists. That is the reason why every few years the headlines in the top scientific journals declare that the “why sex?” mystery has been solved once and for alldagain! Some believe the “why sex?” question has been solved, despite detailed differences between current models. Others think that researchers still have no clue regarding this important issue. Even leading evolutionary scholars (and most textbooks) who consider this question basically settled are not so sure, deep down. For example, Ridley once wrote: “I asked John Maynard Smith, one of the first people to pose the question ‘Why sex?’, whether he still thought some new explanation was needed. ‘No. We have the answers. We cannot agree on them, that is all’” (Ridley, 1994). However, his intellectual consciousness left him worried. In his book, Maynard Smith concluded that “One is left with the feeling that some essential feature of the situation is being overlooked”. It would take yet another 30 years to finally identify the essential but overlooked feature of “why sex?”. First, there are some obvious disadvantages of sex that together explain why sex seems to be counterproductive for evolution.

264

5. WHY SEX? GENOME REINTERPRETATION DETHRONES THE QUEEN

1. Cloning (asexual reproduction) produces far more offspring than sexual reproduction. This has been referred to as the “twofold cost of sex.” 2. Cloning preserves the beneficial gene combinations that have already been tested against the environment. Sex through meiosis, by contrast, randomly mixes genes into new and untested combinations (the cost of recombination). 3. Cloning maintains all parental genes. There is no drastic diluting effect across generations. In sexual reproduction, however, a grandchild inherits only one-fourth of its grandmother’s genes. This represents a big disappointment for gene-centric concepts. 4. Finding a mate and fighting off competitors is inconvenient, costly in terms of time and energy (ornaments, courtship, and mating behavior) and sometimes can be an issue of probability. Of course, the possibility of sacrificing your life to mate would also seem to be an extreme disadvantage. 5. There is also a risk of contracting infectious diseases. As sex is a common and predominant phenomenon, those who consider sexual reproduction important point out that the advantages of sex must obviously outweigh these disadvantages. There have been extensive efforts to search for advantages for sex at multiple levels, including at the population, individual, and gene levels. Most of these arguments are based on neo-Darwinism and, in particular, on “good” or “bad” gene (genetic parts) effects that determine genotype. Maynard Smith hypothesized in 1978 that at the population level, sexual populations can evolve more rapidly to create novel genome types by combining mutations from two different individuals in one organism. This hypothesis was made despite the fact that sexual populations typically grow slower than asexual populations because of the “twofold cost of sex.” For example, when two advantageous alleles A and B occur at random within a population, sexual reproduction would permit the combination of these two alleles in the same individual in a much shorter time than if the reproduction were asexual since the two alleles would have to arise independently because of clonal interference (Maynard Smith, 1978). This model represents one of the most accepted explanations of the advantages of sex within the population point of view. In addition to creating novel genotypes, sex also should reduce the frequency of less-fit genotypes within a population as most mutations are believed to be harmful. In an asexual population, because alleles are passed to all descendants, populations are unable to purge an accumulation of mutations. This process is referred to as Muller’s ratchet. In contrast, in sexual populations, mutation-free genotypes can be restored by recombination of genotypes containing deleterious mutations.

5.2 WHAT IS THE PURPOSE OF SEX?

265

Alexey Kondrashov’s deterministic mutation hypothesis suggested another way of removing deleterious genes (Kondrashov, 1988). As most mutations are only slightly deleterious and the population will tend to be composed of individuals with a small number of mutations, sex would recombine these genotypes, generating some individuals with fewer deleterious mutations and others with more. Because there is a major selective disadvantage to individuals with more mutations, these individuals should die out, expunging the mutations from the population as a result. At the individual level, the Red Queen hypothesis is one of the most widely accepted propositions that tries to explain the existence of sex in that it helps organisms protect against parasites (Van Valen, 1973). As parasites evolve quickly, the host must also evolve to maintain an edge to fight disease. This idea became popular during the Cold War when the arms race between the United States and the Soviet Union was at the forefront of daily life. Furthermore, the arms race metaphor is easy to understand and can even be tested in artificial linear conditions. The tangled bank hypothesis suggested by Graham Bell emphasizes that sexual reproduction generates diversity (Bell, 1982). These slightly different diversities are essential for evolution for the organism to exist in an unstable environment with many physical niches. Specific clones are suitable for a given niche. As Zimmer described it, “a clone specialized for one niche can give birth only to offspring that can also handle the same niche. But sex shuffles the genetic deck and deals the offspring different hands. It’s basically spreading out progeny so that they’re using different resources” (Zimmer, 2000). George C. Williams’ lottery principle presents a similar idea of sexual reproduction, where sex produces increased diversity despite its twofold cost (Williams, 1975). Williams thought that sexual reproduction produces genetic diversity, which allows genes to survive in changing or new environments. He used the lottery analogy to explain his idea: essentially, breeding asexually would be like buying a large number of lottery tickets with the same number printed on each ticket. Sexual reproduction, in contrast, would be equivalent to buying a small number of tickets, but each with a different number. In a sense, even though sexual reproduction produces fewer individuals (tickets), collectively these different individuals would have much better odds of winning the game of evolution. There are many other hypotheses, including Bill Shields’s ecological viewpoint that sex reduces genetic variation (Shield, 1982, 1988) and meiosis maintains ploidy and corrects DNA damage (Cavalier-Smith, 2002; Gorelick and Heng, 2011). Among them, the DNA repair hypothesis proposed by Bernstein, Hopf, and Michod (1989) has received a great deal of attention (Bernstein et al., 1989). They argued that “the lack of ageing of the germ line results mainly from repair of the genetic material by meiotic

266

5. WHY SEX? GENOME REINTERPRETATION DETHRONES THE QUEEN

recombination during the formation of germ cells. Thus our basic hypothesis is that the primary function of sex is to repair the genetic material of the germ line.” Putting it all together, scientists have proposed about two dozen different hypotheses to explain the “paradox of sex.” All hypotheses can be divided into two categories (good or bad), one of which includes increasing the beneficial effects of and another which reduces the detrimental effects of evolution. The beneficial effects give the organism the ability to adapt faster and have greater diversity in a changing environment. The detrimental effects include slightly harmful mutations or poorly performing genetic combinations, as well as parasites. What do these many hypotheses have in common? All can explain some cases, but none are universally applicable to the majority of cases in nature. Most are based on the gene-centric notion that mixing genes gives rise to genetic diversity, the key for evolutionary success either through the production of diverse offspring directly favored by selection or through the rapid creation of novel genotypes. Their assumption is that asexual reproduction results in genetically identical clones while sexual species are highly diverse because of gene mixing. However, they all fail to explain some important paradoxes within the field. For example, following the argument that sex increases genetic diversity, sexual species possess high adaptability within extremely changeable natural conditions. The logical prediction is that sexual species should be detected in harsher environmentsdthe opposite to what is found in the real world. Contrary to yet another example of theory that does not transpose to nature, why is there a prevalence of asexual reproduction in harsh, unstable environments? Most asexual species are near the top of the evolutionary tree while sexual species tend to populate the tree branches. If asexual species display identical genomes as previously assumed, these unaltered species should be detected in the main branches because many of the asexual species appeared much earlier than most sexual species. As pointed out by Otta (2008): Even though asexual lineages do arise, they rarely persist for long periods of evolutionary time. Among flowering plants, for example, predominantly asexual lineages have arisen over 300 times, yet none of these lineages is very old. Furthermore, many species can reproduce both sexually and asexually, without the frequency of asexuality increasing and eliminating sexual reproduction altogether.

If a higher level of genetic diversity is important to evolution, why do sexual populations display slower evolution (sexual populations do not adapt or do so at a much slower rate than expected) (Futuyma, 2010, Gorelick and Heng, 2011)?

5.3 SURPRISE! ASEXUAL REPRODUCTION

267

If the main function of meiosis is to mix genes, then what is the purpose of sex in species capable of self-sex where genetic mixing would not change the degree of genetic variation (Gorelick and Carpinone, 2009)? With so many factors that can be linked to sex, why is it so difficult to identify a central common mechanism? To be worthwhile, the advantages of sexual reproduction must outweigh its many disadvantages. In contrast, when most hypotheses are tested experimentally or by mathematical modeling, the benefits of sex are often much less than expected, with frequently conflicting results. In particular, by reviewing the history of evolutionary studies on sex, it appears that all hypotheses have a certain level of explanation, but with obvious limitations, and are particularly better under very specific conditions hard to find in nature. Models become popular for a while and then lose their attraction. The ways in which accepted theories illustrate the advantages are often extremely complicated, counter to the general belief that the important principles of nature are often very simple. Occam’s razor, if you will. This situation is comparable to current cancer research, where so many cancer genes have been identified as the causative factor, yet no common molecular mechanisms have emerged. Clearly, something big must have been missed and Maynard Smith’s skepticism is well founded. But which essential features have been overlooked?

5.3 SURPRISE! ASEXUAL REPRODUCTION DOES NOT GENERATE CLONAL PROGENITORS! The common key assumption for almost all hypotheses regarding “why sex?” is that asexual reproduction produces identical clonesd virtual xerox copies with identical genomes as described in many textbooks. If this assumption is not accurate, then all time-honored hypotheses based on this assumption need to be reconsidered. Unfortunately, no one questions this “fact” or wants to bother examining the validity of this key assumption because it is obvious that genotypes can only be mixed through meiosis. Therefore, asexual species must have identical genomes. Our cancer research unintentionally and surprisingly indicated that this century-long assumption is incorrect (Heng et al., 2004c; Heng, 2007b). Asexual reproduction can produce both clonal and drastically different nonclonal progenies depending on the phase of evolution which is influenced by the environment! Extended analyses of literature have further revealed that, contrary to what was expected, sexual reproduction actually (and typically) produces similar genomes! This exciting discovery became apparent from the immortalization in vitro model previously described in Chapter 3. As illustrated by

268

5. WHY SEX? GENOME REINTERPRETATION DETHRONES THE QUEEN

Fig. 3.4, within the punctuated evolution phase, it is hard to identify common clones because most cells display different karyotypes, even though somatic cells are an asexual reproductive system. This experiment meant to “watch somatic cell evolution in action” is particularly interesting here, as the initial cell population was clonal with normal karyotypes. These significantly altered karyotypes must have been generated from the normal clonal cells, demonstrating that drastic nonclonal products can be effectively generated through an asexual process much more rapidly than through sexual reproduction. Later, we traced different clones generated by single cell cultures. The clonal procedure cannot generate pure clones at all when the system is unstable. Clearly, the observation that drastically altered genomes generated by asexual reproduction is a common phenomenon related to cancer transformation in patients, where it is almost always dependent on karyotype changes through a series of nonclonal events. In fact, genome alteration driven by transient nonclonal events is the most dominant feature of cancer. Although it is now obvious, this important message has not been seriously considered by people who claim that asexual reproduction produces clonal progenitors (it is possible that evolutionary biologists do not consider the cancer process as a good model to understand the mechanisms of evolution because of lack of understanding of cancer evolution). Recent findings and later analyses have demonstrated compellingly that the process of somatic cell reproduction involves largescale genetic diversity and that asexual organismal reproduction in general is probably not clonal. Furthermore, it has also suggested that the concept of clonality should be considered from a different genomic point of view (see Chapter 3), as clonality in fact is drastically different from a gene- or genome-based point of view (Heng et al., 2006c; Horne et al., 2015). For example, the clonality of a given gene can not only be found in cells with the same karyotype (clonal) but also in nonclonal cells with altered karyotypes. On the other hand, cells with the same karyotype (clonal architecture) can share the same gene mutation (clonal parts) or different gene mutations (nonclonal part). Knowing that somatic cell reproduction is not a simple clonal process, the next question to consider is the clonality in asexual and sexual organisms. In light of the gene-centric (parts-oriented focus) view and genome-centric (system-oriented focus) view, the situation in these systems could be drastically different as well. The two-phase model of cancer evolution has been used as a framework to connect the dots (Fig. 3.4). One of the key messages from Fig. 3.4 is that the pattern of punctuated evolution can switch into gradual/stepwise evolution and that the switching force is selection based on system survivability and stability. As the evolutionary pattern of bacterial and viral (asexual species) is known to represent punctuated evolution

5.3 SURPRISE! ASEXUAL REPRODUCTION

269

(Nichol et al., 1993; Heng, 2007b), while the evolutionary pattern among mammals (sexual species) are largely traceable in a stepwise fashion by following chromosomal rearrangements, the relationship between asexual- and sexual-mediated evolutionary patterns might be explained by this two-phase model. Specifically, the system stability might separate asexual reproduction from sexual reproduction. Based on the fact that asexual and sexual reproduction display different evolutionary patterns, asexual and sexual reproduction are placed into punctuated and stepwise phases, respectively, to parallel organismal evolution with cellular evolution (Fig. 5.1). The relationship between asexual and sexual reproduction can now be referred to as phase

Cancer evolution

Organismal evolution

Unstable genome High level of NCCAs High genome diversity

System stability

Stable genome High level of CCAs Low genome diversity

Punctuated Pattern

Stepwise Pattern

Asexual reproduction High genome diversity

Sexual reproduction Low genome diversity

Sexual filter

Asexual = Identical genome?

Sexual = Diverse genome?

FIGURE 5.1 Comparison of somatic and organismal evolution. Unstable cancer genomes and asexual reproduction display parallel punctuated patterns of evolution. In contrast, stable genomes and sexual reproduction demonstrate comparable stepwise evolutionary patterns. Because the key separation between unstable genomes (preimmortalized cells) and stable genomes (immortalized cells) is system stability attained following the cellular crisis stage (see Fig 3.4), the key to separating sexual and asexual organisms (the main function of sex) could also be system instability. To support this inference, however, the assumption that asexual organisms contain identical genomes (while sexual organisms are associated with diverse genomes) must be revisited. With this change, it is apparent that sexual reproduction conserves the system rather than contributing to its diversity. Reproduced/modified with permission from Heng, H. H. (2007). Elimination of altered karyotypes by sexual reproduction preserves species identity. Genome, 50(5), 517e524. https://doi.org/10. 1139/g07-039.

270

5. WHY SEX? GENOME REINTERPRETATION DETHRONES THE QUEEN

transition between unstable (or less stable) systems and stable systems, which shines new light on the “why sex?” issue. If our two-phase model of cellular evolution is correct and is generally applicable to organismal evolution, it is very tempting to conclude that the function of sex in organismal evolution is to provide genome-level stability (reducing diversity) as predicted by our somatic cell evolution model. There is one big challenge, however. If the assumption that sexual reproduction functions as a “filter” to maintain system stability, then, logically, sexual species should display less genomic heterogeneity compared with asexual ones. This assumption sharply contrasts with what we have known and know about sexual reproduction. Can we really simply switch the status of sexual and asexual reproduction in terms of their associations with diverse or identical genomes? To many, this idea was unthinkabledeven laughable. The notion of genetic diversity by means of sex was well established. It was as if the two-phase model of evolution could not be correct because it would deem the status quo belief that sex conserves the genome dead wrong. How could it be that the function of sex is connected to system flux or stability? And how could the century-long concept of sexual and asexual reproduction be wrong? The new two-phase model, however, has great predictive power regarding cancer evolution, as it explains most features of cancer extremely well and has provided a great deal of insight into the evolutionary dynamics of somatic cells. The following additional reasons further support this new concept: (1) somatic cells and other asexual reproductive lineages as well as sexual reproduction linkages are biological lineages. They all should follow the rules of evolution; (2) we are not aware of any common important link between asexual and sexual reproduction except the patterns of evolution established by the two-phase model; (3) we already know that somatic cells can form drastically different nonclonal patterns which challenge traditional beliefs that asexual reproduction produces clonal progeny. To test our model, the only thing left to do was examine real data to see if, as we predicted, asexual species displayed highly diverse genomes and sexual species displayed identical genomes. Simply put, if this new model is correct, we need to switch the key assumptions about asexual and sexual associations with respect to reducing or increasing genetic diversity. Ideally, we would like to examine genome-level diversity. Because of the fact that most available data for asexual species concern gene-level changes, we decided to use gene data instead. The question then became, “Are asexual organisms really clonal with low degrees of genetic diversity?”

5.3 SURPRISE! ASEXUAL REPRODUCTION

271

Thanks to various genome sequencing projects, these data have become available to directly test this assumption. We quickly arrived at our answers simply by studying the current literature. It is now clear that most free-living bacterial species undergo frequent rearrangements of their genomes, leading to a great deal of genetic variation between and within a species (Ochman and Davalos, 2006). Some obligate intracellular symbionts such as Buchnera aphidicola represent exceptions as they have very small genomes and high degrees of stability (Tamas et al., 2002). For example, when nine strains of Escherichia coli were sequenced, they shared only 55%e60% of a conserved gene core, illustrating a high level of intraspecies genetic diversity (Konstantinidis and Tiedje, 2005). This suggests that E. coli has a much more imprecise reproduction mechanism than was previously expected. Recent single cell sequencing analysis also confirmed that high levels of genome dynamics are detectable in bacteria (Ottesen et al., 2006). Interestingly, a well-known challenge in defining bacterial species has also illustrated the high degree of intraspecies genetic diversity (Konstantinidis et al., 2005). Based on the fact that a bacterial species is defined as a collection of strains characterized by DNA with at least 70% cross-hybridization (Wayne et al., 1987; 1988), it is obvious that asexual reproduction produces a high level of intraspecies genetic diversity, which explains why there is a loose classification of species. Amazingly, as if this knowledge is not enough, there is also a flood of data from Craig Venter’s global ocean sampling expedition project that assessed the genetic diversity in marine microbial communities. As mentioned in Chapter 4, the observed genetic diversity is so high that even assembling a “standard” genome for many microorganisms collected in the ocean is difficult. Again, this powerfully demonstrates that these asexual species do not simply reproduce themselves in a clonal manner. It should be pointed out that bacterial conjugation and horizontal gene transfer (HGT) between microorganisms should not be considered equivalent to sexual reproduction, as sex in eukaryotes is generally defined by meiosis plus syngamy. Without typical chromosomes, prokaryotes are drastically different from eukaryotes, and the eukaryotic viewpoint should not be imposed on prokaryotes. In addition, considering HGT as a sexual process within asexual microorganisms is confusing. The second question was “Are sexual species much more genetically diverse than asexual species?” The answer was also a resounding “No!” Sequencing data demonstrates that there is drastically less diversity among genomes of individuals in higher sexually reproducing animals (including humans) than in asexual organisms. For example, the sequence similarity among human individuals is 99.9% (Venter et al., 2001). More

272

5. WHY SEX? GENOME REINTERPRETATION DETHRONES THE QUEEN

importantly, there is a high degree of genetic homogeneity among individuals of every species of higher animals at the karyotype level as “normal” individuals of a given species share a signature karyotype. Such genetic similarities can be found among closely related species as well, with much higher similarities than in bacteria. The sequence similarity between human and chimpanzee sequences is over 98%; with the exception of one major chromosomal fusion and a few chromosomal inversion events, the karyotypes of both species are very similar (Waterson et al., The Chimpanzee Sequencing and Analysis Consortium, 2005). Humans and mice also share the same essential genes; the proportion of mouse genes with a human homolog is higher than 99% (Waterson et al., Mouse Genome Sequencing Consortium, 2002). Certainly, there are copy number variations within each genome which have less impact in terms of changing the overall genome topology compared with genome reorganization. Even in many species that are not closely related, both the chromosomal syntenic relationships and the sequence homology are evident. Thus, it is much easier to establish a phylogenetic tree based on karyotype evolution in mammals. When these data were initially discussed, many were not impressed by the comparison of genetic diversity between humans and E. coli. It was reasoned that comparison with bacteria is improper because of their very short generation times. In addition, there are just too many differences between bacteria and human. To be more convincing, we sought to compare more similar species with distinctive features of sexual and asexual capabilities. The rotifer (a microscopic freshwater invertebrate) was selected as it provides direct and convincing evidence to validate the hypothesis. Known to biologists as an “evolutionary scandal,” the Bdelloidea class procreates entirely by asexual reproduction (females reproduce by apomixes). Interestingly, their closest relatives are in the class Monogononta, which can reproduce either sexually or asexually. Comparing species with only asexual capability and species with both sexual and asexual capability should be informative. As expected, the sequence diversity of the four bdelloid species (asexual) showed that for each tested gene, the most similar copies of synonymous sites differed by 36%e73%, while in monogononts (with sexual capability), the difference between comparable alleles observed was only between 0% and 2.4% (Welch and Meselson, 2000; Welch et al., 2004; Birky, 2004). Clearly, the significantly higher level of genetic variation at both the gene and genome levels is associated with asexual and not sexual organisms. Finally, with the Saccharomyces cerevisiae yeast, the story became clear. Yeast can reproduce both sexually (through meiosis) and asexually (both haploid and diploid yeast cells reproduce asexually by mitosis when daughter cells bud off from the mother cells). During stressful conditions

5.4 THE SEARCH FOR THE MAIN FUNCTION

273

like starvation, yeast will go through the sexual cycle to produce four haploid spores with two mating types. Conventional wisdom has been that the meiotic function of gene mixing will provide diverse genetic backgrounds for progenitors to better fit a new harsh environment. The asexualesexual transition in yeast has been considered a good example to explain why sex is needed to increase genetic diversity in the meiotic process. But this reasoning is very problematic as evolution has no eye for the future. Who knows how long the starvation will last or whether there will be any starvation at all in the future? The better answer runs counter to conventional, gene diversityepromoting wisdom: the asexualesexual transition is to provide genome stability when the system is under stress that could potentially alter the genome (which either leads to the formation of a new species with extremely low odds or death). In other words, yeast enters the sexual cycle to preserve their genome and ensure that each new progenitor formed will still be the same type of yeast, regardless of the environment. It is known that in yeast, asexual lineages typically develop aneuploidy over time and the chromosomal changes frequently lead to the loss of the capability to enter meiosis. Eventually, following limited generations, these asexual lineages die off.

5.4 THE SEARCH FOR THE MAIN FUNCTION AND COMMON MECHANISM OF SEX So what is the underlying basis that the main function of sex is the maintenance of species stability? Based on the somatic cell evolutionary concept that genome alterations are the driving force in macroevolution, and the genome theory that genome-level alterations are much more dominant than gene-level alterations, it was hypothesized that the genome framework (or system coding) must be preserved to provide the necessary system stability produced by sexual reproduction. To achieve this, genome alterations would have to be prevented or altered genomes would have to be eliminated. Because many gene-based models such as “DNA repair” and “purifying gene mutation” models have limited power to explain the main function of sex, it is time to look to the genome level for answers. The following is an evaluation of the main steps of sexual reproduction to see what happens in these altered genomes. The first step is meiosis. It is known that the pairing of homologous chromosomes is the mechanism that reduces diversity at the chromosome level. When there are significant chromosomal alterations, this pairing process is jeopardized and leads to meiotic failure. Altered genomes are eliminated and they are out of the game. Even within the genome, the chromosomal pairing mechanism in meiosis reduces genetic diversity. It

274

5. WHY SEX? GENOME REINTERPRETATION DETHRONES THE QUEEN

was suggested that the X chromosome has maintained most ancestral genes during evolution, whereas the Y chromosome has degenerated because of its lack of recombination (Bachtrog, 2003). In fact, the Y chromosome appears to be characterized by the interplay of gain and loss of genes, large genomic regions, and entire chromosomes (Bachtrog, 2006). It is likely that the increased level of diversity in the Y chromosome resulted from escaping pairing regulation and is analogous to asexual reproduction. In addition, the asexual bdelloid chromosomes are heteromorphic, providing further support for the link between genome diversity and reproductive strategy as heteromorphic chromosomes serve as a hallmark of asexuality (Birky, 2004). Furthermore, even at the gene level, the DNA repair process also reduces diversity by reversing some gene alterations, and mutations in many meiosis-specific genes can be filtered out by meiosis itself (Bernstein et al., 1989; Gorelick and Heng, 2011). The second step is fertilization. It is likely that competition among sperm and the event of spermeegg interaction represent another layer of filters to reduce or eliminate altered genomes. During capacitation, one of a large number of sperm will be successful in penetrating the ovum. If the sexual filter exists, one should find an increase in genome alterations in unsuccessful sperm. That was the case in literature. For example, the number of chromosomal abnormalities is increased in sperm from infertile patients (Calogero et al., 2001), and there is a high rate of aneuploidy detected in miscarriages following in vitro fertilization in general (41%) and intracytoplasmic sperm injection in particular (76%) (Lathi and Milki, 2004). More examples of genome-level fuzzy inheritance detected from sperm and egg can be found in Chapter 4. The third step is development. Drastically altered genomes will often be aborted. The clinical finding that a majority of spontaneously aborted early human embryos display chromosomal abnormalities supports this viewpoint. Approximately 50% of early ( corn > wine grapes. However, in reality, based on DNA sequencing data, the order was reversed: wine grapes > corn > humans. Indeed, the level of genetic diversity in wine grapes is an order of magnitude greater than humans (Myles et al., 2011). The problem is that people refuse to acknowledge this fact anddmore importantlydits implications. Although an overwhelming amount of genomic diversity has been demonstrated in most asexual species by current sequencing efforts, today’s textbooks still firmly claim that asexual species are genetic identical clones without genetic variations, while the main function of sexual reproduction is to increase genetic diversity. It is interesting to quote the statement of Sarah Otto, a leading expert in the field: Most biologists are comfortable with the idea that sex evolved to provide variability, but mathematical models have proved that this comfort is unwarranted: sex need not increase variability, variability need not be beneficial and evolution need not favour sex, even when it does increase variability and variability is beneficial. Nevertheless, models have shown that there are certain conditions under which higher rates of sex and recombination should evolve. . However, the conditions on fitness are rather restrictive and it is unsettling to require that selection fortuitously meets them in the vast majority of eukaryotes in order to explain the ubiquity of sex. Otto and Lenormand, 2002

284

5. WHY SEX? GENOME REINTERPRETATION DETHRONES THE QUEEN

Maybe it was still acceptable for textbook authors and editors to ignore the above message in 2002. But sequencing data has forcefully disagreed with the traditional assumption that asexual species are genomic clones and it has been published over a decade ago that asexual species display a high level of genomic diversity due to the lack of sex mediated genome constraint (Heng, 2007b; Gorelick and Heng, 2011), which means it is now educators’ duty to alter textbook information regarding this topic. Perhaps we should not be too critical toward many textbook authors, as many presumed “facts” continue to impact many researchers’ thinking. In various simulation studies, the function of sex is always assumed to increase the genomic diversity, so are many computer and ecological models as well as new “why sex?” ideas. For example, Victor Shcherbakov has promoted the idea that the main advantage of sex is the ability to counteract evolution (Shcherbakov, 2010). Based on the consideration of entropy and discrete entities within the universe, he proposed that biological species as defined by Ernst Mayr are the only possible forms of existence for higher organisms. He would likely appreciate the genome perspective as it was not clear to him that asexual reproduction is more closely linked to higher levels of diversity than sexual reproduction. Another example is the rationale to test the Red Queen hypothesis based on nongenetic variants from asexual species: One hypothesis is that parasites keep asexual organisms from getting too plentiful. When an asexual creature reproduces, it makes clonesdexact genetic copies of itself. Since each clone has the same genes, each has the same genetic vulnerabilities to parasites. If a parasite emerges that can exploit those vulnerabilities, it can wipe out the whole population. On the other hand, sexual offspring are genetically unique, often with different parasite vulnerabilities. Parasites may have had role in evolution of sex, Science daily, 2009

These researchers stated that, in theory, when facing parasites, asexual populations will go to extinction due to no or low genetic diversity. Their experiments also support their conclusion. However, if the most basic assumptions used are incorrect, then conclusions based on these assumptions, regardless of how beautiful the experimental system is, will likely be wrong. Meanwhile, today’s researcher will continually make headlines to “prove” again and again that some individual factors can cause sex under experimental or selected natural conditions and thus validate the existing hypotheses. As discussed in the previous chapters, using a simple linear model that focuses on a specific factor can often artificially illustrate a causative relationship that does not exist in the complex real world. Furthermore, some specific features can be traced in defined experimental conditions (Heng, 2015). However, evolutionary selection

5.5 THE BATTLE IS ON: CHANGING CONCEPTS

285

is based on the dynamic package rather than any specific feature separated by an isolated test because there are so many variables that it is impossible to link a species’ survival to any specific factor. Particularly in large populations, the sheer number of small local groups, each with their own unique genetic profiles at the gene or epigenetic level as well as distinctive “microenvironments,” makes it impossible for a single factor (such as a specific type of parasite) to be the sole determining factor. As long as some “local group” survives, even if by chance, the surviving few will repopulate themselves using the same genome. Such fuzzy inheritance-contributed dynamics make individual factors insignificant in the long run. To explain this further, one can analyze the current status of searching for genetic factors that cause common genetic conditions using genomewide scanning of large patient populations. Although hundreds and even thousands of genetic markers can be identified based on statistical significance, it was a challenge to identify a common mechanism from the point of view of a specific molecular mechanism. That was the rationale to suggest the use of the evolutionary mechanism of cancer to understand the common mechanism of cancer, rather than study each independent molecular mechanism (Chapter 3; Ye et al., 2009; Heng et al., 2010a). An analogy applies between evolutionary studies and the function of sex. Under specific conditions, many factors including population density, nutrition conditions, dynamic environmental heterogeneity, parasites, seasonal changes, overproduced reactive oxygen species (ROS), increased competition, and short-term growth can all induce or be linked to sex even though they often are illustrated only in specific species. The problem of focusing on each specific mechanism is that there is no common mechanism to explain the question of “why sex?” The emergent current view is that perhaps only a combination of all these factors can be used to explain the existence of sex. Interestingly, however, underneath these diverse individual factors, there is a major evolutionary mechanism that can unify all of them: system stress triggers sex to preserve the identity of the genome under unstable conditions. As initially proposed (Heng, 2007b), many factors that have been linked to sex can and should be considered to be system stress and can likely induce genome instability (it has recently been illustrated that various types of stress can be linked to system instability) (Stevens et al., 2011a; Horne et al., 2014; Heng, 2015). System stress is indeed the only common link in the myriad of recent publications that have associated sex to a specific factor. Such analysis once again resonates with the evolutionary mechanism of cancer that was proposed in Chapter 3, as the cancer question and the sex question are both in fact questions of evolution.

286

5. WHY SEX? GENOME REINTERPRETATION DETHRONES THE QUEEN

With the acceptance that sex serves as a major constraint to preserve the genome so that a species can exist for a longer period of time, readers can now appreciate more the limitations of most linear models of sex that focus on a single factor. With regard to wave after wave of “convincing” publications that have “proven” the function of sex is linked to a specific factor such as parasites or heterogeneous environments, readers should reexamine each elegantly constructed story, using the following considerations/facts: • The question of how sex is involved is different from the question of how sex switches within a specific species that can reproduce both sexually and asexually. It also differs from the relationship between various types of sex (both self-sex and biparental sex). The question of more or less sex does not equate to the question of sex or no sex. • Short-term benefits of sex cannot predict the long-term benefits of sex as the long-term benefits are not equal to the accumulation of short-term benefits. • A designed linear experimental or mathematical model is limited when trying to explain the natural setting where multiple factors are involved. • Many diverse factors can be linked to sex, but the primary function of sex is not these diverse factors. When one examines the myriad of stories conveyed in many publications, their key internal conflicts can be seen. Even papers published by the same group at various time periods or different experimental systems focusing on alternative factors will many times be contradictory.

5.6 SIMULATION: ASK THE SIMPLEST QUESTION ABOUT THE FUNCTION OF SEX In cooperation with Dr. Hao Ying and others, a simulation study was performed to examine the concept that function of genome constraint as a main function of the sexual reproduction and to further address the issue of whether the genome constraint reduces or promotes biodiversity. Traditional assumptions concerning the function of sex including increasing the genetic diversity through gene mixing by meiosis, accumulating genetic variations over time resulting in biodiversity and geographic isolation as a key condition of speciation by reducing gene flow were not used. Instead, three basic assumptions were chosen as key for this simulation: (1) sex requires partners with the same or similar genomes, whereas asexual reproduction does not require partners; (2) genome variations can be simply achieved by changing the size (increasing or reducing) (Fig. 5.2A) and topology of the genome (to mimic genome size changes and genome reorganization) (Fig. 5.2B); and (3) the

5.6 SIMULATION: ASK THE SIMPLEST QUESTION

287

FIGURE 5.2A Parental genome size (600 units), where each round increases or decreases the size by 20 units.

chance of individuals with “new” genomes mating with individuals of identical genomes is low when the majority of encountered individuals have incompatible “parental genomes,” especially when individuals do not have a large or unlimited number of mating opportunities.

FIGURE 5.2B Topological changes. Translocation occurs between the black and red alphabets.

288

5. WHY SEX? GENOME REINTERPRETATION DETHRONES THE QUEEN

The simulation study was based on four distinct stochastic models of fluctuations and alterations in populations through time, measured in generations: (1) asexual reproduction with a consistent rate of genome size changes (both increasing and decreasing), (2) sexual reproduction with a consistent rate of genome size changes, (3) asexual reproduction with chromosomal inversions to represent topological genome changes, and (4) sexual reproduction with the same chromosomal inversions. MATLAB, a widely used engineering software package, was used to simulate a total of 5000 generations. This simulation presents three major conclusions: (1) asexual and sexual reproduction are linked to the promotion and inhibition of genome-level variations, respectively; (2) the rate of genome alteration contributes to genome-level diversity, especially in asexual reproduction; and (3) increasing the chance of specific mating among individuals with the same altered genome can promote biodiversity of a sexual species. Briefly, the simulation results support the concept that sex reduces karyotype (genome-level) diversity. Based on the analyses, it is less likely that the accumulation of gene-level diversity by meiosis will lead to macroevolution. This conclusion is at odds to the popular viewpoint that genetic recombination increases gene and population diversity, and population diversity is a necessary contributing factor to speciation. Although meiosis can increase genetic diversity at the gene level under certain conditions, the key and common role of meiosis is actually genomic preservation via elimination of altered genomes and reduction of the chance of successful mating between individuals sharing altered genomes. These ideas have not previously been vigorously simulated. A correct understanding of the main function of sex requires separation of genomic diversity that occurs at the gene and genome level. Gene-level diversity contributes to microevolution, whereas genomelevel diversity contributes to macroevolution and biodiversity above a species level. By acknowledging the common role of the system constraint, many other secondary functions of sex need to be systematically studied. Perhaps one of the most surprising observations is that new speciation events are incredibly low under sexual reproduction models. Before the simulation, we predicted that the number of new genomes (species) formed would be much more prevalent in asexual than in sexual models. However, it is totally surprising to us that no new genome became dominant. Because there was no selective pressure in our simulation, such high constraint cannot be explained by differential selective advantages. Based on the random mating model we used and the low rate of genome alteration in a large population, we reasoned that newly formed sexual species may not be able to find a mating partner with the same altered

5.7 CASE STUDIES: REINTERPRETATION USING NEW FRAMEWORK

289

genome. As a result, these newly introduced genomes will be diluted out of the population, and the original species will remain dominant. We immediately hypothesized that increasing the chance of specific mating among individuals with the same types of altered genomes should significantly increase biodiversity. To test this hypothesis, we introduced a condition to mimic geographical isolation that promotes specific mating among individuals with changed genomes. This was accomplished by allowing all individuals with altered genomes to separate themselves from the parental population (much like moving them to a nearby island). Through this setup, we could essentially create the conditions that every new, emerging species (with its specific altered genome) is able to find its own distinct niche and isolate itself from all other preexisting (parental) species. The simulation data strongly supported our hypothesis, as many new species became visible within the total population (Ying et al., 2018). Such analyses not only solved the initial difficulty to observe new species from sexual reproduction but also shone an important light on speciation. Thus, this simulated situation further suggests that the chance for mating partners with new matching genomes to meet represents another key rate-limited step for speciation (see Chapter 6).

5.7 CASE STUDIES: REINTERPRETATION USING NEW FRAMEWORK After describing the main genome-based function of sex, it is time to comparatively analyze some cases to illustrate the importance of using this different framework to explain and understand the same data. As described in Chapter 3, different appreciations between the gene mutation theory and genome theory of cancer lead to totally different conclusions, even based on the same datasets. It is not so much a matter of generating experiments/data as there is a great deal of preexisting data, but rather explaining the new approach based on a theoretical framework. For example, there is massive gene mutation data that have been generated by whole genome sequencing. People who believe that common gene mutations cause cancer are interested only in highlighting the patterns while ignoring the ultra importance of the massive stochastic mutations. In most published papers, there are always key gene mutations identified, which seem very convincing, but when all these proposed mutations are considered together, they make much less sense. But if the genome theory is used as an explanation, the conclusion is much different and the explanations become more reasonable. However, many researchers who work on cancer genome sequencing just cannot or will not see and acknowledge the pivotal role of the genome. When reading literature on

290

5. WHY SEX? GENOME REINTERPRETATION DETHRONES THE QUEEN

the function of sex, this is a common practice. That is, different groups focus on their own specific factors without considering the whole picture. Example 1: Sexual reproduction is more likely to be elevated in heterogeneous environments. This finding can be explained either as “heterogeneous environments cause the evolution of sex” or “sexual reproduction preserves the genome during stressful conditions.” Monogonont rotifers can reproduce both sexually and asexually. They reproduce asexually until populations reach a certain density at which point females signal to one another to produce males for sexual reproduction. When heterogeneous and homogeneous experimental conditions were compared, the incidence of sexual reproduction was greater with spatial heterogeneity (Becks and Agrawal, 2012). This interesting experimental result has been publicized as “Variety sparks sexual evolution” and “Why sex evolved” (Milton, 2010; Akst, 2010). However, how does heterogeneity trigger sex and how is this linked to known triggering factors such as population density that can produce a similar switch? This experiment did not directly address these questions as the regulation of a physical response in a switching species and the origin of sex are very different. Similarly, starvation triggers sex-switching in the yeast, S. cerevisiae, but this tells little about the origin of sex. Thankfully, we have not seen any new publications titled “Starvation sparks sexual evolution” or “Starvation is why sex evolved.” Furthermore, the data can easily be explained by an alternate framework such as the genome theory. The opposite conclusion is the following. Increasing numbers of sexual individuals do not mean increased genetic diversity but instead imply that sexual reproduction is a means to preserve the genome during stressful conditions. Stress often triggers genome instability, and cycles of sexual reproduction are an effective way to maintain the normal genome. A factor that promotes sexual-switching does not necessarily cause sex evolution. There are many additional factors influencing a switch to sexuality such as seasonal changes, ROS, and even behavioral cues. But it should not be claimed that these factors initiated the origin of sex. Interestingly, these diverse factors are all environmental stresses which can cause genome instability (Heng, 2007b). When genomes are unstable, sexual reproduction is necessary to preserve the genome. This may be true even in bdelloid rotifers that are capable of alternating meiosis with endomitosis (premeiotic doubling) and thereby use meiosis to correct environmentally induced damage. The genome theory predicts that by introducing environmental stress (including parasites), the ratio of sexual individuals will increase in facultative sexual species.

5.7 CASE STUDIES: REINTERPRETATION USING NEW FRAMEWORK

291

Example 2: It has been illustrated that under experimental conditions, hosteparasite coevolution favors biparental sex among individual nematodes that can reproduce by self-sex or outcrossing (Morran et al., 2011). This finding can either be praised as “the most definitive support yet” for the Red Queen hypothesis (Brockhurst, 2011) and “sex evolved to prevent parasite infections” or considered as an interesting experiment that illustrates the advantage of having a choice of different types of sex but has limited implications for the Red Queen theory. According to an author of that publication, the researchers tested a key component of the Red Queen hypothesis: that coevolving parasites favor the maintenance of sex. But the real message is that coevolving parasites favor the physical switch between outcrossing and self-fertilization for the same species. The Red Queen hypothesis is intended to explain the main function of sex, asserting that parasitically induced sex can facilitate adaptation by generating diversity which provides novel genotypes capable of escaping infection by parasites. There are four key components for this hypothesis: (1) parasites induce sexual reproduction of the hosts; (2) sex increases genetic diversity in progenitors; (3) parasite-resistant progenitors display altered genotypes; and (4) there is a resultant arms race between the parasite and host. This paper did not provide the needed information to validate these four components of the Red Queen. First, self-sex is fundamentally different from asexual reproduction because of only the former being capable of both meiosis and karyogamy (Gorelick and Heng, 2011). This paper addressed only switching between two types of sexdoutcrossing and self-sexdrather than switching between sexual and asexual reproduction. The question of “why sex?” differs from “why different types of sex?” In addition, the very existence of self-sex challenges the foundation of sex for gene mixing because nothing changes by mixing two identical copies of the same gene. Second, this paper assumed that the increased ratio of biparental sex increased genetic diversity. However, new genomic data overwhelmingly indicate that asexual reproduction results in higher levels of diversity than sexual reproduction. Third, the observation that the obligate self-sex population quickly died off was used to imply that diversity can prevent infection. However, the diversity level was unknown in both self-sexual and outcrossing populations. The survival difference between selfing and outcrossing populations seems to be too drastic in this experiment; further research is needed to investigate whether lab settings artificially contribute to such a difference. A long line of evidence indicates asexual rather than selfsexual populations die off (Sonneborn, 1954; Bell, 1988). Fourth, coevolution of parasites promoting outcrossing seems to suggest an evolutionary arms race. However, it is not clear if the genotypes

292

5. WHY SEX? GENOME REINTERPRETATION DETHRONES THE QUEEN

actually changed in the nematodes and their parasites during the observation period of this study. Together, the seemingly strong evidence provided by this paper to support the Red Queen hypothesis is actually limited. Interestingly, this paper might have illustrated a secondary benefit of outcrossing under specific conditions. When the genome has been preserved by sex, gene mixing can then have certain benefits under the extreme conditions illustrated. However, unlike laboratory experiments, natural conditions are always changing so one will never observe a constant universal benefit. In summary, to reconcile the importance of genes and the genome in sex and the impact on evolution, the concept of a gene and genome evolutionary tango can be proposed, which solves the conflict between short-term gene dynamics and long-term genome conservation. More importantly, when discussing many questions with high degrees of complexity, such as the function of sex, big picture synthesis is essential. It is possible that different specific models can be made to illustrate certain linkages with different features, but how universal are they? The implications of our new interpretation of the principle function of sex have far-reaching consequences. “Why sex?” represents a key evolutionary question as sex is one of the evolution’s major transitions, and a new way of thinking about evolution is required to understand our genome-based explanations. This new sexual reproduction concept will force people to reexamine the framework of current evolutionary theory. Gene mutations, new gene combinations, epigenetic alterations, and genome-facilitated plasticity all contribute to short-term evolution. However, the main constraint comes from genome conservation. Of course, in addition to system constraint by overall genome integrity and in particular sex, there also exist parts constraint such as DNA duplication mechanisms, checkpoint and DNA repair, and environmental and ecological constraints. However, the genome constraint is a key. Moreover, the genome is responsible for constraints both within the microevolution phase and the potential jumping that occurs at the macroevolutionary phase. Additional details are presented in the next chapter. The rethinking of the function of sex is a liberation experience for us. If one of the most important questions in evolution can be reinterpreted, how about the evolutionary theory itself?

5.8 LESSONS LEARNED The new model of the main function of sex and the journey to search for it has also taught us how to approach some of the most complex questions in the field.

5.8 LESSONS LEARNED

293

1. Focus on the first principle What is the most fundamental feature that separates sexual and asexual reproduction? The answer is that typical sexual reproduction requires a mating partner with a compatible genome. Thus, all conclusions, including those derived from our thinking and simulation results, must make sense based on this first principle. Both our effort to search for the main function of sex and our simulation studies are based on this key fact. That was the rationale behind why we chose to ignore the many other popular features of sex. Such focus is necessary when so many factors are involved. Equally importantly, if another theory does not match predictions set forth from the first principle, it must be incorrect in a general sense (even if it may be useful to explain certain data from limited models), no matter how popular it is. Similarly, if some sex-related factors are closely associated with the first principle, they should be given priority when planning experiments. Such analyses also will help the identification of the common linkages among different seemingly nonrelated factors. For example, many features of sex are related to stress. On focusing on the first principle, it becomes obvious that genome compatibility becomes key, which immediately leads to the issue of compatible genome maintenance. Fig. 5.1 and its rationale make perfect sense, so does our simulation data. As expected, asexual populations displayed high genome-level diversity, whereas sexual populations evidenced low genome-level diversity. On the surface, this result seems foreseeable. However, this straightforward simulation also revealed a big surprise/paradox: if this simplest assumption, that sex requisites partnership and thereby reduces genome-level diversity, is indeed true as our simulation illustrated, why does the current paradigm hold that sex promotes biodiversity? Should we just challenge the paradigm based on this foreseeable concept/result? Can more sophisticated stimulations incorporating many other factors (such as selection, recombination rate) demonstrate results that can challenge the first principle? Clearly, many researchers are used to forgetting the big picture of science. They just cannot see past parts characterization.

2. Follow the paradoxes According to current concepts, sexual reproduction generates genetic diversity through meiosis. Here is one paradox: why are good combinations of genes broken up by sex every generation? This paradox plays an important role in the departure from relying on the gene and its combinations as a main function of sex. It makes sense according to the genome package point of view.

294

5. WHY SEX? GENOME REINTERPRETATION DETHRONES THE QUEEN

The main function of meiosis is to ensure the integrity of the genome (initial function of meiosis). The breakdown of gene combinations will not impact the genome framework even if certain combinations contribute to overall genome instability and subsequent genome-level changes because most altered genomes will be eliminated by sex. In addition, “harmful” combinations will soon be eliminated by recombination. It is likely that the constant creation of different gene combinations represents a secondary function of meiosis (Wilkins and Holliday, 2009), which likely has much less impact on genetics if most of its genetic information is fuzzy (Chapter 4). The integrity of the genome in the long run is not dependent on certain gene combinations. The evolutionary function of creating/breaking down gene combinations contributes to the evolvability and the robustness of the entire species. There are at least two obvious advantages to constantly creating new gene combinations: even though the breakdown of the selected successful combinations in a given environment will likely slow down the proliferation of the population (most of the new combinations are less fit compared with the selected most fit combinations), it presents a major advantage come crisis conditions when environmental conditions are drastically changing to those that current advantageous combinations no longer fit. Some unique combinations could be crucial to a species’ survival even though their generation seem wasteful under normal conditions (much like the presence of a large military during peaceful times). Interestingly, these “heroic” combinations will not last long as no fixed combination will retain its advantage over the long term because evolutionary selection forces are constantly changing in different environments. Gene-level changes will continuedthis is also the key reason why there is no genetic accumulation during the long evolutionary process. One can imagine that without a large pool of diversity, currently successful species might easily be eliminated under crises. Therefore, the slow proliferation and energy demands associated with diversity are a price paid to remain ready for potential crisis conditions. For asexual species, the conflict between growth under normal conditions and survival under crisis conditions is solved by the balance between clonal and nonclonal populations (as we illustrated in the cancer model in Chapter 3, the relationship between clonal chromosome aberrations and nonclonal chromosome aberrations). In sexual species, this conflict is beautifully solved by gene combinationemediated dynamics and sex-mediated genome constraint. At this point, a new tango will start to provide change for short-term survival, especially under crisis conditions and constraint for the long-term perpetuation of the species. Before the existence of complex genomes and sex, the efficiency of gene duplication determined the evolutionary winners. However, when the sexegenome relationship became established, the game changed:

5.8 LESSONS LEARNED

295

efficiency not only determines the winner but also the long-term survival of the species. If you have enough capital, you can afford to buy more insurance. When you do not use the insurance, it may seem like a waste to buy it, but in those crisis situations, it matters, even though you might never want to use it. This explains the investment in diversity by organisms. Efficiency becomes less important because despite having a high level of efficiency, a species without the insurance of diversity can be totally wiped out. The price for diversity is costly, but organisms will pay for it. Imagine that there are 100 ponds, each of which has one dominant gene combination that can neutralize different factors such as parasites, high temperature, dry resistance, or cold weather. Usually, those diverse ponds are not equally robust as many of them will grow slower. The battle, however, is not just who grows the fastest, but who will survive and last as a species. Under dramatically changed environments (crises), the majority of species from most ponds will be eliminated, but as long as individuals in one pond with a specific gene combination survive, then the genome survives. Soon, the surviving individuals, all with identical genome frameworks, will repopulate the 100 ponds. Then, with gene recombination breakdown over time, the 100 ponds will again develop a variety of gene combinations in preparation for the next crisis. Although they are important over the short term under crises, gene mutations and gene combinations come and go. The genome of a given species lasts. Individuals with different gene combinations are important for species’ survival under crisis, but no specific genes or combinations are important to the long-term sustainability of the species. That is achieved by the maintenance of system inheritance by sustaining the genome through sex. This is similar to the importance of a university versus any specific faculty. Faculty members are important for a university to flourish but individuals will come and go over hundreds of years of history. The university survives, whereas individual faculty members do not. In a sense, for most sexually reproducing eukaryotes, it is the selfish genome that matters, not the selfish gene. If every small change is accumulated as predicted by neo-Darwinism, then the wasteful aspects of this process could be a major problem. However, if these small changes are not preserved in the long term, then the breakdown of any beneficial combinations does not matter. The fact that the breakdown of gene combinations exists widely also indicates that the main function of meiosis is not to preserve good gene mutations or their combinations. Clearly, the concepts of evolution need to be changed. Unexpectedly, the gene and genome play drastically different roles in evolution and conservation functions than previously imagined.

296

5. WHY SEX? GENOME REINTERPRETATION DETHRONES THE QUEEN

3. Respect the facts First, when there is a fundamental conflict between our favorite concepts and key facts, the facts must always win. Furthermore, when facts disagree with the key predictions of a theory, we must hold the theory responsible. Second, facts are not only those that most researchers are familiar with but also those that most people have ignored. For example, if inheritance is fuzzy, then any phenotype within the full spectrum can be realized, and it is difficult to define which end of the spectrum is positive or negative when we collect the data. Often, statistical analyses can give us the comfort to be intellectually lazy. We can easily say that “it is statistically correct to ignore those data which does not make sense,” not realizing that it is these “outliers” that matter the most in biological evolution. Clearly, we need to change our practice when collecting the data. For cancer research, collecting heterogeneous data is the key, and for evolutionary studies, collecting data from dynamic and constraint is the key. As for studying the function of sex, collecting data in favor and against increased genetic diversity is long overdue. From time to time, there have been solid reports that challenged the current concept that sex promotes genetic diversitydfar earlier than our papers (Heng, 2007b; Gorelick and Heng, 2011). Recently, for example, Root Gorelick has studied several century-old publications showing that asexual populations/individuals have at least as much heritable variation, if not more variation, as sexual populations/individuals. Strangely, although these papers were published by top-notch authors in top-notch journals (such as Science and American Naturalist), they seem to have no citations over the last century (Gorelick, personal communication). These important data have obviously been considered to be negative data by most. For us, knowing these data existed certainly would have increased our confidence when first presenting our concepts. Of course, the key limitation that blocks people from “seeing” is their own paradigm. Carefully analyzing data invisible to most perhaps is an effective way to identify key paradoxes and define questions related to the first principle.

4. Fill in knowledge gaps: Dare to think big Before establishing the genome constraint concept of the main function of sex, sex has been considered the “queen problem.” Traditionally, sex has been considered the main source of genetic variation, as meiosis can amplify variations from gene mutations by creating unlimited combinations. Now, with our drastically new interpretation of the main function of sex, the evolutionary importance of sex becomes remarkably obvious. It holds the key to understanding the relationship between micro- and

5.8 LESSONS LEARNED

297

macroevolution between the gene and genome. It can contribute to the unified theory of evolution. It also illustrates the basis of the biological concept of a species and solves the paradox of short-term change but long-term preservation. Obviously, it plays an important role when rethinking evolutionary theory. When we started our research on cancer evolution nearly 20 years ago, we never anticipated that we would discuss the function of sex and the genome-based macroevolutionary concept. But we dare to think big. We must think big, for it is our duty.

C H A P T E R

6

Breaking the Genome Constraint: The Mechanism of Macroevolution 6.1 SUMMARY Current evolutionary theories have largely ignored the vital role of genome changes in macroevolution. Historically, macroevolution has been viewed as the result of microevolution accumulated over a long period of time; thus, the only difference between micro- and macroevolution is the time frame of the process. The discovery of the two phases of somatic evolution and the newly realized main function of sex, however, have forcefully challenged the long-assumed relationship between microand macroevolution. In this chapter, by comparing gene dynamics and genome constraints, somatic dynamics and germline constraints, individual trait dynamics and system identity constraints, and in particular, natural selection and artificial selection, the limitations of natural selection in evolution are discussed. Such limitations portray one of the greatest quandaries in evolutionary biology: a conflict between the shortterm dynamics of adaptation and the long-term constraints of system stasis. Moreover, different genomic mechanisms (genome reorganization and gene changes) are responsible for macro- and microevolution, respectively. This realization drastically decreases the power of using natural selection to explain macroevolution, especially speciation. In other words, microevolution-mediated adaptation fundamentally differs from macroevolution-mediated speciation, and these two types of evolution (within and above the species level, respectively) are only explainable by a new genomic mechanism, not any allotted quantity of time. Based on the above syntheses, a new macroevolutionary model for

Genome Chaos https://doi.org/10.1016/B978-0-12-813635-5.00006-9

299

Copyright © 2019 Elsevier Inc. All rights reserved.

300

6. BREAKING THE GENOME CONSTRAINT

speciation has been proposed, departing from the Darwinian evolutionary model. Diverse evidence is presented to support this new model of evolution, which involves initial speciation by rapid genome reorganization followed by long-term microevolution for population growth and maintenance.

6.2 PATTERN OF CELLULAR EVOLUTION CHALLENGES CURRENT EVOLUTIONARY THEORY 6.2.1 Simple Evolutionary Principles Are No Longer Simple The dimension of time in evolution is a key aspect of 4D genomics that brings the genome into the context of real life. The depiction of the genome in real time not only provides an opportunity to critically examine our scientific concepts but also guides us in designing better experiments. Although it has long been appreciated that time is a key factor in evolution, genome theoryebased 4D genomics represents a new perspective to study the geneegenome relationship, both within the individual and between generations. Unfortunately, time has also often been used to explain gaps in our reasoning. For example, time has been used as a convenient way to justify some unrealistic inferences in evolutionary thinking to bridge gaps between micro- and macroevolutions. Because large-scale evolutionary events occur over a much longer period of time than a researcher’s life, we were told that inference needs to be used frequently to explain knowledge disparities in the field of evolution. The common mistake of such inference is generalizing artificial or exceptional experimental findings and directly applying them to nature, without considering the possibility that there is a fundamental difference between nature and our experimental settings. Furthermore, many important paradoxes have been ignored based on the power of time, as if when given enough time for the effect of accumulation, isolated cases will become universal, small quantitative changes will alter the entire system, gene-level alterations will become genome-level innovation, and artificial possibility will certainly become the reality of nature. As a result, despite overwhelming acceptance within the scientific community, the correctness of current evolutionary theory is still dependent on the correctness of time-related assumptions. For example, one core assumption is about the relationship between micro- and macroevolution. To many evolutionary biologists, such relationship can simply be expressed as Macroevolution ¼ Microevolution þ Time

6.2 PATTERN OF CELLULAR EVOLUTION CHALLENGES

301

According to the common viewpoint, macroevolution (the origin and evolution of higher taxa) proceeds via essentially the same mechanisms as microevolution (that is, natural selection, sexual selection, and genetic drift) (MacNeill, 2011), and time is the key that connects adaptation within species to the origin of species or speciation. Time carries the power for using inference to fill the gap of logic or lack of direct observations. Who can argue with time? To the nonscientific community, however, such assumption clearly cannot compete with people’s beliefs. Despite that microevolution has gained acceptance by different groups including the Roman Catholic Church, the gap between micro- and macroevolution is huge. For example, many questions remain the same following 160 years since Darwin’s publication. Why is dog considered dog despite the existence of many different breeds? Why is there no evidence of a half humanehalf monkey? On the surface, it is hard to comprehend why all the molecular DNA data have failed to convince the general public. Deep down, the scientific community has actually not addressed these questions in a satisfactory way. Sure, ample DNA evidence has demonstrated that all species are linked, but by what mechanism? Why is it so that the success of using natural selection to explain microevolution cannot be convincingly extended into macroevolution? Can the concept that continuous and stepwise evolution leads to speciation be falsifiable? How can the model of punctuated equilibrium be incorporated into phyletic gradualism? What if micro- and macroevolution are not simply linked by time but by different mechanisms? Is there any biosystem that can be used to directly examine the proposed relationship between micro- and macroevolution? Clearly, much more convincing frameworks are needed to further advance evolutionary theory in the 21st century, which will possibly also resolve the controversy between neo-Darwinism and other alternative Darwinism. First, we need to reexamine the rationale and arguments Darwin used when he originally formulated the idea of natural selection. When applying Thomas Malthus’s population theory to explain how living things evolve from common ancestors, Darwin stood on four key cornerstones: (1) all individuals in a species vary slightly, (2) assuming overpopulation, only the fittest of these can survive to reproduce, (3) just like artificial selection, the natural selection process should create and maintain variety in the wild, and (4) given enough time and a changing environment, selected varieties would gradually become separate species (Larson, 2002), among which time is the key factor for solving his mystery of mysteries. Darwin treated the conflict between his prediction and the geological record very seriously, realizing that these unexplainable missing

302

6. BREAKING THE GENOME CONSTRAINT

intermediate links would represent the biggest challenge to his idea of natural selection. He wrote: Why then is not every geological formation and every stratum full of such intermediate links? Geology assuredly does not reveal any such finely graduated organic chain; and this, perhaps, is the most obvious and gravest objection which can be urged against my theory. The explanation lies, as I believe, in the extreme imperfection of the geological record (1902 p.251).

Most evolutionists agree with Darwin’s explanation. After all, many missing links have been found. However, the key is not about if there are identifiable missing links. To truly support Darwin’s concept of stepwise natural selection, the geological record must be “full of such intermediate link,” according to his prediction. Furthermore, it does not make sense if the extreme imperfection of the geological record only unfavored these intermediate links, unless the number of intermediate links was very low (which is inconsistent with the original Darwinian concept or neo-Darwinian population concept). Interestingly, however, this is the exact prediction of our punctuated model (see Section 6.7.1 and Fig. 6.2). A simple and better alternative explanation is that there is no such series of slightly altered intermediate links between two species. In other words, the mechanism of natural selection Darwin proposed may not be able to explain speciation. For most biologists, the obvious problems of Darwin’s explanations are ignored. They are convinced that it would just be a matter of time before these intermediate forms would be identified. During an earlier stage of evolutionary studies, it was easy to imagine observing these intermediates within genera and families, but transmutation between higher categories was harder to envision. It turns out that many transitions between higher categories have been found but large numbers of interspecies transitions are still hard to demonstrate. Different opinions have been offered as explanations. According to Ernst Mayr, As we now know, there never was a ’taxic discontinuity,’ because the two species were connected with their common ancestor by a continuous series of intermediate populations. Mayr, 2001

While Mayr’s viewpoint seems to explain the difficulty Darwin faced, it also challenges Darwin’s key prediction. Furthermore, this explanation has important conceptual limitations. How can we explain that all species are linked despite the obvious discontinuity? Where should we draw lines between species among all of these continuous series of intermediate populations? Why is classification of most different extant species of animals and plants more obvious? Where is the point of no return for

6.2 PATTERN OF CELLULAR EVOLUTION CHALLENGES

303

establishing the reproductive barrier among a continuous series of intermediate populations? Clearly, after putting many explanations that support natural selection together, something important is missing. While it is a beautiful idea that continuous series of intermediate populations can become different species given enough time, such a natural process must be validated by facts. The journey in search of answers to solve these contradictions was started unintentionally during cancer research started in late 1990s. As described in Chapter 3, the initial effort was using existing evolutionary theory to explain the cancer data. It was soon realized that the experimental findings in cancer evolution actually challenged the current thinking of the mechanism of evolution. Many popular concepts in cancer research such as the sequential accumulation of gene mutations leading to cancer are in fact examples of interpretations being influenced by traditional ideas that microevolution over time leads to macroevolution. However, in reality this is not the case illustrated by both experiments and clinical observations. At first it was felt that there had to be something wrong with the in vitro system where there is no constant clonal expansion detected across the entire process of the cellular evolution (the accumulation of small changes over time). Following the demonstration that different in vitro and in vivo models display the similar pattern of punctuated evolution, the concern became that cellular evolution is not relevant to organismal evolution, this concern is reflected in the general comment from evolutionary biologists that cancer is weird. Furthermore, it is not trivial at all to question our own evolutionary knowledge. Later, after successful reexamination of the main function of sexual reproduction, the confidence of using genome theory to reexamine paradoxes in biology soared. Now, if the century-long “queen question” could be answered by this new genome-centered conceptual framework, why couldn’t this framework also explain the general mechanism of evolution? In particular, it is obvious to us that the power of cancer research could provide key information not available from other biosystems studied so far. Realizing that cellular evolution data challenged current cancer theories was one thing, but the further realization that current mainstream evolutionary thinking regarding micro/macroevolution was heading off on the wrong tangent was a totally different thing and was pretty shocking and frightening. Nevertheless, the search for the new mechanism of macroevolution for organisms has become our key interest in the past two decades, albeit most of our publications are focused on cancer evolution.

304

6. BREAKING THE GENOME CONSTRAINT

6.2.2 Why the Cancer Model Is an Excellent Platform for Studying Evolution in General As illustrated in Chapters 3 and 4, watching evolution in action experiments has provided a unique window to study the relationship between various genomic elements, phenotypes, a large number of diverse environmental factors, the pattern of evolution, and time. It has been challenging to study the above relationship in organisms, especially when it comes to macroevolution. The following examples illustrate the advantages of using cancer evolutionary models to study key features of cellular evolution, many of which are applicable for understanding organismal evolution. 1. Correlating phenotype and genome type: linking immortalization, metastasis, and drug resistance to karyotype changes (time scale: from a few weeks to several years, using in vitro and in vivo models); linking cellular proliferation to gene mutations; and linking differentiation to epigenetic changes (time scale: from hours to weeks). Those studies provide a unique window of observation into the workings of evolution in bio-lineages. For example, how do biosystems achieve long-term and large-scale changes? Does this occur by an accumulation of small genetic changes over time or by punctuated mechanisms with sudden massive changes, or perhaps the combination of both? 2. Linking internal genome instability and environmental stresses to change the speed of cancer evolution. The quantitative contribution of key factors can be compared by the timeline of cellular transformation and size of tumors or status of the metastasis. More importantly, extremely high stress conditions can be applied in various cancer models, which can separate natural selection (for slow adaptation over time) and instant survival by genome reorganization (for immediate survival under crisis). Such studies have illustrated new mechanisms underlying key transitions of cancer evolution including genome chaoseled drug resistance (environmental factors induce genome chaos; genome chaos creates new systems that can survive in such environments). When combined with the concepts of adaptive landscape and survival landscape (Heng et al., 2011b; Heng, 2015), such studies will likely support the idea that extreme environmental conditions can play a major role in organismal macroevolution (e.g., mass extinction), which can also offer support to the idea that “. the history of life has been more affected by catastrophe than the sum of other factors, including the slow, gradual evolution first recognized by Charles Darwinebased on his training by the dominant teachers of uniformitarianism” (Ward and Kirschvink, 2015).

6.2 PATTERN OF CELLULAR EVOLUTION CHALLENGES

305

3. Comparing multiple runs of independent cancer evolution (immortalization or drug resistance) and each end product (immortalized cell population or drug-resistant clones) can be achieved by observing different karyotypes followed by combinations of different genes/pathways. The initial genomic and environmental conditions (i.e. the constraints that can either accelerate or slow down the evolutionary process) can be compared in parallel experiments. In contrast, organismal evolution, particularly a series of macroevolution that have shaped the “tree or network of life on earth,” is probably comprised of only one run of evolution. 4. Linking high-stress conditions and high genome instability to genome chaosemediated new system emergence. Monitoring the dynamics of outliers can provide insight for studying how overall heterogeneity and individual genome complexity influence the chance of survival. It is possible that cancer cells with a large number of chromosomes will have more materials for regenerating survivable genomes. Massive chromosomal reorganization in cancer cells, rather than gene transfer observed among bacteria, will likely be much more effective in increasing the likelihood of survival (i.e., through increased heterogeneity) under crisis. 5. Separating microcellular evolution and macrocellular evolution by introducing distinctive mechanisms. Specifically, linking karyotype alteration to macrocellular evolution and gene mutations to microcellular evolution. The relationship between these two types of evolution is thus not simply connected by time, but should be explained by the relationship between local (adaptive) and global (survival) landscapes. 6. Illustrating the importance of distinguishing the clonality of DNA marker (genomic parts) and genome system formation (different karyotypes can be formed using similar DNA parts). The “parts inheritance” (inheritance through which DNA materials is passed) differs from the “system inheritance” (inheritance to pass the blueprint). Two cells with similar gene profiles but different karyotypes represent different systems. 7. Illustrating the complex and highly dynamic patterns of coevolution. As cancer evolution involves different types of cells, evolutionary competition and collaboration are common and dynamic. For example, tumor cells can stimulate nearby normal cells to produce angiogenesis signaling molecules to promote the supply of blood to tumor cells; cancer cells can change the behavior of immune cells (which are supposed to kill cancer cells) such that they become helpers for cancer cells. Cancer cells even can fuse with immune cells and form new cancer cells. Such capability of

306

8.

9.

10.

11.

6. BREAKING THE GENOME CONSTRAINT

collaboration, coupled with high genotypic/phenotypic plasticity, allows cancer cells to become highly robust and resilient. This also favors the chance of success for some outliers. Displaying the capability of cancer cells to manipulate and break multiple levels of constraints. There are many environmental constraints, including tissue organization, immune cells, and the entire bodyemind system. For cancer evolution to be successful, cancer cells must not only break down each one of these by forming new genome systems but can also use the “constraint factors” in their favor to further promote cancer evolution. For example, the strategy of targeting the blood supply to constrain cancer with antiangiogenesis therapeutics can often promote the aggressive growth of cancer cells as a result; using the maximum tolerated dosage of chemotherapeutic treatment to achieve high initial tumor cell killing often also promotes rapid drug resistance. Both examples also illustrate that under crisis conditions when the environmental stress is so high that the vast majority of cells will be eliminated, such killing power, paradoxically, will promote the formation of new emergent genome systems following the induction of genome chaos. Such phenomenon is highly relevant to understanding organismal evolution under massive extinction conditions. Correlating different cancer types (liquid or solid, hereditary or sporadic) to different patterns of cancer evolution. Unlike blood cancer, most solid tumors are generated from small isolated cellular populations. Such difference impacts the population structure, the behaviors of the average versus outliers, and clinical prediction (including diagnostic strategy and treatment response). Offering opportunities to investigate the interaction between multiple levels of genomic and nongenomic contributions and multiple stages of cancer evolution with traceable phenotypes and population structure. Such analyses will not only illustrate the relationship among chromosome alterations, gene mutations, and epigenetic functions, but also validate the genomic mechanism of micro- and macroevolution. Providing good models for studying the interaction of individual cells with the same genome and with different genomes. Because a tumor often represents a collection of multiple types of cells, including different genomes, the same genome with different mutation profiles, and different epigenetic landscapes, how such interactions impact each group of cells and overall tumor evolution can provide information regarding species interaction (an important environmental factor in organismal evolution), the status of specific species, and that of larger ecosystem.

6.2 PATTERN OF CELLULAR EVOLUTION CHALLENGES

307

12. Offering the platform of both in vitro and in vivo experiments to compare the pattern of experimental manipulated artificial evolution and evolution in patients’ samples with similar time scales.

6.2.3 Similarities and Differences Between Somatic Cell Evolution and Natural Evolution With the many obvious advantages that cancer research has offered for studies of cellular evolution, and in particular, the many evolutionary insights it has revealed which are relevant to organismal evolution, and yet there are no other accessible model systems to address these difficult questions, one would think that cancer models must be frequently used to study organismal evolutionary theory. The situation is quite the opposite, however. Despite that increased research papers have applied the Darwinian principles to explain the mutational landscape of cancers and how cancer evolution progresses (especially in recent years following the cancer genome sequencing project), there are only very limited publications that have studied evolutionary concepts using cancer as a model system (Heng, 2007, 2009; Vincent, 2010; Gorelick and Heng, 2011). To many, somatic cell evolution in cancer seems to have little to do with organismal evolution. Traditionalists firmly believe that cancer is very different from general organismal evolution. They are neither sure nor willing to apply these surprising findings from cancer models to their evolutionary concepts. The often-referred reason why the majority of evolutionary biologists are not interested in cancer evolution is that cancer is weird. It fundamentally differs from other “normal” organisms, as illustrated by the following arguments: (1) The success of cancer evolution leads to the death of the host, the end of cancer evolution, whereas organismal evolution can continue through multiple generations. (2) Cancer evolution occurs within a limited time and space scale (each run of cancer evolution is within the life span of its host, it cannot be transmitted among different hosts, and there are neither any large-scale natural surroundings, e.g., mountains and rivers, nor competition from other species). (3) Somatic cell evolution is asexual while many eukaryotic organisms are sexual. (4) Organismal evolution might involve the “hopeful monster” concept, whereas cancer evolution illustrates the “hopeless monster.” Now, it appears that such ignorance comes from lack of knowledge and imagination. First, while it is very rare, there are different types of wellknown contagious cancers detected from dogs, wolves, coyotes and jackals (canine transmissible venereal tumor or CTVT), Tasmanian devils

308

6. BREAKING THE GENOME CONSTRAINT

(Tasmanian devil facial disease or TDFT), Syrian hamsters, clams, and mussels (Metzger et al., 2016). More strictly, some cancer types can jump across different species! There are also reports about cancer transmission among people. It was proposed that in the future, it is possible that some human cancers might be able to jump among individuals (Heng, 2015). Second, there are many artificial cellular species created by laboratory research. Many cancer cell lines have been passed around the world for many decades after its original hosts have been long gone. Because artificial selection has played a very important role in establishing evolutionary theory, the importance of artificial cellular species should not be ignored. Nevertheless, cancer is real and far beyond the cell culture model. Third, the field of evolution has traditionally ignored microorganisms. For example, Ernst Mayr’s biological concept of species is not applicable to species without sexual reproduction. Now, with the realization of the main function of sexual reproduction, which is to maintain the identity of genome, the suggestion of using genome similarity to define species warrants more attention. This requires including the cancer genome, an unavoidable biological phenomenon. Fourth, some well-known evolutionary thinkers including Julian Huxley and Leigh Van Valen have proposed the similar idea of considering cancer as a new species. Huxley was one of the key architects of the new evolutionary synthesis. In his article “Cancer biology: comparative and genetic” (Huxley, 1956), he stated that “all autonomous neoplasms can be regarded as the equivalent of new biological species.” Leigh Van Valen, the originator of the Red Queen hypothesis of sex, went even further by suggesting that HeLa cells (one of the most well-known cancer cell lines that was derived from cervical cancer cells) be redefined as a new species called Helacyton gartleri (Van Valen and Mairorana, 1991). For more detailed discussions, see Debating Cancer, Chapter 7: Do different cancers represent different species? (Heng, 2015). But perhaps, the most important fact is that no matter how bizarre cancers are, they are definitely biological systems and should follow the law of evolution. When the patterns of cancer evolution challenge current evolutionary concepts, we must not simply disqualify data from cancer evolution because of our ignorance and insensibility, especially when the new facts conflict with some generally accepted assumptions. Equally important, studies of somatic cell evolution and organismal evolution are in fact sharing the same goal: to illustrate the evolving patterns and mechanisms of biological lineages, and to understand the way in which systems evolve (even though many evolutionary biologists are interested in the heritability of traits of individuals across populations while cancer biologists focus on the genetic traits within somatic cell

6.2 PATTERN OF CELLULAR EVOLUTION CHALLENGES

309

populations). At a fundamental level, the key elements of bio-evolution are the same: competition for fitness, inheritable variables, and system evolution. If one accepts a system viewpoint and considers each individual in organismal evolution and each individual cell in cancer evolution as a unique system, then common evolutionary patterns and mechanisms clearly emerge. They all display evolutionary dynamics and constraint even though the specific trigger factors and molecular mechanisms differ. It is rather surprising to notice that throughout the history of contemporary evolutionary research, one big wish for many researchers is to be able to directly observe micro/macroevolutionary transition, to confirm the obvious assumption that the accumulation of microevolution over time will lead to macroevolution. Now, such a system is in front of us, but few are interested in using it to examine many assumptions because it generates data which do not support current concepts, even though these surprises likely will reveal the mystery of mysteries of evolution. It is thus interesting to ponder what the reactions of Huxley and Van Valen would be to the newly described punctuated genome replacement (macroevolution) as a key event in cancer evolution. Huxley’s response would be particularly intriguing as his belief that evolution occurred by small steps and not by saltation was well-known. By the way, like many evolutionists prior to the gene era, he believed that what really mattered in evolutionary studies were genetic changes at the chromosome level. If Van Valen knew that HeLa cultures were a mixture of cells with different karyotypes where different cultures might display drastically different clonal karyotypes under different selection conditions, would he still have used one name for all HeLa cells (Heng, 2013b)? Furthermore, imagine if the cancer evolution pattern fully supported natural selection. Would the majority of researchers still ignore the cancer model for evolutionary study?

6.2.4 The Conflict Between Observations From Somatic Cell Evolution and Neo-Darwinian Concepts If somatic cell evolution is comparable with organismal evolution, then these new data from cancer research will not only challenge traditional clonal evolutionary concepts of somatic cell evolution but also question the basic framework of current evolutionary thinking, as somatic cell evolutionary concepts are mainly derived from neo-Darwinian theories. Clearly, the patterns of somatic cell evolution that have been observed cannot be explained by neo-Darwinian theory. As mentioned in Chapter 3 and illustrated in Fig. 6.1, the co-existence of two phases of evolution really challenges key assumptions regarding the mechanism of evolution and in particular the relationship between macro- and microevolution (Heng et al., 2011a).

310

6. BREAKING THE GENOME CONSTRAINT

FIGURE 6.1 Description of the two phases (macro and micro) of evolution. Within the punctuated phase (macro), genome constraint is low, and there are high levels of genome replacement. No system inheritance occurs across the different genomes (represented by different shapes). Within the stepwise phase, the genome is highly constrained, and stable genomes are maintained (represented by the same shape). Although the two-phase cycle is easy to observe in somatic cell evolution, it is a rare event among sexual eukaryotes because of the function of the sexual filter. It is likely that the punctuated phase is frequently associated with crisis stages during the history of evolution. Please note that the time duration of the stepwise phase is usually much longer than the punctuated phase. Reproduced from Heng et al., 2011a (with permission from Elsevier).

The rationale of using macrocellular and microcellular evolution is to follow the terminologies used in organismal evolutionary study where macroevolution refers to major evolutionary changes, such as evolution of groups larger than an individual species. In a clear majority of animals and plants, each types of organism often displays its unique karyotypes. To name the cellular evolutionary phase as macro, where the karyotype is constantly changing, seems appropriate. Furthermore, the phenotype changes associated with these phase transitions are drastic, from normal cells to immortalized cells and from noninvasive cells to highly invasive cells. In contrast, the microcellular phase refers to lower levels of genomic changes within a fixed karyotype (e.g., gene and epigene), and the degree of phenotypic change is much more moderate. Interestingly, following years of synthesis, it is logical to suggest that the relationship between macro- and microevolution in somatic cells should apply to organismal evolution as well. Initial efforts to push the concept of macro/microevolutionary transition in cancer proved to be difficult. Most molecular researchers who study specific pathways or cancer genome sequence have no appreciation of chromosomal-level alterations. As a result, few have realized the value of karyotype evolution. Following over 5 years of efforts to introduce the concept of system inheritance, to link nonclonal chromosome aberration (NCCAs) to transcriptome and evolutionary potential, to separate genemediated adaptive landscape and genome-mediated survival

6.2 PATTERN OF CELLULAR EVOLUTION CHALLENGES

311

landscape, and to discuss the importance of the evolutionary mechanism of cancer, the tide has finally started to turn. Recent cancer genome sequencing efforts, and especially single cell sequencing, have forcefully supported the concept of the two phases of cancer evolution. It was demonstrated that even at the DNA level, there is a punctuated phase of cancer evolution (Navin et al., 2011). Such a phase has also been observed from different cancer types, and has been referred to as “big bang” (Sottoriva et al., 2015). Moreover, when different parts of the same tumor are sequenced, the clonality is generally limited for most solid tumors. As a result, the concept of “macroevolution in cancer” is gaining acceptance (Gerlinger et al., 2014; Klein, 2013), despite the initial resistance that arose when we illustrated it (Heng et al., 2006). Heng, 2017a, with permission from Elsevier

Now, despite the fact that much more attention needs to be paid to genome alteration rather than individual gene mutation, genome chaosemediated cancer macroevolution has become a common feature of most cancer types, and news articles such as “Researchers reveal massive genome chaos in breast cancer,” “Sarcoma-associated gene fusions a result of ‘genome chaos’,” and “Giant leaps of evolution make cancer cells deadly” are frequently published. With the success of gaining support from the cancer research community to accept macrocellular evolution as the common driver in cancer, the time seems ripe to ask the important question: given the fact that cancer macroevolution displays no stepwise accumulation of small changes over time, is cancer evolution Darwinian? This sensitive question was raised as soon as the two phases of cancer evolution were introduced (Heng et al., 2006a-c). There has since been no good response from the research community, even though this question has been repeatedly asked during these many years in conference talks and publications (Heng, 2007a, 2009; Heng et al., 2011a,b; 2013a,b). Again, when massive genome reorganization was confirmed by cancer genome sequencing, which is difficult to explain on the basis of Darwinian selection, the attitude started to change. Recently, non-Darwinian and non-Darwinian evolution dynamics was used to explain therapy-induced cancer drug resistance (Pisco et al., 2013; Heng, 2007c, 2016a). More directly, by analyzing hundreds of samples from the same hepatocellular carcinoma tumor, it was concluded that clonal diversity agreed well with the nonDarwinian model with no evidence of positive Darwinian selection. In contrast, genetic diversity under a Darwinian model would generally be orders of magnitude smaller (Ling et al., 2015). Given the fact that some authors are well-known evolutionary biologists, their conclusion is highly significant. Knowing the punctuated phase in cancer, the crucial involvement of fuzzy inheritance in genomics, and the difference between macro- and microevolution (Heng, 2016a), especially with the accumulating sequencing data that support the unique pattern of cancer evolution (despite the fact that different terms have been used, e.g., big bang, punctuated, and major shifts in evolutionary trajectories, the essential message of macroevolution is the same), this conclusion is not surprising at all. Immediate research is urgently needed to examine this issue. Heng, 2017a, with permission from Elsevier

312

6. BREAKING THE GENOME CONSTRAINT

A decade later, more publications have pointed out the inconsistency of cancer genomic data and current evolutionary theory. Examples include that cancer evolution is mainly driven by neutral evolution; both positive and negative selections are missing (Williams et al., 2016; Bakhoum and Landau, 2017); copy number variations are punctuated bursts in the earliest stage of tumor evolution rather than gradual accumulation (Gao et al., 2016); massive mutations are orders of magnitude larger than predicted by a Darwinian model (Ling et al., 2015). Some non-Darwinian explanations were offered including saltationist theory, which is directly against the continuous and gradual Darwinian model (Markowetz 2016). Meanwhile, cancer researchers are more comfortable with discussing the controversial issues involving evolutionary theory, partially because of the fact that increased evolutionary biologists have joined cancer research teams. They have stated that punctuated bursts of gene mutation or copy number variations challenges the Darwinian principles of both “gradual accumulation” and “continuous selection” (Markowetz, 2016). Similarly, different patterns of tumor evolution (linear, branching, neutral, or punctuated or the combination of them) have been discussed (Davis et al., 2017), and punctuated equilibrium in cancer has been declared as a new paradigm in clonal evolution (Cross et al., 2016). Compared with the responses of the initial reports to the two phases of cancer evolution and follow-up discussions (Heng et al. 2006a-c, 2011a-b, 2013a-b; Heng, 2007a, 2009), the attitude of these top journals certainly has changed. Unfortunately, however, the much-needed appreciation of the importance of karyotypic dynamics in cancer evolution is still missing. Despite the overwhelming karyotype changes detected, and it is known now that chromosomal coding is important for cancer evolution, most of these evolutionary models are only based on gene profiles. Such practice has generated many important confusions, which can be solved by using the genome theory. For example, based on the genome theory, the gene mutationemediated microevolution phase differs from karyotype alterationemediated macroevolution. Within the punctuated phase, selection acts on different karyotypes (system) rather than genetic parts (gene or copy number). While neutral evolution can be detected at the gene mutation level, it should not be neutral at the karyotype level. It is obvious that the same gene mutation or copy number profiles can form different genomes. Monitoring only gene-level profiles misses the whole point of studying cancer macroevolution! Slowly but surely, more researchers have now come to appreciate the difference between two types of genomic landscapes (local and global) and their role in cellular adaptation and survival (Heng et al., 2011; Huang, 2013). Fortunately, a group of cancer researchers have kept chromosomal-based strategies going for decades, and recently, such research has become more popular with the help of yeast molecular geneticists (see Chapters 3 and 4). But such slow change is not good enough given the current confusions

6.2 PATTERN OF CELLULAR EVOLUTION CHALLENGES

313

from the cancer genome project. Until there is a general acceptance of the genome theory, cancer research will not move into a new era.

6.2.5 Time to Compare/Reexamine Evolutionary Theories There are six components to the current evolutionary framework: evolution (species undergoes genetic changes over time), gradualism, speciation, common ancestry, natural selection, and processes other than natural selection that cause evolutionary changes (Coyne 2009) (Table 6.1). But fundamentally, neo-Darwinian evolution can be summarized by two essential hypotheses: common descendants and gradual improvement. The correctness of these two assumptions defines the significance of current evolutionary concepts. Now, let us examine these two primary assumptions through the lens of the genome theory of somatic cell evolution. Common descendants are an important component but also one that is not unique to the neoDarwinian theory as many nontypical neo-Darwinism theories (punctuated equilibrium, neutral or near-neutral theory, self-organization/ complexity theory, and endosymbiosis) and even intelligent design feature the same element. The common feature of DNA (like a coding system as well as conservation of genes and larger genomic fragments) among diverse species strongly supports common descendants. The observation of a punctuated phase in cancer further demonstrates that drastically different genomes indeed share a common ancestry of normal cells with standard karyotypes. This observation is important as there are only cells with normal karyotypes present before somatic cell evolution within the experimental system. Hence, we know that these different genomes are the result of genome evolution stemming from a common descendant. This provides direct evidence that drastic genome evolution or macroevolution does happen. As to the point of gradualism, our somatic evolution data suggest a very different pattern (Chapters 3 and 4; Fig. 6.1). Neo-Darwinism insists that micro- and macroevolution are the same processes, as macroevolution is nothing more than microevolution occurring over a long period of time. In sharp contrast, during somatic cell evolution in cancer, the macroand microevolutionary phases are clearly different, and macroevolution cannot be achieved through accumulated microevolution because they involve different genomic mechanisms. This difference is highly significant. According to Darwin, his main credit was not in proposing the idea of evolution, but in proposing natural selection as the key mechanism of how evolution works, and that the soul of natural selection is the stepwise accumulation of useful variations. As he precisely stated: Natural selection acts solely by accumulating slight, successive, favorable variations, it can produce no great or sudden modification; it can act only by very short steps.

314

6. BREAKING THE GENOME CONSTRAINT

TABLE 6.1 Summary of the Primary Differences Between Current Darwinian Evolutionary Theory and Genome-Based Evolutionary Theory. In Addition to the Six Components of the Current Evolutionary Framework (Coyne, 2009), the Main Function of Sex and the Limitations of Some Specific Mechanisms of Evolution Have Been Compared. Key Component of Evolution Evolution

Gradualism

Current Evolution Theory

Genome Theory

Species undergo genetic (DNA) change over time.

Two-stage process each with distinct mechanisms, e.g., mechanisms of microadaptation and speciation are different.

New species evolve into different types over many generations (adaptation by selection leading to origin of species).

New species rapidly emerge by genome reorganization/ successful mating followed by long-term stasis of microevolution.

Essential, as the evolution of new features occurs gradually by the accumulation of small changes.

No need for genome reorganizatione mediated macroevolution, but useful for those organisms that display directional selection. Microadaptation might gradually accumulate over a short period of time, but those accumulated changes are often canceled, over a subsequent period of time.

Speciation

A result of the accumulation of small genetic changes over time. Initiated by speciation genes.

Reproductive isolation interrupts gene flow. Geographic isolation is often required.

Common ancestry

Species split from parental species through gene change.

Results from the formation of individuals with similarly altered genomes. This alteration can produce a new genome and lead to genome changeemediated reproductive isolation. Geographic isolation increases the probability of success of the new species by increasing the chance for mating and reducing competition from parental species. Sexually reproducing eukaryotic species are primarily split by genome reorganization, which establishes key reproductive isolation.

6.2 PATTERN OF CELLULAR EVOLUTION CHALLENGES

315

TABLE 6.1 Summary of the Primary Differences Between Current Darwinian Evolutionary Theory and Genome-Based Evolutionary Theory. In Addition to the Six Components of the Current Evolutionary Framework (Coyne, 2009), the Main Function of Sex and the Limitations of Some Specific Mechanisms of Evolution Have Been Compared.dcont’d Key Component of Evolution

Current Evolution Theory

Genome Theory

Natural selection (NS)

Primary mechanism of adaptive evolution.

Primarily involved in microevolution, but not necessarily by the accumulation of change over long periods. Not the mechanism of macroevolution.

Evolutionary processes other than NS

Plays a minor role.

Punctuated genome reorganization plays a major role in macroevolution

Main function of sex

Provides genetic diversity.

Preserves the genome in sexually reproducing eukaryotes by eliminating large alterations while providing diversity at the gene level.

Allows sexual populations to evolve more rapidly.

Slows macroevolution.

Different types of evolutionary selection forces throughout history

Natural selection plays the major role.

Different types dominate at different historical stages (gene accumulation, endosymbiosis, horizontal gene transfer, genome shattering). The selection force in nature might be different within species and among species.

Limitations of evolutionary mechanisms

Not clearly defined.

Other mechanisms are of importance (gene mutations come and go; fuzzy inheritance generates variants without changing the genomic landscape).

There is no real difference in the mechanism underlying artificial and natural selection

NS does not fully apply to other levels with new constraints or laws. Artificial selection is fundamentally different from NS.

316

6. BREAKING THE GENOME CONSTRAINT

The solid observation of two phases of cancer evolution suggested that it is necessary to reexamine natural selection. As mentioned in the beginning of this chapter, Darwin proposed natural selection based on four key elements (Larson, 2002). Elements 1 and 2 (individual variation and the selection based on fitness) are rather solid. Let us reexamine elements 3 (natural selection acts similarly to artificial selection) and 4 (the accumulation of selected, gradual changes over time leads to speciation).

6.3 ARTIFICIAL SELECTION AND NATURAL SELECTION ARE FUNDAMENTALLY DIFFERENT What is the difference between artificial selection and natural selection? When this question was posed to evolutionary biologists, many stated that natural selection is more complicated with multiple dimensions and is much slower and much more difficult to study. When further asked, if the data on artificial selection serve as the basis for Darwin’s scientific inferring and ultimately serve as the basis for all evolutionary framework, does this reduce confidence regarding the mechanism of evolution, including how macroevolution works, this question is often met with silence. Some seem surer than others. There’s really only one difference between artificial and natural selection. In artificial selection, it is the breeder rather than nature who sorts out which variants are ‘good’ and ‘bad’ Coyne, 2009 It was brilliant. He (Darwin) took something very familiar and comfortable, for example, animal breeding, and explained that the same sort of thing was going on in nature, just at a little bit different pace and with no human guide. Carroll, 2009

However, as will be illustrated, artificial selection and natural selection are very different for sorting the target and the relative meaning of “good” and “bad.” In fact, the confusion and mixed usage between artificial selection and natural selection represents one key flaw in traditional evolutionary thinking. Natural selection does not have an “eye” or a “desire,” but a breeder does. This distinction changes the whole game. First, artificial selection represents a selection force with a defined direction of selection. For example, in dog breeding, the selection criteria have been constant for many generations. As a result, the cumulative effect is often quite obvious. In natural selection, however, the direction of selection is ever changing over long periods of time, swinging in multiple directions in response to environmental adaptation. A few decades of specific conditions can be followed by different and often opposite

6.3 ARTIFICIAL SELECTION AND NATURAL SELECTION

317

conditions for an equal length of time. In nature, there is usually no fixed accumulation effect of one direction over the long term (that might be the reason why among 250 carefully analyzed fossil sequences, only 5% of them was directional evolution and the remaining 95% of sequences were random walk and stasis) (Hunt, 2007). In contrast, the majority of artificial selection displays directional evolution. In addition, artificial selection is often carried out within controlled environments different from conditions in the real world with its constantly changing genomeeenvironment dynamics. Such variables also interfere with the results even when the selection force is constant. Thus, using observations from artificial selection to infer natural selection in terms of accumulative effects can be misleading. Yet, the accumulation of changes observed during artificial selection has been the key assumption extrapolated to explain how natural selection works and in particular has been used to explain how microevolution leads to macroevolution. Second, artificial selection deals with specific features or traits (body weight, size of egg, etc.) by changing the natural environment. In contrast, natural selection focuses on the overall survival and reproduction of individuals with their genomic package, rather than on their isolated features in unreal selective conditions. By providing sufficient food and eliminating real competition, researchers can artificially make some feature or trait dominant, but the results are not representative, as most features are not separated in natural selection. Fundamentally, by changing the very nature of competition, artificial selection already alters the key assumption of the how natural selection works. Therefore, two types of selection will lead to very different patterns of phenotypic changes. For example, domestic dogs can reach 180 pounds compared with a maximum 60 pounds in wild dogs, and even in the largest extant Canidae, the gray wolf, males average only 100 pounds. There are reasons that evolution did not push animals to run ever faster or increase in body size to get bigger or taller. The “good” features in artificial conditions might actually be “bad” in nature, as natural selection produces an allaround healthy species, while artificial selection often can lead to unexpected side effects in real environments. In addition, artificial selection creates a more linear model system that differs greatly from the complexity of nature. Predictability is very high in artificial selection while in nature, predictability is very low. Much has been learned from artificial selection experiments in which using pure strains to push for maximal features can lead to undesirable consequences. The domestic dog represents one of the most dramatic long-term (over 15,000 years) artificial evolutionary experiments. Descended from a single species, the gray wolf, they have evolved through an ancient lineage that is now extinct (Ostrander et al., 2017). Many breeds display remarkably exaggerated features following

318

6. BREAKING THE GENOME CONSTRAINT

continuous selection and accumulation. However, the price is high as many breeds are linked with high rates of cancer and other diseases. For Dalmatians, perfect spots (a desirable feature from human artificial selection) can be associated with many health problems including deafness. In recent years, the consequence of over thoroughbred breeding in horse racing has been referred as “a moral crisis,” following a series of incidents where animals have suffered catastrophic injuries on live television in some major racing events (Jenkins, 2008). It is now known that to produce the perfect race horse (both fast and light), breeding has focused on thoroughbreds with huge muscle concentrations but light bones. These purebreds have become faster over the years, but they have also become more fragile. Clearly, many features are packaged together for a given species and cannot be separated in nature. The success of the human species is one such example. It is the package rather than any single dominant trait that makes us successful. Simply overdominating some features by artificial selection, a strategy that works well in artificial selection, does not work in nature, as these individuals are not survivable in nature. This it is simply not how natural evolution works. Again, if it has not happened in nature, it should not be used to support natural selection. Of course, natural selection also can work on some specific features, but often in a much less dominating way. For individual animals in natural settings, each feature is heterogeneously distributed among individuals, which provides the population heterogeneity. Importantly, each feature is not expressed at its maximal level because of the dynamic environments, and for a given species at a certain region, the overall gene landscape is stable. For example, when malaria became a serious problem, deleterious mutations of specific genes involved in red blood cell function (responsible for sickle cell anemia) suddenly gained a selective advantage. In the regions with high malaria transmission, this gene variant thrived because of the benefits of malaria resistance outweighing the negative impact of sickle cell disease. However, selective forces will surely change during long periods of evolution. The frequencies of specific gene mutations can rise and fall depending on the current environment (individuals can also migrate), but other challenges will also arise such as the HIV problem, which could favor the increase of other genetic variants. From a long-term evolutionary perspective, as long as the genome survives, most combinations of gene mutations will come and go. This concept can also explain why the same type of fitness can be achieved by very different gene combinations for different populations. For example, the Tibetan people in Asia, the Andes of the Americas, and Ethiopians in Africa have all acquired the ability to live at extremely high altitudes. Interestingly, the patterns of genetic adaptation among the Andeans are

6.3 ARTIFICIAL SELECTION AND NATURAL SELECTION

319

largely distinct from those of the Tibetans (Beall, 2000). This information challenges the strategy of dissecting out common genes that are supposed to be responsible for a common fitness in nature. Artificial experiments might be more successful in artificial experimental settings by eliminating many variables, but these findings/results may not be transferable to nature, as the genome/environmental interaction package represents an emergent feature which is often nondissectible into separate parts. Third, in artificial selection, breeders can match ideal pairs based on desired features. In general, the artificial selection process deals with much smaller and purer populations. In natural selection, the mating process is much more random, and it is impossible to maintain such pure breeding. More importantly, artificial selection including natural selection in very isolated environments deals with a limited population, which is very quick to establish common features among individuals but does not reflect most natural conditions that normally occur with open populations. In large populations, it is difficult for a specific change to become dominant. Mayr cautioned that evolutionary mathematical models could not completely capture the natural situation where most populations are open. Another important reality is that unlike artificial selective conditions and extremely isolated exceptional natural settings (like isolated small islands), general species can migrate to avoid specific selection pressure. If animals can leave certain areas, they will do so rather than evolve or die off under stress in an environment as may be dictated by a certain gene status. Moreover, in artificial selection, the outliers can be selected for mating. Under a highly specific selective environment, the correlation between genotype and phenotype can be quickly established based on the continuous cross-mating among these outliers. In fact, using outliers was what Mendel did when he linked genetic factors to the tall and short pea (see Chapter 1). Obviously, it is rather difficult to replicate these in a natural population where the majority of individuals are not outliers. Interestingly, artificial selection based on cross-mating outliers with the same phenotype could reduce the side effect of sexual reproduction (when by chance, these outliers share a similar genotype). It is known that artificial selection primarily works by gene recombination rather than mutation, and the process of sexual reproduction can break down the selected “good combinations” of genes responsible for some features. The thoroughbred breeding of outliers can lead to fast evolution when there is not too much disruption of the selected gene combinations (by meiosis). Fourth, most artificial selection deals with microevolution. Regardless of the efforts of breeders, highly diverse dogs are still dogs despite all their drastically dissimilar features. Natural selection, in contrast, must be selected within and among all the different species (according to Darwin’s

320

6. BREAKING THE GENOME CONSTRAINT

proposal). Traditional explanations have stated that given enough time, artificial selection will generate new species. This point was illustrated by continuous cultures of bacterial cells in experimental conditions. As has been illustrated previously, however, the evolution of bacterial and eukaryotic cells with sexual reproduction is fundamentally different because of sex-mediated genome constraints, making the bacterial comparison invalid (see Chapter 5). In fact, both micro- and macroevolution can be observed in asexual species such as bacteria and cancer cells. Fifth, the power of artificial selection is constrained by the genome. Despite the fast speed of the initial phase of many artificial selection traits, many of them can quickly reach their plateaus after many generations. Then, artificial selection becomes much slower or even less noticeable. Such trend suggests that there is system constraint which likely will prevent or drastically slow down the accumulation of certain features enhanced by artificial selection. In other words, there seems to be some unknown factor that prevents the assumed transition of micro- to macroevolution during long-term experiments. The likely explanation is that the genome defines the range of phenotypes. Selection only works within such a range, which is directly influenced by fuzzy inheritance. Such constraints of artificial selection are clearly observed from some of the longest artificial selection experiments humans ever performed, including dog breeding and the selection of modern corn. Despite their highly diverse phenotypes, different dogs maintain the same karyotypes. The story of how corn was artificially selected from teosinte (a Mexican grass) is an amazing example, illustrating that there always is a limit to how far artificial selection can go. From a phenotype point of view, there have been drastically or almost unrecognizable morphological changes between teosinte and modern corn, which demonstrates the power of artificial selection for feature alteration after long periods. Even more surprisingly, their karyotypes are still the same or similar enough to cross-mate with each other (despite a gene profile difference involving a few genes) and generate offspring with typical features of teosinte and corn. This key observation has allowed George Beadle to study teosinteemaize hybrids, which solved the mystery of how modern corn was evolved. The fact that they still have the same number of chromosomes and a remarkably similar arrangement of genes along the chromosome demonstrates that (1) nearly 10,000 years of artificial selection has brought about huge morphological change but has failed to generate new species and (2) the stable karyotype seems not to have been impacted by artificial selection. Based on the newly realized function of sexual reproduction (Chapter 5) and the realization that teosinte and corn belong to the same species,

6.3 ARTIFICIAL SELECTION AND NATURAL SELECTION

321

the conclusion is that genome constraint is the most important factor to preventing new species formation, which is more important than any morphological features. Again, gene changes can come and go, but the boundary of the system has been maintained by the karyotype which codes the system. The above reasoning makes it obvious that artificial selection mechanisms should not simply apply to natural selection, particularly when applying such inferences over long periods of time. Interestingly, despite the difference between artificial and natural selection, such inferences usually only work when looking at short-term adaptation, especially when focusing on some features. The key problem comes when trying to bridge short-term adaptation to long-term evolution for speciation. However, this is exactly what Darwin did. As the direct evidence for evolutionary changes at the species level was lacking, Darwin turned to artificial selection for help. Based on the observations that breeders can achieve cumulative selection, he believed that the same could be true in nature, and in particular, when there is sufficient time. The key differences between artificial and natural selection challenge the premise of Darwin’s small accumulation of change leading to greater change in speciation. Despite that many scholars now realize the limitations of using artificial selection to explain natural selection, few question Darwin’s proposed mechanism of natural selection that is primarily based on the workings of artificial selection. Interestingly, before the establishment of neo-Darwinism, Darwin was criticized for his generalization from artificial selection to natural selection because the traits he discussed (body size, coat color) are hardly the characteristics that define new families and genera. In addition, the fact that evolution might have different causes other than natural selection was one reason why biologists accepted the concept of evolution many decades earlier rather than accept natural selection (Coyne, 2009; MacNeill, 2011). Since the New Synthesis (1910e1960) and particularly in the molecular evolutionary era, evolutionary studies have been dominated by population genetic analyses that trace and compare genes within and among species. However, it is an easier but a different question to establish the heritability and spread of relatively minor genetic mutations in the laboratory or to trace the spread of minor changes in wild populations. But this does not address how macroevolution happens, which is the same question that was not addressed in Darwin’s book. Remarkably, there have been far fewer critical evaluations of neo-Darwinism, probably because most biologists now study genes rather than the larger picture of the interplay of components in nature. Many neo-Darwinists are proficient at studying gene frequencies thanks to the tools provided by molecular biology that facilitate analysis of gene families and gene trees of evolution. This focused approach on the gene means that the idea that evidence for

322

6. BREAKING THE GENOME CONSTRAINT

microevolution is proof of macroevolution now dominates. In addition, the confusion over parts inheritance and system inheritance (Chapter 4) has further convinced many researchers that the mechanism of macroevolution can be understood by identifying speciation genes, as gene changes are the driving force of evolution. Perhaps today’s reluctance to criticize any aspect of Darwin’s theory is also based on its being formed in the middle 1800s when there was no theory of genetics. It is thus not quite fair to judge him using today’s knowledge. It should be pointed out that in this case it is not the genetic knowledge differences that matter but the overall framework of thinking. The misstep that Darwin made regarding the mechanism of natural selection is not because of the lack of knowledge of genetics but the assumption and inference he used. It is interesting to imagine Darwin reading the letter and paper sent to him by Gregor Mendel, but it would not have made sense to him as his evolutionary observations cannot be explained by the parts inheritance approach. This way of thinking seems obvious and logical and in fact has convinced many scientists including T. H. Morgan, the father of the gene theory. Morgan started his genetic research as a saltationist, as he believed that saltational mutations and not selection created novelty. He was determined to demonstrate that mutations could produce new species. In contrast, his experiments with fruit flies showed that rather than creating new species in a single step, mutations increased the genetic variation in a population. Although he never demonstrated that mutations causing wing size change could lead to new species, he extrapolated that if small mutations could be accumulated in a stepwise fashion, new species would form over long periods of time that would allow the accumulation of small changes. Similarly, in Sewall Wright’s adaptive landscape evolutionary model, such extrapolation was also used to link small adaptations within species to speciation. For example, Wright studied the effects of inbreeding on guinea pigs and domesticated shorthorn cattle. He demonstrated that the new variations can be established through isolated small subpopulations, and this fixed variation can be introduced into the main population. Based on this, he believed that was how new species often arise in nature. But unfortunately, there is no such dominant linear mechanism of selection in the natural setting, where gene accumulation leads to new species. With the advent of the gene theory, neo-Darwinists missed the same point Darwin initially missed. The sophisticated knowledge of modern genetics has further entrenched the confusion between macro- and microevolution by assuming that the proposed general mechanism of evolution, which works by accumulating gene mutations and/or changing allelic frequencies in a population, explains both types of evolution.

6.4 BOTH ISOLATED CASES AND ISOLATED NATURAL ENVIRONMENTS

323

Similar to Darwin, neo-Darwinism has ignored the conservative aspects of evolution (through genome constraint) and as a result has limited their theory, as evolution and system constraint or conservation define each other: without conservation, there would be no evolution. In other words, with only conservation, the species will not evolve, while with only evolution, there will be no species. Perhaps it is not quite fair to ask Darwin and neo-Darwinism to separate the mechanisms of short-term adaptation and long-term evolution above the level of species. If Darwin had been aware of the genome theory, he might have modified his thinking, as short-term and long-term evolution are clearly different. A change to neo-Darwinism to acknowledge the obvious differences between “parts inheritance” and “system inheritance” is therefore proposed (Chapter 4). With the genome theory reconciling evolving and conserving aspects of evolution by illustrating the gene dynamic and genome constraint relationship, the central importance of micro- and macroevolution should now be realized.

6.4 BOTH ISOLATED CASES AND ISOLATED NATURAL ENVIRONMENTS REPRESENT EXCEPTIONS THAT FAIL TO DEMONSTRATE THE RELATIONSHIP BETWEEN MICRO- AND MACROEVOLUTION One might argue that in addition to artificial selection, there is a great deal of direct evidence from the natural environment. This would include the decades-long follow-up on Darwin’s finches from their “natural laboratory” and the rise and fall of the peppered moth. Although these observed microevolution cases are impressive and suggestive, they do not address the issue of macroevolution. It should be pointed out that many of these famous examples are indeed exceptions among the majority of evolutionary case studies. It is thus important to further analyze the uniqueness of these cases and their limitations when applied generally. First, why does an isolated environment that has led to the beautiful observation of rapid Darwinian adaptation represent an exception? Generally speaking, natural populations are very large and adaptation effects are limited/small and hard to observe. Neo-Darwinian thinking suggests that long periods of time are required to accumulate specific genes within these large populations. Now, if we add back the reality that the environment is continually changing and creating different selective pressures, an accumulated dominant gene effect is quite unlikely to emerge in a large population over time. This might explain why neutral alterations can be accumulated through drift mechanisms. In contrast, in isolated environments like small oceanic islands, rapid microevolution

324

6. BREAKING THE GENOME CONSTRAINT

can be observed relatively more easily, as species can evolve in isolation more rapidly and a specific allele can quickly spread based on neoDarwinian theory. Darwin and others have observed such cases. However, observing rapid microevolution (short-term adaptation) on islands is one thing. Inferring that this type of fast-paced adaptation in isolated areas can lead to speciation (macroevolution) in large general populations is a totally different matter! Yet, the observation of drastic variation in mockingbirds or Darwin’s finches among the various islands was crucial to Darwin’s development of his theory of natural selection to explain evolution, as he speculated that the distribution of mockingbirds and tortoises might "undermine the stability of Species" (Darwin and Keynes, 2000). Clearly, this observation served as a key reason behind the linking of microadaptation to the macroevolution of speciation. Why should this inference be challenged? To illustrate this point, we need to focus on the data of Peter and Rosemary Grant who followed Darwin’s footsteps and studied Darwin’s finches. This husband and wife team has spent part of each year since 1973 on a tiny volcanic island in the Galapagos. What is amazing is that in over 20 years, they have patiently measured hundreds of birds and recorded their diet of seeds in the “natural laboratory.” They have studied the evolutionary relationships between different types of seeds (hard or soft) as influenced by weather, the beak changes in Darwin’s finches in the context of the food supply, the differential survival of these birds, and adaptation across generations (Weiner, 1994). What they found is fascinating and critical for understanding both the power and limitations of evolutionary adaptation: 1. The types of seeds (food supply) serve as a powerful selection force. During a severe drought in 1977, there were less seeds available. The small, soft seeds were quickly eaten by the birds and only the large, tough seeds which were normally ignored by the finches remained. Under such selection conditions, smaller finches with less-powerful beaks perished as they could not open the hard seeds. 2. The large-beaked survivors had more offspring. After only one generation, natural selection had increased the beak size by 10%, which is far greater than any known fossil record in terms of the rate of evolutionary change. Clearly, the evolutionary adaptation response to the environment led to a larger-beaked finch population and is direct evidence of microevolution. Alternatively, such beak size changes can be simply explained by shifting the spectrum of fuzzy inheritance. 3. However, there was no long-term accumulation of these rapid adaptations, creating a paradox between short-term dynamics and long-term stasis. More interestingly, as expected by the genome theory, in such an extreme environment, the swing back could be

6.4 BOTH ISOLATED CASES AND ISOLATED NATURAL ENVIRONMENTS

325

equally abrupt, wiping out the previous effect of evolution and again reshaping the population. Over time, the boundary of a phenotype does not change for a given species. There is a great deal of discussion about the significance of the Grants’ data. A number of points must be emphasized: First, under extremely isolated natural conditions, like the 100-acre island called Daphne Major where the Grants collected their data, evolution can progress very quickly, so quickly in fact that it would have stunned Darwin who thought natural selection required long periods of time and could not be directly observed. However, this rapid evolution observed by the Grants represents an exception, as natural evolution in general is usually much slower, as Darwin expected. Although exceptions can be effective to test a hypothesis, they remain exceptions to the rule. Second, 20 years of observing evolution on the tiny island have compellingly demonstrated that artificial selection and natural selection are fundamentally different. Natural selection does not accumulate as adaptation can proceed in multiple directions over a long period. As the Grants noted, seven years after the severe drought in 1977, unusually rainy weather quickly reversed the direction of adaptation, and the birds with smaller beaks were the ones that survived and produced the most offspring. Again, only in extreme cases can rapid fluctuation of evolution be observed. In conclusion, despite the potential rapid speed of microevolution, there is no constant additive or cumulative trend or direction that natural selection takes over an extended period of time (the fluctuation of evolutionary selection forces can effectively wipe out the increased beak size that resulted from a drought and it is no wonder there has been no drastic evolution rate recorded in the fossil record). This observation does not fit the hypothesis that small accumulations eventually lead to major evolutionary changes over long time periods. Third, despite the features of the population undergoing constant change, rapid adaptation is not macroevolution, as the species does not change permanently. Thus, one must not use short-term rapid microevolution to infer long-term macroevolution as they are very different. This point is well stated by Lynn Margulis. No changes of this magnitude, correlated with other traits that would produce a newly named species of Galapagos finch, were seen by the Grantseor anyone else. The Darwinian paradigm is operating exactly as it should: Different traits (whether within species or among different species) are varying in prevalence according to the demands of the environment. Obviously, the genes that produce these traits are varying in like fashion. But there is no evidence whatsoever that this process is leading to speciation. Margulis and Sagan, 2003

326

6. BREAKING THE GENOME CONSTRAINT

Finally, the true general evolutionary story in nature is much more complicated than just a single identifiable trait such as beak size. It is the genome-defined overall system package that matters. For example, in addition to the food supply, disease, predators, extreme environments, and many other factors can shape a population in the evolutionary long run, making it difficult to study these multiple aspects simultaneously. However, that is how evolution works at the genome package level in nature. Again, this is another example of attempting to simplify a complex system to understand a system (Heng, 2015). It turns out that in complex systems, the traits that can be easily traced are often trivial while the traits that are hard or impossible to trace are always important. The above analyses were done based on the results of the first 20 years of observations made by Grants. In recent years, the Grants have published their new book summarizing data collected during past 40 years (Grant and Grant, 2014). It is thus very interesting to see the evolution of their evolutionary studies in the second half of the 40-years window, and in particular, if there is any new insight which was not observed in the first 20 years. Surely, there are some exciting discoveries. (1) Although the selection dynamics is already known from the first 20 years of observation, the data of 40 years make this point much stronger. They forcefully demonstrated that the selection (type, direction, and speed) can change over time. (2) Perhaps the most important observation is that the initiation and gradual establishment of a new lineage has been discovered, which now behaves as a new species. On the surface, the Grants’ watching evolution in action experiments have strongly supported natural selection by providing direct evidence of how evolution occurs on a tiny island. Following more careful analyses, and especially, through the lens of the genome theory (Chapter 7), however, many phenomena and conclusions need to be reinterpreted to reveal important hidden information, some of which is indeed not consistent with current evolutionary concepts. First, regarding the speed of natural selection, many have claimed that the speed of evolution is much faster than Darwin’s prediction. “It does not take millions of years; these processes can be seen in as little as 2 years” (Zimmer and Emlen, 2012). However, Darwin is likely correct about the slow speed of the natural selection in normal natural conditions. The tiny island with extremely limited space (Daphne Major is less than half a square kilometer in size) creates exceptional natural conditions, like artificial selection conditions, which surely are different from selection conditions in huge landscapes or even large islands. In particular, with drastic environmental conditions in the Galapagos Islands, where up to 80%e90% of birds on these small islands die during drought years, such highly selective conditions will

6.4 BOTH ISOLATED CASES AND ISOLATED NATURAL ENVIRONMENTS

327

likely push the speed of the evolution to its maximum. Moreover, without predators and other types of competitors, the relationship between beak size and survival advantage can be observed in a nearly linear model system (nature has created a system unlike others). Yes, according to Rosemary Grant, “those extreme examples would give us the opportunity to measure the climate variations that occurred and the evolutionary responses to those changes” (Singer, 2016), but it must be pointed out that extreme cases often illustrate the rules of exception. The Grants’ data showed how fast natural selection can go under the extreme (thus nonrepresentative) conditions, but not how fast the natural selection is in general. As a conclusion, the “evolution’s astonishing speed” observed in one generation likely represented an exceptional case. One can imagine that for birds that live in a huge forest, the environmental change would not be that drastic, and birds could move around to escape some local harsh conditions. Of equally importance, unlike artificial selection, the direction of natural selection is not fixed during the long period of time. As a result, the swinging of the environments can cancel out the short-term changes of natural selection. Grants’ data have beautifully illustrated this point. The fast-gained larger beaks at one environmental condition (severe drought) can be quickly reduced to smaller beaks when the environmental conditions are changed into the opposite one (bountiful rain). This is a very important observation for explaining why long-term evolution is slow, which also supports the viewpoint that artificial selection conceptually differs from natural selection. In conclusion, because of the incapability of accumulating these changes observed in the short term, the environmental swinging plus time will lead to slow evolution in the long run. For the majority of species, they will not drastically change or change stochastically in terms of direction, as the fossil record has shown. Thus, it is unlikely that microevolution accumulation will lead to macroevolution in the long run. Second, regarding new species formation, it is a very exciting observation that big birds represent a newly established lineage (species), but the mechanistic explanations could be very different if the discussion focused on the key prediction of neo-Darwinian theory rather than one or two isolated features. Moreover, there are some much better explanations when the correct level of genomic information is considered. The question of how new species are formed represents one of the three important questions the Grants had when they initiated their journey (Singer, 2016). Forty years later, despite the fact that fast phenotype evolution has been observed coupled with environmental changes, there is no evidence that the accumulation of small changes over time can lead to speciation. In contrast, by “accident,” a few big birds from other islands entered their theater in 1982 during the heavy rains of the El Nino, and

328

6. BREAKING THE GENOME CONSTRAINT

one of them (a hybrid that had backcrossed with one of the parent species) had crossed with local birds which initiated a new lineage. They slowly increased their numbers; when the drought came 11 years later, this population in fact can influence the food supply on this tiny island. However, when this island had a serious drought (2003e2005), all of the big bird lineage was wiped out except for a brother and sister. When the rain came, mating between the only two survivors resulted in 26 offspring, among which only 9 survived to breed. Life goes on, an inbred lineage is still there, and whether or not this lineage will become a sustained species is yet to be seen. Nevertheless, this story represents a highly significant event. According to RG: “We had often argued that if birds that had genes from other species flew to another island with different ecological conditions, then natural selection would shape them into a new species. We never thought we’d see it happen, but we did.” (Singer, 2016). Peter Grant further explained that based on natural selection, there are three key steps in the traditional model of speciation: 1) the colonization of a new area with a distinctive ecological environment (promoting changes); 2) species evolution under natural selection (accumulation of changes); and 3) repetition of this process (going to new areas, accumulating additional changes) until sufficient changes are accumulated between the parental and new populations so that once they come in contact again, they are separate species. The Grants concluded that their work has supported the traditional model of speciation. Furthermore, they have shown other routes to speciation, like the model of gene flow (Singer, 2016). The Grants linked the rare speciation event observed to new genes (gene flow) and natural selection. However, there are many more important messages. 1. The speciation event seems to have little to do with the accumulation of the selected beak’s size, the key evolutionary feature under their investigation for 40 years. The new species does not come from natural selection working on the existing species in the theater (even though it is believed that the beak size is clearly related to feeding strategies and reproduction, and together, these factors can promote the development of new species). Rather, it came from a rapid hybridization event, which most likely involved chromosomal-level changes rather than individual gene changes. Such macroevolutionary event seems to have occurred before traditional natural selection. It is anticipated by the genome theory of evolution that the chromosomal coding was altered when the big bird crossed with local birds and produced birds with altered but stable genomes.

6.4 BOTH ISOLATED CASES AND ISOLATED NATURAL ENVIRONMENTS

329

2. The fact that the majority of the big bird lineage died during the drought, despite the advantage of their big size, further suggested that there are other factors rather than size alone that are being selected on. If only based on the survival advantage of the beak size, the major discovery of the study, such lineage should have experienced less death. 3. The dynamics of the big bird lineage illustrates the importance of luck: what if the drought only left one bird to survive instead of one brother and a sister? As the new model of speciation states (in a later section, see Fig. 6.2), the luck of meeting a reproductive partner with a similar or compatible genome for reproduction, and the luck of being in environments with less competition (allowing the new species to grow), are of ultimate importance for many new species to be successful. 4. As pointed out in Grants’ latest publication (Lamichhaney et al., 2018), the big bird lineage shows that “reproductive isolation, which typically develops over hundreds of generations, can be established in only three.” It also needs to be further pointed out that genome alterationemediated reproductive isolation is often achieved by one to a few generations (Fig. 6.2), whereas for specific geneemediated reproductive isolation, if this occurs, it might take hundreds or thousands of generations. Moreover, despite the fact that genome reorganization can be rapidly achieved, and under rare conditions a new species is born, many more generations are still needed to make a specific species visible (with a certain population size). The big bird story fully supports such idea. Another example is the evolution of mouse body size. Based on the evolutionary rates of a mouse colonization study by Philip Gingerich, in just 10000 years, the mouse should be as large as an elephant if the small changes are additive and accumulate in the same direction (Coyne, 2009). The fact that a mouse is still a mouse and still quite a bit smaller than an elephant emphasizes the following: (1) The long-term evolutionary rate is much slower in nature, and estimations based on artificial selection and even natural colonization are off target. Note that colonization studies are 500 times faster than rates of fossil change, while laboratory selection is a million times faster! But the key difference is that the natural setting in general and especially over the long term fundamentally differs from both artificial laboratory selection and isolated special environments in the short term. (2) The real lesson is that we should not use artificial selection data or exceptional observations in nature to explain the fossil record or natural evolution in general. (3) More importantly, genome constraints rather than gene-mediated adaptation define species and prevent us from confusing a mouse with an elephant. Again, body size is just one trait we

330

6. BREAKING THE GENOME CONSTRAINT

measure, but it is the genome package that defines a species (including its maximum size which cannot be superseded by natural adaptation). This analysis nicely illustrates the relationship between individual trait dynamics and system identity (package) constraints. Perhaps the best supporting evidence that challenges the assumed accumulative relationship between microevolution and macroevolution comes from cavefish research. Using the “natural laboratory” of the cave environment to study evolution has its advantages of examining the accumulative selection on phenotypic features and the species, as it could mimic (to a certain degree) artificial selection. Among a diverse group of cave animals (flatworms, mollusks, arthropods, and vertebrates), a strong convergent evolution has resulted in a connection of phenotypic changes including loss of eyes and pigment, enhanced tactile sensitivity, lower metabolic rates, and increased longevity (Culver and Culver, 1982; Jeffery, 2008). Therefore, cave life can serve as good material to study the genomic mechanism of such strong selection. By briefly summarizing recent cave animal studies (especially cavefish), the following interesting points stand out. 1. The power of constant directional selection has been illustrated by cave life studies. Regardless of what the mechanisms are for the evolution of the cave phenotype, a strong convergent phenotype has emerged. Many mechanisms have been proposed including (1) direct selection, such as selection on the visual system, which is relaxed in darkness, allowings for the accumulation of gene mutation for eye function, or eye loss, which can save energy; (2) indirect selection based on antagonistic pleiotropy between regressive and beneficial traits, such that functional eyes were sacrificed to allow the development of more “useful” characters (like more taste buds and an enhanced lateral line system); and (3) the accumulation of neutral mutations in the absence of natural selection (Yamamoto, 2004; Jeffery, 2008). 2. Diverse genetic and epigenetic alterations are associated to (in a dynamic relationship of both contributing to and resulting from) the same phenotype. The story of the teleost, Astyanax mexicanus, with both cave and surface-dwelling forms, illustrates this point well (Borowsky, 2008). Based on the observation that many mutations at different loci have accumulated in different populations (often within different caves), each cave-adapted population clearly displays different trajectories of genetic evolution that impact eye development. Genetic complementation experiments, to restore visual function between independent populations, have demonstrated that different gene mutations have been involved in different populations, as when closely related but

6.4 BOTH ISOLATED CASES AND ISOLATED NATURAL ENVIRONMENTS

331

geographically separated blind cavefish populations were cross bred, a large portion of their hybrid offspring had vision. Furthermore, different selective factors specific for cave life, such as scarcity of food, have been examined for Mexican cavefish to explain some adaptive behavior including starvation resistance and binge eating when food becomes available. As a result, mutations of the melanocortin 4 receptor (MC4R) gene have been identified, which may contribute to the insatiable appetite found in some populations of cavefish (Aspiras et al., 2015). Of course, many more different genomic and epigenomic mechanisms can be linked to cavefish phenotypes and the list of gene mutations will grow fast. Recently, epigenetic profiles have been examined for Pachoń cavefish, in which there are no inactivating null mutations in essential eye development genes. This study showed that changes in DNA methylationebased gene repression can serve as an important mechanism leading to phenotypic diversity during development and evolution (Gore et al., 2018). 3. In most cases, cave animals have survived long after the extinction of surface-dwelling ancestors, suggesting that the constant cave environment, albeit representing extreme habitats, can be good for the propagation of these species, as unexpected environmental changes always challenge the survival of living systems. It also can be explained that environmental constraint could contribute to the preservation of species. 4. Despite the long term of evolution with drastic morphological changes, these cavefish maintain their identity of the species. Specifically, their core karyotypes likely are maintained during nearly a million of years of natural selection (directional selection under high stressful conditions), as they can cross with fish near the surface. Clearly, the million years experiment of evolution in the cave failed to push cavefish into new species. Similar to the corn story, this is a very important demonstration that genome constraint is not changed after million years of directional microevolution. Together, all cases discussed so far support the statement that despite its general acceptance, there is no solid evidence to support the assumed relationship between accumulated microevolution and macroevolution. In the light of above analyses, it is useful to briefly mention the evolutionary story of three-spined sticklebacks. Sticklebacks represent an ideal system for studying the molecular basis of adaptive evolution, especially for natural selectionemediated speciation. Marine sticklebacks have successfully colonized and adapted to a large number of streams and lakes formed since the last ice age (10,000 and 20,000 years ago). Many consider the repeated phenotypic

332

6. BREAKING THE GENOME CONSTRAINT

changes and distinct gene pools between marine stickleback and freshwater forms highly significant, which represents a good example of ongoing macroevolution. As marine form and freshwater form sticklebacks can cross with each other and generate fertile offspring, they clearly belong to the same species. The mechanism behind why different ecotypes do not merge into a single homogenous population might be that there is a minimal degree of reproductive barrier, which will not be favored outside the “hybridization area” (in the lower reaches of rivers and streams in the northern hemisphere). This strong genomic barrier or constraint will likely involve chromosomal-level elements rather than gene-level elements to maintain these morphological, physiological, behavioral, and genetic differences among ecotypes when they come in contact with each other during the breeding season (where hybridization can take place) (Miescher, 2015). It is not known if the hybridized offspring display any disadvantages in natural conditions. Interestingly, genomes of marine and freshwater form sticklebacks have been sequenced; the reuse of globally shared standing genetic variation, specifically of chromosomal inversions (the complete inversion of three large gene regions), has been linked to the repeated evolution of distinct marine and freshwater sticklebacks and to the maintenance of various ecotypes during early stages of reproductive isolation (Jones et al., 2012). Again, the limited scale of chromosomal changes plays an important role in separating ecotypes within the species, but further chromosomal alterations rather than gene mutations are needed if new species will emerge. In conclusion, neither artificial selection nor isolated cases of natural selection have provided solid evidence that clearly linked the accumulation of microevolution to macroevolution. In fact, there are no reliable conceptual frameworks on this issue. Gene-mediated adaptation is very different from genome-mediated system constraint and genome alterationemediated macroevolution. Thus, our somatic cell evolution model can actually provide insight into the relationship between genemediated microevolution and genome-mediated macroevolution in an evolving lineage, which parallels lineages of natural evolution. To illustrate this point, readers will need to appreciate the issue of evolutionary constraint and its implications for the new evolutionary thinking.

6.5 MAINTAINING GENOME INTEGRITY: THE MAJOR EVOLUTIONARY CONSTRAINT The appreciation of evolutionary constraints represents one of the biggest advances in the past half century (Futuyma, 2010). Thus, realization of the discontinuity between micro- and macroevolution is

6.5 MAINTAINING GENOME INTEGRITY

333

essential to explain how evolutionary constraints work and to establish the correct framework of evolution. Specifically, genome constraint provides the explanation for sluggish evolution (Gorelick and Heng, 2011).

6.5.1 Why Are Evolutionary Constraints Important? One of the key challenges of the current evolutionary theories is solving the puzzle regarding how genetic organization can both promote and constrain organismal evolution. While it is well known that abundant genetic alterations can be observed both in the experimental and natural conditions coupled with rapid local adaptation, the actual overall evolutionary speed seems to be slow with a difficulty to adapt and has recently been referred to as “sluggish evolution” (Futuyma, 2010). Based on fossil studies, for example, species were capable of displaying abundant change over decades to centuries, yet over millennia they were basically static (Eldredge et al., 2005). A recent large-scale statistical survey of an evolutionary model in fossil lineages revealed that directional evolution is rarely observed and accounts for only 5% of cases (13 of 251) (Hunt, 2007). Note that many textbooks, however, highlight the exceptions, giving students the wrong impression that most fossil lineages display directional evolution. This is yet another example of using exceptions to describe a general rule. An even more fundamental issue is that evolutionary failure is commonplace (Bradshaw, 1991). It is difficult to explain the extinction of a vast majority of species as well as all kinds of evolutionary limitations (Futuyma, 2010), if evolution always results in successful long-term adaptation. Ironically, Darwin’s theory of natural selection has faced opposite challenges on the same issue of dynamics and constraint of genetic variants, albeit from different historical stages. Before the new synthesis, Darwinism faced challenges to find the needed variants for evolution. By 1900, “classic Darwinism, which envisioned the natural selection of minute, random, inborn variations of an essentially continuous nature, was widely dismissed as leading nowhere” (Larson, 2002). At that time, blending inheritance was dominant, and “even if an individual with a beneficial variation was more likely to survive, it would likely breed with a “normal” individual, and their offspring would regress toward the species norm. Over time, continuous variations would be “swamped”” (Larson, 2002). This challenge has been addressed by neo-Darwinian theory, mainly for short-term microevolution, and mainly achieved by theoretical and modelingbased analyses. Now, a new challenge is to explain the constraints of evolution and the mechanism of breaking it to form new species. This clearly represents another side of the evolutionary story: why short-term rapid adaptation often fails to lead to new species over time and how

334

6. BREAKING THE GENOME CONSTRAINT

nature solves the paradox of “changes, yet does not change (still the same species), but finally changes (into a new species).” Various factors have been analyzed to address the issue of limited evolution (or limited response to selection). Bradshaw suggests that the controlling agent limiting evolution is the supply of variation. As inheritable variation is a key component of evolution, examining genetic variation is a logical approach to understand the mechanism. Although there is evidence that genetic variation can limit evolution, it is insufficient to solve this issue. Many other mechanisms have been proposed including developmental constraint, stabilizing selection, population structure, ephemeral divergence, and system internal homeostasis (Bradshaw, 1991). One interesting trend lies in the viewpoint that the key evolutionary constraint might not be located at the genetic level but rather at the ecological level, as most genetic variations analyzed to date do not seem to account for evolutionary constraint.

6.5.2 Genome Integrity Represents the Major Evolutionary Constraint The analyses in previous chapters will hopefully lead readers to appreciate that even though the main evolutionary constraints have not been identified at the gene level, one must not rule out the idea that genetic contribution at the genome level could be the most important evolutionary constraint. According to the genome theory, it is the genome rather than individual genes that is the unit of evolutionary selection (Heng, 2007b, 2009). Thus, the search for the main genomic contribution of evolutionary constraint should focus on the genome rather than genes. Based on this viewpoint, particularly after realizing that genome alterations drive macroevolution, it has been hypothesized that the longignored genome-level constraints will provide the key genetic mechanism for organismal stasis. 6.5.2.1 Why Is It Essential to Discuss Genome-Level Constraint? Given the nature of the complexity of both the biological system and evolutionary process, it is necessary to examine the issue of multiple levels of constraint and their dynamic interactions including genes/ epigenes, proteins, genetic and protein networks, the genome, the individual, populations, society, and ecology (Heng, 2009). As Futuyma provided a comprehensive summary and insightful analysis for many of these levels (Futuyma, 2010), the level of the genome will be the focus in this section, not only because it has been more or less ignored in traditional evolutionary studies but also because it represents the main level of constraint/control and defines potential interactions that may occur at

6.5 MAINTAINING GENOME INTEGRITY

335

various lower and higher levels. Comparatively speaking, other types of evolutionary constraints are important but are not major determining factors when compared with genome constraints. Reducing the constraints at the ecological level, for example, can increase system instability. Under such instability, there are increased levels of short-term adaptation and genome alterations which provide increased potential to form new species. However, until a new genome cluster is formed, there is no opportunity to pass on these altered genomes. A genome cluster refers to multiple individuals that share a similar genome and can mate. The appearance of genome clusters usually occurs during a transitional time when new species are appearing and becoming established. As the importance of the genome has been discussed in previous chapters, some of the conclusions will be briefly summarized, which are directly related to this topic. a. The genome codes for the package of an entire system.

Traditional genetic coding specifically refers to combinations of three nucleotides that determine the amino acids in proteins or RNA. 4D genomics, in contrast, emphasizes comprehensive coding as a system inheritance resulting from the genomic relationship between all genes and other structural/regulatory elements of the same system. Significantly, the genome-level information controls might not be directly stored within an individual DNA molecule but instead might exist within the topological relationship between genetic loci within the nuclei. This explanation also makes sense and agrees with various biological observations. The basic coding to make materials is universal among species who share a common ancestor. However, similar genes (materials or tools) can build a variety of genomic architectures with specific sets of chromosomes that define different species. Each genome represents a unique interaction or assembly code that is not shared by other species except for similarities among species that belong to the same genus and/ or family. “Material” coding by DNA is essentially conserved among species because of its basic level of genetic organization. In contrast, architectural coding is highly dynamic and less predictable, as this new type of information is not coded within the DNA sequence but exists at the chromosomal level and is achieved through the self-organization principle based on the genes and genomic topological relationship. Such genome-level properties are inherited for each species and ensured by genome conservation. b. Sex safeguards genomic integrity.

Maintaining genome integrity has been a hot topic in molecular biology. Proposed mechanisms include but are not limited to various DNA repair systems, cell cycle checkpoints, programmed cell death,

336

6. BREAKING THE GENOME CONSTRAINT

robustness of network, the separation between germline cells and somatic cells, and epigenetic resetting. Recently, sexual reproduction has been recognized as serving a key function in maintaining genome integrity in most sexually reproducing eukaryotes (Chapter 5). The genome context can be preserved by maintaining both the composition of the chromosomes and the gross order of genetic loci along each chromosome. As such, sex serves as a filter to eliminate most significant alterations at the chromosome level while allowing for gene-level changes. Thus, meiosis serves two main functions: it promotes gene changes but limits genome alterations. c. Sex-mediated genome integrity ensures long-term evolutionary stability.

Because sex can preserve the boundary of the system, it provides an ideal balance between genomic dynamics and constraint in sexually reproducing organisms. In the short term, the dynamics of genes occurring through mutation, genetic recombination, and splicing can effectively enable evolutionary adaptation by passing on the genome while ensuring over the long term that the genome maintains its framework. During each normal reproductive cycle, most accumulated genetic alterations at the genome level will be eliminated, resetting the system back to the original status. Despite some small-scale copy number variation and some retrovirus integrations that can be accumulated, for most sexually reproducing species, the genome framework is basically the same generation after generation. In contrast, asexual organisms constantly go through both micro- and macroevolution, as there are no effective mechanisms like sex to purify their genomes and preserve clear-cut genome-defined boundaries. d. The genomednot the genedis the macroevolutionary selection unit.

Extensive discussion has occurred on why genes or individuals should or should not be the primary object of natural selection (Mayr, 1997). It is proposed that the genome is the primary object of selection, especially at the macroevolutionary phase (Heng, 2009, 2015). The following points further support this concept. (1) The genome and gene represent different levels of genetic organization where the genome determines the system inheritance, the essential component of macroevolution (see Chapter 4). (2) There is a separation between individuals and genomes (some individuals with altered genomes can pass a normal genome on to the next generation with higher frequency because of the sexual filter (e.g., XXX, XYY, and individuals with Down syndrome). (3) There is a distinction between germline cells (genome to be passed on) and the final function of somatic cells. Note that this difference can contribute to the individual variations that occur outside the boundary of the genome.

6.5 MAINTAINING GENOME INTEGRITY

337

Thus, there is a difference between success of a phenotype and the genome types that get propagated (Heng, 2010). There are two points worth discussing here. In microevolution, selective contributions from individual genes can be measured if they display relatively simple traits. However, most genes are not independent information units but rather depend on a genome-defined network within specific environmental conditions. Selection acts on the genome package rather than on specific genes. Of course, the genome defines the boundary of the genetic potential of a species, which includes large numbers of combinations of genes that are displayed by different individuals. In macroevolution, selection at the genome level is critical during speciation as the newly formed genome must survive against competition. (The traditional concept insists that speciation is often caused by polyploidy or geographic isolation that results in large genetic drift, which do not require any selection. However, genome-level selection is obvious in cancer evolution). Once a species is stably established, the genome serves as a constraint to maintain the system where sexual reproduction prevents drastic genome-level changes in the germline, while gene/genome dynamics at the somatic cell level promotes individual survival. In short-term adaptation, the effect is that gene and epigene dynamics are high, whereas in long-term selection, where the selective forces are constantly changing, gene mutations/epigene alterations come and go. As long as the population size is healthy and different individuals display variable features (the interactive results of environment and different genetic profiles defined by the genome), a given species can be passed by passing the core genome, which also defines various potential phenotypes under different environments. No wonder Mayr has stated: When re-reading my analysis, I was quite surprised how rarely I had to refer to the genetic aspects responsible for the phenotype. Apparently, it does not matter very much how the genes are combined or how much the genotype has to be modified, provided the resulting phenotype is favored by selection. What counts is the adaptedness of the end product. Mayr, 1997

Mayr is partially correct because all phenotypes are products of the genotype/environment interaction. All selected individuals will also pass the core genome (the representative karyotype), which contains constantly changed gene and epigene landscapes. 6.5.2.2 Different Factors Contribute to Genome Constraint The realization that most eukaryotes achieved evolutionary constraint by genome integrity preserved by sexual reproduction is of importance. It

338

6. BREAKING THE GENOME CONSTRAINT

addresses the relationship between conservation of the genome and the dynamics of gene mutations and epigenetic alterations. It should be noted that many other factors, such as environmental and even social interactions, can either promote or reduce genome constraint. The balance between different levels’ dynamics and constraints is important. For example, increased genetic diversity at the gene level is useful for individual survival, but the conservation of the genome ensures that the surviving system can be sustained over the long run without changing into a novel system. Interestingly, the higher level of system constraint will diminish the impact of lower level dynamics over the long run despite its importance to organismal adaptation in the short run. This can be explained as a new version of the Red Queen story: try everything to maintain life including gene mutations, yet do not alter your genome to change who you are. Only the existence of the genome matters in the long run. Furthermore, how the system constrains its genomic system also defines the pattern of evolution. In eukaryotes, which all contain chromosomes, sexual filters make altering the genetic identity of a given species difficult as most of the chromosomal modifications will be eliminated by sex. However, when speciation occurs, the reorganization of the genome can produce very different genomes, and some of them may have the potential to make big leaps in evolutionary terms. In contrast, while prokaryotes can evolve both by micro- and macroevolution, their macroevolution is often less drastic compared with organisms with the capability of chromosome-based genome reorganization (e.g., the amount and degree of creating new and complex genomic information). The formation of typical chromosomes and the establishment of sex thus represent key evolutionary events that separate these two major groups. Accordingly, the evolutionary patterns are different among them. It is likely that in prokaryotes, the ecological constraint is of equivalent importance to the powerful sex-mediated genome constraint.

6.6 IMPLICATIONS OF GENOME THEORY TO EVOLUTIONARY CONCEPTS Understanding the essential role of genomic constraint in evolution is critical. Before the Darwinian era, species were generally thought to be a fixed entity created by God. Then, Darwin’s evolutionary ideas raised acceptance that species are created through descent with modification: they are entities in long and constant flux with many intermediate forms. Despite the dominance of Darwinian views of evolution in modern science, the mechanism of species formation is largely unknown. Many interesting inferences based on gene-centric thinking sound very convincing. “Mendelian genetics enabled Darwin’s idea of natural selection to

6.6 IMPLICATIONS OF GENOME THEORY TO EVOLUTIONARY CONCEPTS

339

be accepted and developed by providing a mode of inheritance in which selection can operate” (Hurst, 2009). However, if increased genomic knowledge challenges traditional Mendelian genetics, and especially if Mendelian inheritance only explains exceptional cases (Chapters 1 and 2), we have to search for a new evolutionary paradigm. Increased voices support a viewpoint that challenges gradualism. As Fodor and Piattelli-Palmarini put it, “the textbook cases of Mendelian inheritance, in spite of their great historical and didactic importance, are more the exception than the rule” (2010). The new paradigm must be able to explain general rather than exceptional cases (even though a good theory should explain both general and exceptional cases, which is not the case for many current biological theories). In particular, it needs to be based on the correct inheritance theory. It should be able to explain many common phenomena, such as different evolutionary patterns between sexual and asexual species, different genomic mechanisms between micro- and macroevolution, different environmental impacts on evolution (in normal and crisis conditions), different selective landscapes (slow adaptation or fast survival), and different types of cellular evolution, including cancer and organismal evolution. Clearly, the genome theory and 4D genomics will have a major role in establishing new evolutionary concepts. First and foremost, 4D genomics brings a better concept of system inheritance and fuzzy inheritance into the evolutionary thinking, allowing us to consider the entire genome system rather than only the genes which are parts of the genetic organization. This change can provide a basis to distinguish between genebased microevolution and genome-based macroevolution. It also explains the limitation of using a 1D gene approach to study evolutionary traits and to illustrate macroevolution in particular. Many traits cannot be separated because the genome represents a package of macroevolutionary selection. It would be extremely challenging to use individual genes or specific traits to understand the reality of the 3D genome context. This leads to an important point: the crucial additional dimension of time. In contrast to the previous practice where time was used as an “imagined bridge” to fill the gap between micro- and macroevolution, time here explains the unpredictability of genome evolution. These new concepts compellingly challenge many generally accepted viewpoints. The following are a few examples.

6.6.1 The Concept of Species Defining a species is one of the most highly debated topics in biology and philosophy (also see Chapter 2). Under the influence of the stepwise and accumulative evolutionary model, it has been proposed that a species

340

6. BREAKING THE GENOME CONSTRAINT

cannot be clearly delineated as it is a dynamic concept. Where do you draw the line if a species represents a continuously changing entity? And yet this idea directly contrasts the observation that most different sexually reproducing species are identifiable. As stated by Coyne, “the discontinuities of nature are not arbitrary, but an objective fact . Although there is variation among individuals within a cluster, the clusters nevertheless remain discrete in ‘organism space’. We see clusters in all organisms that produced sexually.” (Coyne, 2009). It needs to be pointed out that “Darwin apparently didn’t see the discontinuities of nature as a problem to be solved, or thought that these discontinuities would somehow be favored by natural selection. Either way, he failed to explain nature’s clusters in a coherent way” (Coyne, 2009). The generally accepted definition of a species is Mayr’s biological species concept, which fits especially well with a majority of animal species. His concept focuses on the reproductive boundary. Now, realizing that there are no significant continuous changes within a given species at the genome level because of the sexual filter, despite the possibility of continuous change at the gene frequency level within a population, we should have confidence in defining a species as a fixed entity, at least for sexually reproductive organisms. From the genome theory point of view, individuals of the same species share a genome that defines species identity. Furthermore, the integrity of the genome is ensured by the function of sexual reproduction, the mechanism whereby the barriers of reproductive isolation are enforced. Thus, this viewpoint is in synchrony with Mayr’s definition of a species. One key limitation of Mayr’s biological species concept is that it does not apply to asexual species. As the genome defines biological systems, the degree of genome similarity should be used for defining asexual species. In fact, successful sexual reproduction can be measured through the similarities of the genome, as only the same or a very similar genome can serve as the basis for sexual reproduction. In this way, reproductive compatibility becomes a practical and ultimate measurement of genome similarity (without calculation by an artificial standard). To provide a quantitative standard to measure similarities between asexual genomes for the purpose of classifying species, the use of the core genome concept has been suggested (Heng, 2007b, 2015). In species that undergo sexual reproduction, the core genome is fixed, reflected as a low degree of karyotype variation. It should be pointed out that a portion of individuals of any species will display altered genomes. Such variations will not have a huge impact on the population, as these individuals will have no chance to pass on their altered genomes under normal conditions (although under crisis conditions, altered genomes might have a better chance to survive and pass on). In fact, most altered genomes are produced de novo in each generation. They come and go,

6.6 IMPLICATIONS OF GENOME THEORY TO EVOLUTIONARY CONCEPTS

341

with little chance of becoming dominant within the population. Therefore, the existence of altered genomes among individuals should not be used as evidence against the core genome concept. It would be interesting to investigate whether these altered genomes can be passed on for more generations in populations where altered genomes become more common. In humans, the core genome is 46 chromosomes, but we do have individuals with XXY, XYY, trisomy 21, and other chromosomal alterations. Despite the fact that some altered genomes can sometimes pass through the reproductive filter (e.g., a small portion of children born to mothers with Down syndrome are likely to have trisomy 21), altered genomes do not consistently pass through generations (as prevented by the mechanism of sex). In species without sexual reproduction, the core genome should be less cohesive. Traditionally, the similarity of DNA sequences has been used as a measurement for asexual species. For example, a bacterial species is defined as a collection of strains characterized by DNA with at least 70% cross-hybridization (Wayne et al., 1987). Because bacterial evolution often involves mixtures of micro- and macroevolution, a new set of standards needs to be established. This standard must include information regarding the topological relationship among genes, especially because the differences among microorganisms’ genomes are much greater than among mammals, as Venter’s group reported (Nealson and Venter, 2007). Even in some closely related microorganisms, it is difficult to identify a defined genome based on sequence diversity, possibly because of a lack of sexual filters that maintain a static, clonal species. This point is supported by computer simulation studies (Heng 2015; Ying et al., 2018).

6.6.2 The Origin of Adaptation Differs From Speciation Despite the importance of macroevolution, current evolutionary theory fails to illustrate how it actually works. This fact was recently pointed out by Coyne (Coyne, 2009). A better title for The Origin of Species, then would have been The Origin of Adaptations: While Darwin did figure out how and why a single species changes over time (largely by natural selection), he never explained how one species splits into two. Yet in many ways this problem of splitting is just as important as understanding how a single species evolves . For if speciation didn’t occur, there would be no biodiversity at alleonly a single, long evolved descendant of that very first species.

This brings us to a question which has troubled many evolutionary biologists ever since Darwin first published the Origin of the Species. How can Darwinians address the key question of how a continuous evolutionary process leads to discrete species?

342

6. BREAKING THE GENOME CONSTRAINT

Mayr introduced the biological concept of a species by acknowledging that discrete biological groups are real, as reflected by the isolation of reproductive barriers (which in fact contradicts his evolutionary viewpoint, see Mayr, 2001). A similar contradiction can be found in Stephen Jay Gould’s evolutionary viewpoints, when he explained how punctuated equilibrium works in speciation. In fact, this contradiction is deeply rooted within the neo-Darwinian synthesis, which describes a species as a group that shares a common gene pool. While such synthesis has finally established that gene mutations can spread within populations and that gene frequency changes in populations represent a key mechanism for short term adaptation, it does not address how accumulations of small gene-mediated adaptations lead to new species. It just promises that speciation will occur, given enough time. That is why there is still a crucial gap in understanding how macroevolution works in general and how speciation works in particular. We can answer very few questions about this process, despite many claims. How does evolution transform one gene pool into another, for example? How do gene pool changes lead to reproductive isolation? Is the process of establishing the reproductive barrier gradual/stepwise or sudden (in one to a few generations)? Which genomic levels (gene, epigene, or chromosomes) function as the key mechanism for reproductive isolation? If both genes and genomes are involved in speciation, which one is more important? Which stage occurs first? And why it is so difficult to identify genes responsible for speciation while most species display clearly distinctive karyotypes? Another important question concerns the function of geographic isolation in speciation. Sure, it seems obvious that geographic isolation is a precondition to initiate speciation as it can effectively stop gene flow (note that this model might be more relevant to asexual species). But how applicable is it to sexual species, knowing the power of genome-level constraints, and knowing that the majority of animal and plant species are distributed without typical geographic constraints? Moreover, could distribution patterns related to geographic features be formed after the speciation rather than before it? Of course, the neo-Darwinianebased concept of speciation employs the assumption that in most species, speciation occurs through the accumulation of small gene changes. Here is the real problem: if this assumption cannot apply to the majority of speciation cases, then neoDarwinian explanations must be revisited or even reinterpreted. As discussed in previous chapters, cellular adaptation within the gradual phase is fundamentally different from within the phase transition. Such a concept should be examined for speciation. Interestingly, the cancer evolutionary model (where a karyotype is formed during the macroevolutionary phase before a large cellular population is formed

6.6 IMPLICATIONS OF GENOME THEORY TO EVOLUTIONARY CONCEPTS

343

during the microevolutionary phase) not only mimics cases of organismal speciation but can also be used to explain many puzzles in speciation, including the relationship between continuous adaptation and discontinuous speciation, the pattern of fossil records, and speciation after mass extinctions.

6.6.3 Genome Theory: Defining the Concept of ChromosomeMediated Speciation Claiming that the chromosome is the key instrument of speciation is not a new idea at all. As presented in the discussions in Chapter 2, debates regarding the pattern of speciation occurred even before Darwin published The Origin of Species, which insists on gradualism in evolution as in geology. In fact, before Darwin, biologists had generally accepted saltationism, which favors immediate speciation. As early as 1822, E´tienne Geoffroy Saint-Hilaire argued that species could be formed through sudden transformations, for example. Nevertheless, the importance of chromosome-mediated speciation has failed to capture most evolutionary biologists’ imaginations so far, even though the vast majority of species can be differentiated by karyotypes. The concept of chromosomal alterations or genome changes leading to speciation has been promoted by Richard Goldschmidt, Barbara McClintock, Michael White, Max King, and Gabriel Dover, just to name a few, and is strongly supported by directly observed cases in plants and animals. Mainstream reluctance to accept chromosome-mediated speciation is largely because of gene-centric thinking that considers chromosomes simply as vehicles for genesdnovel species are completely defined by new genes and have no relation to the chromosome. This attitude of favoring genes at all costs has prevented many genetic researchers from seeing the big picture: genomic identity. For example, as mentioned in Section 2.4.2.2, Goldschmidt had proposed two hypotheses: chromosomebased “systemic mutations” and gene-based “developmental macro mutations”dhis latter gene-based “hopeful monster” became popular; McClintock called for focusing future biology on the genomedmost researchers have latched onto her work on jumping genes instead; Dover’s ideas on molecular drive suggested that genomes are riddled with ubiquitous genomic mechanisms of turnoverdthese genome instabilityemediated chromosomal mechanisms are overlooked in favor of biased gene conversion (Dover, 1982; Fodor and Piattelli-Palmarini, 2011). By the way, if chromosomal aberrations are involved, they should not be spreading across populations but rather form reproduction isolation and spread in new populations. In the current cancer genome project, even though it is known that high genomic heterogeneity is the key behind most cancers, people ignore overwhelming genome chaos in favor

344

6. BREAKING THE GENOME CONSTRAINT

of focusing on common driver gene mutations. More examples can be found in Chapters 1 and 2. As the very fact that the majority of animal and plant species displaying unique karyotypes has failed to convince most researchers to appreciate the importance of chromosome-mediated speciation, we must change the conversation. Clearly, the issue is not whether or not evidence exists, but how to change attitudes to see what obviously exists. According to Thomas Kuhn, for people to see fact, they must first accept a new conceptual framework. Scientists with certain worldviews can only see facts that fit their own paradigm. Introducing the genome theory will provide such a new paradigm. In the past two decades, the genome theory has established the following concepts to define chromosome-mediated speciation, which, hopefully, can bring about readiness for its own acceptance: 1. Karyotype-defined system inheritance represents the most important genomic information. Changing such information impacts speciation. For decades, some gene-centric evolutionary biologists have argued that there is no evidence in genetics that major chromosomal rearrangements outweigh gene-level aberrations in producing morphological effects. With the ample discussions on this issue in previous chapters, and in particular, with the new genomic concept where genomic topology defines the platform of network interactions, it become obvious that genome reorganization is often more influential than the changes of individual genes. Such a framework highlights the importance of chromosomal aberrations in biological functions including speciation (see Chapter 4). 2. The main function of sexual reproduction is to maintain the chromosomal coding by providing the key constraint for a given species. Such a concept also explains the chromosomal alterationemediated mechanism of reproductive isolation and speciation. Traditionally, some neo-Darwinians consider chromosomal alterations as incidental to speciation. On the surface, it may be easier to explain how two isolated populations accumulating differences between their gene pools can gradually become new species over time. Deep down, however, such a concept is problematic. First, it depends entirely on a key untested assumption (which, considering genome-based analyses, is highly likely to be incorrect). Second, even if gene pool differences can form new species over long periods of time, chromosomal alterations are still the most common features that separate different species in animals and plants. Thus, a key mechanism of introducing chromosome-

6.6 IMPLICATIONS OF GENOME THEORY TO EVOLUTIONARY CONCEPTS

345

mediated reproductive isolation is still required at a certain point during speciation. Meanwhile, as of yet, there is no certainty that genes cause speciation, despite extensive research (Chapters 1 and 2). 3. Genome reorganization under stress represents a major driver for speciation. The understanding of genome chaos not only explains massive genome reorganization under crises but also links stress to nearly all types of genome alterationemediated speciation. There are many examples of chromosome-mediated reproductive isolation and speciation. However, there is no common accepted mechanism or framework to unify these diverse examples. The “why chromosome” question has been answered by the previous two points, as the altered chromosome can effectively lead to reproductive isolation and potentially lead to new phenotypes (with new genomic coding). But why do chromosome changes occur in the first place? The answers come from cellular evolutionary studies. As discussed in Chapter 3, the analyses of two extreme cases have shed light on this issue. The first case is the common cause of genome chaos. High levels of system stress, either internally or from the environment, trigger survival mechanisms during cellular crises, leading to massive genome reorganization which can create any potential genome imaginable. The second case is the common cause of low NCCA frequencies in normal tissues from healthy individuals. Researchers were perplexed: why, under normal physical conditions, does this low background of seemingly random chromosomal variations exist? It is straightforward to understand the linkage between induced NCCAs and both internal and environmental stresses. Years of research pinpointed the background of NCCAs as one type of fuzzy inheritance (Chapter 4). Under stress, cellular adaptation requires this baseline of genome variations as a spectrum of potential. As soon as the relationship between stress and genome variations is accepted, many reported phenomena of chromosome-mediated speciation will make sense. The same mechanism that generates variants for adaptation and survival at the somatic level provides the future potential of speciation. Here are some cases with examinations. 1. Spontaneous chromosome alterations mediated speciation There are many examples that illustrate how numerical and structural variations of chromosomes are involved in speciation. These cases show that in many closely related species or subspecies, the chromosomal difference, including chromosomal translocation, inversion, duplication,

346

6. BREAKING THE GENOME CONSTRAINT

fusion (especially Robertsonian translocations), represents the most dominant distinguishing genotypic feature. Classic examples include karyotype evolution between humans (46 chromosomes) and chimpanzees (48 chromosomes) through chromosomal fusion and between Indian muntjacs (chromosome 6/7 for female/male) and Chinese muntjacs (with 46 ancestral chromosomes) through multiple chromosomal fusions. In fact, for mammals, Robertsonian translocation represents the most effective process in chromosome-mediated speciation (Garagna et al., 2001). It has been extensively studied for black rats (Yosida et al., 1974), house mice, mole rats (Nevo et al., 1994), and rams, to name a few. Furthermore, different types of chromosomal variations are preponderant among different groups (suborder, family, and genus) of primates. For example, “Robertsonian translocations are preponderant among the Lemuridae (44/57), but are nonexistent among the Pongidae. Chromosome fissions are very frequent among the Cercopithecidae (10/23), but were not found elsewhere, and pericentric inversions are preponderant in the evolution of Pongidae and man (17/28)” (Dutrillaux, 1979). Interestingly, different types of chromosome alterations occur spontaneously during human reproduction, although frequencies are low and a clear majority is eliminated by sexual filters. . Altered sperm and eggs will usually not be fertilized. Even if they are, paternally and maternally transmitted chromosomal aberrations may lead to pregnancy loss, developmental defects, infant mortality, infertility, and genetic diseases in offspring (Marchetti and Wyrobek, 2005). Such events will further preclude the transmission of an altered genome, reducing the overall probability of genome diversity or evolvability. Heng, 2007

Clinical observations fully support this statement. A high percentage of spontaneous abortions in humans display chromosomal abnormality (Warburton et al., 1986). In a recent 872-case study of Robertsonian translocations identified from a single laboratory, 93% of the balanced Robertsonian translocations observed were from adults with infertility, miscarriage, or offspring with known chromosomal abnormalities (Zhao et al., 2015). “In a sense, sexual reproduction provides multiple barriers that maintain order at the genome or chromosomal level” (Heng, 2007b). Interestingly, despite all these barriers and filters at multiple stages of reproduction, individuals or small group of individuals can very occasionally be observed displaying the transmission of the altered karyotypes, often under some special circumstances. There are stunning examples of some individuals displaying 44 chromosomes as contributed by homozygosity for a Robertsonian translocation (in contrast to normal number of 46). Martinez-Castro et al. reported a family with three homozygous carriers for a Robertsonian (13q14q) translocation (44, XX or XY, t(13q14q),

6.6 IMPLICATIONS OF GENOME THEORY TO EVOLUTIONARY CONCEPTS

347

t(13q14q)) (1984). The same type of Robertsonian (13q14q) homozygosity was also observed in a different family (Eklund et al., 1988). Homozygous individuals with (14; 21) and (14; 15) translocations were also reported (Dallapiccola et al., 1989; Song et al., 2016). It is quite possible that there are more cases out there which have gone unnoticed. These very special individuals with new karyotypes (e.g., homozygous of 44 chromosomes) usually do not display drastically different phenotypes. Under extremely new environments, however, they likely will display new features which evolutionary selection can then work on. The fact that new stable karyotypes do form within the human population from time to time forcefully illustrates the importance of spontaneous chromosomal variations in speciation. While these spontaneous chromosomal variations are seemingly only linked to phenotypes that are abnormal or moderately disadvantageous in the current environment, they can function as key materials for macroevolution and can potentially display different phenotypes under different conditions, which explains why so many are present within a population. Traditionally, spontaneous chromosomal aberrations are considered products of bio-errors, caused by stress and/or accidents. Such an explanation does not make sense now that the high level of fuzzy inheritance has become obvious. There are conflicting phenomena: on one hand, there are high frequencies of chromosomal variations in both the sperm and egg; on the other hand, multiple reproductive and developmental filters wash out most of them. On one hand, many of these “escaped” new variants are linked to diseased phenotypes; on the other hand, very few of them can lead to new species. By connecting the dots, these conflicting phenomena do not simply reflect the strategy of wasting energy, but a costly insurance policy to maintain fuzzy inheritance at the chromosome level: while preserving the core genome and a certain degree of heterogeneity of an existing species, it also searches for potential new system emergence, just in case. The relationship between NCCAs, and clonal chromosome aberrations (CCA) and key evolutionary phase transitions observed from cancer evolution fully support the above analyses (Chapters 3 and 4). Theoretically speaking, the births of these individuals with homozygous stable karyotypes (such as individuals with 44 chromosomes) are highly significant for speciation. According to our new speciation model (see Section 6.7.1), individuals with new stable karyotypes have achieved the initial success of speciation, even though they most likely will be eliminated before becoming a stable and long-lasting species. Nevertheless, as humans have clearly benefited from such a “genome packagee based selection strategy” (involving a chromosomal number transition from 48 to 46), we should not downplay the significance of these 44-chromosome individuals. They are examples of constantly present

348

6. BREAKING THE GENOME CONSTRAINT

altered genomes, outliers from the core genome, which can function as potential materials for speciation. Another potential advantage of speciation through limited chromosomal changes (e.g., Robertsonian translocations) is the open option of swinging back to the core parental genome. In many species, individuals with Robertsonian translocations still have certain compatibility to their parental populations in terms of reproduction, albeit the compatibility is not ideal. Interestingly, the reconstruction experiments of the evolutionary histories of chromosomal inversions in Drosophila persimilis and Drosophila pseudoobscura show that “contrary to widely accepted ideas, these inversions existed as polymorphisms in the ancestor of both species before their initial split” (Fuller et al., 2018). Clearly, chromosomal alterations are the preconditions for speciation. 2. Hybrid speciation: chromosomes play an important role Since the 1990s, as various genomic methods have become more successful, hybrid speciation has received increased appreciation. This is especially true in plants, as hybridization is more common in plants than in animals. Many commercial fruits, vegetables, flowers, and garden herbs are hybrids. In recent years, cell fusion (somatic hybrids) has been under investigation to better understand the relationship between polyploidy and chromosomal instability, as cell fusion is a common phenomenon in cancer evolution which can trigger polyploidy, aneuploidy, translocation, and massive genome reorganization (Duelli et al., 2007; Zhang et al., 2014; Heng et al., 2016a). It is anticipated that the dynamic relationship between polyploidy and structural genome reorganization observed from cell fusion can contribute to organismal evolution through hybrid speciation. The general idea is that when the genome is unbalanced because of hybridization, which often involves different chromosome sets, cellular stress can further destabilize the hybrid genome and promote genome reorganization. During this process, environments can select winners with both phenotypic advantages and stable genomes. There is now evidence to support this idea. First, hybridization can lead to chromosomal variations. As illustrated by a recent publication, mating between captive night monkeys Aotus azarae boliviensis (2n ¼ 50) and Aotus lemurinus griseimembra (2n ¼ 53) has produced hybrids. Among four analyzed hybrid individuals, two display de novo genomic and karyotypic alterations, including trisomy, translation, and mosaicism (Hirai et al., 2017). In fact, it has long been known that hybridization in plants can generate chromosomal aberrations (possibly through cellular stress), including multipolar meiosis, chromosomal breakage, and sticky chromosomes, which results in asynchronous anaphase I disjunction or nondisjunction

6.6 IMPLICATIONS OF GENOME THEORY TO EVOLUTIONARY CONCEPTS

349

(Beadle, 1933; Pessim et al., 2015; Klasterska and Natarajan, 1975). Barbara McClintock’s genetic earthquake experiment represents a classical example. Second, chromosomal changes clearly play an important role for speciation. For example, by studying the two occasionally hybridizing North American species D. pseudoobscura and D. persimilis, it was concluded that chromosomal inversions may contribute to the speciation process, which explain the abundance of chromosome arrangement differences between closely related species in the same geographies (Noor et al., 2001). Further analyses suggested that chromosomal rearrangements may facilitate species persistence despite hybridization (Brown et al., 2004). This agrees with the genomic finding about the chromosomal relationship between marine and freshwater sticklebacks, where the chromosomal inversions are responsible for the maintenance of various ecotypes during the early stages of reproductive isolation (Jones et al., 2012). Hybrid speciation in animals is rare and mainly belongs to homoploid hybrid speciation (where both parents must have the same number of homologous chromosomes). Based on the above cases, attention is needed to examine subchromosomal variations such as smaller inversions and deletions/duplications. Third, hybrid-generated polyploidy can lead to diverse karyotypes: hybrid species are more common in plants than in animals (especially in flowering plants), as plants seem to be more tolerant of polyploidy. All flowering plants have undergone at least one whole genome duplication episode in their evolutionary history (Blanc and Wolfe, 2004; Jiao et al., 2011). Furthermore, it is likely that many plants with diverse karyotypes might be directly derived from polyploidy plants. This process is called diploidization following genome duplication, in which large-scale genome reorganization occurs, involving loss of repetitive DNA, chromosomal rearrangements (e.g., fusion and fission), and gene loss (Dodsworth et al., 2016). Interestingly, in cancer evolution, such diploidization occurs rather frequently, especially during the major evolutionary transition when genome chaos occurs. It was recently suggested that chromosomal rearrangement following genome duplication, not duplication itself, is responsible for the macroevolution of angiosperms. Polyploidy is important for the generation of genetic and genomic novelty, but it also requires extensive genome reorganization in order for this evolutionary potential to be fully realized (i.e., ‘diploidization’).” “Diploidization is necessary for evolutionary persistence and diversification. Diploidization of the genome postpolyploidization is associated with neofunctionalization, subfunctionalization and genome downsizing. Dodsworth et al., 2016

350

6. BREAKING THE GENOME CONSTRAINT

Further studies on this issue are urgently needed. 3. Genome chaos: massive speciation during crisis Genome chaosemediated speciation has the power to directly and quickly change the chromosomal coding which defines a species. This is why it is potentially the most important mechanism of speciation. In Earth’s evolutionary history, enormous numbers of new species emerged after each massive extinction. Such observation suggests that there is an important transition between species before and after massive extinctions, but what is the mechanism for such a transition? Interestingly, a similar transition can be observed in cellular evolutionary models. As illustrated in Chapters 3 and 4, the key transition is achieved by massive and rapid genome reorganization, which generates large number of cells with altered genomes. The cellular evolutionary transition has thus provided a model with which to examine the mechanism of speciation following mass extinctions. The key elements/stages of genome chaosemediated macrocellular evolution are as follows: a. High stress generated by crises (internal and environmental); b. The available karyotype can no longer survive under crisis conditions, triggering massive cell death; c. Massive cell death initiates survival mechanisms (achieved by massive genome reorganization); d. New mixed cell populations are formed with heterogenous karyotypes. There are high numbers of transitional karyotypes. As a result, the diverse cell population is less stable and evolves continuously by genome reorganization (through fusion/fission cycle, cellular collaboration), accompanied by cell death. e. Stable genomes are selected. In this stage, genome-based macroevolutionary selection favors survivable genomes and cares less about small advantages such as growth rate. f. One or more cellular populations become dominant following clonal expansion through microevolutionary selection such as the oncogene-promoted growth advantage. In stage of macroevolutionary selection, “environmental isolation” can be introduced by subculturing surviving cells into different flasks rather than in one flask. Many more stable populations emerge when using these different flasks. In a sense, different flasks provide opportunities for different clonal populations to become dominant, which hints at the role of geographic isolation in organismal selection. If similar principles and overall patterns (system behavior) can be used to understand the macroevolution of organisms during massive

6.6 IMPLICATIONS OF GENOME THEORY TO EVOLUTIONARY CONCEPTS

351

extinctions, the cellular macroevolutionary model predicts the following order of events for speciation following a mass extinction: a. Massive extinction (which is a well-known fact); b. A huge wave of new species emerges. Most have rapidly evolved from previous species via genome reorganization (this event can be illustrated by comparing the synteny relationship among different species); c. As many newly formed species display less stable genomes, there is continuous genome reshuffling until a stable genome is achieved. During this period, some transitional species may be eliminated. Some old and new species, as well as intermediary species, might hybridize because the reproductive isolating condition is less stringent at this stage. This stage thus favors species radiation; d. During the crisis stages (stages 1e3) (See Figs. 3.11 and 6.2), the main selection is macroevolutionary selection based on the survival of genomes; e. Each new species with a new genome has its own respective speed of evolution, which may be fast, slow, or at any tempo in between. Meanwhile, microevolution starts to work to increase the population size for each species; f. The majority of species formed with stable genomes will last very long, usually until the next crisis. Of course, because this predicted model is based on the understanding of the cellular evolution, the main purpose of introducing it at this stage is to initiate its discussion in the evolution research community. Nevertheless, many organismal speciation case studies support this model. For example, while the fossil record shows that mass extinctions are consistently followed by episodes of rapid diversification (Erwin, 2008), recent analyses have revealed some surprises. For example, mass extinctions also affect the pace of evolution, in not just the immediate chaotic aftermath, but for millions of years to follow (Krug and Jablonski, 2012). Such reset long-term evolution rates can be explained by the behavior of the new genomes. First, these newly formed genomes are different from their ancestors before the mass extinction. Although they inherited the genomic materials and module information (synteny) from their ancestors, they are different from those ancestors. Why? It is because they can survive in the new world with their new genomes. Different genomes code different systems and display different patterns of evolution. Second, in the long run of microevolution, strong genome constraints by sexual reproduction will prevent drastic karyotype changes. Most species will remain the same despite some short-term modifications by microevolution, resulting in stasis survival until the arrival of a new round of extinctions (as historically evidenced in fossil records).

352

6. BREAKING THE GENOME CONSTRAINT

Compared with spontaneous chromosomal aberration and hybridization, genome chaos is the phenomenon most related to the emergence of massive new species under crisis conditions. However, even though we have used mass extinctions to make a point, genome chaos can also be triggered by local environments, as well as other genome alteration events, on a limited scale. For example, the speciation of gibbons most likely involved genome chaos. As a part of the same superfamily as humans and great apes (Hominoidea), the gibbons’ karyotype displays massive chromosomal changes from the common hominoid ancestor. In total, there are 24 major chromosome rearrangements involved between the karyotypes of the presumed gibbon ancestor and the hominoid ancestor. In addition, 28 more rearrangements are involved which differentiate the various currently living gibbon species from that presumed gibbon ancestor (Carbone et al., 2006). Interestingly, no specific common sequence elements are shared among independent rearrangements, suggesting the likelihood that genome chaos is involved where nonhomologous end joining represents one key mechanism (Liu et al., 2014; Ye et al., 2018a, 2018b). Moreover, individual hybridization events can also lead to genome chaos, albeit at a much smaller degree compared with genome chaos during mass extinctions. One example is the exceptional speed and diversity of speciation of the haplochromine cichlid fishes of Africa’s Lake Victoria. In the last 150,000 years, over 700 diverse cichlid species have evolved. It was recently demonstrated that hybridization between two divergent lineages facilitated this process (Meier et al., 2017). It would be interesting to examine the karyotypic relationship with fast speciation where genome chaos is involved. 4. The core genome concept: limited fuzzy inheritance within populations A critical view toward using chromosomal coding to define species argues that for some monkey species, different individuals can display slightly variable chromosome numbers. It is thus unreliable, some claim, to define a species using chromosomal coding when the karyotype is not a fixed identity. Knowing that such a situation can also be applied to many species, including humans, it is helpful to discuss the concept of the “core genome” for a given species. The core genome is a common karyotype with defined genomic features (number of chromosomes, order of genes along each chromosome). It is shared by the majority of individuals in a given species, and it is protected by the function of sex to eliminate most drastic variants. Occasionally, however, some minor variations can escape the filters or constraints. Most of these variants are unable to pass themselves on to future generations, as they are often associated with reduced

6.6 IMPLICATIONS OF GENOME THEORY TO EVOLUTIONARY CONCEPTS

353

reproductive potential. Infertility in humans, for example, is surprisingly common, occurring in approximately 15% of the population, and research clearly links chromosomal variations with infertility (Harton and Tempest, 2012). Because of the mechanism of fuzzy inheritance at the chromosome level, however, there are always some individuals with altered chromosomes, most of which occur de novo. The fact that the human population includes individuals with Down syndrome, as well as individuals who carry various chromosomal inversions and translocations, does not challenge the usage of 46(XX) and 46(XY) to describe the core genome for homo sapiens. While limited individuals can occasionally find mates with similar genomes and produce offspring, it is extremely difficult to become a successful subpopulation under normal natural selection conditions (see Section 6.7.1). Note that many factors influence the core genome’s fuzziness, including environmental dynamics (highly unstable environments can increase the portion of altered genomes within a population), differences among different species (each of which have different tolerances of fuzziness), different stages of speciation (newly formed species are often less stable), and the presence of artificial selective conditions (under artificial conditions, some hybrids can mate and their offspring can survivedthis does not occur in natural conditions). 5. Altered karyotypes and evolutionary certainty: understanding the genotypeephenotype relationship Molecular researchers influenced by gene-centric strategies often have a hard time understanding the karyotypeephenotype relationship because of a lack of conceptual and experimental platforms. The convenient way to study specific gene function is by inactivating the targeted gene to study gained or lost functions, assuming there is a direct causative relationship between the gene and the phenotype. In spite of these serious caveats, a similar approach has been used in cancer research to link specific genes to genome instabilityemediated karyotype alterations (Ye et al., 2018b). But it is a real challenge to study how altered genomes impact phenotypes as large numbers of genes/pathways can be involved. One can only imagine how much harder it would be to study the altered genomes’ impact on a species’ specific phenotypes. Furthermore, it is less meaningful to dissect a genome into genomic parts, as the genome’s emergent properties are defined by the interaction of environments within the context of evolutionary selection. Nevertheless, if researchers know how to collect and analyze data through the lens of evolution, then overwhelming evidence of altered genomes’ impact on phenotypes can be found. These phenotypes include providing the potential to break down the genome constraint and form

354

6. BREAKING THE GENOME CONSTRAINT

new species, establishing partial or full reproductive isolation, generating overall phenotypic changes including diseases and moderate phenotypes, changing the range of fuzzy inheritance for many features (providing potential phenotypes), and generating specific features unique to new species. First, most of the chromosomal alterations generated from the germline are deadly. Most are detected from spontaneous abortions. Only a few types of trisomy in humans can survive, albeit with diseased phenotypes. These examples support the importance of correct karyotypic coding on normal biological function. Significantly, the capability of producing various chromosomal aberrations can also provide macroevolutionary potential. Second, for some chromosomal aberrations such as chromosomal inversions and Robertsonian translocations, although morphological differences between carriers and normal individuals can be moderate, the phenotype itself is obvious: it contributes to reproductive isolation by a different degree. It is a well-documented fact that the capability to reproduce is affected in individuals with chromosomal inversions, from Drosophila to sticklebacks to humans (Noor et al., 2001; Jones et al., 2012). Third, the new phenotypes are not only environment-dependent but also highly dynamic (ranging from moderate to drastic). The involvement of fuzzy inheritance will likely allow individuals with new genomes to reset the boundary of the phenotype (for both maximal and minimal expression of some features within different potential environments). The variable phenotype thus becomes important for survival and success in dynamic environments. Although a newly formed species may not immediately display significant advantages or disadvantages compared with its parental population, these distinctions could reveal themselves later. Then, the new species may get eliminated (become extinct), coexist with the parental species, or replace the parental species altogether (thus becoming a stable species). Equally importantly, even if there is no huge phenotypic difference between newly formed and parental species, the slightest differences can be amplified by evolutionary selection. The above principles have been beautifully demonstrated by recent karyotype engineering experiments in yeast. Two groups of investigators created artificial yeast species by fusing 16 independent yeast chromosomes into one and two giant chromosomes, respectively, using CRISPRCas9emediated genome editing (Luo et al., 2018; Shao et al., 2018). Clearly, by reorganizing the genome (rather than focusing on gene-level manipulations), these investigators have effectively created a series of new artificial yeast species under laboratory conditions. The fusion of sixteen native linear chromosomes into a single chromosome results in marked changes to the global three-dimensional structure of the chromosome due to the loss of all centromere-associated inter-chromosomal interactions, most

6.6 IMPLICATIONS OF GENOME THEORY TO EVOLUTIONARY CONCEPTS

355

telomere-associated inter-chromosomal interactions and 67.4% of intra-chromosomal interactions. However, the single-chromosome and wild-type yeast cells have nearly identical transcriptome and similar phenome profiles. The giant single chromosome can support cell life, although this strain shows reduced growth across environments, competitiveness, gamete production and viability. Shao et al., 2018

Just by reading this article, one might get the impression that the order of genes on the chromosomes (the chromosomal coding) is not as important as previous chapters have suggested. Changing the genomic topology here did not have a big impact, as the new yeast strain seems to be similar to the wild type in terms of transcriptome and phenotype profiles. However, an opposite conclusion can be reached if the experiment is reconsidered and reexplained from an evolutionary point of view. Luckily, using the same approach, Luo et al. have paid attention to the issue of how chromosome fusion leads to reproductive isolation, a key evolutionary question. Their results are highlighted here: When we crossed a sixteen-chromosome strain with strains with fewer chromosomes, we noted two trends. As the number of chromosomes dropped below sixteen, spore viability decreased markedly, reaching less than 10% for twelve chromosomes. As the number of chromosomes decreased further, yeast sporulation was arrested: a cross between a sixteen-chromosome strain and an eight-chromosome strain showed greatly reduced full tetrad formation and less than 1% sporulation, from which no viable spores could be recovered. However, homotypic crosses between pairs of strains with eight, four or two chromosomes produced excellent sporulation and spore viability. These results indicate that eight chromosomeechromosome fusion events suffice to isolate strains reproductively. Luo et al., 2018

Four major conclusions from these two important publications can be summarized as follows: (1) Genomic arrangement is important for a species’ survival, even under artificial conditions. The fact that one research group could fuse all 16 chromosomes into one while another research group using the same technology could not strongly suggests that the arrangement of the fusion order matters the most, as the arrangement of individual chromosomes differed between these two research groups. Such an issue can be addressed by creating new arrangements of the fused chromosome. Furthermore, the drastic decrease in spore viability caused by chromosome fusion indicates that when genomic topology is altered, the survival potential of these newly created strains is altered as well. It is also possible that fusion between each entire chromosome will have a better chance of generating survivable genomes compared with breaking each chromosome into smaller pieces and then randomly fusing them into a giant chromosome. Such a

356

6. BREAKING THE GENOME CONSTRAINT

process can more easily break up the synteny blocks which are important for biological modules. In fact, this formation of one giant single chromosome by chromosome fusion has been observed during the induction of genome chaos in a cancer cell (see Fig. 4.19 in Chapter 4). This experiment indicated that these giant chromosomes, often observed during transitional periods, are unable to become the winner in somatic evolution, possibly because of the fact that the giant chromosome is unstable. The “winning” karyotypes of successful populations are often displayed with chromosomes in the normal size range (Liu et al., 2014). (2) Fast reproductive isolation can be achieved through genome reorganization in limited steps. Altered karyotypes represent a very important phenotype: full or partial reproductive isolation (between species) and altered reproduction capability (within species). This important point has long been ignored by many when discussing a lack of obvious phenotypes from altered karyotypes. (3) Most reorganized genomes are not going to survive. A majority will lose the competition in natural conditions. Despite similar growth phenotypes, when cultured together, the new strain loses the competition with its parental strain. This is the most important phenotype evolution cares about. The lower reproductivity will just lead to the same results. This is why the odds to produce a new species that will last in natural conditions are extremely low. (4) Chance is an important factor when evaluating the success of macroevolution. Although both the genomic arrangement and the chromosomal number are important to provide workable and stable genomes, they also have certain flexibility as different combinations can work, albeit to different degrees. One key factor the yeast chromosomal fusion experiment did not address is the chance to allow individuals with newly formed genomes to meet, mate, and produce fertile offspring. For yeast, achieving such reproduction is much easier because of the capability of switching between sexual and asexual reproduction. When sexual reproduction is required, mating with an individual with the same altered genome could be the most challenging step for speciation. Another chance-related issue could be the culture conditions. Under some extreme conditions, parental strains can no longer survive, and again by chance, if some individual of the newly formed strains can survive, divergent evolutionary fates will be decided. All of a sudden, the new species will have a chance to seize. For more discussion on this area, see the next section on environmental factors in speciation.

6.6 IMPLICATIONS OF GENOME THEORY TO EVOLUTIONARY CONCEPTS

357

Now, with all these new understandings, it is easier to accept the evolutionary significance of genome alterations. The most important phenotype concerns evolutionary certainty. Others, characterized by diverse molecular phenotypes, are relatively unimportant. The key is infusing new genomic packages with higher evolvability, reflected by increased genomic and phenotypic plasticity. 6. New species formation occurs much more frequently than suggested by natural selection, but the chance of the formation of lasting species is extremely low and time-consuming The somatic evolutionary model of cancer suggests that even though a large quantity of new, different genomes are created within the punctuated phase followed by macroevolutionary selection, it takes a much longer time for the winning genome to produce large numbers of offspring within the stepwise clonal expansion phase. This demonstrates the importance of microevolution: without it, no cellular population will become visible. Using such a model to review organismal speciation, it is logical to separate speciation into two phases as well. The first is the initiation phase (to become a new species) and the second is the domination phase (to become a lasting species). While extremely rare, some individuals do display a homozygous of altered chromosome (like the cases of 44 chromosomes). It is not unreasonable to suggest that these individuals have already established reproductive isolation (at least partially). The challenge is to meet reproductive partners who have the same or similar genomes. Without such luck, the opportunities of forming new species will be lost. This is in fact the case. Here is a new hypothesis: speciation is a constant event, but the chance of forming a stable species are extremely low. Becoming a stable species requires a perfect storm. Going through all the major stages and different environmental conditions, from geographic isolation to crisis conditions, helps to greatly increase the odds. 7. What is the role of geographic isolation in speciation? Many consider allopatric speciation (speciation by geographic isolation) the primary type of speciation. According to neo-Darwinian thinking, geographic isolation is a precondition to speciation, as physical isolation can interfere with gene flow. Preventing two or more groups from co-mating results in specific lineages with separate gene pools. This can occur either by physical distance or specific physical barriers such as rivers, oceans, or deserts. In addition to the overwhelming fact that most species display a unique karyotype and there are very limited cases of speciation caused by speciation genes, there are still unsolved questions regarding geographic

358

6. BREAKING THE GENOME CONSTRAINT

isolation. For example, how can the interrupted fossil record be reconciled with geographic isolation? Should geographic isolation be linked to historical periods of punctuated fossils? How can huge extinction events with subsequent rapid emergence of wide-ranging and divergent species be explained? How can the existence of closely related species living in close proximity without geographic isolation be rationalized? If speciation is a process by which small genetic changes are accumulated, then geographic isolation makes sense as a precondition for speciation. If, in contrast, speciation is mainly a process of genome reorganization and finding a sexual partner with a similarly altered genome, then geographic isolation is not a precondition. In this case, geographic isolation is irrelevant to the initial process of speciation (primarily genome alteration). However, it is likely that geographic isolation might aid speciation. Isolation could increase mating opportunities between individuals with altered genomes (e.g., meeting reproductive partners with the same compatible altered genome would be more likely on the same small island) and reduce initial competition from the parental species to allow the newly formed species to grow (e.g., the fledgling population would successfully proliferate and thrive on a nearby new island). Clearly, genome reorganizationemediated speciation reinterprets the role of geographic isolation in speciation. The environment is a key piece of the speciation puzzle, but its exact mechanisms are likely very different from those of prevalent neo-Darwinian assumptions.

6.7 EVOLUTION IS TRUE BUT ITS MECHANISM MUST BE REEXAMINED By redefining the current mechanisms underlying natural selection (which mainly involve the microevolutionary phase) and integrating them into a new evolutionary model considering the genome-based relationships between micro- and macroevolution, it can be stated that while evolution is indeed true, its detailed mechanisms may differ from current predominant ideas. First and foremost, Darwin discovered, and perhaps more importantly, brilliantly popularized, the law of the common ancestor and how natural selection functions by pointing out adaptation within species. Yet, he missed some key mechanisms on how evolutionary selection functions throughout long periods of time and specifically how speciation occurs. In particular, the discontinuity of evolution has not been adequately explained and the entire evolutionary process has been simplified into an accumulation of microevolutionary phases. Experimental genetics, particularly population genetics, has served as the backbone of neo-Darwinism and has promoted discussion and provided ample

6.7 EVOLUTION IS TRUE BUT ITS MECHANISM MUST BE REEXAMINED

359

evidence of relatively minor mutations spreading both in the laboratory and in nature. However, most studies focusing on gene frequencies deal with adaptation within the same speciesdwithout crossing the boundary from one species to another. Population genetic models have attempted to explain the mechanism of a species’ genetic structure and how it changes over time but do not address the mechanism leading to an increase in the number of species. The evidence fails to validate that an accumulation of these minor mutations in nature could actually lead to speciation. Thus, we are confronted with a key evolutionary question: how does speciation occur? In other words, how do genomic changes such as chromosome variations and gene mutations, environmental conditions such as geographic isolation, evolutionary selection (both micro- and macroevolution), and random chance play their interactive roles in the speciation process? The answer lies within the introduction of the genome theory.

6.7.1 The Integrated Model of Speciation: How Micro- and Macroevolution Create and Maintain Species For further discussion, a model of genome/gene alterationemediated speciation is illustrated in Fig. 6.2. The key feature of this model is the combination of micro- and macroevolution and the important role of chromosomal coding along with chromosomal reproductive barriers to separate and create species. Based on the genome theory, the processes of micro- and macroevolution differ greatly and occupy different time windows and mechanisms based on distinctive levels of genetic changes. They are still, however, connected to the entire process of evolution. Note that this model refers to the speciation of sexually reproducing eukaryotes. There are four necessary conditions/phases to achieve successful speciation. 1. Macroevolutionary phase 1: Initiation of speciation by breaking the genome constraints of the parental species through genome reorganization. This process results in various individuals with altered genomes (which differ from the core genome of the parental species). As previously discussed, genome reorganization can be achieved through spontaneous chromosome variations, hybrids (including polyploidy), and genome chaos. Environmental stress can promote genome reorganization. Most individuals with altered genomes, however, will be eliminated. There is a strong selection in this phase. 2. Macroevolutionary phase 2: Although only a small portion of individuals with altered genomes are able to survive, ultimate success is passing their altered genomes onto fertile offspring. To achieve this step, which is essential for speciation, individuals with

360

6. BREAKING THE GENOME CONSTRAINT

FIGURE 6.2 Diagram of genome alterationemediated speciation. There are four different phases. Three initial phases involve macroevolutionary selection and one phase involves microevolutionary selection. The initial phases are responsible for breaking down the genome constraints and creating individuals with altered genomes (phase 1), mating with individuals with similar genomes to form new species by producing fertile offspring (phase 2) (The success of this phase requires at least one pair (of opposite sex) that can produce fertile offspring), and selecting a stable genome (phase 3). The last phase is the natural selection phase, which produces a large population of the selected species. The different yellow shapes represent different karyotypes generated from genome reorganization. The yellow circle and yellow oval represent similar karyotypes. The circles of various red shades represent different gene profiles of the same karyotype. The number of circles represents different population sizes.

new genomes must find a mating partner that has a similarly altered genome or can tolerate their altered genome (so that fertile offspring can be produced). This step has an extremely erratic probability, which explains why the formation of new species is relatively rare. If the individual with the altered genome fails to find an appropriate partner, said new genome will become useless regardless of its potential adaptive benefits. If the individual with the new genome does find an appropriate partner and its progenitors are able to survive, it will have the chance to produce a stable new genome (even if unideal) and potentially become a new species. It is safe to say that many successful and perpetuated species evolve through pure luck (not including the many species formed after massive extinctions, when the opportunity of speciation is drastically increased). As that is the case, being in the right place at the right time to find the right partner is crucial. A few reported cases of human individuals with 44 homozygote karyotypes resulting from Robertsonian translocations fully support this point. They were nearly all formed by mating between first cousins who display 45

6.7 EVOLUTION IS TRUE BUT ITS MECHANISM MUST BE REEXAMINED

361

chromosome heterozygote karyotypes (Song et al., 2016). The opportunity for two random individuals with 45 chromosome heterozygote karyotypes to meet is approximately 1 in a few million. With geographic and social constraints, the odds are much, much smaller! In addition, extreme conditions such as mass extinctions may also disturb the stringency of the sexual filter which promotes reproduction between individuals with similar genomes. This would be very unlikely under normal conditions. 3. Macroevolutionary phase 3: Selection of winning genomes among transitional genomes. The pattern in this phase depends on the outcome of macro-E phase 2. For example, if two individuals (male and female) have perfectly matching altered genomes, their offspring will likely have a stable genome rather quickly. In such perfect conditions, it is possible for a species to form with a stable genome within one to two generations, only skimming macro-E phase 3 right to micro-E phase with its compatible genomes. In contrast, if mating partners display slightly different genomes, even though they can produce offspring, these offspring will display unstable transitional genomes which can either be eliminated or further diversified. This process can be highly variable for new and different species. For some species, selection of a stable genome can continue for hundreds or thousands of generations. This stage also opens opportunities for adaptive radiation. In general, based on the high extinction rate of all species, it is highly likely that many new species will not be able to survive unless they have an instantaneous critical mass population. Environmental opportunities, such as geographic isolation, will help new species become more successful. Survival and dominance will occur much more quickly where there is little direct competition from the “parental” species. 4. Microevolutionary phase: Generation and maintenance of a large population with sufficient diversity. A hallmark of a successful species is long-term stability, diversity, and dominance. Emergent species with stable core genomes must achieve viable population sizes of a diverse makeup, as different environments often eliminate many individuals and population recovery requires survivors to gradually establish new genetic landscapes, which is achieved through gene- and epigene-mediated microevolution. Large numbers of individuals ensure a diverse gene pool, which will provide a higher probability of survival in adverse conditions or competition. In contrast, new species with small population sizes can easily be wiped out once and for all.

362

6. BREAKING THE GENOME CONSTRAINT

This stage represents the stasis phase of speciation and takes much longer than the three initial stages (see Fig. 6.3). During this long process, microevolutionary adaptation can modify the new species to better fit its ever-changing environment, which is reflected by fluctuating population sizes. A successful population often has a better chance of leaving behind detectable fossil records. Until that happens, a specific species will likely become one of the many undetected species in the fossil record because of small numbers. Once successful species are established, genome constraints again ensure long-term stability. Under extreme conditions, however, when competition or conditions become so difficult for a species that they exceed the boundary of genome viability, there are only two options left: extinction or emergence of a new species (which could trigger another round of speciation). Because of the very low probability of new species emergence, most species under extreme stress become extinct. The fact that the vast majority of all species become extinct illustrates the importance of the genome package and the limited power of microevolution. This speciation model clearly differs from the neo-Darwinian model. The key is that genome-defined

FIGURE 6.3 Comparison between the Darwinian microaccumulative model and the genome reorganizationebased macroevolutionary model of speciation. The left panel represents the Darwinian microaccumulative model, where speciation (proceeding from A to B, B to C, B to D, or D to E) is achieved by the accumulation of a series of microevolutionary events (small red arrows) across many generations. The right panel is the model of genome reorganizationebased speciation, where speciation is not achieved by the accumulation of microevolutionary events but occurs independently through macroevolution itself. Macroevolution is formed in one or a few generations, followed by microevolution which can last a long period of time. While microevolution is often detectable within a short time window, its impact can often be canceled over a longer term. Considering the overall panevolutionary perspective, the direction of microevolution constantly changes (illustrated by the direction of small arrows), separating the mechanisms of micro- and macroevolution.

6.7 EVOLUTION IS TRUE BUT ITS MECHANISM MUST BE REEXAMINED

363

macroevolution occurs first (to introduce isolation within a few generations) before microevolution follows (to generate and maintain large populations over time). In contrast, the neo-Darwinian model stresses that the accumulation of microevolutionary events lead to speciation over time. As illustrated in Fig. 6.3, the pattern of speciation and its relationship with the time and direction of microevolution differ greatly. If, according to this new model, accumulated microevolution does not lead to speciation, what is the overall contribution of natural selection to speciation other than maximizing gene-mediated adaptation to increase the population size? First, it is possible in some cases that speciation can be driven by gene mutations or other mechanisms above or below the genome level. But gene-meditated speciation should not be considered the general rule for speciation, as a majority of species involve the formation of altered karyotypes. Furthermore, the idea that mass extinctions are associated with mass speciation is well-accepted. When genome alteration occurs, gene flow immediately ceases because an altered genome cannot pass through the reproduction filters. In contrast, even if a gene initiates speciation in some cases, the new species must still face the issue of genome alterations arising from that mutation, such as finding mating partners with similar alterations to become a new viable species. Second, species-specific genes can also be formed during micro-evolution (after the initial phase of speciation). Some new genes can function as drivers of phenotypic evolution (Chen et al., 2013), which allows new species to further depart from parental species. Third, it is possible that modified phenotypes (e.g., brain size) can be accumulated during microevolution. It is important to note that the branch tree type of evolutionary relationships among species only illustrates the relationships among successful species that reach a critical mass historically and can be observed by us. A more accurate representation of Fig. 6.3 would use a 3D diagram or a model similar to the multiple levels of landscapes model (Heng et al., 2011b; Heng, 2015) rather than a simple 2D branch type diagram. Interestingly, the adaptive model illustrates evolutionary potential, and the branch model (or the tree of life on earth) only illustrates the dominant successful types that have occurred on earth through only one run or limited runs of evolution. If natural evolution was run an infinite number of times, the relationship between the branching and adaptive landscape models would be the same. The above model of speciation focuses on species with typical eukaryotic genomes and sexual reproductiondit does not address prokaryotes. As suggested, the similarities among prokaryotic genomes, from their number of genes to their genomic arrangements, need to be considered. With regard to endosymbiosis, Lynn Margulis promoted the notion that certainty plays a fundamental role in the transitional process between prokaryotic and eukaryotic systems; in a similar way, the self-

364

6. BREAKING THE GENOME CONSTRAINT

organization principle contributes to the initiation of life forms. However, it is very likely that alternate stages of natural evolution involve different mechanisms of selection. For example, as soon as endosymbiosis is achieved, further macroevolutionary events in subsequently derived species occur through genome reorganization rather than a continuous acquisition of totally new genomes. Realizing that the different stages of evolution might require a variety of mechanisms is essential. As an example, the multiple-level interaction model of selection and its relationship to different types of self-organization has been described (Fig. 7.3; Heng, 2009). The above model is based on the synthesis of many genomic and evolutionary facts in addition to the genome theory. More extensive arguments can be found in Chapters 1e4. For the sake of brevity and convenience, a few additional points will be listed in Table 6.2. It is important to mention that this model has gained support from both computer simulations and clinical observations. In addition, increased case reports of rapid speciation in nature have started to challenge the traditional view of speciation. Through our analysis of how genome alterations contribute to biodiversity by forming different species, a computer simulation was performed to compare the patterns of sexual and asexual reproduction. To our surprise, the expected population of new sexual species was invisible among parental populations. To illustrate the importance of mating opportunities for speciation, the simulation program was designed to relocate individuals with the same altered genome type onto the same island (to increase their chance of mating by mimicking the conditions of geographic isolation from parental populations). As soon as this condition was applied, a large number of visible populations of new species emerged. This simulation experiment concludes that both the mechanisms of generating new genomes and creating mating conditions for newcomers are essential for speciation. It fits well with our model (Heng, 2015; Ying et al., 2018) (also see Chapter 5). Although the conclusion of the simulation experiment makes sense, can it be applied to the real world? As it turns out, there are reported cases that fully support the simulation data. In a Chinese family, there was a Robertsonian translocation involving chromosomes 14 and 15, generating a karyotype of 45 chromosomes. Because of mating between first cousins (by increasing mating opportunities among individuals with similar genome alterations), a 44-chromosome homozygote karyotype was produced (potentially the initiation of a new speciesdthink of the relationship between humans and chimpanzees). When this 44-chromosome male mated with a normal 46-chromosome female (as he was not able to meet a female with 44 chromosomes and the same 14 and 15 Robertsonian translocation), it resulted in the death of an offspring with 45

TABLE 6.2 Examples of Concepts/Evidence That Support the Ultimate Importance of Genome Reorganization in Eukaryotic Evolution. Key Message

References

Concepts and ideas

Nature works constantly with the same materials. She is ingenious and varies only the forms.

Geoffroy Saint-Hilaire

The greatest masterpiece in literature is only a dictionary out of order.

Jean Cocteau

The karyotype defines genetic networks and preserves system inheritance.

Heng 2009

Any chromosomal changes impact a large numbers of genes. The changes at lower genetic levels (gene and epigenetic levels) only modify the systemdthey do not create new systems. Facts of organismal evolution

Telomereetelomere fusion is the most frequent mechanism for chromosomal number changes among different yeast species.

Gordon et al., 2011

The majority of eukaryotic species display different karyotypes.

Ye et al., 2007; White 1978; King 1995

The karyotypic relationship between organisms can be effectively used to construct an evolutionary tree as many evolutionary events involve genome-level changes.

Ferguson-Smith and Trifonov 2007;

A few inversions and one chromosomal fusion separate humans from chimpanzees. An additional 5:17 translocation is found in the gorilla. Numerous translocations have led to massive reorganization, which separates gibbons from humans. The complex karyotypic relationship between gibbons and humans can be illustrated by 100 synteny breakpoints (at a resolution of about 200 kb).

Carbone et al., 2014, 2006

Genome sequencing projects have revealed a shortage of species-specific genes among various species which indicate that genome stochastic reorganization could play a central function.

Ranz et al., 2007; Navarro and Barton 2003;

Graphodatsky et al., 2011; Gordon et al., 2011; Rogers et al., 2011; Goldschmidt 1940

Continued

365

Murphy et al., 2005; Kohn et al., 2006 Dujon 2006

6.7 EVOLUTION IS TRUE BUT ITS MECHANISM MUST BE REEXAMINED

Categories

TABLE 6.2 Categories

References

Genome duplication and chromosomal rearrangements are key features among species separated by various distances within a single evolutionary phylum (e.g., hemiascomycetous yeasts).

Wolfe 2006; Gordon et al., 2011

Recent hemiascomycete evolution suggests an evolutionary mechanism by relocating a set of genes. Chromosome maps show great similarities within the same Saccharomyces sensu stricto clade, with limited chromosomal rearrangements; in contrast, between clades there are only short syntenic blocks observed, indicating that numerous chromosomal rearrangements have occurred. The syntenic block size and average number of genes per block varies across the phylogeny of 12 Drosophila species.

Clark et al., Drosophila 12 genomes consortium 2007

Salmon that colonized a river and a lake beach evolved partial reproductive isolation in fewer than 13 generations.

Hendry et al., 2000; Wong 2000

Super-fast evolving fish split into two species in the same lake in 150 year.

Le Page 2016 Marques et al., 2016

Observations from evolutionary experiments and clinical samples

McClintock’s “genetic earthquake” experiment

Jones 2005

Novel traits can evolve through rearrangement and amplification of preexisting genes in bacteria.

Blount et al., 2012

All key transitions in cancer evolution (immortalization, transformation, metastasis, and drug resistance) are associated with and likely achieved by genome alterations. Most cancer types display genome-level alterations.

Heng et al., 2009, 2011b;

Both aneuploidy and translocations change the entire genome network. Chromosomal translocations not only directly impact genes at their breakpoints but also change the relationship of the overall genome topology and transcriptome.

Pavelka et al., 2010

Ye et al., 2009

Stevens et al., 2013a Heng 2015, 2017

6. BREAKING THE GENOME CONSTRAINT

Key Message

366

Examples of Concepts/Evidence That Support the Ultimate Importance of Genome Reorganization in Eukaryotic Evolution.dcont’d

6.7 EVOLUTION IS TRUE BUT ITS MECHANISM MUST BE REEXAMINED

367

chromosomes. The task of creating a 44-chromosome population ceased. This case illustrates that despite the ability to achieve a homozygote karyotype of 44 chromosomes in two to three generations, mating must occur between individuals with the same karyotype to form a cluster of individuals with identical new karyotypes. Otherwise, if the alteredgenome individual mates with the original population, the altered karyotype will not be passed on (Song et al., 2016). Interestingly, as mentioned earlier, there are multiple similar cases involving different occurrences of human karyotype evolution; they all likely ended up being eliminated. There are certainly cases that come and go without even being noticed as exhibited phenotypic changes could be relatively unfit or rather moderate under current environmental conditions. When under extreme environments, the viability of an altered human genome will become obvious. Only during that time will a new run of macroevolution select the winner, either “them” or “us.” Perhaps current prenatal screening of cytogenetic anomalies unconsciously aims to eliminate altered genomes, thus protecting our core genome. One can speculate that humans are consciously eliminating aberrant chromosomes to reduce genetic diseases while unconsciously eliminating potential competitors or saviors of the species. Interestingly, the key message from our model that chromosomal variations occur before gene involvement in speciation is supported by a recent analysis on the role of ancestral polymorphisms of chromosomal inversions in speciation (Fuller et al., 2018). It turns out that some signature chromosomal changes that define species were already present long before the species split. Our results suggest that patterns of higher genomic divergence and an association of reproductive isolation genes with chromosomal inversions may be a direct consequence of incomplete lineage sorting of ancestral polymorphisms. These findings force a reconsideration of the role of chromosomal inversions in speciation, not as protectors of existing hybrid incompatibilities, but as fertile grounds for their formation. Fuller et al., 2018

By summarizing the new model of speciation, as well as synthesizing supporting evidence, one stunning conclusion emerges: speciation is a common and frequent event, though the majority of new species are constantly being eliminated at earlier stages of speciation and are thus invisible to us. If this prediction is correct, we should be able to detect waves of speciation events from time to time, especially in dynamic yet isolated environments, and discover many new species with limited population sizes. It is also possible that some of the new nearly extinct species are in fact individuals within the initial stages of speciation and will be eliminated by parental populations anyways. Creating a special space for these

368

6. BREAKING THE GENOME CONSTRAINT

“breakthrough” species by separating them from their parental populations would certainly increase chances of survival. Slowly but surely, despite the traditional belief that directly observing speciation in nature is extremely rare, the status quo is changing. In recent decades, not only has the evidence for sympatric speciation grown but also many exciting observations of rapid speciation have been reported. This of course captured the interest of popular science magazines leading to headlines like “High-speed speciation” and “Super-fast evolving fish splitting into two species in same lake.” The common theme throughout these headlines emphasized that “the impossible” (at least based on current evolutionary theory) is happening in front of our eyes! It was not an isolated case, eitherdincreasing cases included various species (Hendry et al., 2000, 2007; Wong 2000; Marques et al., 2016; Le Page, 2016). Although these manuscripts currently focus on ecological speciation (as ecological windows allow us to discover the ongoing speciation process), the ecological explanation clearly agrees our model. First, ecological conditions create opportunities in which speciation can be processed (at least for the initial macro-E phases); second, ecological theories only function when altered genomes are present in the first place, which ecological selection can then work with; lastly, whether or not the initial species can last depends on many interactive factors. Clearly, our model needs to be used to assess many questions including the following: Salmon adapting to divergent breeding environments can show restricted gene flow within at least 14 generations. Birds evolving different migratory routes can mate assortatively within at least 10e20 generations. Hybrid sculpins can become isolated from their ancestral species within at least 20e200 generations .. Ecological speciation can commence within dozens of generations. How far it goes is an important question for future research. Hendry et al., 2007

According to our model, this is the answer: these salmon, like any newly formed species, have a long way to go. They can easily be wiped out at different phases of speciation. There are many examples of isolation barriers that can be established in limited generations, but whether they become a lasting species is hard to predict. They represent new potential nevertheless. While lasting species with large, diverse populations and broad geographic distributions survive for long periods of time (no matter how the environment changes, the population will bounce back, repopulated by some individuals, except during mass extinctions), many transitional species come and go, reflected by the high dynamics of speciation. This prediction can be confirmed by the case studies of the speciation of Lake Victoria cichlid fish. In addition to the fact that over 500 endemic species evolved from a few species (at least 3) in 15,000 years, both speciation and extinction still continuously occur, reflected by the observations that there are still many very rare species for which the

6.7 EVOLUTION IS TRUE BUT ITS MECHANISM MUST BE REEXAMINED

369

population size is unknown (Joana Meier, personal communications; Meier et al., 2017; McGee et al., 2015).

6.7.2 Time for Reinterpretations The integrated model of speciation offers new explanations to many long debated evolutionary issues, which are closely related to the dynamics between gene-mediated microevolution and genome-mediated macroevolution. Here are a few examples. a. Genome-based alternative mechanisms for evolution Here is a key issue: all current species have evolved from ancestral species, yet do not seem to be evolving (it is difficult to observe current evolution). To explain it, Darwin introduced the concept of natural selection which focused on the accumulation of small variations over time. Now, it is obvious that the idea of “small changes over time” can neither explain increased cases of rapid evolution, including fast speciation, nor clarify why short-term adaptations fail to lead to long-term speciation. Furthermore, it is difficult to use natural selection to explain why most species were formed following mass extinctions and maintain their stasis afterward. Another related challenge is the obvious conflict between concepts and reality. If every organism continues to change, how do we classify them? Where do we draw the line? The fact that we can nicely classify species suggests a need to modify our concept. If the reproductive barriers are real and effective, there must be gaps produced between species. Any responsible evolutionary theory should not avoid this issue. The key is to accept both sides of the same coindchange, without completely changing. One side represents the microevolutionary phase, where small change occurs continuously. The other side is the boundary of genomedefined species, where big genomic change is eliminated. Species can change during the macroevolutionary phase, but it involves a different “coin,” again with features of “change but not change” or continuity and discontinuity. Within a boundary of species, there is continuity of change. Outside these boundaries, there is discontinuity. Further questions need to be addressed. Following each mass extinction, bio-complexity seems to increase. Is there a direction to genome complexity during evolution? Does this direction occur within the same species where the genome framework is fixed? Based on the concept that the main function of sex is to eliminate significantly altered genomes, there is certain tolerance for subchromosomal alterations within the same genome framework, including copy number variation, retrovirus integration, and additional epigenetic evolution.

370

6. BREAKING THE GENOME CONSTRAINT

Knowing that genome-level constraints exist, it will be very interesting to investigate whether system complexity still increases within the human genome because of subchromosomal alterations, particularly epigenetic alterations. This analysis might answer questions such as whether we are getting smarter compared with our ancestors. b. Fast or sluggish evolution? The speed of evolution has been an important and confusing issue. The original viewpoint maintained that the evolution is rather slow. Then, many experimental evolutionary studies revealed cases of fast evolution. Right now, different cases support different evolutionary speeds, and there is no general accepted explanation, despite the realization that evolutionary constraints on evolution are a major advance (Futuyma, 2010). A comparative study based on the morphological features of fossils and genetic profiles has revealed a paradox: evolution seems much faster in the short term than in the long term. For example, when the shapes of horse bones separated by a few generations were compared, small but significant morphological variations were observed. In contrast, when horse species separated by millions of years were compared, far fewer differences were observed (Kurten, 1959). A similar observation has been detected across species in the fossil record (species displayed ample change over decades to centuries but were basically static over millennia) (Eldredge et al., 2005). Interestingly, accumulated DNA mutations are much quicker in birds and primates in the window of a few thousand years than the window of millions of years (Ho et al., 2005). In addition to supporting that there is no assumed accumulation of small changes becoming a big change, these observations also lead to the appreciation of the time-dependent rate phenomenon, which was explained by mutation saturation. According to the genome alterationemediated speciation model, 1. In the short term, microevolution is faster than we thought. However, because of the swinging of the selection direction, changes can cancel each other; moreover, the genome constraint defines the range of the change, while gene mutations come and go. Thus, in the long term, we are given the impression that evolution is slow. 2. Macroevolution is fast in the initial stages of speciation, but the microevolutionary phase could last very long. There are no major changes until extinction or speciation. As conclusion, whether evolution is fast or sluggish depends on the evolutionary phases we are observing.

6.7 EVOLUTION IS TRUE BUT ITS MECHANISM MUST BE REEXAMINED

371

c. The irrelevance of the neutral theory The neutral theory has been considered another major advancement in evolutionary theory over the past several decades (Futuyma, 2010). It states that genetic drift, the main mechanism spreading genes and neutral mutations, serves as the driving force of evolution (Kimura, 1983). However, considering the genome theory, the irrelevance of the neutral theory to natural evolution is obvious. If the importance of an individual gene or protein is limited in evolution, then smaller portions of genes or proteins can hardly be significant to the overall picture of evolution. There are many supporting pieces of evidences. First, the fact that molecular-level alterations can be neutral or nearly neutral exactly reflects the fact that the molecular level may be the wrong level to study evolution (even though it is very convenient for population evolutionists to study the genetic parts). Mayr once wrote, Kimura is correct in pointing out that much of the molecular variations of the genotype is due to neutral mutations. Having no effect on the phenotype, they are immune to selection. Mayr, 2001

The main reason why many molecular variations fail to impact selection is that the evolutionary selective focus point is above these molecular levelsdat the genome level. Mayr considered these molecular variations evolutionary “noise,” not evolution itself. It can be argued that the lower level dynamics are still important to evolution (in fuzzy inheritance, many types of biological “noise” should be considered as heterogeneity). However, this importance should not be accredited to the isolated individual molecules because the overall emergent properties reflected as the level of genomic dynamics that is essential to evolution. To further support the point that most individual molecular variations are less significant to evolution (macroevolution in particular), it can be argued that the molecular level of study is too low or too far removed from the evolutionary action. Imagine if we dropped to an even lower level of study, such as the atomic level, where all elements would be completely neutral with respect to evolutionary traits. This argument might seem extreme to some, but why would one consider the molecular level more appropriate than the atomic level? It is likely that both the atomic and molecular levels are too remote for evolutionary studies. This argument is supported by the quantitative differences between atoms (completely neutral) and molecules (nearly neutral) with respect to biological evolution. Second, our ability to observe nearly neutral effects often depends on specific experimental or modeling systems. In these well-defined conditions and models, the slightly “advantageous” and “disadvantageous” traits become visible. However, such “benefits” might not matter to the

372

6. BREAKING THE GENOME CONSTRAINT

genome package in natural systems, where many other factors are much more critical than these isolated overly emphasized small changes. This point has important implications for the study of disease genes (Chapters 3 and 8). For example, it was a bit surprising that most cancer genes are evolutionarily neutral. As discussed in Chapter 3, genome alteratione mediated macrocellular evolution is the driver of cancer evolution. Focusing on the neutral selection of genes would be the wrong level of analysis. Third, the genome package in selection predicts that nearly neutral effects that might be visible in the short term can disappear in the long term. A “bad” gene can be passed on not only because good and bad are context-dependent but also because the package arrangement might prevent selection of individual genes. It’s the “in-law package,” if you will: the in-laws must be accepted as part of the matrimonial package with all the accompanying advantages and disadvantages (Heng, 2009). In addition, system heterogeneity is the key to long-term success; the number of existing beneficial alleles will come and go over generations. Fourth, increasing research is focusing on the gap between selection at the molecular level (parts selection) and its impact on shaping the genome. As Hurst pointed out: Perhaps the most profound conceptual issues have come from recent genomic discoveries. Genomic features either suggest that genomes are not greatly shaped by selection or, as in the case of selection on synonymous mutations, they have shown that there are underexplored complexities of how genomes work. Hurst, 2009

Clearly, the evolutionary reality of genome constraint and the differences at each level of organization downplay the importance of focusing solely on the molecular level. d. The genome basis of punctuated equilibrium Long-term stasis interspersed with rapid bursts, which exists in the fossil record, has led to the theory of punctuated equilibrium. Based on this theory, species are considered to be stable entities resistant to change rather than infinitely dynamic entities affected by continuous small alterations. Since its publication in 1972, punctuated equilibrium has been accepted as an alternative to phyletic gradualism (Eldredge and Gould, 1972). As illustrated in Chapter 3, despite the fact that the concept of punctuated equilibrium has helped in the study of somatic cell evolution from the very beginning, there are some important issues that need to be addressed or reinterpreted. In particular, what mechanism can explain both rapid bursts of species and the periods of long-term stasis that follow? Fortunately, with the unique advantages of linking genotype and phenotype to punctuated cancer evolution, we have now realized that

6.7 EVOLUTION IS TRUE BUT ITS MECHANISM MUST BE REEXAMINED

373

rapid bursts of species should be explained by massive genome reorganization, and long-term stasis is likely because of the genome constraint achieved by the function of sexual reproduction. With this new knowledge, it is the right time to reexamine this important concept. Punctuated equilibrium has been an important observation, but alone, it fails to establish the correct relationship that exists between micro- and macroevolution. One of the main reasons for this is that the traditional evolutionary thinking of microevolution leading to macroevolution has overshadowed punctuated equilibrium, making it difficult to introduce a new paradigm. First, the concept of punctuated equilibrium originated as an extension of Ernst Mayr’s idea of genetic revolution. It specifically intended to apply allopatric and particularly peripatric speciation to the fossil record. According to the genome-mediated speciation model, there is no need to artificially assume that new species can only arise from the periphery of the parent species before coming back to occupy the overlapping area with the parental population, as most species can be formed by sympatric speciation. Furthermore, the sequential order and the main function of macro- and microevolution need to be changed. Second, without the new mechanism of genome-mediated speciation, it has been difficult to explain the pattern of rapid changes in fossils with gene-based genetics, even though the evo-devo concept was (incorrectly) used to bridge small changes at the gene level to drastic morphological changes. As a result, some neo-Darwinian proponents claim that punctuated equilibrium is just an “interesting but minor wrinkle on the surface of neo-Darwinian theory,” which “lies firmly within the neo-Darwinian synthesis” and is actually a form of gradualism (Dawkins, 1996). Even Gould missed the key point. His struggles between continuous and discontinuous evolution eventually succumbed to his conventional evolutionary school training and prevented him from appreciating the importance of the mechanism of punctuated equilibrium. In the end, Gould downplayed this important concept, simply offering it as just a new way of looking at the tempo of evolution, which relies on the sameold Darwinian mechanism of natural selection operating on random mutations. He stated: Punctuated equilibrium emerges as the expected scaling of ordinary allopatric speciation into geological time, and does not suggest or imply radically different evolutionary mechanisms at the level of origin of species . The theory only refers to the origin and development of species in geological time, and must not be misconstrued (as so often done) as a claim for true saltation at a lower organismal level, or for catastrophic mass extinction at a higher faunal level. Gould, 2002

374

6. BREAKING THE GENOME CONSTRAINT

This is indeed an unfortunate case, as Gould should be the one person to defend punctuated evolution rather than retreat from it, particularly because he valued Richard Goldschmidt’s thinking. Unfortunately, Gould did not pay enough attention to Goldschmidt’s key concept of system mutation (chromosome) but instead favored the hopeful monster concept of evo-devo genes. If he had only known about the somatic cell evolutionary data and the genome theory as a new mechanism for punctuated equilibrium, he might have completely departed from the traditional thinking that states that the mechanism of micro- and macroevolution is the same except on the geological time scale. Although micro- and macroevolution phases are linked by time, the accumulation of microevolution does not lead to macroevolution: the driving forces of the two types of evolution are different (gene and genome alterations represent different types of inheritance), leading to variant behaviors (dynamics and constraint). Unexpectedly, natural selection contributes to speciation by increasing the population size of a given species. Active genome alterations will likely provide an answer to the question of rapid speciation as well. Saltational speciation fits well with the pattern of punctuated equilibrium, although the time needed for new species to become visible takes time. On the question of long-term stasis, Gould and Eldredge have focused on development and population constraints without knowing about the key role of genome constraint via sex. Clearly, micro- and macroevolution do not just represent different time windows but entirely different evolutionary mechanisms. The fact that the various phases of evolution display specific types of genetic changes further suggests that there is a key limitation to the relationship between quantitative change and qualitative changes or the idea of “transformation of quantity into quality.” It is apparent that this type of transformation has to occur within the same system. If speciation involves system changes by forming new genomes and long-term stasis is maintained by genome constraints, then the accumulation (quantitative) of periodic fluctuating erratic gene alterations cannot eventually cause quality (speciation) changes. The pattern of punctuated equilibrium cannot be used to explain the transformation of quantity to quality. Clearly, the original concept of punctuated equilibrium needs to be reconsidered in light of these new propositions. e. The invisible missing link in macroevolution Our in vitro cellular evolution model revealed a surprising observation: it is difficult to detect macroevolutionary links when evolution is most active. In contrast, only when evolution is slower do evolutionary links become visible. These observations are highly counterintuitive on the surface but make complete sense and can also answer one of the most

6.7 EVOLUTION IS TRUE BUT ITS MECHANISM MUST BE REEXAMINED

375

challenging questions of evolutiondwhy it is so difficult to find missing links if evolution is true? This question has sparked heated debate for over a century. Many who believe in macroevolution and follow neo-Darwinian logic see fossil links everywhere. They are convinced that most links have been identified and the absence of missing links is not an issue but an incomplete puzzle as those links will eventually be found. Those who do not believe in macroevolution argue otherwise. Their arguments are based on the lack of a mechanism of speciation in the current neo-Darwinian approach. The fact is that the answers fall somewhere between these two approaches. Yes, there are many identified links; overall, various intermediate forms can be found in virtually every proposed lineage. However, there are some significant gaps that still remain. In particular, specimens from the orderly gradual transitions between species are relatively rare. The pattern of karyotype evolution we observed explains why missing links exist but are not easily detected: their transient nature. Let us review what has been observed (Fig. 3.4). In the two phases of karyotype evolution, the punctuated phase represents a rapid evolutionary phase, which occupies a relatively short time window. The gradual stepwise phase represents the slow evolutionary phase which lasts a much longer time. The pattern of karyotype evolution is very different between these two phases. In the punctuated phase, it is difficult to identify any missing links when every cell displays drastically different karyotypes as there are no intermediate clonal populations, only a nonclonal population invisible to population analysis. In contrast, during the stepwise Darwinian phase, the karyotype links are easily identified as they are detectable in the clonal population form. Additional experiments have traced the karyotype evolution of single cells. When the genome is unstable, the daughter cells can display radically different karyotypes compared with parental cells. Examples of abrupt major changes in karyotypes without stepwise accumulation are very common during drug treatments, where genome chaos can be induced within one or just a few generations. As has been illustrated, each chaotic genome is fundamentally different. More interestingly, following the evolutionary process, the karyotype of final survivors seems to have little resemblance to the chaotic genome observed during the process, even though they are progenitors of these chaotic genomes. In summary, in the somatic cell model, there are two situations that can explain the “missing links.” First, the abrupt radical change among parental and daughter genomes does not produce gradual links between them. Only gradual changes will likely leave a series of links. Thus, abrupt changes will produce no transient links and therefore no such links can be detected. Second, the missing links are nonclonal cells which are difficult to detect within a population that obscures their visibility where

376

6. BREAKING THE GENOME CONSTRAINT

only their subsequent clonal progenitors can be detected with any ease. These nonclonal cells are often eventually eliminated. In this manner, the somatic cell evolutionary model offers a logical explanation as to why missing links are extremely difficult to find in nature. Increased cases of rapid speciation discussed previously (Section 6.7.1) support the predictions of the above cellular models. f. Multilevel evolution and constraint Knowing that multiple types of system constraint are mainly involved in micro- and macroevolution also sheds new light on the issue of multilevel evolutionary selection. The various levels of selection play different roles in evolution, and some levels are much more dominant than others depending on the phase of the evolutionary process. For example, in bacterial systems with limited genes and more importantly without the typical chromosome-mediated complexity and topological significance, the gene level of selection might be dominant. The genome constraint becomes the main controlling feature in eukaryotic systems with sexual reproduction. The genome level of constraint might be less important for intelligent social interaction. Thus, it is very important to define the context of selection and identify the level that is most influential. This is the primary reason the genome theory is important: the genome is the key to understanding cellular and organismal macroevolution. Another important issue is the collaborative and conflicting relationship among different genetic organizational levels. Traditionally, more emphasis is placed on the synergistic accumulative effect between different levels. Genes and genomes have been thought to simply occupy two unique levels of genetic organization, each with its own sphere of influence where their cumulative effects are important. This relationship requires studies at each level. However, the emergent properties of higher levels might not be obvious if one, using a reductionist approach, only studies the lower level properties, as there is a disconnection between levels. Moreover, the conflict between levels is also significant. The conflicting relationship between genes and genomes in the context of sex has been discussed (Chapter 5). Another example of conflicts between levels is the individual versus society. The “success” of the genome is to duplicate itself in any way possible. However, through societal laws, constraints can be applied to individuals regardless of the biological basis of certain behaviors. For instance, an individual with highly aggressive behavior may have served as a great fighter or leader thousands of years ago. Today, such behavior could be considered highly improper. In fact, individuals behave differently communally than they do when isolated. This is also true of genes, genomes, cells, organisms, and societies in both normal and conflict situations.

6.7 EVOLUTION IS TRUE BUT ITS MECHANISM MUST BE REEXAMINED

377

Finally, at some levels, it is more difficult to study a causative relationship when a chain of events is involved in which each step is contingent on the probability of the previous step. For example, it is easier to calculate the gene mutation rate than to predict the pattern of genome alteration. So far, there are only correlations between unique karyotypes among species without direct causative evidence. However, the same is also true for gene-mediated speciation. More importantly, based on current scientific reasoning with regard to complex systems, causative relationships might be difficult to establish in the first place (Heng, 2015; Heng et al., 2019). g. Somatic cell dynamics and germline constraint Upon being shown the high levels of genome heterogeneity present in cancer evolution for the first time, many ask if such high levels of genome alteration represent experimental artifacts specific to cancer and particularly to in vitro systems. When it is pointed out that over half of the mature hepatocytes in normal mice and humans display aneuploidy/ polyploidy and that karyotype variations can be detected in many normal human tissues, there is total disbelief. This information always seems to be a big shock. Biology has taught that there is a high level of fidelity in biological systems such as DNA repair, checkpoints at multiple genetic levels, and the mechanism of cell death, all of which work together to eliminate any altered genomes. Why do high levels of genome variation exist in the first place and how do biological systems display this heterogeneity and yet retain all biofunction? Furthermore, how do such somatic cell dynamics impact the survival of the species? On the surface, the easier answer is related to the fact that there are large numbers of cell division events and a seemingly unlimited source of bio-stress. After all, there is no such thing as a perfect world where all outside influences and internal errors can be avoided or corrected. The existence of many diseases illustrates this point. However, in normal situations, system homeostasis can tolerate certain levels of heterogeneity and the key might be in quantitative levels of heterogeneity that relate to the increased probability of changing the overall system status. Deeper analysis can reveal the fascinating evolutionary relationship between the somatic and germline genomes. One important function as a by-product of karyotype heterogeneity at the somatic cell level is to provide increased adaptability by increasing system dynamics, especially under stressful conditions. The heterogeneous system will host diverse genomes or subsystems which can handle the stress. Of course, the price to pay is the possibility that disease conditions may occur when the karyotype variations surpass a certain threshold. Now, the sex-mediated genome constraints come into play. Despite all the karyotype heterogeneity somatic cells gained as a result of stress, the

378

6. BREAKING THE GENOME CONSTRAINT

species will not be impacted because system inheritance is ensured by the genome constraint through sexual reproduction. In other words, somatic cell dynamics and germline constraint ensure maximal performance of the individual, yet the framework of the species remains the same. h. Extinction issues The fact that most species do not survive forever despite unlimited adaptations can also now be explained. Each type of genome has its limitations despite its flexibility (inherent short-term adaptation capability by gene and epigene alterations). If microevolution cannot fundamentally change the genome to alter these potential limitations and the overall genome can no longer survive under certain conditions, that genome will eventually become extinct. Unlike delineated laboratory settings where one key gene mutation or knockout will kill all animals with a specific genetic background and experimental setting, the statuses of genes in natural settings are much more diverse. They have complicated interactions with different genetic backgrounds and environmental interactions, and even a mutation that initially impacts population size may not necessarily lead to extinction. The population can also potentially recover because of gene-mediated adaptation. In the long run, an individual gene’s status is less important, especially in large populations where enough varied mutations will ensure the robustness of the genome. A major cataclysmic event can wipe out a species, but at the same time, such a highly stressful selection pressure would favor the formation of new genomes and new species. This explains the bursts of many new, different species following mass extinctions. Only a drastically altered system can fit a drastically different environment. The view that a genome defines the boundary of adaptation also effectively explains other interesting phenomena such as the very singular types of species that appear in different historical periods, as each of these periods represents drastically changed environments that are suitable for particular species with specific genomes. In a short period of time, adaptation can occur without drastically altering the genome, but adaptations can only go so far. In contrast, some species like bacteria can undergo both microevolution and macroevolution and can survive across long periods of time where they occupy totally different environments. It would be interesting to investigate whether sequences or gene topological relationships are very different from ancient bacteria compared with current bacteria and among the same bacteria that occupy completely different niches. Some other related phenomena are worth mentioning. More complex systems have emerged following massive extinctions on earth. It appears that there is a general trend of increasing the system complexity with

6.7 EVOLUTION IS TRUE BUT ITS MECHANISM MUST BE REEXAMINED

379

time. The idea of multiple walls of complexity has been suggested to expand Gould’s “minimal wall of complexity” idea (Heng, 2009). For example, when eukaryotic genomes formed, it became difficult to revert back to a nonchromosomal-based prokaryotic system because the sexual filter and the genome-mediated modularity served as a one-way filter, allowing the system to evolve through to a more complex level of selection. During genome alterationemediated macroevolution, genome reorganization and constraints became dominant. The trend of increasing simplicity detected in parasites has often been used to refute the general trend of increasing complexity in biology. However, if parasites and hosts are considered together as an inclusive comprehensive system, and complexity is not measured on basis of the complexity within its individual parts (such as parasites, bacteria, or viruses), then overall increasing complexity becomes apparent. In this way, the specialization of parasites by simplification should not be used as justification to contradict the trend of increasing complexity during evolution. Similarly, we should not only measure the simplicity of the mitochondria’s genome but incorporate it within the nuclear. This issue, of course, needs much more discussion. We might want to separate increasing complexity from adaptability as complexity is not always good for adaptability. Sometimes, complexity just happens as a result of constraint. As mentioned earlier, mass extinction events are also closely associated with the birth of new species, possibly because of the induced genome chaos that kills the parental species (or perhaps we should say transformed the old species into new ones). The massive new genomes have been observed from various cancer models (for cellular macroevolution) and can be reflected by the fossil record (for organismal macroevolution). Interestingly, it was recently suggested that the history of life on earth has been more affected by a series of environmental catastrophes than natural selection (Ward and Kirschvink, 2015). As these environmental catastrophes can induce genome chaos to break up old species and produce new species, an integrated evolutionary theory should be developed, one that includes the key contribution of genome reorganizationemediated macroevolution, as well as natural selection. i. The unified evolutionary theory? In his introduction of the reissued Goldschmidt book (Goldschmidt, 1982), Gould wrote: I believe that we may achieve a unified and more general evolutionary theory by combining parts of both versions: the acknowledgement of levels that Goldschmidt demanded, with the Darwinian belief in a unity of processes across these levels. The notion of hierarchy does not demand separate causes, for the same set causes may produce different results in action upon the disparate phenomena of different

380

6. BREAKING THE GENOME CONSTRAINT

levels. Moreover, the levels are not separated by impenetrable barriers, but by interacting boundaries that permit extensive leakage and feedbacks. Speciation doesn’t need a distinct genetics to be meaningfully different from microevolution. If microevolution is fundamentally a process of adaptation, and if reproductive isolation (speciation) is the mere by-product of divergent selection upon two isolated populations, then we have smooth continuity and the modern synthesis is vindicated. But if reproductive isolation often arises first, then speciation merely provides an opportunity for subsequent divergent adaptation- and speciation is not microevolution extended. Speciation becomes a recognizable level of evolutiondwithout requiring a distinct set of genetic causes . Gould, 1982

From Gould’s analysis, one can understand more of his rationale for establishing a unified evolutionary theory and the challenge he faced at a time when genetics was described mainly in terms of genes. For example, without the realization that genomes and genes represent distinct genomic/genetic causes of macro- and microevolution, respectively, it has been very challenging to establish any unified theory which may explain why Gould failed to appreciate that micro- and macroevolution have different genetic mechanisms and ultimately led him to downplay the importance of punctuated equilibrium in understanding speciation. On the surface, a unified evolutionary theory should be established very easilydby combining stepwise adaptation and punctuated speciation. The challenge is to develop a new evolutionary philosophy that acknowledges the different mechanisms required for short-term adaptation, long-term stasis, and the breaking of stasis to form a new species. First, one needs to appreciate both the evolving and conservative aspects of evolution as discussed and then recognize that the mechanism of genome alterationemediated macroevolution is fundamentally different from neo-Darwinian gene and population genetics-based concepts. Second, even in the microevolutionary phase, the power of natural selection could be limited, as the separation of germline constraint and somatic dynamics (phenotype changes can be contributed by somatic variants which often do not involve the germline) as well as fuzzy inheritance (sufficient variations can be generated without changing the genomic landscape of the individual and population) can contribute to adaptation without natural selection. Such an understanding will define the limitation of the current evolutionary theory. Clearly, to establish a unified theory of evolution, we must recognize the importance of one crucial step: validating, embracing, and further developing the genome theory. The new genome theory will have a strong genomic basis (genome-defined system inheritance), as well as proper appreciation of both gene- and genome-mediated micro- and macroevolution. Certainly, it also needs to address the relationship between the genome theory and other extended evolutionary synthesis including the

6.8 IMPLICATIONS: CREATING ARTIFICIAL SPECIES

381

issues of epigenetic inheritance, evo-devo, and evolvability (Wagner and Zhang, 2004; Mu¨ller, 2007; Danchin et al., 2011).

6.8 IMPLICATIONS: CREATING ARTIFICIAL SPECIES BY SHATTERING THE GENOME FOLLOWED BY ARTIFICIAL MATING/GENOME SELECTION The understanding of how evolution works also provides us with new strategies to artificially modify and even create new species with desired characteristics. Specifically, the genome-mediated speciation model can guide new efforts to create and select artificial species.

a. Creating new cell lines As the karyotype is the signature of a given cell line and different cell lines should be considered as artificial species (Heng, 2015), more cell lines can be quickly created by shattering the genomes of existing cell lines plus selection within different culture conditions. Systematically generating new lines and selecting them by karyotype will produce some unique lines with desired designed features. Creating special cell lines also can be used for disease studies and biotechnological innovations.

b. Creating artificial laboratory species Recently, using chromosomal engineering technology, yeast chromosomes have been fused (from strain with 16 individual chromosomes per cell to 1 or 2 giant chromosomes per cell) (Luo et al., 2018; Shao et al., 2018). As expected, the strains with a reduced number of chromosomes became new species, as they formed the reproductive barriers. An approach of genome chaos induction will likely also work. Chromosomal shattering can produce drastically different strains, which can generate more rather interesting artificial species. A similar strategy can be applied to bacteria. By drastically altering their genome topologies, some unique strains could be generated.

c. Creating artificial animals/plants Perhaps one of the most radical ideas is to create artificial animals by reshattering and forming new genomes. In plant research, creating hybrids is a common approach and wide hybridization is one of the important methods in plant breeding (Singh and Nelson, 2015). It is likely that the induction of genome chaos can be used more effectively when

382

6. BREAKING THE GENOME CONSTRAINT

creating new plant species. Creating artificial plants (transgenomic or transgenome plants) by combining desired agricultural values from different species (consider a rice tree, for example) is potentially important to feed an increasing human population. In animal breeding, however, the focus has been selecting phenotypes by breeding within the same species (with a few well-known cross-species exceptions such as the mule). Inducing genome reshattering and matching mating pairs based on karyotype similarities could increase the successful rate of producing fertile offspring. Altered genome-based animal breeding could be revolutionary. By creating a new system by first using macroartificial selection and then selecting specific features based on microartificial evolution, better artificial species can be produced. Of course, any research of this type requires very careful regulations. The general population should engage in serious discussions surrounding these techniques as well.

C H A P T E R

7

The Genome Theory: A New Framework 7.1 SUMMARY The genome theory, alongside its rationale and key theoretical assumptions, is presented. This includes concepts regarding the genome’s function as an information package, an evolutionary selection unit, and a key platform in displaying emergent properties. In this chapter, the genome theory is cohesively outlined in 12 main principles. Together, these principles not only summarize the key ideas discussed across this book but also illustrate the logical relationships among them. Moreover, the genome theory’s key predictions, limitations, falsifiability, and future challenges are discussed, all of which should be systematically evaluated. The departure from gene-centric genetics to genome-mediated genomics ought to hasten the acknowledgment of new conceptual frameworks for future research, including somatic and organismal evolutionary studies as well as studies in basic genomics and molecular medicine.

7.2 THE RATIONALE FOR ESTABLISHING A GENOME-BASED GENOMIC THEORY Significant progress has been achieved in all biological fields including genomics and evolution. So, why do we need a new genomic theory? Having read so far, the value of posing such a question is obvious. But for many researchers in the pregenome era, including some noted evolutionists, there is no reason to rethink: they are highly confident that we have already solved major problems.

Genome Chaos https://doi.org/10.1016/B978-0-12-813635-5.00007-0

383

Copyright © 2019 Elsevier Inc. All rights reserved.

384

7. THE GENOME THEORY: A NEW FRAMEWORK

Most evolutionary geneticists would agree that the major problems of the field have been solved. We understand both the nature of the mutational processes that generate novel genetic variants and the populational processes which cause them to change in frequency over time d most importantly, natural selection and random genetic drift, respectively.. we will never again come up with concepts as fundamental as those formulated by the ‘founding fathers’ of population genetics (Fisher, Wright and Haldane), or do experiments as path-breaking as Dobzhansky’s demonstration of natural selection acting on polymorphic chromosome inversions. Charlesworth, 1996.

As illustrated in previous chapters, various large-scale -omics studies have generated surprise after surprise in the field of genomics and evolution. These unexpected findings include the function of sex (to reduce genomic variations); increased cases of faster speciation; separating accumulations of gene alterations and genome changes in somatic models; the inheritance of acquired characteristics by epigenetic mechanisms; and many more (Heng, 2007b; Heng et al., 2006a; Noble, 2013). A few years ago, Nature published an interesting comment: Does evolutionary theory need a rethink (Laland et al., 2014)? It explored the viewpoints of two opposite groups of evolutionists. The group representing extended evolutionary synthesis argued “YES, URGENTLY.” Without an extended evolutionary framework, the evolutionary theory neglects key processes (such as important inheritances beyond genes, the involvement of evo-devo, and the role of the environment). In contrast, the group representing traditional new synthesis rebutted “NO, ALL IS WELL.” To them, none of the issues raised by the “YES, URGENTLY” group are neglected in current evolutionary biology. In fact, they say the divisions between the two groups in the comment are figments of nonexistent imagination. It’s obviously challenging to be a part of the extended evolutionary synthesis group. In addition to pressure from mainstream researchers, “YES, URGENTLY” evolutionists are also often left conflicted. They realize and recognize the differences between their ideas and that of the traditional new synthesis group’s, but they also are constrained by their own belief of natural selection, which prevents them from thinking outside the box of stepwise evolution. For example, in an article Do we need an extended evolutionary synthesis? Massimo Pigliucci declared: No Paradigm Shift: Let me again be clear on a fundamental point underlying this whole discussion: one can reasonably argue that none of this contradicts any tenet of the MS (Modern Synthesis), although it seems to me at least reasonable to concede that the new concepts and empirical findings I have briefly outlined above may eventually force a shift of emphasis away from the population genetic-centered view of evolution that characterizes the MS. On the other hand, to attempt to go further and state that there is not much new here and that all of this is already part of the MS, implicitly or not, would be intellectually disingenuous and historically inaccurate. Pigliucci, 2007

7.2 THE RATIONALE FOR ESTABLISHING A GENOME

385

By admitting they envision “no paradigm shift” from the extended evolutionary synthesis group, “YES, URGENTLY” evolutionists will certainly receive an unconcerned answer. Well then, after all. If there’s no big deal, why bother? This was exactly what Gould diddthis is exactly how the exciting discovery of punctuated equilibrium lost its deserved significance in current evolutionary theory. Imagine if Gould had realized that the speciation can be formed quickly under the genome alterationemediated model and that microevolution differs fundamentally from macroevolution. What would the impact of punctuated equilibrium on current evolutionary theory be? As mentioned earlier (Sections 2.4.2.2; 6.6.3; and 6.7.2), Gould was familiar with Richard Goldschmidt’s viewpoint that micro- and macroevolution differ, but he ignored Goldschmidt’s thinking to avoid conflicts with stepwise evolutionary thinking. Gould’s choice of gene-based “developmental macromutations” over chromosome-based “systemic mutations” (both promoted by Goldschmidt) ultimately constrained his imagination. As a result, he was the one to decrease punctuated equilibrium’s value. While this is a pity, as a case study it reflects the general fact that “we can’t solve problems by using the same kind of thinking we used when we created them” (Albert Einstein). To truly appreciate the power of punctuated evolution, one needs to question the current accepted relationship between micro- and macroevolution, the neo-Darwinian mechanism of nature selection. The discovery of the two-phased cancer evolution has sped up the journey in searching for a new genomic theory. Our first step to develop this new theory relied on systematic analyses to reveal key limitations of the current gene-based theory by debating its rationale, facts, paradoxes, and mistakes. Typical responses toward our efforts were skeptical: All approaches are limited, and some might be wrong, but we can’t afford to do nothing. Science always can correct itself, so just keep on going, keep on doing. Don’t tell us what is not workingdwe all know that already. Just tell us what will work. Don’t continue beating the dead horse! Who doesn’t know the limitations of the current theory?

Well, looking at the “ALL IS WELL” conclusion, it becomes clear that all of these seemingly conflicting comments are just fighting strategies to keep the status quo. If “ALL IS WELL” evolutionists truly know that the current theory is not working and yet are still using it, then isn’t that the definition of insanity, according to Einstein? The truth is: debating the paradoxes in the field is highly significant. It is scientists’ duty to question popular frameworks when they are not working and to challenge the status quo when it is no longer relevant.

386

7. THE GENOME THEORY: A NEW FRAMEWORK

Although asking the right questions is often the first step to solving the problem, a powerful falsification is the essential step to establishing a new paradigm. Moreover, vigorously challenging the dominant theory is also very beneficial for the formation of the new ones. Now, after nearly 20 years of challenging the gene theory, the second step to develop the genome theory is arriving. Now, it is time to propose new theoretical frameworks and technical platforms to solve key paradoxes, to validate facts and concepts, and to advance the field. Indeed, one key western scientific tradition is to propose better models and theories to replace a current irrelevant, less useful, or outdated one. Many of us discussed the use of the term “genome theory” before its debut. Some suggested that the term might turn some people off, as a “theory” is a very serious realization in science. One must have strong concepts and overwhelming evidence to claim the classification. Others liked this term and strongly encouraged its use. Historical examples favor the approach of introducing the new theory at an early stage. Thomas Morgan first published the paper “The theory of the gene” in 1917, almost half a century before the structure of genes was known or any knowledge of how molecular mechanisms work became available (Morgan, 1917). His rationale behind introducing the theory was explained in his famous book “The theory of the gene” in 1926: We are now in a position to formulate the theory of the gene. The theory states that the characters of the individual are referable to paired elements (genes) in the germinal material that are held together in a definite number of linkage group; it states that the members of each pair of genes separate when the germ-cells mature in accordance with Mendel’s first law, and in consequence each germ-cell comes to contain one set only; it states that the members belonging to different linkage groups assort independently in accordance with Mendel’s second law; it states that an orderly interchange -crossing over -also takes place, at times, between the elements in corresponding linkage groups; and it states that the frequency of crossing-over furnishes evidence of the linear order of the elements in each linkage group and of the relative position of the elements with respect to each other. These principles, which, taken together, I have ventured to call the theory of the gene . Morgan, 1926

Although Morgan did not extensively discuss the molecular concept of the gene (no one knew its physical basis, origin, or exact function/ behavior at the time) and only focused on the critical review of past genetic principles, his remarkable synthesis has become the foundation to unify the basic principles of the gene theory. This power of theoretical synthesis is what current genomics urgently needs. Similarly, current knowledge of the genome is sufficient to provide the platform of a new theory, even though more details are needed in the future. According to Thomas Kuhn, the introduction of a competent paradigm is essential before replacing an old one. We thus hope that the

7.3 UNIQUE CONSIDERATIONS FOR GENOME THEORY

387

genome theory will provide this much-needed new paradigm to eventually unify the field, which will likely take decades or even longer. Finally, introducing the genome theory provides a good frame of reference. Having the genome theory can encourage and simplify the process for researchers to directly compare the differences and implications between gene and genome-oriented conceptual frameworks both in genomics and in evolution. Despite the increased realization that the concept of the genome needs to be redefined, most current efforts have been limited: they only focus on using genes to explain the genome. Among many proposals, none have focused on the importance of genomic topology and chromosome coding (Keller 2011).

7.3 UNIQUE CONSIDERATIONS FOR GENOME THEORY 7.3.1 The Genome Is an Integrated Information Unit That Defines the Boundary of the System As discussed in Chapters 1, 2, and 4, the genome serves as a most important information unit for biosystems, in which different types of inheritance and their potential responses to the environment are nicely integrated. Germline transmission passes both the blueprints to build cells and eventually the entire individual, as well as the instructions to make parts. Through the developmental process, the genomic information becomes the biological reality of structure, function, and behavior. Considering the interactions of genomic coding, self-organization, developmental field, energy, and timing, there must be mechanisms that coordinate such extraordinary processes. One can imagine that the genome-defined genetic network serves as a key initial condition for the entire process and that the ongoing developmental process can simply follow the self-organization principle based on local interactions. Current systems biology can be summarized into the following frontiers: (1) Collect and analyze as many biological parts in a cellular process as possible; (2) study the interactions between parts and reconstruct the genetic circuits/network; (3) use a mathematical format to characterize the reconstructed network and create computer models to analyze, interpret, and predict biological functions using the reconstructed network; and (4) generate hypotheses based on the model prediction and test them experimentally to further modify the model (Palsson, 2015; Heng et al., 2019). Such broad coverage promised to solve many biological problems. Despite the initial excitement systems biology brought into genomics and

388

7. THE GENOME THEORY: A NEW FRAMEWORK

evolutionary research, it has so far delivered much less than expected. There are many reasons, some of which are discussed in the following: Systems biology does not simply mean more data collection, and more systematic analyses of the parts, but a different way of doing science, using holistic thinking and analysis rather than reductionist characterization. New systems cytogenetic research should not only link specific genes or pathways to a given genetic region, but link the overall degree of alterations to the emergent properties such as evolutionary potential, success of powerful outliers, and to what extent small changes can be tolerated by the system without significant harm or advantages. For example, current bioinformatic approaches can effectively uncover action of specific pathways. However, as there are many pathways, and as pathway switching happens so frequently, both within and between cells, only understanding individual pathways will not solve the issue of how evolutionary selection works (as this is based on the genome and populations, which are above the pathway level). Among many levels of genetic organization exist (epigene, gene, pathway, and karyotype defined genetic network), cellular evolution is mainly based on the genome package. This is the rationale to use karyotype to bridge systems biology, molecular genetics, epigenetics and cytogenetics. Heng and Regan, 2017 (with permission from Beatham Science Publishers).

Obviously, the genome-based genomic theory should provide a new avenue for systems biology to overcome some of the key limitations. First, the genome encodes the boundary of the entire network structure. In other words, each different species will have its own network structure, which defines the range of its inheritance and pattern of environmental response. Multiple levels of various inheritance interactions, including epigenetic regulation, are also defined by the genome, albeit certain degrees of plasticity exist. Second, despite a certain degree of certainty, such as the facts that genetic information that can be transmitted from the gene to the protein and that protein interaction networks can be established based on chemical laws and experimentation, most genotypes cannot be dissected into genes from the holistic genome (only a small portion of genes display high penetrance when associated with phenotypes). Therefore, a new strategy should focus on the characterization and prediction of system behavior at higher level (at the chromosome level, for example), rather than a parts characterization at lower levels (the gene level). Third, based on new information about genome-defined system inheritance and fuzzy inheritance, we need to develop experimental and computational platforms that deal with the information of genomic topology/heterogeneity/fuzziness.

7.3.2 Emergent Properties in Biological Systems Perhaps the difficulty to predict emergent properties based on parts information represents the greatest challenge for bio-researchers, who are keen to illustrate causative relationships between individual parts (such

7.3 UNIQUE CONSIDERATIONS FOR GENOME THEORY

389

as gene mutations) and phenotypes of the system (such as common and complex diseases). However, even more unfortunate is the fact that a majority of current bio-researchers have ignored the topic of the emergent properties altogether. This is one of the biggest flaws in current biology. Reductionist belief dictates that any phenotype can be dissected back to individual genes. Such beliefs are now facing increased challenges. The genome theory needs to integrate the idea of emergence, and the acknowledgment that in most cases, there is no one-to-one causation between parts and system behavior. Yes, researchers can artificially create many linear models by isolating the object and eliminating heterogeneity to “illustrate” causation, but these beautiful models cannot be applied to real-life systems. This point has been discussed extensively in cancer research. By creating models that lack the key heterogeneity of system constraints and stages of selection, we can cure cancer in dishes and extremely limited animal models. But such success cannot be transferred to the clinic, where heterogeneity-mediated emergence defines the cancer (Heng, 2015). Given the new realization of the relationships between the gene and genome, genomic heterogeneity, and fuzzy inheritance, how heterogeneity impacts emergent properties is now also under investigation. Emergence can be classified as weak and strong emergence: the stronger the emergence, the more challenging to predict. Unlike non-biological systems with weak emergence, the task of studying emergence in biological systems is daunting. Due to the fact that the component properties of biological systems are highly state dependent, biological systems belong to very strong emergence types. The reconstruction of emergent properties from lower levels requires a vast amount of information regarding the state-dependency of its component properties (Kolodkin et al., 2013). Clearly, one of the key component properties (as well as higher levels’ system behavior) is bio-heterogeneity. Heng et al., 2019

Several examples of how different types of genome heterogeneity can impact emergence are listed in Table 7.1. Furthermore, Fig. 7.1 illustrates the relationship between genome heterogeneity and the emergent properties of cellular populations. Clearly, unlike the current gene theory, the genome theory must address the reality of unavoidable emergent properties in complex biological systems.

7.3.3 Understanding Genomic Principles Through the Lens of Evolution Although Mendelian genetics, through population genetic analyses, has greatly contributed to Modern Synthesis, the evolutionary theory itself seems to have less of an impact on genetic mechanisms, even though

390

7. THE GENOME THEORY: A NEW FRAMEWORK

TABLE 7.1 Heterogeneity Alters Emergent Properties. • The topological arrangement of agents (genes) changes the properties of the genome • Quantitative heterogeneity leads to different emergent properties • The same mutation (mtDNA transfer RNA mutation at nucleotide 3243A > G) can cause different types of diseases depending on the degree of heteroplasmy (Wallace and Chalkia, 2013). • Cell density can influence the switching of pathways • Selection pressure swings the patterns of emergent properties: the advantages of average versus outliers within physiological versus pathological condition • The dynamics of emergent properties: transitional populations are important for macrocellular evolution • Emergence based on collaborative agents • Multiple levels of heterogeneity • Heterogeneity is the key feature for most biosystems with advantages in evolutionary selection • The genomic basis of heterogeneity has been identified as fuzzy inheritance • Successful cellular adaptation requires increased heterogeneity to deal with dynamic environmental changes • Environments can function as dynamic agents • Altered karyotypes impact gene function • Homogenous or heterogeneous genome environments lead to different patterns of emergence (Fig. 7.1)

FIGURE 7.1 An illustration of how genome heterogeneity impacts the emergent properties of cellular populations. Because there is no direct correlation from individual agents to the emergent properties, the final properties are based on the collective emergence of all agents. Circles represent cells with normal karyotypes, triangles represent cells with altered chromosomes or karyotypes, and arrows represent pathways among agents (e.g. different cells). These variable properties are the potential basis for cancer evolution. Modified from Ye et al., 2018.

7.3 UNIQUE CONSIDERATIONS FOR GENOME THEORY

391

selection can influence the gene frequency. This is very puzzling, given the fact that nearly all the organisms on earth are products of evolution. The evidence that evolution impacts how genetic mechanisms work was observed during experiments that watch cancer evolution in action (Chapter 3) (Heng et al., 2006a-c, 2011a,b; 2013a-c). When selection acts on the karyotype rather than the gene level during the highly dynamic macroevolutionary phase, the function of individual genes becomes less important. Even some well-known, powerful “cancer genes” become invisible for evolution (thus real expressed genes display no function). In other words, the evolutionary context decides if a gene’s function can be realized or not. Thus, the reason why Mendel could come to his law of genetics is largely dependent on the artificial nature of his evolutionary selection experiments (Chapter 2). The artificial evolutionary experiments clearly created the unique 3:1 and 9:3:3:1 patterns, which drastically differs from the pattern found in nature, where environmental interaction is an unavoidable part of the phenotype and mating is highly random. These artificial evolutionary experiments selected the specific range on the spectrum coded by fuzzy inheritance, leading to increased repeatability. In contrast, in natural conditions, the spectrum of expression of fuzzy inheritance is stretched further and the repeatability is drastically reduced as there are more phenotypic options that can be selected by environments. Now, it is much easier to understand the importance of the genomedefined system inheritance in biology, which is selected by the evolutionary process. For example, breaking the genome constraint is the precondition for speciation; the rapid-reorganizing new genome is the most effective method to ensure survival and system creation; mating with an individual of the same karyotype is the most effective way to establish the reproductive barrier. These are all related to evolution. The same explanation applies to why fuzzy inheritance is a widespread phenomenon. Biological information can be much more precise if the environment is more stable. This has been amply demonstrated by genetic manipulating experiments where bio-efficacy can often be improved by rewiring genetic networks (Isalan et al., 2008; Heng, 2015). However, for most long-lasting species, the key for evolutionary selection might not be finding the most efficient or energy-saving phenotype, but the greatest dynamic potential of high adaptability to many different environments. This may be less efficient or energy-wasting. (By the way, according to traditional natural selection, small differences, such as a different efficacy, are the selective targets. The new genome theory promotes, conversely, that the selection target is related to short-term survival.) In other words, evolution has historically selected species with compromised benefit of both survival under crisis (by existed variants and/or the capability to

392

7. THE GENOME THEORY: A NEW FRAMEWORK

quickly produce them) and fast but not maximal growth under normal conditions. Of course, the compromise favored by evolution is also reflected by multiple mechanisms in the DNA repair tool box. Under high stress, only the less precise repair mechanisms can save the species, as only changed genomes can survive, despite the possibility of system changes or speciation.

7.4 OUTLINE OF THE GENOME THEORY Twelve principles summarizing the genome theory are outlined, many of which have been mentioned or briefly discussed in previous chapters. Some of them overlap to a certain extent to provide a bigger picture. Listing all of them in the same place, and in particular, briefly recapping the rationale behind searching and maturing these principles, will help readers to appreciate and further improve the genome theory. The order does not reflect their importance, and the list will surely grow with the theory’s maturation. 1. The genome is the highest level of genetic organization and is not simply equal to the sum of all its genes or sequences. The genome functions as a gene organizer at a higher genomic level. Despite the general understanding that the whole is not equal to the sum of its parts and that the genome is not just a bag of genes, the importance of “genomic topology” was not highly appreciated before the large-scale sequencing era. The gene theory predicted that key genetic differences among species must involve new gene content. Following the unexpected finding that many species, in fact, share similar genes, attention then shifted to the regulation of these similar genes and the importance of genetic networks. The same idea of systematically characterizing noncoding sequences has currently become popular in evolutionary studies and human disease studies. The ENCODE project (encyclopedia of DNA elements) has generated a high level of excitement, for example. Unfortunately, mainly focusing on the parts (genetic elementsdeither the gene itself or other DNA elements that regulate genes) will not solve the mysteries of life in a fundamental way. The key missing element is the primary level of genomic organization. Such organization provides a physical platform on which genetic networks can emerge. For a long time, it was known that the architecture of chromosomes is important to many biological functions (Chapters 1, 2, 4). However, chromosomes’ ultimate importance has long been ignored in favor of the power of genes. The real breakthrough came

7.4 OUTLINE OF THE GENOME THEORY

393

from a synthesis based on a number of lines of observation/ thinking: first, watching cancer evolution in action reveals the chromosomes’ dominant role in cancer evolution; second, it becomes clear that the genomic topology is new genomic information and that the physical relationship of DNA fragments along and between chromosomes can provide such topological information; third, while many key genes can be deleted without fundamentally changing the biosystem itself (some obvious features can be recovered through rewiring), in contrast, karyotype changes are profoundly linked to system evolution including speciation; fourth, in most complex biological systems, there is no obvious causative relationship between parts and their emergence at higher organization levels. Furthermore, the physical arrangement of the parts contributes to its emergence. The fixed genomic topology of a given species can play the role of “network organizer” by providing the same initial condition for a developmental process, for example. Then, the function of sex can preserve such an initial condition among individuals of the same species. From these realizations, a hidden connection became clear: the karyotype ensures a topological relationship (in 3D nuclei) among all genes and their regulatory elements, and this emergent relationship determines the network structure and dynamics. This could be the reason why (1) the vast majority of different species display different karyotypes, as they are indeed different genomedefined systems; (2) large-scale analyses of genetic parts reveals only limited knowledge regarding how genes work at the system level; (3) it is so challenging to identify common genes or regulation elements in common diseases; and (4) sponges have so many different types of genes but no corresponding functions. It is likely that the specific physical relationship among these genes on their chromosomes holds the blame. Especially when sexual reproduction does not allow these genes to change the genomic topology, they are stuck with the deal of being a sponge, which cannot be broken by gene mutation. These same genes could have very different phenotypes if they are located within different genomes and thus belong to different species frameworks. At the species level, the genome contains all the genetic and epigenetic information for a given species. There are some disagreements when discussing somatic evolution, though. As cellular populations, tissues, and organs are involved, some argue that the research focus should be on levels above the genome. However, if the issue here involved somatic cell evolution, then genome-defined inheritance must be involved, as evolution simply will not work without inheritance. This fact illustrates the

394

7. THE GENOME THEORY: A NEW FRAMEWORK

importance of focusing on the genome when studying somatic cell evolution within the context of tissues and organs, as all levels above the cell simply function as evolutionary constraints, which can fundamentally change the cellular evolutionary process. 2. The genome and the gene represent distinctively different types of “system inheritance” and “parts inheritance,” respectively. For a given species, system inheritance maintains the content of the genes and determines the inherited genomic topological relationship among genes and other DNA sequences. If genes and chromosomes represent different levels of genomic organization, they likely will have different functions. To fully appreciate this distinction, one must understand the various types of genomic coding systems in biology. Everyone is familiar with “parts coding” or gene coding, but not with chromosome-coded “system inheritance,” unfortunately. “Parts inheritance” is easier to understand as it codes for proteins, the research focus for molecular genetics for over half a century. System inheritance,” in contrast, is difficult to decipher as it involves all the emergent properties, and it is difficult to predict by individual “parts coding.” System biologists are among the first to realize that a list of genetic parts differs from a blueprint (the instructions required to put those parts together to create a functional system). Again, however, few have realized that chromosome coding function is this important blueprint by simply controlling the genomic topology. Because the rewiring of the network within a defined genome and the reshuffling of the genome to create different networks are distinct processes, an analysis of the network dynamics must focus on the multiple level landscapes (Heng et al., 2011; Heng, 2015, 2017). In addition to maintaining system inheritance (both when regulating the entire genome and preserving the reproductive barrier), the integrity of the chromosome also ensures the integrity of the gene content. Without the function of sexual reproduction, for example, gene content is highly dynamic (Chapter 5). In bacteria, for example, gene transfer is at its highest rate. There are different means to create new karyotypes. Inducing genome chaos is one of the most effective ways. However, not all drastically altered karyotypes are survivable. It was hypothesized that the rearranged karyotypes should not destroy or break up important biological modules. This hypothesis agrees with the importance of the synteny relationship among different species. It is possible that these syntenies are responsible for the important biological modules.

7.4 OUTLINE OF THE GENOME THEORY

395

3. The genome represents a collective information package. In contrast, most individual genes are not independent informational units as their functions are genome-context dependent (Chapter 4). In the context of 4D genomics, genes are linked and most have no isolated function. Defining a gene’s function is similar to defining the function of a brick. It is a building material, of course, but depending on its context it also can function as a door stopper, a piece of art, an exercise tool, or even a murder weapon, none of which have anything to do with building. Researchers are familiar with similar phenomena in biology. For example, the same gene can play a variety of functions across species. The same protein can have different functions in different cell locations (proteins with such varied functions have been referred to as “moonlighting” proteins). It is believed that protein moonlighting may occur widely in nature. It is important to realize that the initial identification of a given protein’s function is dependent on the experimental system and that new functions can be revealed by changing the detection system to one that focuses on a different genome context. Thus, genes can “moonlight” depending on the genome context. For example, a review of literature reveals that the p53 gene has been linked to seemingly countless pathways and functions. It should be argued that this information is both true and not true. As p53 is a popular research target, researchers in different fields have tried to link p53 to their own favorite pathways using diverse systems (different cell lines and patient samples are different genome systems as the karyotypes are unique). Within these large numbers of distinctive genome systems, one gene can be linked to almost anything if the conditions permit. This illustrates that the function of p53 is not fixed and truly context dependentdalthough p53 can have an array of functions within different systems, its function within a given system is actually limited. Because the function of each gene depends on the context that is determined at the genome level, it would be difficult to judge “good” or “bad” effects based on individual gene mutations. For cells with or without various mutations or genome alterations, the only advantage or disadvantage is evolutionary survival. A “good” or “bad” designation from an individual gene point of view is irrelevant (when you have so many). For example, the p53 mutation is well known for its harmful effects as it causes genome instability and is frequently detected in many types of cancer samples. However, the benefit of the p53 mutation is also obvious under certain conditions. Because the deletion of the BRCA gene is

396

7. THE GENOME THEORY: A NEW FRAMEWORK

embryonically lethal, the p53 mutation is essential to rescue mice with the BRCA mutation. Indeed, with the p53 mutation, mice will have a higher probability of getting cancer, but the p53 mutation in this specific instance has played an important role allowing mice to survive and pass on their genomes, the ultimate goal of evolution. Similarly, mice with the p53 mutation also display survival advantages when irradiated in utero (Heng, 2007a). These two examples of the p53 mutation’s “good” and “bad” features illustrate gene combinational effects and how a status can change from “bad” to “good” or vice versa when environments are altered. These examples are all part of the limited independent functions of individual genes (Heng, 2009). Additional supportive evidence comes from the observation that a classical “tumor suppressor” can switch to an “oncogene” simply through a new combination of gene mutations or a different splicing form. Moreover, it is commonly known that the same gene mutation can be good for the health of one organ, but bad for another. Just imagine a gene-centric physician asking patients: Do you want a perfect liver at the expense of your heart, or vice versa? These examples question the strategy to screen individual gene mutations for common and complex diseases: what is the point when there are so many gene mutations (or genetic loci) involved, and each only contributes a tiny “causative” portion which significantly differs from those highly penetration gene mutation for familial types of diseases? Even for gene mutations with high penetration in some patient populations, environmental dynamics teaming up with fuzzy inheritance can make gene-based predictions less meaningful. It should be pointed out that studying “geneeenvironment interactions” is difficult if the individual gene is not the independent information unit. It will be difficult to trace down individual genes for a majority of phenotypes, as different individuals have different gene mutation profiles, which respond differently to the same environmental stress. In contrast, the genome is a collective information package within which different gene-defined molecular pathways can be used to respond to the same environmental stress. For example, the induced genome instability level can be used to measure the environmental stress without pinpointing a specific gene. Of course, under some more extreme and constant conditions, such as living at extremely high altitudes for many generations, the patterns of gene-mediated adaptation can become obvious (Beall, 2000). The success of genetic studies of this kind, however, will not guarantee the success of other studies that deal with highly

7.4 OUTLINE OF THE GENOME THEORY

397

dynamic populations under multiple confounding environmental factors, especially knowing that the genotype is also fuzzy in the first place. 4. The principle object of selection is the genome, not the gene, although genes are essential for achieving the genome’s potential and maintaining its longevity and status. The above principles indicate that the genome system is not controlled by individual genes but rather by the collective action of all genes acting within the boundary defined by the genome. Thus, it is genome-level changes that are most profound leading to the system changes while gene-level changes modify only features of a given system. Importantly, the genome’s higher level of control permits dynamics to occur at the lower gene level, which can lead to altering the function of individual genes by eliminating or reducing some gene’s function and increasing or creating new functions for other genes. The overall beneficial results often come from collaborating or canceling out each other’s functions at the lower level. Yes, this costs energy, but it is essential for survival under changing environments. This is particularly true when the genome is unstable because under such conditions, individual gene functions become less predictable. This instability can lead to combinational responses, pathway switching, moonlighting, and finally, overall emergent properties. This point has been amply illustrated in somatic cell models. The genome control and information system capability ensure the genome’s ultimate role in evolution: serving as the principle package of biological selection. It is important to point out that it is the genome, not genes or pathways or even the individual, that is the main object of selection. Mayr listed five different levels of selection: the gene, the gamete, the individual organism, the group, and higher levels (species and above). Among all these levels, the individual organism has been considered by most evolutionists as the principal object of selection because “the result of continuously ongoing selection is the adaptation of organisms”, and “when an evolutionist says that the ‘genome is a program that directs development,’ it would be wrong to think of it in a deterministic way. The development of the phenotype involves many stochastic processes which preclude a one-to-one relation between the genotype and phenotype. This is, of course, precisely the reason why we must accept the phenotype as the object of selection rather than the genotype” (Mayr 1997). There are some major conceptual challenges to this popular view, however. First, the success of an individual’s phenotype

398

7. THE GENOME THEORY: A NEW FRAMEWORK

can hardly be passed on if that individual’s genome cannot be passed on. Second, the phenotype is also contributed to by many somatic cell variations, which even cannot be passed. Thus, these adaptive advantages cannot be accumulated within the population. In addition, it is neither the exact genome of the individual being passed nor the combinations of the genes, but the core genome of the species with limited personal touch. For example, the distinctive features of individuals with altered chromosomes such as trisomy 21, XXY, or other inversions cannot be passed to the next generation because of the sexual filter. They pass on XX 46 or XY 46 genomes which differ from their own. Furthermore, because of the genetic recombination process during sexual reproduction, many “good gene” combinations will be lost to the next generation in most individuals. As is such, the core genome is the common linkage between individuals of the same species. It is true that there is no strict determinist relationship between phenotypes and genotypes. However, the individual genomes determine the boundary of possibilities that can interact with the environment; by extension, these genomes determine certain phenotypes. In conclusion, the key is passing the core genome, as the new environments will select a specific phenotype from the genome-coded fuzzy inheritance spectrum. Individuals come and go, but the core genome survives as long as the species still exists. In a sense, genomes (not genes or pathways or individuals) are the main objects of selection. Evolutionary success of an individual is determined by the passage of its genome, and an individual is not equal to a (core) genome. Superficially, natural selection selects the individual, but at a more profound level, it is the genome package that matters (Heng, 2009). “Without the genome, the organism would not exist.” (Keller and Harel, 2007). Let us look at the key role of the genome during different phases of evolutionary selection. During the macroevolution phase, its main function is to initiate the speciation process by creating a new genome from the old and improving the new genome before becoming a core genome for the lasting species (see Chapter 6). During the microevolution (adaptation) phase, the genome functions as a platform or organizer for massive lower level stochastic genetic and epigenetic alterations. The emergent properties of such a platform are also influenced by environmental interactions. On the surface, there are individual winners and losers during selection. However, these winners may not necessarily pass on their winning profile of genes as their specific combination of genes cannot be faithfully transmitted because of meiosis. In addition, the environment is constantly changing so

7.4 OUTLINE OF THE GENOME THEORY

399

that today’s winner might become tomorrow’s loser. The key is passing on more copies of the core genome (the system framework) regardless of individual gene combinations. Additionally, because of fuzzy inheritance and the power of gene regulation, in many situations, the system’s adaptation can be achieved without changing the gene status. That is the reason why a clear-cut relationship between allele frequencies change and specific phenotypes only can be observed from exceptional environments. The next key question is how to integrate the concept of “allele frequency changes in a population” into the equation of the genome theory. If, as argued, natural selection cannot directly lead to speciation by accumulating smaller changes over time, then how can gene pool changes be linked to the macroevolutionary process? As illustrated in Fig. 6.2, in the microevolution phase (phase 4), gene-mediated selection is hypothesized to produce a large population of selected species. Here are some proposed details: To become a long-lasting species, it is crucial to form a large and diverse population as soon as possible. There are three stages to effectively achieve this goal: (1) quickly identify the new niche and further develop specific features to take advantage of the niche; (2) generate a sufficient degree of population heterogeneity; and (3) occupy as many diverse ecological niches as possible to make the species widespread. Obviously, natural selection can be most helpful to develop the healthy population by any or all three of these means. Because of this, gene- and epigene-mediated selection becomes a necessary component of the macro- and microevolutionary equation. Although speciation is not mainly achieved by the accumulation of microevolutionary changes, microevolution is still necessary for speciation. More specifically, despite the fact that environments always swing back and forth, there are periods in between sufficient for gene/epigene-mediated selection to operate on many generations. Within these timeframes, microevolutionary selection can change the gene pool to favor specific selected features to increase or maintain the population size. When a new environmental swing occurs, a new run of microselection will work again on focusing different features with different alleles. As a result, genes favored by microevolution come and go, but the core genome survives. Nevertheless, gene/epigene-mediated microevolution can push the boundary of evolutionary potential defined by the genome and turn some outliers (with specific alleles) into a population’s average.

400

7. THE GENOME THEORY: A NEW FRAMEWORK

Such explanations/concepts can also be used to further understand the following issues: a. Gene-defined specific features can effectively help a given species become widespread. Many species are widespread not because of their capability to tolerate diverse environments but because they are specialized for habitats that are widespread (Jablonski et al., 2013). A young species can quickly specialize for habitats that are widespread and then become widespread with the help of environmental distribution. For young species that fail to achieve such microevolution-based specialization early on, the chance of becoming a dominant and widespread species is reduced. b. If there are many different niches, different subpopulations can be formed with slightly different gene frequencies to better fit different niches. In each episode of big environmental change, some populations survive better compared with others because of differences in the gene pool. The strategy of “don’t put all your eggs in one basket” works: it can increase the chance of survival of some individuals, even under harsh conditions. Such unequal survival of different populations is important for the survival of a given species overall. As the core genome is the same among different populations, they can easily repopulate with each other when needed. c. While most species cannot accumulate their directional changes because of environmental dynamics, some species can have long stable environmental selection which can accumulate certain genetic and phenotypic features. Higher altitudes, further latitudes, and isolated islands, for example, can offer relatively stable and constant selective conditions, which can lead to some exceptional genotypeephenotype correlations. Nevertheless, such accumulation will likely only accumulate a few unique features, not lead to speciation. d. Microevolution also can indirectly increase the opportunities for the speciation. As microevolution can increase the population size, it also increases opportunities for spontaneous chromosomal alterations, as well as the chance of hybridization with other species. Clearly, the relationship between micro- and macroevolution is a rather complicated one. 5. Genome-mediated macroevolution and gene-mediated microevolution are distinct processes involving unique genomic mechanisms. In organismal evolution, the genomes and genes are responsible for system constraint followed by speciation and genetic dynamics,

7.4 OUTLINE OF THE GENOME THEORY

401

while in somatic cell evolution, genome reorganization creates the cancer genome and gene mutations contribute to cancer genome expansion resulting in clinical observed cancer. Because of the different types of inheritance provided by genes and genomes, the evolutionary function between these two levels of genetic organization is distinctively different. The beautiful balance between constraint and necessary evolutionary dynamics has been achieved by the following gene/genome relationship: In organismal evolution, the genome plays a unique role in species stabilization (constraint), yet paradoxically contributes immensely to speciation by altering a species’ genomic topology (the major innovation above species which leads to biodiversity). In contrast, the gene increases the diversity within a species and is important for population adaptation and survival, which is important for the maintenance of the species. Regardless of what type of mutations dominate different population or different stages of evolution, it is the integrity of genome that is preserved, and it is the emergence of new genomes that leads to these endless forms most beautiful and most wonderful. The separation of germline and somatic cells also contributes to an increase in dynamics necessary for successful adaptation without jeopardizing genome constraint. In somatic cells, increased genome alterations can provide adaptive potential. As long as the germline is separate from somatic cells and further protected by the sex filter, somatic cell dynamics is favored particularly under stress despite the potential for creating disease conditions (see Sections 7.4 and 7.5.2 for more discussion). Interestingly, somatic cell evolution follows different rules. There is no significant advantage for would-be cancer cells to maintain the same genome. In fact, genome replacement is the most effective way to achieve macrocellular evolution. More importantly, as there is no sexual filter for somatic cells, genome alterations become the most dominant form of change or dynamics. Such genome flux is much more powerful than individual gene mutations because any significant genome alteration involves new genetic networks and large numbers of genes. That being said, gene mutations do play an important role to proliferate cancer cells to become clinically significant (see cancer model, Fig. 3.11). Evolutionary dynamics and constraint represent two sides of the same coin. This solves a key paradox on why evolution is marked by short-term genetic changes for adaptation and longterm stasis domination by “fixed genomes.” Some additional differences that separate macro- and microevolution are listed in Table 7.2.

402

7. THE GENOME THEORY: A NEW FRAMEWORK

TABLE 7.2 The Major Differences Between Macro- and Microevolution. Feature

Macroevolution

Microevolution

Genetic level involved

Genome

Gene, epigenetic, or lower level

Mechanism

Stochastic genome reshuffling to form a lasting stable genome

Limited accumulation of gene mutations and epigenetic alterations

Pattern of changes

Quick, sudden, punctuated

Slow, longer-lasting, stepwise, short-term accumulation

Biological consequences

Speciation, changes system inheritance

Species improvement, adaptation, changes parts inheritance

Predictability

Very low, for sexual eukaryotes involves genome reorganization and finding a mate with similar genome alterations

Relatively higher, short term

Response to environmental changes

Stable, but when changed, more dramatic

Less stable, also lower impact

Constrained by sexual filter

Constrained in sexual eukaryotes

Minimally constrained

6. Prokaryotes and eukaryotes display different patterns of evolution. The evolutionary pattern between prokaryotes and most sexually reproducing eukaryotes is drastically different (see Chapter 5). Because of the weaker efficacy of constraint in the prokaryote genome that lacks the powerful sex filter, prokaryotes are constantly engaged in both macro- and microevolutionary processes which display punctuated evolution at both the genome and gene level. As a result, it has been difficult to identify speciesspecific core sequences in prokaryotes. In contrast, sexual reproduction in eukaryotes becomes a strong evolutionary constraint. During the speciation process, initial changes are genome alterations. Within this stage, the punctuated phase of evolution dominates. As soon as the species is established as a stable entity, the sex filter preserves the system stasis over the long term and microevolution then takes over. Thus, the two evolutionary phases are clearly separated in sexually reproducing eukaryotes. It can be predicted that because of the two phases of

7.4 OUTLINE OF THE GENOME THEORY

403

evolution, periodic stress will push existing successful genomes into extinction and/or produce new species with new genomes, starting yet another cycle. Somatic cell evolution displays similar patterns to organismal asexual evolution but with higher dynamics because of the involvement of chromosomal reshaping (number change, translocations, and especially genome chaos). Genome reorganization is the most effective way to create new genomes and is much more powerful than individual gene transfer because the new genomic topology defines the new network structure. This principle has important implications. Traditionally, the evolutionary pattern of bacteria or viruses has been used to explain/support the organismal evolution of eukaryotes. Knowing the key differences between bacteria and sexual eukaryotes, and by extension patterns of asexual organisms and sexual eukaryotes, the evolutionary pattern of bacteria and sexual eukaryotes should no longer be considered synonymous. For example, bacterial artificial evolutionary experiments have often been used to explain the relationship between macro- and microevolution in organismal evolution, so are bacterial drug-resistant experiments. These explanations need careful reexamination. Similarly, we should not apply the stepwise evolution pattern within the microevolutionary phase that is typically observed in sexual eukaryotes to somatic cell evolution, where macroevolution dominates. Interestingly, however, the eukaryotic somatic cell evolutionary pattern is very similar to the natural evolution of sexual organisms, despite obvious time scale differences. Both display cycles of longterm microevolution and short-term macroevolution. As there is no sexual filter involved in somatic cell evolution, macroevolution occurs much more frequently compared with organismal evolution in eukaryotes. Significantly, the cellular evolution model has provided interesting information regarding the relationship between phase transitions between micro- and macroevolution. When the overall system is extremely unstable, the microevolutionary phase will be replaced by the macroevolutionary phase. Note that it is not the accumulation of microevolution but rather overall system instability that triggers such changes. In fact, lessons learned from cancer evolution served as the key hints for the development of the genome theory. 7. The genome or karyotype (not specific genes) defines a species. The genome package also determines the boundary of epigenetic changes and potential responses to environmental stress. This issue has been extensively discussed in Chapters 4 and 6. For species with sexual reproduction, the core molecular

404

7. THE GENOME THEORY: A NEW FRAMEWORK

karyotype (with high resolution at the subchromosome level) can be used to define and distinguish different species. For asexual species, the similarity of the genome can be used instead. The rationale of using the molecular karyotype to distinguish species is based on the same principle of judging species using genome similarity. For species without a strong genome constraint, the ecological concept of a species might be useful. A quantitative model to measure the similarity of the genomes (both the content of the gene and their genomic topologies) is needed for this purpose, which can be further integrated with other data based on alternative species concepts. Using the karyotype to distinguish species has its practical value. To generate artificial species in the future, monitoring the karyotypic changes and matching potential mating partners will be essential. Another often ignored meaning of a “genome defines species” is that the genome also defines the boundary for many different genetic/epigenetic landscapes and species-specific features, as well as the overall environment response, as many features or phenotypes are fundamentally influenced by both species-specific fuzzy inheritance and programmed environment response. For example, the maximal and minimal features and phenotypic plasticity include heights, weight, speed, life span, environment tolerance, and response. The different genomic and epigenomic landscapes must be, under the constraint of the genome as well, based on the fact that the genome is the evolutionary selection unit and the organizer of the network. Thus, the viewpoint that epigenetics represents a level above the genome seems incorrect. The genome context includes epigenetic mechanisms as diverse genomes display their own epigenetic patterns and dynamics. For example, many biological responses such as DNA repair (Weis et al., 2008), retroviral interaction (flies and others), toxicity response (between humans and mice), and epigenetic signatures (among species) are different among species. Similarly, most regulatory mechanisms such as alternative splicing, posttranslational modifications, protein degradation, and moonlighting, as well as stochastic regulation among available parts, all belong to the potential functions of a given genome. Although the environmental interaction can generate different results, they all fall within the range coded by the fuzzy inheritance. In addition, the genome defines the initial conditions for development and the phenotype’s final potential. In other words, genomics outlines the phenotypic potential, and environments make it as a reality. The multiple layers of the genomic/epigenomic interaction have been discussed as follows:

7.4 OUTLINE OF THE GENOME THEORY

405

The relationship among gene mutations, epigenetic changes and genome changes can be illustrated by the multiple level landscape model, where local landscape represents gene/epigene status and global landscape represents the status of genome replacement. Fundamentally, the impact of lower levels of alteration (which modify the system) needs to reach higher-level change in order to create new systems. In addition, there is no accumulative relationship between gene mutation and genome alterations. Since the genome represents the highest level of genetic organization, its alteration often has a much larger effect than individual gene alterations. Chromosomal changes can impact on hundreds or thousands of individual genes. Furthermore, rearrangement of the genome can change overall genomic information patterns without generating aberrations in specific cancer genes. Heng, 2015 (with permission from World Scientific)

8. Fuzzy inheritance: how evolutionary selection alters genomic mechanisms. One of the biggest surprises from current large-scale genomic profiles of both patients and normal individuals is that the multiple levels of genomic heterogeneity is overwhelming and that the resilience of the biosystem that can tolerate high levels of genomic variation. These observations have forcefully challenged the viewpoint of genomic precision. In fact, as described in Chapters 3 and 4, a high level of altered chromosomes has long been observed (Heng et al., 1987b, 2013b; Heng and Chen, 1985). Influenced by the precise nature of genetics, however, few have appreciated these discoveries. While watching cancer evolution in action in experiments, there was no way to continuously ignore the presence of these seemingly random but persistent altered chromosomes, especially when they clearly drive cancer evolution (Heng et al., 2006a-c). Soon, the fuzziness of genotype between mother and daughter cells was confirmed from gene and epigene levels, and the new types of inheritance were observed in nearly all biological processes, from earlier development stages through the aging process and everything in between. Finally, following a reexamination of Mendel’s original publication, as well as other publications from the same era, it was realized that genetic data should be (and would have been) fuzzy in the first place if Mendel did not use highly artificial experiment system conditions to collect his data. Moreover, even a single gene codes for a spectrum of fuzzy inheritance. Under extremely artificial selected systems, however, the majority of this fuzzy spectrum can be eliminated, resulting in Mendel’s classical pattern of inheritance (which is clearly not the case in nature). And so, through synthesis based on the above information, we proposed the new concept of fuzzy inheritance. Introducing the concept is a timely effort to promote the genome-based genomic theory built on solid genomic facts rather than wishful thinking or cherry-picked observations.

406

7. THE GENOME THEORY: A NEW FRAMEWORK

Further research has suggested that fuzzy inheritance serves as the mechanism of genomic heterogeneity and has been shaped by evolution. With the attitude toward fuzzy inheritance changing, it becomes obvious that fuzzy inheritance is dominant in all asexual organisms, with high levels of genomic alterations (gene content changes, genomic topological changes, and gene transfers across individuals within and outside the same species). In most sexually reproducing eukaryotic species, even though their core genomes are stable because of the function of sexual reproduction, high levels of fuzziness are achieved by the separation of the germline and somatic cells, which pushes somatic inheritance to high levels. Recently, multiple types of genomic fuzziness have been observed from the somatic cell level. In particular, the highest level is often observed in cancer evolution. The linkage of increased fuzzy inheritance with somatic evolution suggests that these variations are not just generated by mistake, but rather promoted or guaranteed by a genomic mechanism (fuzzy inheritance), as certain levels of variation are programmed for cellular adaptation. To increase its capability of adaptation, a somatic cell can be highly plasticized through the utilization of fuzzy inheritance, as long as the germline displays minimal changes, and the reproductive process further eliminates the vast majority of genome-level changes to retain the identity of the species. The conclusion is that a high level of fuzzy inheritance is favored by evolution. That is why so many mechanisms that generate fuzzy inheritance have been maintained not for precise outcomes but for generations of heterogeneity (see Chapter 4). Such packages have a better chance of surviving, even though they often do not match their environments exactly. Of course, such a heterogeneity-favoring strategy has its own trade-offs. While heterogeneity is useful for adaptation and survival under stress, it also contributes to various diseases. This seems to go against the wish that evolution should ultimately wipe out cancer and other diseases. In fact, it’s quite the opposite: one by-product of the evolution favored package, with its high level of heterogeneity, is that many individuals could display variable phenotypes, and some of them will unfortunately become sick under certain conditions (Heng, 2015). 9. Genome’s response to stress: the key platform for adaptation and new system emergence. Barbara McClintock has insightfully considered the genome as a highly sensitive organ of the cell responsible for monitoring genomic activity and environmental response (McClintock, 1984).

7.4 OUTLINE OF THE GENOME THEORY

407

In contrast, current research on cellular response to environmental stress is focused on different pathways and diverse cellular organs and molecular complexes including the mitochondria, endoplasmic reticulum, cellular machineries for transcriptional regulation, protein modification, checkpoints, DNA repair, and protein degradation. Although interesting, studying the individual isolated molecular mechanisms does not make sense when applied to a clinical setting. When monitored under a holistic lens, their reactions are often stochastic (difficult to predict among many agents) and frequently conflict with each other. It is thus essential to focus on the genome’s behavior through the evolutionary process. The following three-stage model is proposed to illustrate how a system responds to stress its potential relationship with system adaptation (microevolution) and new systems emergence (macroevolution). Depending on the level of stress, system recovery can be achieved by (1) expending energy without any genetic/epigenetic alterations, (2) causing genes to change their status or epigenetic status, leading to gene mutation or epigenetic deregulation, and finally (3) causing the genome to change, producing new systems. When a novel genome emerges, the evolutionary competition ascends to a new level (for more detail see Fig. 7.2). Of course, the quantitative feature of cellular populations is of importance (Fig. 7.1).

FIGURE 7.2 The relationships between stress, genetic/epigenetic change, and patterns of evolution. This three-stage model illustrates how system response to stress results in system adaptation and new system emergence. The stress level is indicated by color: blue indicates low, green indicates high, and red indicates extremely high. Genome A represents a genomic system without stress-induced gene/epigene and genome changes; genome A0 represents system with stress-induced gene/epigene level changes, and genomes BeD represent different stress-induced emergent systems with altered genomes. The change from genome A to BeD covers three stages: regulatory alteration, gene/epigene alteration (microevolution), and genome replacement (macroevolution).

408

7. THE GENOME THEORY: A NEW FRAMEWORK

As mentioned previously, the multiple levels of genetic/ epigenetic alterations are a result of stress-induced dynamics, which represents an effective adaptation method. However, the associated disadvantage is the risk of creating disease conditions. It is likely that during the aging or tissue repair process, the lost functions of tissues can be partially compensated by increased genome diversity (such as increasing the degree of aneuploidy), which in turn might accelerate cancer evolution. Clearly, more effort should be used to study the balance of genetic variationemediated adaptation and its potential disease risks (Heng et al., 2013a,b; Heng, 2017a,b). The above analyses are mainly based on somatic evolution. A similar model can be applied to organismal evolution. In addition, many molecular mechanisms can be employed by the genome to gain adaptations. Similarly, there are different mechanisms to achieve macroevolution by forming new genome systems: the genome can be changed either gradually (similar to the spontaneous chromosome changes) or drastically through the genome chaos (see Chapter 6). Another important aspect of macroevolution is the transition of the outliers into the population average, which also involves the pattern of emergence within a heterogeneous population. Under cellular evolutionary conditions, nonclonal chromosome aberrations (NCCAs) become clonal chromosome aberrations; in organismal evolution, newly formed species become stable and lasting species. Of course, the multiple levels of genomic alterationemediated evolution in cells are constantly under the constraint of higher systems, including tissues, organs, and the individual’s overall homeostasis. Most runs of somatic cell evolution will likely fail to become a disease phenotype. Therefore, despite detectable genetic variations from various normal tissues, there is no absolute relationship between genetic variation and diseases. One thing that is definite, however, is that the elevated rate of genetic variation increases the potential of disease. In organismal evolution, for a given species, the core genome is rather stable across the long period of time. As for speciation, geographic constraints, survivability, and evolvability defined by genomeeenvironment interactions play an important role. 10. Somatic genome alterations represent a major source of human disease. As mentioned in point #9, while the genomic heterogeneity is essential for cellular adaptation, it can contribute to many diseases, such as cancer, as a trade-off. Furthermore, under

7.4 OUTLINE OF THE GENOME THEORY

409

environmental stress, the rate of bio-errors is usually elevated. Under a drastically changed lifestyle, increased common and complex diseases will be diagnosed in the general population, especially when life span is significantly increasing. Although genomic and nongenomic heterogeneity can be detected from the gene/epigene and chromosomal levels (even in mitochondrial genome), given the limited appreciation on chromosomal research in most common and complex diseases, it is necessary to emphasize the link between genome alterations and diseases. In fact, after many years of arguments, the ultimate importance of chromosome alterationemediated cancer evolution has started to emerge and become appreciated, especially after the cancer genome project delivered an unexpected finding to support the importance of the genome alterations. With the gradual acceptance of the genome theory, increased research will be performed using chromosome-based platforms. The following prediction was made a few years ago: It is crucial to realize that genetic heterogeneity and particularly karyotype heterogeneity is the driving force behind many diseases. . In general, evolution is not dependent on a specific pathway (not only because of the plethora of pathways in the first place, but also because so many factors can alter pathway function during evolution, rendering it an unpredictable process), but rather depends on the presence of genetic heterogeneity, which is the rationale for focusing on heterogeneity rather than on specific pathways for most common and complex diseases. This approach is strongly supported by the results of the current cancer genome sequencing project which demonstrates that there is overwhelming genetic heterogeneity at the gene mutation level, copy number variation level and gross karyotype level in all the major types of cancer. In cancer, heterogeneity is the rule, and high penetration of any single specific genetic alteration is the exception. We predicted these results when the cancer genome project first began (Heng, 2007a). This prediction was based on an appreciation of the multiple levels of the genetic/epigenetic heterogeneity that exists in most cancers (Heng, 2007a). It is now time for the research community to take action and reevaluate the overall concepts and strategies applied to cancer and other common diseases. Heng et al., 2013b (with permission from Karger publisher)

More discussions can be found in Chapter 8. 11. No genome is an island: the genome only works in the life environment and the fate of genome alterations is dependent on environmental selection. Despite its importance, the genome only represents one of the many key components of life forms. The genome must be within the cellular organization to be functional; it requires energy and depends on cellular environmental response, homeostasis, growth and development, reproduction, adaptation, and evolution, even though it also contributes to nearly all of these processes. For

410

7. THE GENOME THEORY: A NEW FRAMEWORK

example, not only is genomic function essential for these bioprocesses, but the integrity of the genome is also protected by them. Most importantly, the genome’s function is influenced by all of these components when the emergence of phenotype occurs, which makes studying the genome in isolation difficult, especially when trying to predict an exact phenotype under dynamic bioenvironments. Ultimately, whether a given genome can survive or not depends entirely on environmental selection at that moment, which reflects the fact that, under high stress, genome chaos can generate huge numbers of variants, but only a tiny portion of them will be selected. Such selection shakes up the pattern of genome shuffling. The selection of genome reshuffling contributes to the maintenance and evolution of modularity (Heng, 2007b; 2009), which is important for robustness and evolvability (Hintze and Adami, 2008). Newly formed genomes without essential modules (either old or new ones) will be eliminated. That is the reason many gene clusters such as Hox can be preserved among species, while new clusters are continuously forming and exceptions still exist (Clark et al (Drosophila 12 Genomes Consortium), 2007; Putnam et al., 2007; Lemons and McGinnis, 2006). The capability of creating better modules can increase system complexity (see Chapter 4). Now, how do we integrate these different components? That is one of our biggest challenges. Given the high degree of complexity, where should we start first? The gene, the protein, the genome, or a larger environment? What types of approach should we pursue (bottom-up or up-down)? How much data do we need (“more is better” or “less is more”)? So far, there is no common conceptual framework for such integration. Yes, much molecular knowledge has been accumulated on chromosome, gene, and epigene function, as well as on proteomics, metabolomics, lipidomics, and glycomics. But there is no accepted approach on how to synthesize all this information. Researchers all claim their own disciplines are important, and the reductionist approach is now at its most extreme popular stage. The genome theory, which aims to bring about a holistic understanding of the genomics, will certainly promote the needed integration. One possible approach is to monitor the system behavior (stable or unstable) and predict the patterns belonging to average or outliers of a given population. Despite the complicated interactive relationships among different bio-components, the picture will become much simpler through the lens of evolution: regardless how complex the pathways are, or how dynamic the genomic landscape is, evolution only works on the variants and selects the winner from among them.

7.4 OUTLINE OF THE GENOME THEORY

411

12. Collaborative and conflicting relationships between genes/ epigenes and the genome: the evolutionary selector is the judge. The obvious collaborative and yet conflicting relationship between the gene and genome has been illustrated from the function of sex (the genome reduces variation while the gene increases it) and the model of speciation (the genome creates new systems while the gene tinkers with them). In the past, influenced by the selfish gene concept, different layers of vehicles, including the genome, were considered simply as the tools of the powerful genes. Now, with the genome theory, it is clear that the genome is the genomic information package and organizer of genes. But how does the genome overpower these selfish genes? It all results from evolutionary selection. Perhaps in the initial life forms, when the number of genes was much lower, dominant selfish genes functioned as the selection units. Once the number of genes increased, each individual gene’s power was reduced, and the genomic topology’s power increased. As soon as the first eukaryotic genome was formed, the game changed, as the selection unit became the genome rather than the individual gene. Such genomebased selection speeds up evolution above the species level and separates macro- and microevolution in animals and plants. While evolution in nature has reduced the importance of individual genes, gene researchers, in contrast, have artificially increased the importance of the gene in the laboratory setting. By using artificial selection systems, researchers have selected the winner (research priority) based on the status of the gene. For example, the growth advantage has been linked to a so-called cancer gene (thus demonstrating the importance of said cancer gene). In the real world, however, where selection forces are constantly changing, the entire genome package is under selection, including but not limited to the advantage of growth. The successes of a few given genes with growth advantages are just not sufficient enough to lead to clinical cancer (although this case may be demonstrable in the laboratory). Interestingly, it is also evolutionary selection that possibly promotes epigenomic complexity. Because of the genome constraint, the boundary of the genome is limited. The epigenetic function can be further increased, especially in cellular species which already displays the cell type’s complexity. To tinker with these complex systems, epigenomic complexity becomes very important and thus will be selected. The initial means to manage the cell type complexity, however, is defined by the genome.

412

7. THE GENOME THEORY: A NEW FRAMEWORK

That being said, it is still possible that both genes and epigenes could create a new system. There might be a quantitative threshold of instability necessary for systems to depart from previous systems, regardless of where such dynamics arise. Nevertheless, even if such events occur, they must arise in low frequencies. Perhaps a key lesson from examining the relationship of gene/ genome interactions within evolutionary context is understanding multiple relationships across the entire scale of evolution (see Fig. 7.3). Similar to the relationship between the genome and gene, relationships like those between the individual and organ or the society and individual will likely display conflict and be solved by the selection force. Often, the overarching higher level of the system is more influential (for more discussion, see Section 7.5.2). This offers the possibility that new science-/ technology-/society-based artificial selection can overpower some biological principles/functions, which allows science to push human evolution further by breaking certain biological constraints coded by our genome. In a sense, this would be artificial evolution beyond genomics.

FIGURE 7.3 The multilayer sandwich model of selection and self-organization. Selection types and self-organization mechanisms have been classified into different types. Certain selection types and self-organization mechanisms correspond to each given level of complexity. Modified from Heng 2009 with permission from John Wiley & Son.

7.5 THE PREDICTIONS, IMPLICATIONS, LIMITATIONS, AND FALSIFIABILITY

413

7.5 THE PREDICTIONS, IMPLICATIONS, LIMITATIONS, AND FALSIFIABILITY OF THE GENOME THEORY Although the genome theory can better explain many phenomena and paradoxes than the current gene theory, it still raises questions about its own credentials and limitations, like any other new theory. First and foremost, can this new theory effectively advance current genomics and evolutionary research? To help readers to reach to their own conclusions, a few examples will be briefly mentioned in the following sections.

7.5.1 Predictions 1. With the introduction of the genome theory, important evolutionary questions will need to be reexamined, including the main function of sexual reproduction, the definition of a species, the common mechanism of speciation, the relationship between micro- and macroevolution (evolutionary tinkering vs. system creating), the limitations of microevolution, the role of sexual selection in evolution, the relationship between gene and genome trees, and the relationship between artificial and natural selection. 2. The focus of genomic research will divert from the highly diverse gene/epigene level to the genome level. New integrated frameworks/approaches will include the concepts of emergence and evolutionary selection, and more research will be done to further characterize system inheritance, fuzzy inheritance, and their relationships with parts inheritance and epigenetic mechanisms. Furthermore, additional studies will be encouraged to understand the relationship between evolutionary theory and genomic theory. 3. Molecular biologists will reexamine their reductionist tradition of creating linear models to test specific hypotheses under isolated conditions. Currently, drastic reductions of genomic and environmental heterogeneity, including within extensively used animal models, make it easier to “prove” some hypotheses, but much harder to apply these conclusions into clinical settings where heterogeneity is a key feature. Systems biologists need to incorporate the idea that the genome defines the boundary of the system and to establish models with strong evolutionary basis. Novel statistical methods are also needed to deal with the importance of the heterogeneity. 4. In diseases research, increased attention will be paid to genome variations. New biomarkers will be developed based around genome heterogeneity and the integration of genome/gene/epigene profiles.

414

7. THE GENOME THEORY: A NEW FRAMEWORK

5. Many new and exciting conclusions will be reached or confirmed. New species formation, for example, will be recognized as a frequent phenomenon, even though most new species do not have the chance to become lasting species; genome alterationemediated speciation represents the majority of speciation cases; artificial animal species can be generated through genome manipulation and selective breeding; altering the genomic topology can change the relationship among gene interactions and can even change the main function of some individual genes; and many common and complex diseases can be linked to genome variations. 6. The genome-based genomic theory will explain many paradoxes in the field. Currently, the gene-centric theory states that (1) common diseases are caused by common factors (Heng, 2009; 2010; McClellan and King, 2010) and any meaningful genetic or environmental factor should be shared by patients with the same disease; (2) gene mutations, which should be readily identifiable, are the key players that cause human disease; and (3) heterogeneity is “noise” that can be eliminated by using large numbers of samples from diverse populationsdincreasing the sample size is the key to identifying common genetic patterns (Heng, 2007, 2008a, 2013c; Davidoff 2009). Based on these general beliefs, the following paradoxes exist, which can be better explained by the genome theory: Paradox 1: Major common genetic factors are rarely found in common and complex diseases with obvious heritable patterns. Obesity displays heritable patterns but genetic association studies can only account for a small percentage of inherited variation in body mass index (Bochukova et al., 2010). Similar difficulties have been encountered in autism, diabetes, heart disease, and especially in cancer (Estivill and Armengol, 2007; Heng, 2007a, 2010, 2015, Chapters 3 and 8). Even when found, these hardto-identify genetic loci are of low clinical value. A genetic risk score using over 100 single nucleotide polymorphisms identified as “significantly contributing to cardiovascular diseases” ended up clinically insignificant (Paynter et al., 2010). The term “dark matter” has even been used to describe the puzzling, missing heritability (Manolio et al., 2009). As its influence can be observed, this heritability surely exists, even if it has not yet been identified. But what is it? And where is it? Paradox 2: Increasing the sample size diminishes common patterns. In an attempt to explain Paradox 1, it was suggested that many diseases are caused by multiple genes where each gene has a minimal contribution. To pinpoint these arrays of contributing mutations, a large sample size would be needed to “wash out” the “noise,” which would allow common mutations to “stand out.” The larger the sample size, the more diverse the population and the better the power to identify true genetic patterns.

7.5 THE PREDICTIONS, IMPLICATIONS, LIMITATIONS, AND FALSIFIABILITY

415

Unexpectedly, however, large-scale genome association studies have been more or less disappointing. Surprisingly, when the sample size is smaller and less diverse, some interesting regions can be identified with significant scores. That significance is often eliminated by larger and more diverse sample validation. Studies have demonstrated that increased sample size is associated with an increase in detectable variants (Zeggini et al., 2008, Kathiresan et al., 2009). Paradox 3: Uncommon causes contribute to common diseases. Various cutting-edge genomic technologies have rendered many previously simple genetic cases increasingly complex. For example, seemingly unlimited genetic and epigenetic factors have been linked to diseases of interest identified by a variety of levels of analysis from nucleotide polymorphisms to noncoding RNA and NCCAs (Chapter 4). The deeper one digs, the more different causative mechanisms will be found. This situation has been illustrated in the common evolutionary mechanism of cancer (Chapter 3). Paradox 4: While a defined factor can often be linked to diverse diseases, it is hard to establish the definitive link in a majority of cases. Researchers often focus on their own areas of expertise. Mitochondrion dysfunction can be linked to many common diseases. Obesity is associated with heart disease, type 2 diabetes, osteoarthritis, and cancer. The same copy number variant may be associated with different mental illnesses (Insel and Wang, 2010). In each case, a disease is convincingly linked to a specific mechanism, except when the cases are compared en masse. Patients usually do not share the same molecular basis indicating that all these diverse molecular causes contribute to the same disease, and each mechanism only applies to a limited number of cases. Furthermore, each specific link is statistically meaningful but not clinically useful. This paradox is closely related to Paradox 3. Paradox 5: Common diseases can have uncommon phenotypes. Evidence continually emerges linking certain diseases with even more diverse phenotypes to the point where the definition of the disease is challenged. One example is Gulf War illness (GWI) (Wessely and Freedman, 2006; Heng, 2013a,b; Liu et al., 2018). For years, the medical community denied the existence of GWI because both causative factors and disease symptoms were highly diverse. The only clear link among patients was experience in the Gulf War. According to conventional medical wisdom, a specific disease should display similar symptoms. However, one could not discount the high proportion of Gulf War veterans who were plagued by an array of symptoms.

7.5.2 Implications The genome theory has many implications. This section will focus on how evolution and self-organization works among different levels of systems and their selections.

416

7. THE GENOME THEORY: A NEW FRAMEWORK

The relationship between self-organization and selection in the genome theory can be extended to other levels in nature. This is illustrated in the multilayer sandwich model, proposed in 2009 (Fig. 7.3). Self-organization refers to a process in which a higher-level pattern emerges spontaneously from the assembly of lower level components of the system (Camazine, 2003). Diverse biological phenomena have been described as self-organizing (from the spontaneous folding of biomacromolecules to morphogenesis to the formation of ecosystems). Establishing the logical relationship between natural selection and self-organization presents a challenge for evolutionary theory (Kauffman and Macready, 1995; Sole et al., 1999; Hoelzer et al., 2006). The genome-centric concept nicely incorporates these two concepts. Since self-organization is a diverse term spanning the fields of physics to social science, it is necessary to divide content of nature into distinct levels in order to understand multiple levels of interactive relationships between self-organization and selection. Heng, 2009 (with permission from John Wiley & Son).

Four representative levels are illustrated in this multilayer sandwich model. The key takeaway is this: each level of complexity is governed by its own laws of self-organization and selection. For example, “chemical reactions occur at the molecular level (not at a more basic level such as the atomic level). Below the cellular level, there are no independent life forms. In social insects, natural selection focuses on the level above the individual, and in its most extreme form, individuals are dispensable . the ‘‘laws’’ are different at each level of nature and can only be fully applied at that level. Using laws that apply to one level to evaluate issues related to other levels of nature and society creates confusion.” (Heng, 2009). Additionally, “[s]elf-organization and natural selection are two very powerful forces that function at each level of the system and are responsible for pattern formation and evolutionary dynamics (Kauffman and Macready 1995; Hoelzer et al., 2006).” (Heng, 2009). Detailed explanations of the multilayer sandwich model (Fig. 7.3) are summarized in Table 7.3. One of the most important applications of genome theory is its ability to integrate evolutionary theory. Since the fundamental assumptions and major predictions for current evolutionary theory is problematic in terms of explaining the mechanism of evolution and reconciling genomic and biological facts, we cannot simply use the same framework to solve many major paradoxes. On the surface, it is reasonable to reconcile natural selection as part of the new two-phase model we proposed. Knowing that natural selection is not only limited in the macroevolution phase but also in the microevolution phase, more debates are needed to re-examine evolutionary mechanisms in light of system inheritance and fuzzy inheritance.

7.5 THE PREDICTIONS, IMPLICATIONS, LIMITATIONS, AND FALSIFIABILITY

417

TABLE 7.3 Key Features of the Multilayer Sandwich Model of Selection and SelfOrganization 1. For a given level, self-organization and selection represent two different categories, although both are essential to evolution. Self-organization is not an alternative to evolutionary selectiondit is the mechanism, rather, that generates various packages of a system on which evolutionary selection can act. In a sense, self-organization is responsible for creating or assembling packages; evolutionary selection chooses from these packages. 2. Types of selection are distinctively different among levels. Darwinian natural selection differs significantly from other types of selection (such as genome-mediated macroevolution) because of inheritance and heterogeneity. It is useful to separate simpler (prokaryotes) and more complicated (multicellular) forms of life because the pattern and platforms of natural selection differ. 3. Self-organization mechanisms vary among different levels (from the formation of snowflakes to the more complicated development of the human body). The mechanism of snowflake formation is not the same as that of human development. Chemical laws at work at the molecular level apply only in a limited way to the cellular level because of the unique characteristics of biological heterogeneity and inheritance. Social phenomena can only be partially explained by biological concepts because the rules at these two levels are very different. 4. Different levels may display dissimilar patterns of evolutionary dynamics. Under biological selection, the evolutionary pattern of a stable genome (sexual eukaryotes) differs drastically from that of an unstable genome (prokaryotes and nonsexual eukaryotes). 5. There is continuous interaction between self-organization and natural selection at multiple levels. The role of self-organization is greater than that of natural selection in nonliving systems. In a broader sense, nonbiological selection might involve the selection of a certain level of complexity. In living systems, bio-selection selects inherited packages in heterogeneous populations, and self-organization then fulfills the potential of each selected system. 6. Across all levels, increased heterogeneity may promote increased complexity and increase opportunities for historical ‘‘accidents.’’ From nonliving forms to human society, increased heterogeneity can be observed. The model also seems to suggest a trend of increasing complexity similar to the “minimal wall of complexity” (Gould, 1999). There are multiple “walls” for each level or layer; each wall functions as a one-way filter which allows the system to evolve to the next level of selection but prevents reduction of complexity by regression to a lower level. The sexual filter and genome-mediated modularity represent such filters at the macroevolutionary level of complexity. Furthermore, historical “accidents,” similar to outliers, can be selected. 7. There are many levels in nature that follow this model, including those below the molecular level and above the social level, as well as different scales of ecosystems. Although most selection occurs at the same level among systems, there is also cross-level selection. Models must be built to accommodate them. Modified from Heng 2009 with permission from John Wiley & Son.

418

7. THE GENOME THEORY: A NEW FRAMEWORK

7.5.3 Limitations So far, the biggest limitation, or challenge, to be more precise, of the genome theory is its acceptability. Most genomic researchers do not think of the genome in the context of the genome topologyedefined function. Although Hi-C technology has turned its attention to 3D chromatin spatial organization, it still focuses on specific gene domains and its regulators. Most researchers, who were trained under a gene-centric school, work hard without considering a new framework (essentially, Hi-C simply adds some chromatin flavors into traditional gene-based research). Cytogenetics and cytogenomics, on the other hand, have been struggling for decades. Funding issues and the emerging dominance of molecular genetics are causing this important field to disappear. The field’s dominant researchers are retiring, and the subject is unfortunately not “fashionable” enough to attract new generations of talented brain power. Even the clinical cytogenetic community is constantly under threat of being replaced by sequencing or other molecular technologies (Liehr, 2017). Additionally, the hope to sequence every gene is now gaining momentum. In short, it is not the best time to introduce the genome theory, which means swimming against the tide. The only advantage of introducing the genome theory now is its timely relevancy. The field, armed with drastic numbers of data collection, is becoming increasingly confused, and the current gene theory has lost its capability to solve many key paradoxes (Chapters 1 and 2). Genomic science has not lived up to its promises. Thus, the new genome theory must be developed to challenge and to replace the gene theory. Most currently limitations of the genome theory itself are related to its infancy. Nevertheless, the following issues require further efforts to illustrate, validate, and strengthen the theory: First, although many key statements of the genome theory (which are formed from synthesizing the available data, concepts, and paradoxes) make sense, direct experiments are needed to systematically examine them. Second, some gaps need to be filled with molecular details. For example, how does emergence work when different diverse agents interact with each other? Is there any practical index that can be used to predict emergent properties in biological systems? What is the relationship between chromosome coding and protein synthesis machineries? Does the genomic topology require specificity of protein-specific machineries? Is the pattern of protein distribution within the cell also topology-specific?

7.5 THE PREDICTIONS, IMPLICATIONS, LIMITATIONS, AND FALSIFIABILITY

419

Third, mathematical models have played an important role in neoDarwinian synthesis. The current genome-mediated macroevolution model, in contrast, lacks mathematical support. Integrating mathematics could definitely help strengthen its concepts. Of course, unavoidable fuzziness makes a mathematically elegant synthesis challenging. As discussed in previous chapters, many of these mathematical models need to be reexamined in the light of fuzzy inheritance. Fourth, the genome theory needs to constantly redefine many previous concepts and terminologies, especially those which seem familiar to many researchers but are presented with new understanding. The term “genome” has been with us for over a century, carrying historical connotations (collection of genes) that ignore the true significance of the genome (system coder and gene organizer). Many researchers are not yet ready for change. In this case, one would wish to introduce new terminology to describing the new genome concept. For example, the concept of genome chaos failed to generate deserved attention when it was introduced in 2006 (Heng et al., 2006aec), mainly because most people thought they knew what the genome is. In contrast, many years later, the same phenomenon of massive and rapid genome reorganization was confirmed by genome sequencing, and new names were introduced, such as chromothripsis and chromoplexy. All of a sudden, people became very interested in this “new” phenomenon, and a plethora of different names have been introduced for the same “new” phenomenon (Chapters 3 and 4). This suggests that much effort is needed to explain and clarify some key terms in the genome theory, especially when they incorporate previously “well-known” concepts but redefine their meanings based on the new genome perspective.

7.5.4 Falsifications We see increased evidence supporting the genome theory by validating its predictions and expect future data to accumulate and further confirm it. One important issue remaining is its falsification. There are few key predictions from the genome theory which, if proved to be wrong (judged by potential observational or experimental evidence), will falsify the genome theory. The key considerations are the following: If genomic topology or chromosomal coding is not important for biological function and the genome is just a vehicle of all genes and other DNA sequences after all; If the majority of speciation (in animals and plants) is initiated and achieved by small accumulations of microevolution over time; If the fuzzy inheritance is not true for the vast majority of cases or if elevated nonclonal genome variants just represent the noise without the biological significance, then the genome theory will be falsified.

420

7. THE GENOME THEORY: A NEW FRAMEWORK

There are two important points which need further clarification: First, many statements or predictions of the genome theory concern the majority of cases. This is ultimately important to consider when falsifying or confirming any biological theories. Unlike physics theories, which can be destroyed with one exceptional case, many good bio-theories have exceptions. Biology is a science full of exceptions. Most of its conclusions are not black and white. Contributing factors include (1) the nature of biology as a study of living things, as “[o]n the average, however, organic systems are more complex by several orders of magnitude than those of inanimate objects” (Mayr, 1988), and more is different; (2) the multiple levels of genomic/epigenomic heterogeneity encoded by fuzzy inheritance; (3) the involvement of the historical accidents and evolutionary-selected initial conditions; and (4) emergence based on large numbers of diverse agents, including the dynamic environments themselves. As a result, the emergent properties of biosystems could be heterogeneous as well. Furthermore, repeating the exact same process can even produce opposite results, and both could be correct! Thus, falsification in biology needs to focus on the majority of cases within the context of new statistical analysis. This will highlight the importance of outliers in crisis situations. In fact, even in physics, “scientists now recognize that most physical laws are not universal but are rather statistical in nature, and that prediction therefore can only be probabilistic in most cases” (Mayr, 1988). Nevertheless, short-term falsification needs to focus on the new cancer model (Fig. 3.11) and the organismal speciation model (Fig. 6.2). If these models provide better explanations and predictions compared to gene theory of evolution, they should serve as new frameworks for cancer research. Current strategies are solely based on understanding additional individual molecular mechanisms that are too numerous to characterize and too trivial to be applied clinically. Second, the approach to validating a scientific theory is still a topic of debate (Ellis and Silk, 2014). There are two typical yet different approaches: One favors the “positive” validation of the theory under examination, which focuses on the gradual accumulation of supporting evidence; the more confirmations, the stronger the theory. Another favors “negative” falsification, which assumes that correctness increases only as a result of the failure to disprove the theory in question. To validate the genome theory, both approaches will be used. As mentioned in the first point, by combining these two approaches, quantitative data can be obtained to systematically validate the genome theory. It is possible that the genome theory will cover over 90%e95% of cases, and the remaining cases will need a combination of theories, for example.

7.6 CHALLENGES AHEAD

421

Furthermore, the two different approaches might fit different research stages. The gradual accumulation of information better suits for a theory which has already gone through the falsification process, for which the key framework is already established (within the stable stage of normal science). In contrast, active falsification better suits the purpose of validation before the establishment of a new paradigm, and then again, when the old paradigm is no longer relevant (within the crisis stage). Clearly, the stages of theory formation prioritize different validation methods.

7.6 CHALLENGES AHEAD So far, many limitations of the gene theory, particularly concerning its 1D genetics perspective, have been discussed. The genome theory has been proposed to advance the field. It should be pointed out that genomerelated theories, including Richard Grantham’s “genome hypothesis” and Bernardi’s neoselectionist theory of genome evolution (Grantham et al., 1980; Bernardi, 2007), have been suggested in the past. However, most of these previous theories did not address the key issue of overall genomic topology as it was not known that system coding mainly comes from the topological relationships between genes and other sequences within the 3D nuclei. The importance of the genome in system constraint, as well as the ultimate importance of genome alterations in somatic cell evolution, was also not known. Therefore, these previous theories focused on subgenomic or DNA features like isochore patterns or types of DNA coding strategy rather than the pattern of genome reorganization. Many of these previously proposed ideas are still within the framework of traditional molecular genetics that searches for answers by focusing on the parts. In addition, there has been no systematic effort to consider the genome theory by synthesizing evolutionary concepts. Any new theory always faces many challenges, especially during the initial stages when there are not only insufficient data to answer many questions but also audiences who are unfamiliar with the new concept and often approach it with a hostile attitude. First, the greatest challenge is to recognize that the current gene theory is limited in its realistic applications to future genomics and that just accumulating greater quantities of data will not solve the problem. It is not about giving up or being negative or the debate between half-full versus half-empty. It is about searching for the correct direction if the current one is no longer working. The field has to accept the fact that decoding the genome is fundamentally different from sequencing DNA and the genomic landscape will not be revealed simply by sequencing more DNA samples. Essentially, the accumulation of data at the parts

422

7. THE GENOME THEORY: A NEW FRAMEWORK

level will not guarantee that a new and correct theory will emerge, particularly when there is such a large knowledge gap. Second, we need a new framework of thinking. We cannot just replace a molecule with a new favorite within the same 1D-gene framework. We are all too familiar with the waves of fashionable research under the umbrella of the same gene theory. According to current thinking, if it is not at the gene level, it must be epigenetics; if it is not within coding regions, it must be noncoding RNA; if a few genes are not enough, let us study hundreds of them; if monitoring a few genomic loci is not enough, let us sequence them all... Unfortunately, these obvious progressions on the same gene-based idea are not correct: what matters is choosing the correct level. For example, efforts have been made to illustrate the importance of noncoding RNA studies by linking it to known important genes. Without a new conceptual framework, it will soon run out of gas. If genes cannot offer an answer, then simply establishing associations with genes will not work either. The 4D genomics approach requires a holistic concept based on the genome theory and not a modification of the gene theory. Third, genomic research must integrate the evolutionary perspective. Evolution is a key component for the genome theory. There are a number of issues to consider. a. The time factor. To fully understand the effect of genomics in disease conditions, time cannot be ignored. Even in twins with nearly identical genetic profiles, there are often drastically different timelines for the appearance of diseases. In most genomics experiments, however, the time factor is nonexistent. New research needs to incorporate time issues during genetic analyses, and special attention is also needed to decide the proper time windows of observation and intervention. The time factor is also important to reexamine the relationship between micro- and macroevolution in natural organisms. b. To watch evolution in action in genomics studies, both individual cells and population behavior must be simultaneously monitored. In particular, the behavior of averages and outliers need to be closely compared. Recent studies suggest that the average profile is more relevant in normal physiological conditions while outliers are more significant under extreme conditions, which are the conditions most conducive to disease (Abdallah et al., 2013; Heng, 2015). Similarly, these interesting phenomena observed from cellular evolution should be carefully considered when studying organismal evolution. c. To study genomic dynamics, one must be aware of the differences in genome alterations between different tissues during different times. Realizing that humans are somatic chimeric beings is extremely

7.6 CHALLENGES AHEAD

423

important. One needs to pay attention to somatic cell evolution as the common basis for many common and complex diseases. Common system features such as stress-induced system instability and the evolutionary adaptive principle should be the same for most diseases, as most common diseases represent an evolutionary process within our bodies, in which the end result is the loss of normal tissue/organ/system function. The unique feature of cancer tumor cell overgrowth can easily be explained as the result of a loss of the tissue constraint function, similar to altered genomes in diabetes where cells gradually lose the function to produce enough, or properly respond to, insulin. Based on this concept, more research is needed to study the overall effect of genome instability in many other diseases. (Horne et al., 2014; Heng et al., 2016a,b; Liu et al., 2018). d. One must appreciate the fundamental differences between normal physiological (developmental) conditions and pathological conditions. Such a realization will challenge the conventional strategies of extrapolating information from normal physiological conditions to pathological situations. This latter case is often unpredictable because of the stochastic processes involved. Therefore, its overall predictability is not translatable from normal physiological conditions (Heng, 2015). Fourth, the new technical platforms must monitor system behavior, predicting the patterns of evolution and linking the system dynamics to the outcome of the end products rather than individually illustrating any of a large number of diverse molecular mechanisms under linear experimental conditions. If each factor actually has a low penetration frequency within patient populations and is difficult to predict clinically, then efforts should not be placed on these potentially academically interesting but clinically ineffective mechanisms. The task of establishing a platform to study holistic systems is thus not easy to achieve. Modern science is traditionally largely based on reductionist approachesdit has been very good at dissecting systems into parts to categorize them and to look for patterns among them. However, studying an entire system without breaking it down has been more difficult. The process of breaking systems down often alters the nature of the system as emergent properties are features of the intact system just as much as its base components. The system is truly more than the sum of its parts. Unfortunately, many cutting-edge “-omics” or technical platforms have generated huge amounts of information without considering the genome topological information. Current efforts of using various Hi-C analyses to map and sequence interaction sites represent a very promising approach, but its analysis approach still to focuses on parts interactions, without considering the holistic system as a whole.

424

7. THE GENOME THEORY: A NEW FRAMEWORK

Fifth, more effort is needed to directly illustrate how the genome works and how genome alteration occurs, as well as the relationship between the gross genome level (cytogenetic level) and other genetic and epigenetic levels (molecular levels) (Heng, 2008, 2013c; Heng and Regan, 2017). The following issues need immediate attention: a. Both experimental approaches and modeling are needed to test the smashing genome model hypothesis that illustrates the relationship between genome reorganization and network structure (Heng 2009). The validation of this model will change people’s views regarding the importance of genomic topology. Increasing research has linked overall expression to specific karyotypes and more systematic analysis of this kind is essential. b. Understanding genome chaos in disease conditions is important. Genome chaos represents an extreme situation of genome replacement caused by genome instability. More research needs to study this issue, focusing in particular on how lower levels of genetic/epigenetic alteration or subsystems (like mitochondria function) and higher levels of constraint (like tissue homeostasis) contribute to genome instability. Such information is crucial when applying genomics to clinical situations. After all, the new goal in the fight against cancer may not be to just hit hard but rather to bear in mind the overall benefits to patients. The co-existence of certain disease conditions might be more beneficial to patients than some drastic treatments, which may harm the system over a longer term. In addition, this new way of thinking may also promote greater emphasis on prevention because, in reality, there is no magic bullet for patients to fall back on. The best strategy is to improve lifestyle choices to reduce conditions optimal to disease progression. c. Large-scale methodologies are needed to measure stochastic genetic and epigenetic alterations. These important yet long ignored alterations are not “noise” but rather a feature of system dynamics. The idea of using nonclonal chromosome aberrations to study cancer evolutionary potential needs to be further tested. Recently, seemingly random genetic profiles have been linked to disease conditions, which support the importance of using heterogeneity as an index to study disease progression and outcomes. d. Another important issue is identifying the primary level of system organization rather than equally focusing on different levels. This issue is becoming increasingly urgent as a great deal of current research resources are invested into sequencing the cancer genome and profiling various molecular signatures. If these lower level signatures prove to be too diverse to be clinically useful, a rapid transition toward focusing on the cancer cell landscape located at the

7.6 CHALLENGES AHEAD

425

karyotype level must occur. To be considered, lower level dynamics need to be influential enough to impact the macroevolution phase of cancer progression. In addition, other types of molecular events (like nonspecific stress) can often interfere with the tracing of specific gene functions, as they all can lead to genome chaos that promotes genome reorganization and macroevolution. Similarly, the evolutionary studies of natural organisms must use genomic approaches. When dealing with evolutionary changes above the species level, the focus should be on a genome tree rather than gene tree as genome-level reorganization differs from the issue of gene mutation rates that have a direct impact on the population structure within a species. Last but not least, concepts and methodologies are needed to translate the overall degree of genome heterogeneity into a tissue or organ status, as after all, many diseases are diseases of tissues or organs or the system. e. New cytogenetic methodologies need to be developed to achieve the goal of establishing systems-integrated cytogenetics (Heng and Regan, 2017; Heng et al., 2018). These methods include the following: i. Technologies to integrate multiple levels of genomic/ epigenomic alterations when measuring system instability. Specifically, current -omics studies must include karyotype information in the context of disease progression or evolutionary processes. ii. Methodologies to measure the multiple types of heterogeneity including many new types of chromosomal/nuclear aberrations (Heng et al., 2013b, Heng et al., 2016a,b; Ye et al., 2018b), many of which represent the new mechanism of generating fuzzy inheritance at the genome level (Heng, 2015). There is a common link between karyotype complexity, disease condition, cellular adaptation, and speciation (Heng, 2010, Telerman and Amson, 2009; Van Echten-Arends et al., 2011; Nguyen, et al., 2013; Pellestor, et al., 2014; Zhang, et al., 2014; Bloomfield and Duesberg, 2015; Andriani, et al., 2016; Niederwieser, et al., 2016). Valuable cytogenetic/cytogenomic research needs to be combined with, not replaced by, new sequencing technology (Heng et al., 2016; Bakker, et al., 2016). Equally importantly, comparing karyotypes represents an effective way to illustrate organismal evolution, as the core karyotype guarantees the identity of a species (Ye et al., 2007; Heng, 2007b; Gorelick and Heng, 2011; Horne et al., 2013). In addition, Hi-C technology, which comprehensively detects chromatin interactions in the nucleus (Belton et al., 2012), needs to be integrated with karyotype information.

426

7. THE GENOME THEORY: A NEW FRAMEWORK

iii. Quantitative methods to monitor combined heterogeneity and complexity (Heng, 2017b; Heng et al., 2019). iv. Methods to measure fuzzy inheritance (from gene/epigene to genome). There is increased interest in studying how stressinduced heterogeneity led to diseases through evolution (Horne et al., 2014, 2015; Stepanenko and Dmitrenko, 2015; Valind and Gisselsson, 2014). As gene/epigene and genome alteration mainly contribute to the different phases of cellular evolution, measuring fuzzy inheritance at different levels should play an important role in understanding the evolution of disease (Heng, 2014; 2015), as well as organismal evolution (McClintock, 1984). Paradoxically, although the holistic approach is a new challenge, it might turn studying complexity into monitoring simplicity (Heng, 2013c). For example, if a system is treated as a black box where consideration is only given to the input and output rather than the details of the pathways within the box, then extraneous effort trying to understand countless details is avoided. On the surface, the lack of molecular details seems to represent a big disadvantage. However, if these detailed mechanisms are not visible to evolutionary selection anyway (in other words, if the selection focuses on the level above these highly dynamic and frequently replaced mechanisms), it would be much better not to monitor them. This idea in cancer research is now being actively tested by focusing on overall evolutionary potential rather than cataloging huge numbers of genetic mutations. Similarly, this idea should also apply to other common and complex diseases (Ye et al., 2007; Heng, 2013a,c; Heng, 2010). Of course, with the success of 4D genomics, more challenges will appear. But this new science will make great progress. The genome systemebased evolutionary theory and holistic approaches represent the future of genomics and likely the future of biology. Some who are now convinced that the gene theory is fundamentally limited have voiced concerns that until a new theory replaces it, “we will have to stick with what we have.” The answer to this is rather simple: it is ultimately crucial to accept the truth that the earth is not the center of the universe, even if we are still not sure where the center of the universe is. Remember, this is not just a matter of pure theory: its practical implications are enormous.

C H A P T E R

8

The Rationale and Challenges of Molecular Medicine 8.1 SUMMARY An important application when establishing appropriate genomic and evolutionary theories is to bridge the translational gap between basic research and molecular medicine. Following the success of the Human Genome Project, various large-scale -omics projects have promised to revolutionize medicine. In particular, “precision medicine” has become a buzzword. In this chapter, a brief history of molecular medicine, as well as its challenges and prospects, is reviewed. Following the case study of p53 research, the relationship between stress-induced variations and cellular adaptation and its trade-offs are summarized in the context of disease formation. Moreover, the Future Direction section discusses crucial upcoming issues in molecular medicine: increased bio-uncertainty, relationships between big data and theories, biomarker development, and educational improvements in biomedical science, including scientific policy along with essential knowledge structures, scientific culture, and professionalism.

8.2 A BRIEF HISTORY: THE PROMISES OF MOLECULAR MEDICINE According to the National Cancer Institute (NCI), molecular medicine is defined as “A branch of medicine that develops ways to diagnose and treat disease by understanding the way genes, proteins, and other cellular molecules work. Molecular medicine is based on research that shows how certain genes, molecules, and cellular functions may become abnormal in

Genome Chaos https://doi.org/10.1016/B978-0-12-813635-5.00008-2

427

Copyright © 2019 Elsevier Inc. All rights reserved.

428

8. THE RATIONALE AND CHALLENGES OF MOLECULAR MEDICINE

diseases such as cancer” (NCI). Currently, studies on the genetic basis of disease represent the mainstream in the field, even though research subjects also include enzymes, antibiotics, hormones, carbohydrates, lipids, metals, and vitamins, as well as synthetic, organic, and inorganic polymers. It is thus no surprise that the textbook Molecular Medicine states that “Molecular medicine is the application of gene or DNA based knowledge to the modern practice of medicine” (Trent, 2005). The year 1949 marked the birth of molecular medicine, when Linus Pauling and Harvey Itano, teamed up with their colleagues, published the landmark paper, “Sickle cell anemia, a molecular disease,” which described the results that there is a significant difference between the electrophoretic mobilities of hemoglobin obtained from erythrocytes of normal individuals and from those of sickle cell anemic individuals (Pauling et al., 1949). This publication successfully linked a specific disease (sickle cell anemia) to a particular molecular variant (a different form of the Metalloprotein hemoglobin in patients’ blood) and established sickle cell anemia as a genetic disease, linking a gene to the specific structure of protein molecules. Pauling was one of the most influential scientists of our time, and he is also one of the founders of molecular biology. As such, it seems logical that he introduced the molecular medicine concept by illustrating the relationship between gene, protein, and disease phenotype. The journey of searching for the molecular mechanism (e.g., genetic basis) of various human diseases, however, started far earlier. In addition to the establishment of the theory-based science of genetics in 1865 by Mendel, Sir Archibald Edward Garrod linked alkaptonuria to inborn errors of metabolism (Garrod, 1902). He correctly assumed that an enzymatic defect in a metabolic pathway was responsible for this phenotype, which led to a classical example of using genetics to explain errors of metabolism. As Molecular Medicine emerged from the application of molecular genetics/genomics to medicine, it is strongly influenced by concepts within the methodologies of genomics and other -omics. Its overall success and limitations are reflected by those of genomics, as was extensively discussed in Chapters 1 and 2. Some current and future technologies should be mentioned, as they are either commonly used or will likely play an important role in molecular medicine. These platforms include the industrial capability to produce therapeutic proteins; in vitro fertilization methods for reproduction; gene therapy; stem cell therapy; target-specific therapy; prenatal diagnosis; gene mutation screening; immune therapy; organ culture; and CRISPR/Cas9 and targeted genome editing. Like genomics, perhaps the biggest source of excitement for molecular medicine is the continuous promises that came first from the Human

8.2 THE PROMISES OF MOLECULAR MEDICINE

429

Genome Project and then from many other large-scale -omics projects. Even before the completion of the Human Genome Project, the field of molecular medicine was convinced that a new era was coming, which would impact every aspect of the field. The following two publications are examples. The near-completion of the Human Genome Project, which identifies the 3.2 billion base pairs that comprise the human genome (the so-called ‘Book of Life’), has exponentially heightened the focus on the importance of molecular studies and how such studies will impact on various aspects of medicine in the 21st century. Semsarian and Seidman, 2001 The landmark event since the second edition of Molecular MedicinedAn Introductory Text has been the completion of the Human Genome Project, which is already living up to the promise that it will provide the framework for important new medical discoveries in the twenty-first century. Trent, 2005

The above claims actually represent some of the more modest ones, compared with many more exciting predictions mentioned in previous chapters, most of which have not been fulfilled. Nevertheless, a wave of renewed promises continues to come. One of the most popular ones is the Precision Medicine Initiative launched by then-President Obama in 2015. According to NIH leadership, their understanding of a new initiative on precision medicine was the following: The concept of precision medicine d prevention and treatment strategies that take individual variability into account d is not new; . But the prospect of applying this concept broadly has been dramatically improved by the recent development of largescale biologic databases, powerful methods for characterizing patients, and computational tools for analyzing large sets of data. What is needed now is a broad research program to encourage creative approaches to precision medicine, test them rigorously, and ultimately use them to build the evidence base needed to guide clinical practice. . Collins and Varmus, 2015

Precision medicine soon became a buzzword to replace “personalized medicine” which was popular a few years before and largely overlaps with precision medicine. Also, there are many definitions of precision medicine, but the term broadly refers to the use of molecular diagnostic tools and targeted treatments for individual patients based on genomic (and other -omics), biomarker, or psychosocial characteristics (Ramaswami et al., 2018). Despite its popularity, there have been increasing calls to challenge precision medicine based on the complex reality of medicine and the limitations of current molecular profiling methods (Khoury and Galea, 2016; Joyner et al., 2016). Recently, after examining issues of diagnostic methods, novel therapies, and public health integration, both from

430

8. THE RATIONALE AND CHALLENGES OF MOLECULAR MEDICINE

individual patient and general population points of view, some important statements have been made regarding the health policy: Over the past decade, precision medicine (PM) approaches have received significant investment to create new therapies, learn more about disease processes, and potentially prevent diseases before they arise. However, in many ways, PM investments may come at the expense of existing public health measures that could have a greater impact on population health . We cannot ignore the potential that PM holds for medical progress. However, PM has been disproportionately focused on drug development and strategies for those who have a disease with an intention to improve outcomes, leaving behind the concerns of whole population health. Ramaswami et al., 2018

Putting the policy issue aside for a moment (while acknowledging that it is a crucial issue), it is important to examine if the “disproportionately focused strategies” themselves will work through the lens of genomemediated disease evolution. Knowing the high pressure that the genomic research community has faced to fulfill its promises, it is not surprising to see new initiatives to push translational genomic research and utilize the massive sequencing information for patient care. It is surprising, however, to see the “business as usual” attitude. Besides getting more molecular data and increasing computational power, there has been no call for reexamining the genecentric genomic framework or how somatic evolution plays a key role in disease initiation and progression. The way of thinking, rather than simply accumulating data, might be the biggest challenge for the precision medicine initiative.

8.3 THE CHALLENGES AND OPPORTUNITIES FOR PRECISION MEDICINE The success of precision medicine is largely dependent on the precise prediction of phenotype from the genomic profile or other -omics profiles. It is already known that most common diseases cannot be explained by a few key gene mutations. The confidence of achieving precision in medicine is based on a new assumption that by sequencing more samples and by doing so with increased computational power, disease-specific genomic patterns will be identified, which are valuable for future medical diagnosis and treatment. The same assumption is also used to rationalize Cancer Genome Atlas Project, as well as the study of other types of common and complex diseases (see Chapters 1e2). It was pointed out that current precision medicine evolved directly from the promises of the Human Genome Project.

8.3 THE CHALLENGES AND OPPORTUNITIES FOR PRECISION

431

Collins (1999) envisioned a genetic revolution in medicine facilitated by the Human Genome Project and described 6 major themes:(1) common diseases will be explained largely by a few DNA variants with strong associations to disease, (2) this knowledge will lead to improved diagnosis; (3) such knowledge will also drive preventive medicine; (4) pharmacogenomics will improve therapeutic decision making; (5) gene therapy will treat multiple diseases; and(6) a substantial increase in novel targets for drug development and therapy will ensue. These 6 ideas have more recently been branded as personalized or precision medicine. Joyner et al., 2016.

The question “What are the key challenges of precision medicine?” indeed belongs to a more general question: “What are the limitations of current genomics, and specifically, the various genome sequencing projects?” As we have extensively discussed in previous chapters, the success of precision medicine is predicted to be low if the current gene-centric framework remains the key principle. From these discussions, there are two key messages regarding the realities we must face. First, if the genomic landscape itself for an individual patient is highly dynamic and cannot be precisely profiled (because of fuzzy inheritance, somatic evolution, and environment-influenced emergence), how can precision medicine work? Secondly, we must search for a new path and attempt to answer the question, how can the genome theory better position precision medicine? In this section, rather than repeating many previously made arguments, a number of case studies will be used to make our point.

8.3.1 The 40-Year Journey of Studying p53, From Certainty to Increased Uncertainty In discussing efforts to identify and characterize individual molecules, genes, or proteins, based on its usage in molecular medicine, one cannot forget the famous p53, which is undoubtedly one of the most extensively studied genes and proteins in the history of molecular biology! When it was initially described by different groups in 1979, few could have foreseen how important it would be for basic cancer research, how expensive it would be to understand its functions, how confusing it would be to deal with the multiple facets molecule, and how challenging it would be to use it for cancer treatment. An excellent review article published a decade ago by one of the original researchers very nicely summarized the history of p53 research, outlining the most important functions of p53 among many. Thirty years ago, p53 was discovered as a cellular partner of SV40 Large Tumor Antigen, the oncoprotein of this tumor virus. The first decade of p53 research saw the cloning of p53 DNA and the realization that p53 is not an oncogene but a tumor suppressor .... In the second decade, the function of p53, a transcription factor induced

432

8. THE RATIONALE AND CHALLENGES OF MOLECULAR MEDICINE

by stress, resulting in cell cycle arrest, apoptosis and senescence, was uncovered. In its third decade new functions were revealed, including regulation of metabolic pathways and cytokines required for embryo implantation. The fourth decade may see new p53-based drugs to treat cancer. What is next is anybody’s guess. Levine and Oren, 2009

One might wonder about how the 40 years’ review will look. Knowing the fourth decade has failed to develop p53-based drugs to treat cancer, and the interactions among p53 and other molecules/pathways have become mind-bogglingly complicated, one thing that is certain is that the p53 story can only become even more uncertain. However, if one examines the status of p53 research through the evolutionary lens, the increased confusion can be reduced. The key is to not solely focus on p53 itself and its immediate up- and downstream partners, but on the real world of stochastic interaction within the context of emergence and selection. The following research experiments on p53 illustrate the importance of using the correct framework to understand p53’s contribution during cancer evolution, a phenomenon which plays a crucial role in establishing the genome theory of somatic cell evolution. 1. p53 is ultimately linked to evolutionary potential through genome instability. To compare tumor cells with normal and defective p53, the human lung cancer cell line H460 and ovarian carcinoma cell line PA-1 were modified by HPV E6 transfection to generate two pairs of cell lines with p53þ/þ and p53/. Following cell harvesting, cytogenetic slide preparation, and spectral karyotyping, the frequencies of nonclonal chromosome aberrations (NCCAs) were scored for each sample. Cells with p53/ displayed significantly increased frequencies of NCCAs, as compared with the wild-type p53 counterpart, confirming the importance of p53 to genome instability and evolutionary potential, regardless of the diverse p53-related molecular pathways, which may have been involved (Heng et al., 2006a; Heng, 2015) (see Chapters 3 and 4). By further culturing these cells featuring genome instability, many of them formed clonal populations displaying clonal chromosome aberrations, which are likely associated with gene mutations that promote proliferation. This simple experiment illustrated that system instability is more important than a given cancer gene or pathway, as the stochastic process will select one or another cancer gene(s), as long as instability is there. 2. Many factors (genomic and environmental alike) can contribute to cancer evolution.

8.3 THE CHALLENGES AND OPPORTUNITIES FOR PRECISION

433

Although the p53 mutation can drastically increase the frequencies of NCCAs, so can many other gene/epigene/ chromosomal alterations (like ATM/, DMF, aneuploidy) and environmental factors (like different drug treatments, viral infection, inflammation, culture conditions, and more) (Heng et al., 2006a; Ye et al., 2009). These different factors can work independently or in combination (in an either collaborative or conflicting manner). Different factors can replace each other when opportunities appear. This agrees with the observation that a large number of gene mutations and other factors can be linked to cancer, but in a complicated fashion. Interestingly, all of these factors can be linked to genome instability, which in turn links them to the stochastic evolutionary process, exemplifying “the transition from certainty of parts to uncertainty of the process selecting parts.” This analysis, coupled with additional syntheses, has promoted the evolutionary mechanism of cancer, which unifies the diverse molecular mechanisms comprising system instabilityemediated evolutionary selection (Ye et al., 2009; Heng et al., 2009, 2011a-b, 2013a-b). 3. The two phases of cancer evolution were initially observed from a cellular model of immortalization in which the p53/ was involved. It was confirmed with a mouse model system, which lacked initial p53/ mutations (Lawrenson, 2010; Abdallah et al., 2014). Later, the increased capability to induce genome chaos was also linked to p53/ status (Liu et al., 2014), which led to the realization that reduced genome constraint is important for the generation of new systems through rapid genome reorganization (genome chaos). During the phase of macrocellular evolution, transcriptome dynamics are very high, and an individual gene mutation is much less important than genome reorganization in terms of timely survival. During the microevolutionary phase, however, the gene’s power becomes more visible in the context of clonal expansion. This experiment supports the viewpoint that different genomic mechanisms are dominant in different phases of cancer evolution, further suggesting that macroevolution does not seem to be caused by the accumulation of microevolution over time. Together, the above experiments, and more importantly, the syntheses, demonstrated that a new strategy is needed to understand the mechanism of p53 or other molecules in the context of cancer evolution. Current molecular characterization of p53, in contrast, is mainly in the context of different pathways which often generate large amounts of conflicting data, as most of these data are from a “parts point of view.” In reality, many changes at the gene

434

8. THE RATIONALE AND CHALLENGES OF MOLECULAR MEDICINE

level might be irrelevant for cancer evolutionary mechanisms, at least during the rapid evolutionary phase. With this understanding, it is no longer puzzling why it is so challenging to define p53’s functions and to apply them for the treatment of cancer. Namely, there is no defined function of this molecule in evolution (despite that scientists can define its function in isolation); instead, all of the p53’s function is context-dependent. It is no wonder that there is increased concern about the status of p53 research, particularly in regard to its implications for medicine. Indeed, the issue of “the paradox of p53” has entered into discussions: As p53 biology continues to surprise, the question of how to efficiently harness the modulation of p53 activities for therapeutic benefit remains tantalizingly unanswered. Kruiswijk et al., 2015 Unlike the rather stereotypic image by which it was portrayed ..., p53 is now increasingly emerging as a multifaceted transcription factor that can sometimes exert opposing effects on biological processes. This includes pro-survival activities that seem to contradict p53’s canonical proapoptotic features, as well as opposing effects on cell migration, metabolism, and differentiation.. Deciphering the mechanisms by which p53 determines which hubs to engage, ... remains a major challenge .... . the “paradox” of p53 is still far from being resolved. Can we develop the computational, technological, and biological tools to tackle this “super hub” challenge? . Only the next 35 years will tell. But be ready for new surprises! Aylon and Oren, 2016

Interestingly, even though the issue of evolutionary selection is mentioned in some pieces that search for new frontiers in p53 research, most of the questions still focus on how to handle the molecular characterization of p53, albeit from a large data perspective. The critical framework of rethinking genomics and evolution is not on the table. The following analyses/statements/questions are thus crucial for addressing the paradoxes of p53, in the light of the genome theory of cellular evolution. 1. The “parts” (p53) characterization is extremely limited when the genome context that defines the function of parts is highly dynamic and less predictable. The function of p53 within the context of the genome is analogous to the function of a brick within the context of a building. The function of a brick is dynamic depending on the context of the bricks around it which make up various types of structures. Depending on its context, p53 can have so many functions. The key is to study the context of the building, rather than a specific brick. Unfortunately, so far, in studies of p53’s function, the

8.3 THE CHALLENGES AND OPPORTUNITIES FOR PRECISION

435

issue of different karyotypes has never appeared in the work of most molecular researchers. 2. Studies of “part”-specific interactions (such as p53 DNA-binding specificity) should provide clarity in regard to how p53 and its partners bind. However, because a large number of substrates can be involved, the nonspecificity that p53 faces is overwhelming. There are nearly unlimited possibilities involving the interactions of p53 with others, making its predictability very low. For example, there are more than 2000 different mutant p53 proteins, known as the p53 mutome, which affect the interaction of p53 with DNA (Stiewe and Haran, 2018). Nevertheless, such stochasticity works well for a biological system undergoing evolution, which is unfortunate for researchers and their preference for certainty. 3. Emergence is the key. One can characterize thousands of agents in artificially created linear models; different initial conditions, as well as the evolutionary context, ultimately define the limitation of the basic research. The key is that we cannot take heterogeneity out of the system for the sake of research, and we cannot do continue translational research without the evolutionary context. Furthermore, many key molecular methodologies are problematic, including the various mouse models (Heng, 2015). These methods focus on the average profiles by ignoring the outliers, focus on the short-term molecular response by ignoring the long-term phenotypic consequences, and assume that what is “good” or “bad” in the short term will lead to long-term “benefit” or “harmfulness,” respectively. 4. Everything is linked. This reality also applies to p53. In the past 40 years, p53 studies have touched on many aspects of biological systems, as reflected by nearly 100,000 publications. Despite all of the money involved and research efforts that have taken place, how to apply the molecular knowledge of p53 to the clinic is still a big unknown. The last decade has brought further increased uncertainty to p53 research through various -omics. Now that p53 is linked to the active fields of metabolics, stem cell research, and epigenetics, this complexity will only continue to increase. This leads to a practical policy question: when do we say that enough is enough, in terms of continuing to characterize the potential linkage between p53 and other molecules? This is an especially relevant question when considering that most of the cellular systems used (including animal models and clinical samples) involve karyotype dynamics, which means that the genomic context in these systems is constantly changing. It will likely take much longer than 40 years just to know how complicated this issue will be. An even more profound question for the research community is what should be done about the over 20,000 other genes? Should we duplicate the 40-year story of p53 for most of them as well? p53 alone costs tens of billions of

436

8. THE RATIONALE AND CHALLENGES OF MOLECULAR MEDICINE

dollars; imagine how many resources will be needed to build up all of the molecular knowledge of these other genes (with limited clinical implications)? This question is important and deserves that any serious scientist think about and act upon it. In conclusion, we can continue the effort of chasing the dynamics of parts interactions, get more information at the parts level, and publish another 100,000 papers, while establishing little that is useful for the clinic. Alternatively, we can start a serious discussion about the direction of molecular medicine in general, and at the same time, vigorously search for better frameworks that can guide future research.

8.3.2 The Relationship Between Stress, Variation, Adaptation and Trade-Off, and Disease To understand why the genomic and environmental context is so important for defining the significance of individual molecules in molecular medicine, it is necessary to define what disease is in genomic and evolutionary terms. It was recently stated: The term disease broadly refers to an impairment of the normal physiological function of a tissue/organ, organ system, or of the body and mind, in the context of genetic or developmental errors and unfavorable environmental factors (i.e., infection, poisons, and nutritional deficiency or imbalance). Disease is often associated with specific physiological responses and/or pathological changes caused by stress. (Heng, 2008, 2013c, 2015; Heng et al., 2016a). Based on the appreciation of the ultimate importance of genetic heterogeneity to human species’ adaptation and survival, we would like to define disease as genetice environment interaction-generated variable phenotypes, which display functional disadvantages, discomfort, and/or harmfulness, when they are less fit in the current environment (note, nevertheless, that some potential benefits could be achieved in very different environments). Heng et al., 2016b (with permission from Wolters Kluwer-Medknow).

From the above definitions of disease, a few key concepts need to be highlighted based on the viewpoints of genome theory. The meanings of some frequently used terms, such as “stress,” “variants,” “environment,” and “interaction,” will be discussed in the context of cellular evolution. 1. Stress is not only a double-edged sword for health but also an essential condition for any living system to exist and evolve. Stress can be linked to various diverse molecular mechanisms. Despite that the word “stress” is often associated in the general public with unhealthiness, it is increasingly appreciated that stress can be bad or good, as many important bioprocesses, such as

8.3 THE CHALLENGES AND OPPORTUNITIES FOR PRECISION

437

development and evolution, depend on it (Horne et al., 2014; Heng, 2017b). Therefore, upcoming research should be focused on how to maintain system balance rather than on how to avoid all stress (Zhang et al., 2005; Horne et al., 2014). The following points are related to how to understand stress in molecular medicine: (a) Any “good” molecular treatment can become a stress to the system. Under certain conditions, it can become deadly. The fact that adverse drug reactions remain a major cause of death (Bonn, 1998) reminds us of the challenges that will almost certainly arise when many more molecular interventions are introduced in the future. In addition, changes at lower levels will generate stress at higher levels, and vice versa. However, molecular medicine has traditionally paid more attention to successes at molecular level. For example, in cancer treatment, some costly molecular treatments can achieve a good response at the molecular or tumor levels but have failed to improve life quality or prolong life. (b) Different levels of the biosystem (such as cell, tissue, or individual) may have different responses to a given treatment/ stress, which demands that attention be given not only to what occurs at the molecular level. Pushing the maximal dosage for chemotherapy during treatment, for example, could have a very negative impact on some patients, as high-dosage treatment can induce genome chaos. Many molecular treatments can have an impact on the patient’s mental health, which can in turn affect the stability of lower levels of the biosystem. (c) One of the rationales of identifying a molecular magic bullet (to target a key gene/protein or pathway) is the assumption that stress and the stress response comprise of a specific event and that there is a linear, causative relationship between this molecular target and the disease phenotype. As has been discussed, this is often not the case in cancer and most common diseases. Significantly, the cellular stress response is by and large less specific, as many reported specific responses among molecules can only be observed for a very short period of time and, furthermore, are limited to only some linear models. When the entire genomic network is under investigation, this specificity is unlikely to be observed again (Stevens et al., 2013a-b, 2014). This point has been addressed recently: The cellular stress response is a reaction to any form of macromolecular damage that exceeds a set threshold, independent of the underlying cause, and the fragmented knowledge of the stress response needs to be unified at the conceptual level to explain its universality for many different species and types of stress (Kultz, 2003). In fact,

438

8. THE RATIONALE AND CHALLENGES OF MOLECULAR MEDICINE

many aspects of the cellular stress response are not stressor-specific, because cells monitor stress based on macromolecular damage without regard to the type of stress that causes such damage (Kultz, 2005). There is also limited pathway specificity for stress response during somatic cell evolution, especially under pathological conditions where stochastic genetic alteration plays an important role Horne et al., 2014.

To wit, while the wild-type p53 gene can restore some features in p53 knockout cells in some linear models (illustrating the specific function of p53), this strategy of restoring p53 in cancer patients (a setting in which nonlinear systems are involved) has so far failed. The idea of simply putting the wild-type p53 gene back into the unstable system (which was caused by the p53/ in the first place) will not work in highly dynamic evolutionary systems. In contrast, wild-type p53 can generate further stress for the already unstable system. This gap between research (focused on parts) and the reality of the clinic (determined by systems) once again demonstrates a simple truth: an adaptive system is not like a clock in which specific parts can be replaced without changing the overall system. 2. Multiple levels of genomic/epigenetic variations: There are more important genomic elements than genes. When talking about the genomic/nongenetic variable elements or variations, one is mostly referring to gene mutations/splicing, copy number variations, and epigenetic variations (Feuk et al., 2006; Kundaje et al., Roadmap epigenomics consortium, 2015; Heng, 2017a). However, the most important genomic variants, karyotypic variations, are often left without the deserved amount of study. As stated, the short-term focus of the Precision Medicine Initiative is, using cancer as an example, to translate knowledge into clinical practice (Collins and Varmus, 2015). As there are two phases of cancer evolution, and gene mutations play a limited role in the macrocellular phase of cancer evolution, genome-level alterations must be carefully studied. Unfortunately, current genomic landscape profiling still focuses mainly on gene mutation and copy number variation profiles, with a renewed interest in epigenetics. This situation must change for molecular medicine to include the genomic context, with the new knowledge of chromosomal coding and the genome as a gene organizer. More importantly, chromosomal aberrations display a much higher prediction value than gene mutations (Jamal-Hanjani et al., 2017; Davoli et al., 2017; Ye et al., 2018b). Based on more detailed discussions in Chapters 3 and 4, a new attitude toward variations is needed: not all variants are equal in different evolutionary contexts. When studying each individual type of variable, one needs to understand the types of disease (is it a

8.3 THE CHALLENGES AND OPPORTUNITIES FOR PRECISION

439

gene-based, a genome-based, or an epigenetic disease?). Which types of cellular evolution are involved (microevolution, macroevolution, or mixture of both)? 3. Understanding different genotype and phenotype relationships. The disease phenotype (using cancer as an example) can be classified into the following relationships of environments and genotypes: Disease phenotype ¼ Disease genotype þ environments ð1Þ or ¼ Predisposition genotype þ environments ð2Þ or ¼ Normal genotype þ environments ð3Þ Normal phenotype ¼ Normal genotype þ environments ð4Þ or ¼ Predisposition genotype þ environments ð5Þ or ¼ Disease genotype þ environments ð6Þ The relationship between genotype, environment, and phenotype can be explained as the collection of the above six categories. Category (1) suits typical Mendelian single-gene diseases, in which the environment mainly affects the severity of disease phenotype. Categories (2) and (3) suit most common and complex diseases. There are many examples of category (2) in hereditary and familial cancers. Category (3) likely represents most cases of human diseases, in which environmental conditions seem to play a dominant role. Many sporadic cases of various chronic diseases belong to this category. Category (4) is suitable for most healthy individuals. Categories (5) and (6) are suitable for those lucky individuals who have familial predisposition, or driver cancer gene mutations or chromosomal aberrations, and yet are cancer free. It should be pointed out that the concept of “normal genotype” is now under questioning. What is the normal genotype anyway? . When combined with copy number variations, genome alterations, and somatic mosaicism, and especially when single cell profiling is included, the degree of genetic and nongenetic changes is beyond our imagination (Feuk et al., 2006; Iourov et al., 2008; Heng et al., 2016a). Based on this evidence, it is starting to make sense why the environment plays a major role in human diseases, as most of the evolutionary potential will be fulfilled by environmental interaction-mediated evolution. Heng et al., 2016b

4. Environments deserve more attention in molecular medicine: why environmental dynamics is one major challenge for precision medicine In the field of molecular medicine, understanding the genomic contribution to human diseases has been a priority. In recent

440

8. THE RATIONALE AND CHALLENGES OF MOLECULAR MEDICINE

years, due in part to the limited success of identifying common gene mutations responsible for most common and complex diseases, increased attention is now being paid to geneeenvironment interaction. Because the environment has broad coverage, its medicine-related functions can be discussed as follows: (a) For a given genotype, environment determines or “chooses” the specific phenotype within the potential phenotypic range coded by fuzzy inheritance. Different environments will lead to different phenotypes. (b) Different types of environments influence medical strategies. In the case of infectious diseases, for example, because the causative agent is commonly shared among patients for a given disease, there is a more or less linear relationship between the agent and the phenotype. It is logical to perform diagnosis and treatment based on the infectious agents. In contrast, in the case of many common and complex diseases, such as cancer, there are so many contributing factors and dynamics of the evolutionary context that it is challenging to identify any key molecule(s) as a magic bullet. (c) Somatic genomes, including altered genomes, are the genomic environment of individual human genes. In recent years, studies on the microbiome have led to the further inclusion in this equation of the influence of approximately 100 trillion bacteria and other microbes hosted by the human body. It was even suggested that the human microbiome should be considered part of human hologenomes (Bordenstein and Theis, 2015). While microbiome research represents a long overdue and important frontier (Ursell et al., 2012), data are needed to illustrate its quantitative contribution in the real world rather than in experimental models (as it is currently easier to demonstrate its impact in some linear model systems than in patients). Yes, the microbiome is highly dynamic, but how can we use this feature for medical intervention? Some interesting questions have been asked: It is important to investigate, for example, to what extent specific micro-organisms contribute to the evolutionary selection of hosts. Who controls who and in what degree? (Specifically, is the human genome selecting the microbiome, . or is selection based on the interaction package?..). Does the hostemicrobiome interaction-mediated degree of heterogeneity matter the most, rather than any specific interaction? Should the microbiome be considered as an environmental component, no matter how many types and numbers of them there are? (Similarly, should the number/type of animals/ plants surrounding humans be included as part of the hologenome?) When compared to the multiple types and levels of genome heterogeneity (which so far have been largely ignored), which type of impact is more significant? Heng et al., 2016b (with permission from Wolter Kluwer-Medknow).

8.3 THE CHALLENGES AND OPPORTUNITIES FOR PRECISION

441

(d) Based on the “phenotype ¼ genome type þ environments” relationship, if it is difficult to change the genotype for some diseases, we should focus on environmental changes to achieve the phenotype correction. Therefore, in the future, more effort should be focused on environmental therapy (to reduce the disease phenotype by altering its environmental context). By creating a certain environment, it is possible to eliminate or reduce the disease phenotype for a specific patient group. The strategy of changing the environment will be useful for controlling many common diseases. Furthermore, creating conditions in a medical context, which promotes self-healing, is an important avenue. For example, pain relief medication may be administered to encourage exercisedthe medication is not used purely for continuous relief but to promote conditions for direct self-healing. (e) Different types of environmental stress can be measured by considering them as a general stress. At the genome level, it can be measured using the frequencies of NCCAs. At the gene level, it can be measured using the increased gene mutation rate across the genome. Standardized methods are needed for this purpose, however. 5. Interaction is defined by emergence within an evolutionary context: Traditional strategies for studying molecular interaction often focus on the physical interaction among molecular partners, as well as up- and downstream relationships in specific pathways. To illustrate the interaction among agents in a complex system, however, the concept of emergence within an evolutionary context is the key. Of equal importance, there are multiple levels of interactions. Many higher-level interactions can constraint molecular interactions. Future medicine needs to give more consideration to the individual’s overall health conditions (including mental health and nutrition), as well as familial and societal interactions. It is likely that for the general population, focusing on improving lifestyle for prevention is more effective than applying molecule intervention when people are sick. In addition, the conflicting relationship among different organs needs more attention. It is known that some drugs are good for one type of organ but bad for another. It is best to have a balanced view based on the individual, rather than just on a specific organ. Similarly, a balanced approach is needed when there is a conflict between short- and long-term benefits.

442

8. THE RATIONALE AND CHALLENGES OF MOLECULAR MEDICINE

The synthesis of these five key points has led to the following conclusion: A specific concept of “stress promoted, genetic variation mediated cellular evolution” has been proposed, using the genome theory to synthesize the current status of genomic medicine (Horne et al., 2014; Heng, 2016a, 2016b). In brief, diseases are defined as genotype/environment-induced variants that are not compatible with a selected environment. Since different types of variants are necessary for cellular adaptation, different stresses can promote variants with benefit but also can eventually contribute to disease conditions. Various trigger factors (genetic or environmental alike) can speed up cellular evolution; there is often no stepwise relationship between initial causative factors to the molecular profiles of diseases following years of cellular evolution, where the genome instability can be stochastically linked to different molecular pathways. Though causative factors may be diverse, they can all be considered system stress. Therefore, stress, system response, and cellular evolution are the general bases for most common and complex diseases. Accordingly, monitoring diseases should focus on genome defined system stability and evolutionary potential, rather than specific gene’s functional status that is in fact constantly changing. The rationale of bringing some key factors together under the evolutionary adaptive process is to search for the common mechanisms in diseases, and to unify such diverse molecular mechanisms (Horne et al., 2015a, 2015b, 2015c) Heng, 2017b (with permission from John Wiley and Sons).

With the above understanding, some big picture questions should also be asked, which are important for the future of molecular medicine: First, how can molecular medicine and traditional medicine be balanced? With the increased use of whole genome sequencing data and other molecular profiles, diagnosis soon will detect molecular indications of diseases long before clinical symptoms become detectable. Although this seems to be a dream come true for molecular medicine, it can also lead to serious confusion, as many molecular indications will not lead to a given disease’s phenotype or rapid progression. How “potential patients” deal with this situation represents a big challenge. Even based on current diagnostic platforms, overdiagnosis, referring to when a disease condition is diagnosed that would otherwise not go on to cause symptoms or death (Welch and Black, 2010), is a serious problem in cancer clinics. Indeed, this phenomenon is applicable in the case of approximately 25% of mammographically detected breast cancers, 50% of chest Xray and/or sputum-detected lung cancers, and 60% of prostatespecific antigen-detected prostate cancers (molecular medicine method) (Heng, 2015). The issue of molecular overdiagnosis will likely become worse. Second, can we cure all diseases? If we can, should we? In recent years, there have been many headlines declaring that with advanced -omics technology and powerful artificial intelligence, molecular

8.3 THE CHALLENGES AND OPPORTUNITIES FOR PRECISION

443

medicine will completely alter human fate by bypassing biological evolutionary selection. As a result, we will cure nearly all diseases, including aging. To support these claims, people have cited genome editing technology such as CRISPR/Cas9, as well as organ culture and the use of stem cell technology. We have heard similar promises before (see Chapters 1e3). In fact, mankind has a long history of dreaming of becoming immortal and free of suffering, across all different cultures. But this time is different, we are told. This time we have artificial intelligence, which truly knows how to solve the mystery of life, as if we are God-like. In fact, many funding organizations have clearly settled on the goal of curing all diseases in the not-so-distant future. Can we do it? The answer is no. Based on the gene-centric viewpoint, new gene editing methods can precisely replace individual genes, but from a genome-based evolutionary point of view, the system’s evolution will soon make target-specific molecular manipulation become off-target. This point has been discussed in regard to how DNA transfer methods have an impact on the genome. We conceptualize that the diverse experimental manipulations (e.g., transgene overexpression, gene knock out/down, chemical treatments, acute changes in culture conditions, etc.) may act as a system stress, promoting intensive genome-level alterations (chromosomal instability, CIN), epigenetic and phenotypic alterations, which are beyond the function of manipulated genes. Such analysis calls for more attention on the reduced specificities of gene-focused methodologies. Stepanenko and Heng, 2017.

Further experiments have examined the majority of current DNA manipulating methodologies, including transgene, RNAi, small molecule targeting, and CRISPR/Cas9; all of these have altered the genome, and each run of experimentation led to distinctive karyotypes after evolutionary selection (Heng, unpublished observations). Currently, the precision aspect of CRISPR/Cas9 has been focused on targeted genes or genomic regions. The potential problem is the impact on the entire genome. Hidden genomic rearrangements generated by Cas9 have been reported by another group as well (Boroviak et al., 2017). These results should alarm many. Interestingly, there seems to be a general trend in which types of diseases have been dominant across human history, as defined by the environments and influenced by the advance of medicine. Infectious diseases were dominant before the antibiotics era; now, those dominating are cancer and metabolic diseases, and soon, mental disease will be more dominant. Of course, many future

444

8. THE RATIONALE AND CHALLENGES OF MOLECULAR MEDICINE

diseases will appear as well. Humans will control many diseases, and new forms of current diseases and new types of diseases will come. The only difference is that mankind will use technologies to create artificial selection environments, thereby bypassing some biological evolutionary constraints. The question remains: should we do it (that is, cure all human diseases)? Knowing that genomic heterogeneity is the very reason for many common and complex diseases, and at the same time, that it is essential for the human species to exist and evolve, the answer to this question becomes rather complicated. We probably will try to wipe out many infectious agents (even though the superbugs will challenge back), change our lifestyles to reduce and manage metabolic diseases, live longer, and live in different environments with help of artificial technologies; however, we will still live with diseases because of the unfit variables. Medicine should do as much as possible to reduce the individual’s suffering, but at the same time, it should take care of our own species from a long-term point of view. It is difficult to even realize that, while representative of unfortunate circumstances for some individual patients, having many genetic variations in the human population is essential to ensuring the necessary degree of heterogeneity or robustness (regardless of the good or harm to some individuals in current conditions), which could ultimately contribute to the existence of humanity. Heng et al., 2016b

8.3.3 Genome Alterations and Common/Complex Diseases The long-term goal of the precision medicine initiative is to establish a platform for successful diagnosis and treatment of other noncancer, common, and complex diseases. Based on the extensive analysis of cancer genomics, it is crucial to appreciate the concept that system instability is a key feature of many diseases. Numerous diseases whose phenotypes and molecular mechanisms are different may have the same underlying causedgenome instability. According to traditional thinking, cancer appears as a unique problem because such cells have an apparent growth advantage over their normal counterparts. But the reason altered cancer cells can outcompete normal tissue is that they have lost the homeostasis mechanisms (or system constraint) of normal tissues. Despite the very different features between cancer and other common diseases including differential degrees of genome alteration, they are all in fact system diseases where system deregulation is the key evolutionary process during disease progression. Rearranged genomes represent altered systems, and the created imbalance of system homeostasis is an important defect that favors the evolution of disease.

8.3 THE CHALLENGES AND OPPORTUNITIES FOR PRECISION

445

8.3.3.1 Key Features and Types of Common and Complex Diseases The search for genetic causes of common and complex diseases is a major challenge for molecular medicine. As listed in Table 8.1, many key features of common and complex diseases have handicapped traditional genetic strategies useful for studying Mendelian diseases. While it is difficult to identify commonly shared genomic alterations with high or even modest effects in patient populations, increasing evidence has linked large numbers of rare genetic loci with severe effects on individual patients. Significantly, many detected genetic changes involve genome alterations rather than gene mutations alone. To address this important issue, the genome theory will be applied to illustrate why 4D genomics rather than the 1D gene view can provide answers. Based on the diverse genetic patterns of various human genetic diseases, it is necessary to classify human inherited diseases into four subtypes (Table 8.2) (Heng, 2010). The first type is classified as those produced by genomes that feature commonly shared genetic alterations within a population (gene mutations or chromosomal or subchromosomal alterations such as copy number variations). This type of genome can be further classified into two subtypes. (1) Common genetic/epigenetic loci

TABLE 8.1 The General Features of Common and Complex Diseases. * High incidence within populations * Clear genetic influence, family clusters (tends to aggregate in families) * Failure to identify common genes responsible for the majority of cases after several decades of searching Many genetic loci are indicated, but the penetration is low among patient populations Fewer collective effects are observed when multiple loci are used for population screening Most genetic loci are stochastically involved Diverse genetic/epigenetic heterogeneity among somatic cells Common diseases without common molecular causes and most genetic changes that lead to diseases are rare within populations but carry serious consequences Large numbers of factors (genetic and nongenetic) are involved Many experimental models mimic the phenotype under specific conditions Some diseases are closely associated or overlapped and share some common genomic regions: cancer and obesity, cancer, and aging Many diseases share some key molecular pathways, such as the dysfunction of mitochondria, linkage with stress pathways, metabolic pathways Many diseases share similar genetic networks * Longer periods of time are required for a disease to become clinically dominant (possibly represents an evolutionary process: time þ probability) * Clearly related to lifestyle * Systems diseases involving somatic evolution

446

8. THE RATIONALE AND CHALLENGES OF MOLECULAR MEDICINE

TABLE 8.2 Genetic Disease Classification. Genetic Factors in Patient Populations

Disease Type Prevalence

Relative Genome Stability

A

Commonly shared

Infrequent

Stable

Typical Mendelian diseases (for example, sickle cell anemia)

B

Commonly shared

Infrequent

Unstable

Familial cancer syndromes

C

Rare

Infrequent

Stable

CharcoteMarieeTooth neuropathy

D

Rare

Prevailing

Unstable

Sporadic cancers and neurological and/or behavioral disorders

Disease Type

Examples

Reproduced from Heng, H. H. (2010). Missing heritability and stochastic genome alterations. Nat Rev Genet, 11(11), 813. https://doi.org/10.1038/nrg2809-c3.

that have been identified within a relatively stable genome. Well-known examples of this category include cystic fibrosis, sickle cell anemia, Down syndrome with extra chromosomes, fragile X syndrome with expansion of trinucleotide repeats, and diseases that share copy number variations or even single-nucleotide polymorphisms. (2) Common genetic/epigenetic loci that have been identified with unstable genomes. Typical examples are the p53 mutations detected in LieFraumeni syndrome and BRCA mutations in familial types of breast cancer. This first type of genetic aberration with higher penetration levels has been the primary research focus of inherited diseases where whole genome association studies and patient validation works well. In fact, the concepts and experimental approaches of medical genetics have so far been based on the understanding of this type of disease. The second disease type features genomes that have rare genetic alterations. In contrast to the first type, they are represented by a large number of rare genetic alterations in individuals or families and are not highly represented within the population. Here, there are also two subtypes: type c, rare loci within relatively stable genomes, and type d, rare loci within unstable genomes. As genome instabilityemediated stochastic genome evolution is the driving force of cancer formation, it is likely that most of the sporadic cancers are caused by rare genetic or epigenetic alterations within or leading to unstable genomes, the type d disease. The vast majority of common and complex diseases belong to type c and d, in which the traditional approach of identifying common patterns of genetic/epigenetic alterations has failed. The rationale for such a classification is to illustrate the distinctive patterns among heritable diseases.

8.3 THE CHALLENGES AND OPPORTUNITIES FOR PRECISION

447

Sometimes both common and rare genetic alterations may lead to the same disorder like CML patients with and without Ph chromosomes as well as inherited hearing loss (Dror and Avraham, 2009). More detailed analysis is needed to define rarity and instability of the genome for various diseases. Such a classification can resolve many confusing issues. For example, although many rare genetic loci have been found across the human genome within populations when compared with a few commonly shared loci that are responsible for a common disease’s phenotype, the validation concept and currently used methods downplay the important relationship of these rare loci to diseases. Furthermore, even though stochastic genome alterations are much more frequent and dominant than gene mutations, these genome-level alterations are often ignored unless they can be directly linked to specific disease genes. Such geneticmediated phenotypic plasticity has often been confused with nongenetic effects because genome-level alterations provide additional arrays of phenotypic plasticity that reduce predictability. 8.3.3.2 Stochastic Genomic Alterations Contribute to Most Common Diseases By linking stochastic genome changes to various common diseases, it is hypothesized that the missing heritability of common and complex diseases is the result of stochastic genome alteration during disease evolution. Thus, for most common diseases, more attention should be focused on the heterogeneity and system dynamics defined by the genome package rather than on common gene mutations. To connect the dots, the above classification has been further synthesized based on the genome theory. 1. Changes at the genome level can have an impact on large numbers of individual gene functions. 2. Any significant stress such as specific gene mutations, epigenetic abnormalities, or environmental stresses will cause an increase in genome dynamics, leading to a less stable population with increased NCCAs. 3. The evolutionary mechanism of a given common complex disease is equal to or larger than the sum of all molecular mechanisms. If a specific altered genetic locus can be linked to a particular molecular mechanism in an individual case, the entire collection of different genetic loci is necessary to explain large numbers of diverse individuals. 4. There are many cases where the properties of genome-level alterations cannot be explained by individual gene functions. Thus, stochastic genome alterations are the common reason behind the diverse molecular pathways in patient populations.

448

8. THE RATIONALE AND CHALLENGES OF MOLECULAR MEDICINE

5. In addition to passing on genes and mutations, it is also possible for individuals to pass on the degree of genome instability (or fuzzy inheritance) without direct impact on specific genes or pathways. In this case, the specific genome is less stable under certain environments, reducing predictability in terms of phenotypes. 6. In the same individual, there is a key difference between the genome of germ cells and the genome of somatic cells. Increasing data indicate that somatic genome variations may contribute to disease conditions (Heng, 2010, 2016a-c; Iourov et al., 2008; Vorsanova et al., 2010; Sgaramella and Astolfi, 2010; Ye et al., 2018b). This level of variation typically occurs in only a proportion of somatic cells (Bruder et al., 2008) and is significantly more abundant in adults than in newborn individuals (Flores et al., 2007). Epigenetic differences also arise during the aging process as illustrated by differences arising during the lifetime of monozygotic twins (Fraga et al., 2005). Collectively, stochastic alterations should be observed in many diseases especially in aging tissue (Geigl et al., 2004; Vijg and Dolle´, 2002). 7. Increasing reports link various common diseases to genome-level alterations including hypertension, neuropathy, infertility, schizophrenia, autism, aging, and recently, obesity, and Gulf War illness (GWI) (Heng et al., 2013b, 2017b, 2018; Iourov et al., 2008; Liu et al., 2018; Vorsanova et al., 2010; Sgaramella and Astolfi, 2010; Bucan et al., 2009), and diverse karyotypes have been linked to the vast majority of cancer cases. In addition, the de novo somatic L1 insertion occurs at higher frequencies in the human lung cancer genome (Iskow et al., 2010). Together, this strongly supports the importance of genome variation during the transition from physiological to pathological conditions. Based on the above synthesis, the following general model is proposed for consideration. A. In response to system stress (internal and environmental), the majority of genetic/epigenetic alterations are stochastically distributed among patients’ genomes with low penetration within a population. Only a small portion of genetic/epigenetic changes will display higher penetration in a population. B. Some of these rare genetic alterations can be linked to particular molecular mechanisms and disease phenotypes by being associated with specific gene functions. However, many of these genome-level alterations can impact genome topology rather than only directly affect specific gene loci, especially when different loci function simultaneously within the genome package based on the self-organization principle. A large portion of genomic disorders in this category cannot be satisfactorily explained by an individual gene or even the cumulative effects of

8.3 THE CHALLENGES AND OPPORTUNITIES FOR PRECISION

449

multiple loci. This agrees with the finding that there is a major gap between the majority of variations detected by genome-wide association studies (GWAS) and their biological significance based on the knowledge of specific gene functions. It is thus likely that many emergent properties at the genome and its environmental interaction level (such as heritability) cannot be dissected to the individual genetic elements. C. Because of the nature of heterogeneity, a specific genetic alteration that is crucial for one individual patient may or may not be significant for another. The difference between individuals and populations must not be ignored when screening methods are designed. D. There is a difference between inherited “germline genomes” and “somatic cell genomes” of the same individual. Germline genomes are much more stable than somatic genomes. Somatic genome variation that can occur during developmental and physiological processes such as tissue renewal and aging and particularly during pathological processes is essential to disease phenotypes. E. Some diseases can share certain genetic alterations and all diseases represent different phenotypes of an altered system. F. The genomeeenvironmentetime interaction plays an important role in common diseases. The general cause of many common diseases is system instability, which can be achieved by combinations of an array of molecular mechanisms and environmental stress. Furthermore, the “window of opportunity” between the genome of germline and somatic cells and in particular between somatic cells provides yet another layer of complexity on which environmental insults can act. The initiating factors are often untraceable in fully developed complex diseases because their formation is a time-dependent, nonlinear evolutionary process. It is thus likely that identifying “initial” errors (genetic or otherwise) may have minimal benefit to the diagnosis and treatment of most cases of common diseases. In these cases, it is the 4D genomic dynamic interactions that matter. G. How genetic variations become associated with disease depends on the environment. With drastic environmental changes, it is anticipated that some “future diseases” will become clinically significant. The proposed model is illustrated in Fig. 8.1. Of course, the main purpose of using this simple model is to link genome change to specific gene function, which will be more easily accepted by gene-centric researchers. Note that even though many

450

8. THE RATIONALE AND CHALLENGES OF MOLECULAR MEDICINE

FIGURE 8.1 The stochastic genome alteration model. Stochastic changes can potentially “hit” anywhere within unstable genomes. However, most of the alterations are not directly associated with disease (not shown). Four chromosomes and nine loci (AeI) are illustrated and represent the entire genome. Some of these diverse genome alterations can be linked to specific genes or pathways, but many of them contribute to altering the genome context or dynamics without being linked to specific genes. Multiple loci can contribute to different diseases but individual patients may display variable “hit list” profiles of potential shared loci (in disease type a, loci A, D, E, G, and I are involved in different patients, whereas in disease type b, loci B, D, E, F, and H are involved). Some loci are actually shared among diseases (e.g., D and E for diseases a and b). There are also some genetic alterations that can potentially generate disease conditions if there is exposure to a particular environment (C) (potential or future disease).

chromosomal translocations can be linked to specific fusion genes, the genomic codes can be changed even without those specific fusion genes. More discussion can be found in Chapter 4. 8.3.3.3 The Search for the General Model for Common and Complex Diseases/Illnesses: A Case Study for Gulf War Illness GWI is detected in nearly one-third of Gulf War veterans in the United States (Research advisory committee on Gulf War Veterans’ Illness, 2008). Diagnosing and treating GWI is difficult because of its complex etiology and diverse symptoms. This challenge has led to the slow acceptance of GWI as a real clinical condition (Heng et al., 2016a-c). The general mechanism of GWI remains unknown, despite increasing studies that have revealed a number of associated molecular mechanisms, including

8.3 THE CHALLENGES AND OPPORTUNITIES FOR PRECISION

451

mitochondrion dysfunction and altered immune response (Koslik et al., 2014; Craddock et al., 2015; Parihar et al., 2013; White et al., 2016; Liu et al., 2018). Interestingly, however, all these seemingly diverse observations can be summarized as the following conclusions: (1) There are diverse warrelated factors that can be considered as GWI triggers (e.g., nerve gases, pesticides, insect repellents, and antinerve agent pills); (2) all these trigger events occurred nearly 30 years ago, suggesting that GWI represents a complex adaptive system; (3) the symptoms are highly diverse; (4) most GWI patients display genome instability, reflected as an elevated level of NCCAs, rather than specific and commonly shared chromosomal aberrations; and (5) the majority of identified contributing factors as well as phenotypes can be linked to genome instability. By synthesizing all these facts through the genome theory lens, a general model of GWI has been proposed to integrate stress, genome alteration, somatic cell evolution, and diverse symptoms (Fig. 8.2).

FIGURE 8.2 A general model of cellular evolution of GWI (modified from Heng et al., 2016c, with permission from Springer Nature).

452

8. THE RATIONALE AND CHALLENGES OF MOLECULAR MEDICINE

As illustrated in this model, the GWI process can be divided into three stages: (1) The initial stage: diverse, extremely high stresses incurred during the Gulf War damage cellular systems; (2) the cellular/system evolution stage: many individuals recover from stage one, but in those who cannot recover from the initial damage, the genome will become destabilized, triggering further cellular evolution. This stage may involve a variable period of time for different individuals; and (3) the illness stage (with diverse disease phenotypes): the altered genome can impact different cellular mechanisms leading to diverse symptoms (which is why GWI symptoms range widely). Increased stress and further genome instability represent key features of this stage and are likely linked to illness severity or progression potential. As GWI has progressed for nearly 30 years, the initial trigger factors might be less visible now, but because both genome instability and various stress pathways are elevated in GWI patients, the stressegenome interaction is still clearly evidenced in GWI patients. It is thus extremely important to study the ongoing stress-mediated genome evolution process and its implications for diagnosis and treatment. In addition, this model can also be used to explain some other common and complex diseases that involve stress-induced, genome alteratione mediated somatic evolution. Moreover, the three stages of the disease evolutionary model are very useful when it is integrated with patient-centric health care. Prevention (avoiding initial trigger factors), stabilization (stabilizing the system and slowing down the illness’s cellular evolution), and reduction of the illness’s impact by promoting systems recovery (applying systems constraint and self-healing) should play increased roles in health care. 8.3.3.4 New Model With New Explanations The concept of stochastic genome alteration as the basis for many common/complex diseases needs to be vigorously examined, particularly the study of distribution patterns of the various genome types (a to d) for each disease. This will establish a new standard for validating rare genetic loci and will elucidate the relationship between genome-level alterations and stochastically involved molecular pathways. By emphasizing the importance of stochastic genome dynamics rather than specific common gene mutations, the genome era will finally arrive. This realization will also diminish our zeal for magic bullet therapy to cure common and complex diseases. New methods of medical validation and conducting clinical trials will emerge. These strategies will deliver the benefit of concurrently targeting numerous diseases whose phenotypes are very different but whose underlying causedgenomic instabilitydmay be the same. In addition, new approaches to public health will focus on changing

8.3 THE CHALLENGES AND OPPORTUNITIES FOR PRECISION

453

lifestyles to reduce the probability of developing disease. The future of medicine will greatly benefit from the genome theory. This model has the potential to explain many issues/questions better than previous explanations. Examples are listed as follows. 1. Why are multiple steps of genome stochasticity the main reason for high levels of genomic heterogeneity? Genome stochasticity can affect at least five key transitions that are directly related to human diseases (Fig. 8.3). First, the specific type of genome alteration acquired by any individual within a population is a stochastic phenomenon. It used to be thought that most of the unfortunate genetic alterations inherited arose mainly from one’s own family tree. In fact, many new genetic alterations related to human illness are introduced each generation (McClellan and King, 2010) and are transient. The main reason that most commonly shared alterations can accumulate in populations is not

FIGURE 8.3 Genome stochasticity affects at least five key transitions that are directly related to human diseases. The five different key transitional events are illustrated. Event 1 occurs between the population and the individual. Event 2 occurs between the germline and somatic cells (early development). Event 3 occurs between the somatic cells and other somatic cells during system maintenance. Event 4 occurs in the interval between the normal physiological state and when pathological changes first appear during disease progression. Event 5 involves system alterations that occur following medical intervention. In event 1, for example, population size, genetic drift, and geography all play roles. In event 3, somatic cell maintenance is a lifelong challenge to preserve and maintain system homeostasis. Regarding events 4 and 5, drastic genome alterations might be a key (the altered genome is represented by different shapes of the genome). All five events are influenced by time and environmental interactions, as well as the fuzzy inheritance.

454

8. THE RATIONALE AND CHALLENGES OF MOLECULAR MEDICINE

because they are harmful but because many rare alterations can be linked to disease under a given environment (also see the fuzzy inheritance section). Second, variation occurs from the germline level to somatic genomes in a given individual, notably during developmental processes such as T and B cell development and high-level ploidy dynamics observed from hepatocytes (Duncan et al., 2010). Third, variation occurs during somatic cell maintenance such as during tissue regeneration. Fourth, additional high levels of variation occur during the transition from normal physiology to pathological conditions where genome integrity is often lost as in cancer cells. Finally, from pretreatment conditions to the posttreatment state, pathway switching is a frequently observed phenomenon associated with genome alterationsdnote the loss of effectiveness of specific drugs during cancer treatment as the cancer cells adapt to the new dominant pathways. Such pathway switching is often accomplished by altering karyotypes. Together, all five steps contribute to extremely high levels of genome alterationemediated genetic and epigenetic heterogeneity including karyotypes, CNVs, and de novo insertions of endogenous retrotransposons. These steps reflect the important involvement of fuzzy inheritance, which represents the big challenge for precision medicine, which is only based on parts characterization and targeting. 2. How can the same genetic defect generate alternate disease phenotypes while different genetic alterations can lead to similar phenotypes? The same genetic defect can be linked to diverse phenotypes because of combinations of different genetic or environmental modifiers. More significantly, the same gene mutation can display a variety of functions within different karyotype-defined genomes. On the one hand, most common diseases occur at the somatic cell and tissue/organ levels where both the developmental and aging processes are intimately involved. Environmental impact such as geographic factors and lifestyle also contribute to the wide variation in phenotypic diversity. In addition, unstable genomes have more probability of becoming “abnormal” during the development and aging processes, particularly during stress caused either by the altered genomes themselves or by various environmental stresses. This explains the phenotypic variation among individuals who share similar genetic alterations. On the other hand, as the human body has limited major systems (cardiovascular system, nervous system), many genetic abnormalities will ultimately impact on these same major organs or systems. From a molecular and cellular biological point of view, many key cellular functions will be commonly involved as many genes can be involved in similar pathways, creating a highly interlinked internal organization of the

8.3 THE CHALLENGES AND OPPORTUNITIES FOR PRECISION

455

cell (Albert, 2005)da package deal (Heng, 2009). For instance, common/complex diseases are often associated with abnormal metabolic regulation, increased endoplasmic reticulum stress, and abnormal cell death regulation. Notably, mutations of p53 lead to multiple effects (Vogelstein et al., 2000) while many genomic regions have been linked to various types of disease such as four constitutional (germline) genomic disorders, an array of other somatic disorders linked to 17p11.2p12 (Carvalho et al., 2010), and the association of 16p11.2 with autism and obesity (Weiss et al., 2008; Walters et al., 2010). Similarly, common polymorphic variation at the histocompatibility (MHC) loci has been linked to autoimmune and inflammatory conditions such as multiple sclerosis, type 1 diabetes, systemic lupus erythematosus, ulcerative colitis, Crohn’s disease, and rheumatoid arthritis (Fernando et al., 2008). It has recently been noticed that the transcriptional signature and common networks link cancer with diverse human diseases (Hirsch et al., 2010). Many diseases can be derived from an unstable system and can further promote the instability of that system. This explains how different common diseases share many of the same genetic alterations and similar environmental responses as many common diseases are just varied expressions of an unstable system. It is also possible that the shared network reflects the system dynamics of the “abnormal system” in general rather than specific pathways. 3. Why is epigenetic deregulation particularly important in human disease but hard to target? For a given species, the framework of the genome cannot be drastically altered due to sexual reproduction, yet increased system complexity is essential for evolution (Heng et al., 2009; Stevens et al., 2013a). Epigenetics therefore serves as another layer of complexity (Huang et al., 2009). This situation would be similar to a person changing the color of a house or rearrange the furniture but being unable to alter the architecture itself. Since epigenetic regulation is more sensitive to environmental stress, it has a profound impact on human diseases. However, despite the possibility that abnormal epigenetic regulation represents the earliest changes during the evolution of many diseases, application of epigenetic therapy is challenging. A potential dangerous side effect is related to less predictable responses and the fact that the somatic genomes are drastically altered for many late stage diseases. Simply targeting the epigenetic status will not reverse a changed genome (Heng et al., 2010b), yet, drastically challenging the epigenetic status by treatment (especially prolonged treatment) could harm the system, as interestingly, despite generating large amounts of diversity (such as contributing to diverse disease

456

8. THE RATIONALE AND CHALLENGES OF MOLECULAR MEDICINE

phenotypes), most epigenetic alterations are in flux. This is a key biological advantage of maintaining the genome system while providing phenotypic diversity and complexity in normal tissue. Targeting the epigenetic status therefore could go either way and could just as easily harm the system. In addition, this plasticity is much more subtle compared to gene mutation mediated phenotypes, and they are harder to target with any precision or predictability. 4. Why are genome alteration-mediated common diseases difficult to study? As collectively illustrated by the above reasons, the main difficulty is the stochasticity of a complex system that diminishes the significance of linear causative relationships. The emergent property of a disease is not simply based on the quantitative accumulation of individual loci but rather on the genome-level information package. As illustrated in Figs. 3.4, 8.3, the genetic information transfer goes through multiple steps that provide increased stochasticity of genome variation. In a sense, for many sporadic cases, the genome is the nondissectible unit of information of an individual’s disease condition. Thus, the gene theoryebased dissection of an individual’s genes will not adequately explain the missing heritability. Furthermore, within a patient population, diverse combinational patterns make pattern identification extremely difficult using current approaches, as the alteration of genome architecture may lead to the failure to replicate data of a given genetic association study (Greene et al., 2009). It is likely that the combination of rare genetic alterations with ancient commonly shared polymorphisms may contribute to disease. These ancient polymorphisms are shared by all human populations and account for 90% of human variation (Tishkoff and Verrelli, 2003). Another key challenge when attempting to link individual genetic loci and disease phenotypes is the evolutionary process itself where time and historical contingency are important. Many common diseases require years of evolution to become clinically significant and it is difficult to identify and repeat the historically contingent events for different individuals within the patient population. This is particularly so for many sporadic diseases. In addition, an unstable genome often generates highly diverse genome populations among somatic cells where the heterogeneous cell population further stresses the system and the stress can damage the homeostasis leading to disease. Interestingly, research has established the relationship between diverse types of molecular mechanisms and common system stresses which explains why it is

8.3 THE CHALLENGES AND OPPORTUNITIES FOR PRECISION

457

so difficult to predict the probability of diseases based on specific molecular mechanisms (Stevens et al., 2011a-b, 2013a-c; Heng, 2015). This analysis also applies to the large number of microregulators of disease genes. The recent ENCODE project (the Encyclopedia of DNA Elements) has identified a huge number of noncoding sequences and each might contribute to disease conditions in a very subtle way. Despite the high-level excitement of these discoveries, it would be extremely challenging to apply this information to clinical situations. If the emergent properties at the genome level are hard to dissect into individual genes, it will be even more challenging to dissect them into individual microregulators of these individual genes. If we cannot make predictions based on a handful of seemingly dominant influential gene mutations, how can we realistically expect to decipher the more subtle effect of these less prominent elements? 5. If evidence to support this model is overwhelming, why was it not recognized before? There are a number of contributing factors that have led to a blind spot regarding this important link. First, often what is seen is what is recognizable, familiar, or expected. Second, the knowledge generated from infectious diseases has influenced our approach to diseases in general, as it is believed that each type of disease should have the same cause and effect similar to infections caused by the same type of infectious agents. Third, in genetics, there is a tradition of making exceptions into general rules. Highly penetrating, widespread gene mutations (disease alleles) with a high correlation in disease phenotypes are the exception and their uniqueness (exceptional value) is the main reason we analyze them. However, as soon as these molecular mechanisms are established, the fact that these mechanisms will likely work only for these exceptions is often overlooked. To further compound this mistake, these exceptions are used to validate general findings. If we search for the genetic basis of diseases in diverse patient populations, a link can often be found with genetic defects in some individuals. But most individuals in the general population will display different alterations representing different genetic loci. Such situations have prevented the establishment of the link between specific diseases and rare genetic alterations as it is difficult to validate them using the concept of “common diseases being caused by common genetic loci.” Fourth, most of us believe that different levels of information can be inferred by data accumulation and synthesis. Despite the fact that there are multiple levels and nonlinear relationships involved, there is a tendency to attempt to understand genes first and then translate this information to higher levels of organization. Unfortunately, this

458

8. THE RATIONALE AND CHALLENGES OF MOLECULAR MEDICINE

approach no longer works, as the emergent properties cannot be understood by classifying only the parts at a lower level. The knowledge gap between these levels is the real challenge (Heng, 2013c). Fifth, bio-heterogeneity among patients has traditionally been considered to be “noise,” which can be eliminated by analyzing large numbers of diverse samples. Sixth, many common and complex diseases are a result of somatic cell evolution where time is a key element. The issue of time is related to many other physical/ pathological and environmental factors and it is rather difficult to predict on an individual basis despite its power of prediction based on the population. Last (but not least), with a reductionist’s mindset and the influence of molecular biology, many are not comfortable at all if molecular targets are not identified. If it is not in the genes, then it must be the noncoding parts including noncoding RNAs or epigenetics. Something of a molecular nature must be at the root of this problem or so the thinking goes.

8.4 FUTURE DIRECTION The goal achieving precision genetics started from Mendel in 1866, when he introduced the method of calculating the pattern of genetic factors across generations. Sixty years later, when Morgan introduced his gene theory, he declared “. the theory of the gene, enable[s] us to handle problems of genetics on a strictly numerical basis, and allow[s] us to predict, with a great deal of precision, what will occur in any given situation” (Morgan, 1926). Based on the contributions of Ronald Fisher, Sewall Wright, and J. B. S. Haldane, population genetics has played a key role in the neo-Darwinian synthesis. With all of these statistical tools, bioscience has drastically increased its prediction power. So far, so good. Both genetics and evolutionary biology have achieved their solid frameworks, and the major work ahead is simply to fill in some details and to apply these exciting knowledge into practice, including in medicine, or so we were told. In particular, people predicted, based on the availability of large datasets (such as the Human Genome Project) and increased computational power, precision genetics would finally be within our reach. This was where the problem appeared. It is also the rationale behind why we search for new theories. It is thus not at all surprising that after over 150 years of triumph and failure, many geneticists are determined to achieve the dream of precision medicine, the ultimate implication of the precise genetics that Mendel had foreseen.

8.4 FUTURE DIRECTION

459

The fact is, however, that the gene represents only “parts inheritance,” genomic information is fuzzy, and the evolutionary emergent process is highly dynamic. This explains why it is so challenging to explain some key genomic/evolutionary phenomena and to precisely predict clinical outcomes based on genetic profiling. It comes full circle.

8.4.1 Facing Reality: The Increased Bio-Uncertainty As described in previous chapters, bio-uncertainty has increased drastically since the birth of genomics, and especially, following the start of the Human Genome Project. Initially, the immaturity of technical platforms was blamed. Gradually, it was realized that this newly revealed bio-uncertainty is real. This accumulative realization is based on the following examples of observations: (a) Various large-scale -omics have produced overwhelmingly heterogeneous data, both from experimental systems and from clinical samples; (b) Cancer evolutionary studies have questioned the certainty of the cancer genes, as well as the pattern of somatic evolution; (c) GWAS studies have delivered disappointing results despite large sample sizes, and the missing inheritability issue has become obvious; (d) There is clearly a complex correlation between different layers of molecular profiles (DNA sequencing, transcriptome, proteins); (e) Single-cell technology has revealed the importance of “noise,” and the study of NCCAs led to the concept of the fuzzy inheritance; (f) Many biological processes seem very wasteful (large amount of RNA transcripts do not get used for protein; portions of proteins are immediately degraded soon after their synthesis; many insulin molecules are destroyed soon after their synthesis); (g) DNA-binding specificity can be influenced by genomic topology (different cellular locations with different substrates), gene function specificity can be affected by karyotype, and pathway specificity can be influenced by the level of stress and cellular environments; (h) Mosaicism is common in human beings (as we all had a “genomic touch” of disease, in different degrees of genomic impact), and genome chaos can be observed from early developmental stages; (i) The century-long knowledge of genetics cannot apply to common and complex diseases, given our understanding of how macroevolution really works.

460

8. THE RATIONALE AND CHALLENGES OF MOLECULAR MEDICINE

Bio-uncertainty always exists. However, because of the use of many well-established linear model systems and selective data collection, heterogeneity-reflected uncertainty has been ignored, even since Mendel. This strategy has proven ineffective when a given conclusion from basic research needs to be translated into the clinic or when the data need to be validated by different models or different researchers. Here are a few well-discussed experiments and their conclusions: Over the past decade, before pursuing a particular line of research, scientists (including C.G.B.) in the haematology and oncology department at the biotechnology firm Amgen in Thousand Oaks, California, tried to confirm published findings related to that work. Fifty-three papers were deemed ’landmark’ studies

(Note: these articles are published by reputable labs in top scientific journals. Thus, the selection criterion is much higher than that of average papers.) Nevertheless, scientific findings were confirmed in only 6 (11%) cases. Even knowing the limitations of preclinical research, this was a shocking result. Begley and Ellis, 2012

This is an extremely shocking conclusion, but it is in line with other validation experiments. Scientists from Bayer, another pharmaceutical company, have examined 67 target validation projects in molecular medicine, including cancer care and cardiovascular medicine. The reproducibility is only 21% (Mullard, 2011)! These important analyses on data irreproducibility from academic research are timely. They forcefully revealed a huge problem in molecular medicine: the vast majority of the research publications are not reliable. This represents a deep insult for any serious scientist who works hard to search for the truth. But what are the scientific explanations behind this stunning observation? While no researchers come out to dispute these numbers, many have offered their explanations of what has gone wrong with current science. There are many obvious factors that bear the blame: dishonest individuals, insufficient duplication of experiments, the publication of only positive data (and the hiding of negative data), cell line contaminations, experimental condition variability, etc., and the list goes on. One personal estimation is that cases of dishonesty should be lower than approximately 10%. The majority of cases are directly caused by bio-uncertainty and the current conceptual and technical limitations of molecular medicine. Because of bio-complexity, any given process can display different molecular features, depending on its context. Most reports have captured one of the many potential statuses

8.4 FUTURE DIRECTION

461

during the dynamic process, although many researchers are guilty of cherry-picking. As illustrated in Chapter 3, the observation that changed our view about molecular reproducibility occurred during the “watching evolution in action” experiments, using an immortalization model. As each run of cellular evolution can be achieved by different genomes (with unique karyotypes) coupled with different molecular pathways, each run of an experiment can be linked to some specific genes, which is good enough for publishing papers. However, if the same experiment is repeated (a task which entails more than a year of time and effort), different genes will be identified. Not only can many molecular results not be duplicated during different runs of evolution; just focusing on specific molecular parts will result in one fundamentally missing the stochastic nature of cancer evolution. Surely, as is evidenced by the literature, different molecular mechanisms have often been reported by different groups using the same model system. This has even been the case even for the same research group, when reporting different molecular links on different occasions. The only conclusion is that these reported certainties are based on isolated models. When different contexts are compared, the uncertainty of a given molecular event is high and universal. Interestingly, the main reason that molecular reproducibility has become a big issue now is also linked to the stage of molecular medicine itself. This topic has been briefly discussed: In molecular biology, most researchers have focused their studies on isolated parts, such as: enzyme activities in vitro, a specific gene’s structure, the regulation of a defined pathway, a given cellular structure/response, and molecular characterization of causality in a linear experimental system. To study these individual structures and mechanisms, focusing on the average is justified and even effective. At this level of understanding, reductionist approaches might work best, as noise becomes less visible and is less important for the understanding of parts. . our bio-knowledge can be classified into three types: (1) parts characterization: description of the parts and study how parts can potentially work. To achieve the goal, many parts are analyzed under the condition of isolation. (2) parts assembly: study the conditions to put many parts together to characterize a specific pathway or biofunction, the assembly of ribosome, the stages of development, and interactions among parts. Current systems biology holds much promise to advance this type of understanding; and (3) how the system as a whole works under evolutionary selection in a laboratory setting or nature, and especially under various stress conditions where the physiological and pathological responses might differ drastically Heng, 2015 (with permission from World Scientific).

As increased attention becomes focused on the translational implication of molecular discovery, the third type of research will become dominant, as will the issue of reproducibility. That is the reason why the

462

8. THE RATIONALE AND CHALLENGES OF MOLECULAR MEDICINE

TABLE 8.3 A List of Some Contributing Factors That Often Give Molecular Researchers the Illusion of Bio-certainty Concept: Cellular evolution is stepwise and accumulative. Targeting key genes in the process should cure diseases. Common and complex diseases share the same genetic mechanism of single-gene diseases, with the exception that there are more genes involved. The key is quantitative analysis based on large sample size. Methods: Creating linear models by reducing or eliminate heterogeneity Focusing too much on nonrealistic hypotheses Cherry-picking (i.e., only reporting data that make sense under a given hypothesis) Using averaging-based profiling methods to wash off “noise” and increase the support of statistics Assuming that the same principles apply to both normal physiological and pathological conditions

genome-mediated evolutionary concept has become essential for future molecular medicine Table 8.3. To illustrate that outliers (and uncertainty) are very important for the evolution of diseases, phenomena that are generally ignored by current molecular methods, the question should be raised: how many times does one need to repeat an experiment to capture the dynamics of outliermediated drug resistance? Following the comparison of the short- and long-term effects of drug treatment, the maximal dosage-induced drug resistance experiment needed to be repeated 64e100 times (Heng, 2015; Horne et al. unpublished observation).

8.4.2 Big Data, Artificial Intelligence, and Biomarkers for Adaptive Biosystems Nowadays, “big data” and “artificial intelligence” are exciting topics in molecular medicine. The vast majority of all bio-medicine data have been generated from the past 2e3 years, and only an estimated 1%e2% has been analyzed. With such a massive amount of data pouring in, is molecular medicine prepared for more? Biomedical research has always been driven by data generation and analyses. Before the massive -omics data era, researchers spent most of their time generating data. Although it is tedious work, researchers thought they knew what the generated data meant, at least in the ballpark. But not anymore, not for traditional molecular researchers, many of whom have worked on one or a few key genes or cellular targets for their entire career.

8.4 FUTURE DIRECTION

463

In this section, genomic considerations regarding the big data approach in molecular medicine will be discussed. In particular, it will speculate questions regarding the relationship between data and theory and what types of genomic data should be collected in the big data era. 8.4.2.1 The Future of Big Data in Biological Systems The term “big data” does not simply refer to the rapid accumulated data itself but the evolving technological platform of computer science, which enables us to extract new insights from massive datasets. For example, one simple definition stated: “Big data represents the information assets characterized by such a high volume, velocity and variety to require specific technology and analytical methods for its transformation into value” (De Mauro et al., 2016). Although many definitions exist, most have emphasized the following key elements: new characters (volume, velocity, and variety), specific computational technology, and the end products: valuable information. For biologists, we should focus on the data generation and how to use the information revealed by big data to establish and validate new biological theories which can contribute to the practical benefits of humanity including molecular medicine. The increased power of artificial intelligence, evidenced by AlphaGo (a computer program), defeated the best professional player of the board game Go in 2015 (Silver et al., 2016). This event had people hoping artificial intelligence will also serve as a game changer in molecular medicine. For a machine to win a given game, it depends on the rules, algorithm (a set of unambiguous instructions that a mechanical computer can execute) (Domingos, 2015), and extensive training (playing with humans and more importantly with machines). To duplicate such success in molecular medicine, the correct rules, training, and a biological version of algorithms are crucial. These requirements bring forth a difficult dilemma: computers need a biological theory in the first place to capture and analyze the right types of data among so many and only the data can confirm the theory. To solve this issue and avoid circular arguments, we should collect data based on various theories and then allow the data to validate the theories. For example, the evolutionary concept is now integrated into artificial intelligence. However, if the evolutionary theory needs to be changed at the first place, it of course will have an impact on how artificial intelligence works. Nevertheless, biologists must play an important role in big data business when dealing with medical issues or when algorithms are based on biological principles, in particular, to provide the correct biological concept to build and train machines. The computer technologies, no matter how powerful they will become, are only helpers to biologists and physicians in treating patients and searching for the truth in biology.

464

8. THE RATIONALE AND CHALLENGES OF MOLECULAR MEDICINE

8.4.2.2 Big Data Versus Theories: The End of Theories or the Beginning of Better Theories Current genomics has collected parts data vs system data but the characterization of parts so far has failed to achieve the understanding of a genomic system. Not distinguishing “parts inheritance” and “system inheritance” has led to much confusion (Chapters 1e4). To solve these confusions, different strategies have been proposed: 1. Analyze more samples to eliminate the “noise” and to identify the pattern of gene mutations, which will ultimately validate current theories of human diseases. This strategy has been used in the current cancer genome project and has not been successful as the “noise” or heterogeneity is a key feature of the cancer system. The more samples analyzed, the more diverse mutations are discovered for the majority of cancer types (Chapter 4). The same approach was also unsuccessful for GWAS. 2. Use the big data approach to establish a correlation between genetic data and disease phenotypes. As promised by many computational experts, the big data approach will finally establish reliable biomarkers for most diseases. Although there are some great success stories about the market prediction based on big and messy data, predicting a biological system within an evolutionary context will surely be more challenging. For example, in the physical world/ business world, predictions can be made based on the general trend of data. For predicting evolutionary events however, the outliers rather than the average data make the call, and macroevolution is often based on “accidents” or a “perfect storm” of emergence. Moreover, what if new system-based data, which we do not even know how to collect, are actually the key? In this sense, we do not even have the correct data for molecular medicine in the first place. Obviously, a better genomic and evolutionary theory is essential in guiding data generation and analyses. A correct cancer theory will also convince cancer researchers to accept the idea that correlation between genomic profiles and phenotypes is good enough in cancer diagnosis (while focusing on individual molecular mechanisms is not useful when there are so many, especially for a complex adaptive system). The evolutionary mechanism of cancer predicts that knowing “what happened” is more important than knowing “why it happened,” as how one pathway works under a defined condition has little to do with predicting cancer in reality. Last, the strength of big data is identifying associations without understanding the meaning. Correlation is good enough for discovering fact, and new theory is then needed to synthesize these facts.

8.4 FUTURE DIRECTION

465

3. Our recommended approach is to use a correct theory to guide the data collection. Specifically, collecting the genomic data based on multiple levels of genetic organizations (including the ignored karyotype level) and applying the results of big data in molecular medicine as well as validating and improving the genome theory. As bio-scientists, our task is to establish a correct theory of genomics and evolution and use these theories to guide data generation and collection and use the results of big data to develop, falsify, or improve new bio-theories. This approach can also reconcile the opposite viewpoints regarding the value of the genomic theory in the big data era. In 2008, Chris Anderson, then the editor in chief of Wired magazine, published a provocative article “The end of theory: The data deluge makes the scientific methods obsolete” (Anderson, 2008). This piece has generated a heated debate regarding the importance of a theory in science and the rationale to study correlation or causation. Of course, many scientists disagree with Anderson’s conclusion even though they consider the importance of this debate. Most of Anderson’s observations/evidence are both true and highly significant, including the great limitation of the Mendelian theory in genomics and the speciation concept in evolution. However, these limitations only point out that the current paradigm of genetics and evolution might no longer work and science desperately needs new theories rather than end theories altogether. In fact, we have listed much more evidence than Anderson did, but with very different conclusions. In our view, quite the opposite, a way out is to search for a new theory with the help of big data. Once again, it demonstrates the importance of a scientific paradigm on individual observers, as the same facts can lead to drastically different conclusions. For Thomas Kuhn, perhaps, the observations Anderson mentioned would have represented a clear sign for a paradigm shift; for us, it is the time to search for new genome-based genomics and evolutionary theories. Either way, it should be the beginning of new theories with the help of new technologies. 8.4.2.3 How to Collect the Necessary Data to Create a New Generation of Biomarkers? The triumph of precision medicine is largely contingent on the success of identifying large numbers of reliable biomarkers. Despite that hundreds and thousands of biomarkers have been described each year, only few will reach clinic. Moreover, some biomarkers suffer with a lack of reproducibility. To change this situation, the National Biomarkers Development Alliance has organized two think tank meetings in 2017 and 2018, aimed

466

8. THE RATIONALE AND CHALLENGES OF MOLECULAR MEDICINE

to “connect the dots” among topics of “big data, artificial intelligence, and biomarkers.” There are shared concerns regarding biomarkers, including the quality of data and database, the data standardization, integration and sharing, industry’s participation, government’s role and patient’s rights, to name a few. As for how to improve the strategy for developing biomarkers, the majority seem to favor “the more the better” approach. The suggested amount of -omics platforms that should be used keeps growing during discussions, as if biologists’ job is to collect all information possible, and big data analyses will consequently deliver the good biomarkers. There is limited interest in prioritizing current -omics strategies. Overall, it is a bit surprising to see that only a few presentations have discussed the conceptual limitations of the current genomic approach, despite the importance of the information theory and the fundamental laws of biology (to create algorithms provides a mechanistic approach to the discovery of biomarkers) being emphasized by some speakers. We have insisted that biology does need a new framework to collect data; and current genomics has only collected parts data rather than the systems data that we need. As genome organization (system) is more important than genes (parts) in cellular evolution and these two levels of genomic organizations follow different “laws,” different biomarkers are most definitely needed to monitor these different processes. Unfortunately, for systems with multiple levels, information from lower levels is easier to obtain (such as DNA sequence) but has little to do with system control. To further illustrate our viewpoints, the following questions were considered: 1. How can we develop biomarkers to predict emergent properties during the cellular evolutionary process (in which many genetic targets are constantly changing and there is no linear correlation between genetic markers and phenotypes)? 2. Which genetic/genomic or cellular entity (gene, chromosome, epigene, expression profile, overall genome instability, metabolic profile, or a combination) should be our priority when developing biomarkers? Which types of data should we collect and what purpose will they serve? Now efforts are being made to study genotype-tissue expression (GTEx Consortium 2015), what else needs to be done? 3. Knowing that heterogeneity is the key driver for many diseases, how should we deal with “noise” in developing biomarkers? What if the genetic code is not precise but fuzzy in the first place? How should we predict the emergence of outliers (traditional cancer research has ignored the outliers) (Abdallah et al., 2013; Heng, 2015)

8.4 FUTURE DIRECTION

467

4. Why is correlation good enough in cancer research (while focusing on individual molecular mechanisms is not useful when there are so many, especially for a complex adaptive system)?: the evolutionary mechanism of cancer predicts that knowing “what” is more important than knowing “why” (as how one pathway works under a defined condition has little to do with predicting cancer in reality). To address these issues, it was proposed that the field of molecular medicine should search for better frameworks and technical platforms by rethinking current genomics and evolution theories. The following suggestions represent some examples of how to achieve this goal. 1. When developing new biomarkers, focus on monitoring the system behavior, not specific pathways (e.g., use NCCAs to measure genome instability for different diseases, based on heterogeneity and complexity, outlier’s profile). 2. Record longitudinal datasets to monitor the dynamic process (e.g., when monitoring the disease progress as well as drug treatment response) 3. Pay attention to phases of evolution (in the macrocellular phase, genome data are more important, whereas in microcellular phase, gene mutation data might be more useful). Different phases need different biomarkers, and the phase transition is of clinical importance. 4. Collect quantitative data (with all positive and negative results, benefits and trade-offs, long and short term). 5. Do not treat all datasets equally (it is important to identify the conflicting data sets. It is known that in different cases, predictions can be improved or reduced by combining data) and phenotype data should be weighed more heavily. It is necessary for the research community to compare different types of biomarkers in identical conditions to identify the best stage specific biomarkers. 6. Outlier data are more important than average data during crisis (drug resistance data need to be repeated over 60e100 times based on our in vitro experiments). Methods are needed to predict the emergent properties based on outliers. 7. While increasing the quality of data is essential (to avoid the situation of garbage in and garbage out), researchers also need to respect the fuzziness of real data when dealing with messy big data, as bio-uncertainty and nonspecificity is a key feature for disease conditions. The commonly used statistical platforms need to be modified to reflect such needs.

468

8. THE RATIONALE AND CHALLENGES OF MOLECULAR MEDICINE

8.4.2.4 Big Data and Phenotypes If the big data approach is effective in identifying associations, why not take advantage of this amazing technology for developing biomarkers? If the genomic fuzziness is so high, why not focus on the phenotype rather than gene profiles, especially with the new finding that normal tissues also host a large amount of gene mutations including cancer driver mutations (Martincorena et al., 2018)? There are two new trends representing such a shift in focusing more on phenotypes rather than molecular parts to search for biomarkers: One is focusing more on cellular phenotypes rather than genotypes to monitor disease progression and treatment responses. For example, many cellular features (cell weight, nuclear morphology, cellular differentiation status, and growth and survival rates) can result from various genetic and environmental conditions. They can all be treated as end products of a complex interactive relationship without knowing which molecular pathway is currently involved (like a “black box”). Then, combined with the big data approach, identify useful biomarkers based on cellular phenotypes. Encouragingly, US Food and Drug Administration considers “black box” algorithms of complex diseases an acceptable strategy for identifying biomarkers (if the inputs and outputs are robust). Interestingly, when frequencies of various types of NCCAs are used as a biomarker, it is not based on the specific genomic change, but the stability of the cellular population, another phenotype. In fact, these new systems related phenotypes, including the frequencies of the outliers, the complexity of the karyotypes, the evolutionary phases of transition, new karyotype emergence, all of which have also been referred to as system behavior, should be used for establishing new biomarkers. Similarly, the dynamic level of transcriptome and epigenetic activities rather than specific gene/epigene function can serve as a better biomarker to monitor the disease process (Stevens et al., 2011b, 2013a-b, 2014; Heng, 2015). Recently, using aneuploidy as an example, such points have been explained through the lens of system inheritance, fuzzy inheritance, and emergence of new genome systems (Ye et al., 2018b). As nonclonal aneuploidy represents a phenotype which can be used to unify diverse molecular mechanisms, its clinical predictability is much better than individual gene mutations. Other trends include using the large datasets of normal individuals and patients’ health records to study the relationship between genetic profiles and diseases and connecting different diseases to one another, in addition to looking at the interactions among different treatments, the baseline and ranges of normal individuals and patients, and their diseases along with health and survival consequences. Well-known examples

8.4 FUTURE DIRECTION

469

include the UK Biobank and recent US million veteran program (MVP). The UK biobank involves 500,000 participants with links to a wide range of electronic health records (cancer, death, hospital episodes, general practice). In particular, 100,000 participants have worn a 24-hour activity monitor for a week, and 20,000 have undertaken repeat measures; 100,000 selected participants have done scan image (brain, heart, abdomen, bones, and carotid artery); all participants have done blood biochemistry and genotyping and many with exome sequence (UK Biobank). The goal of MVP is to partner with veterans receiving their care in the VA Healthcare System to study how genes affect health. Other types of information will still also be available. Although very promising (Cox, 2017), these types of programs also have their limitations. First, genomic information is mainly based on the gene’s contribution. It would be more valuable if the chromosomal features were also included. For example, it is now known that elevated stochastic chromosomal aberrations can be linked to GWI (Liu et al., 2018). The continuous ignorance toward chromosomal aberration relevance is puzzling. Second, it is very important to retrieve population data based on large sample sizes; there is still a gap when applying such information to individual patients, especially when the establishment of significant linkages between disease and factors requires hundreds and thousands of individuals. In a sense, it is still difficult to use the population pattern and general trend to precisely diagnose individual patients. Efforts regarding this aspect clearly need improvement. The idea of focusing on phenotypes has generated exciting results (Bastarache et al., 2018). Similarly, different medical records have been applied to study human diseases. For example, using insurance claims for over one-third of the entire US population to create a subset of 128,989 families (481,657 unique individuals), Wang et al. have used these data to “(i) estimate the heritability and familial environmental patterns of 149 diseases and (ii) infer the genetic and environmental correlations for disease pairs from a set of 29 complex diseases.” They found that “migraine, typically classified as a disease of the central nervous system, appeared to be most genetically similar to irritable bowel syndrome and most environmentally similar to cystitis and urethritis, all of which are inflammatory diseases” (Wang et al., 2017). Many similar types of research are actively ongoing from nearly all major medical centers and the topic of big data mining on medical records will soon become a big deal in molecular medicine.

8.4.3 Education and the Future of Biomedical Science Before the end of this book, the issue of how to educate and influence future biomedical scientists deserves a brief discussion. The importance of this subject is obvious: the new generation of scientists will ultimately

470

8. THE RATIONALE AND CHALLENGES OF MOLECULAR MEDICINE

decide what the future of molecular medicine will look like and which theories will dominate. However, today’s education systems are not likely up to the task. The current PhD students’ training has been more focused on how to keep up with the excellence of incremental scientific progress and how to equip them with popular technical skills. This has been particularly true since genomics became a big science, where most of the trainees have become the skillful work force. Such issues have triggered serious rethinking in treating scientists as a workforce and its damage to the future of science (Lazebnik, 2015). In recent years, the bioresearch landscape has drastically changed in front of our eyes. Traditionally, there was a higher degree of heterogeneous research environments. Scientists were trained and influenced by different schools of thought (reflected by the academic tree back to generations), excelled in different technical skills or experimental systems, and were interested in diverse research topics. They often became experts in their own field by accumulating decades long of experience. The greatest scientists who made seminal contributions all went through decades of dedication by continuous focus on research topics that most interested them and seized their curiosity rather than simply shifting their research direction for the sake of popularity or funding. This aspect of research has changed since the arrival of molecular genetics and the biotechnology industry and such changes have been greatly accelerated by big science projects such as the Human Genome Project. Most laboratories now use similar popular methods and this causes a scientist’s individuality to gradually disappear. As large-scale technologies have become a big driving force in the research community and research funding pays for access to these technologies, success in current research environments requires high levels of funding and a large working force, making smart research ideas and unique skills less valuable and less desired. With individual creative thinking rapidly losing its influence on science, as many big projects are goal-oriented and individuals only play their own part, scientific individuality is being drastically reduced for many, while a small portion of “science managers” continue gaining more credit. The goal of many researchers is to successfully obtain funding and publications, rather than search for the truth, which should ultimately be the main goal of conducting any kind of research in the first place. It seems as though they have little time for conducting truly meaningful research, which often requires long-term thinking and the courage to not simply “go with the flow.” Becoming an exceptional scientist is difficult. It requires decades long of knowledge accumulation and long hours of work and dedication (including holidays), all while being in a competitive environment and with moderate financial rewards. The driving force for most scientists is the curiosity toward nature, a passion to search for the truth, intellectual

8.4 FUTURE DIRECTION

471

satisfaction, and the potential contribution to humanity. The negative environments of current research communities, which in turn have a huge negative impact on researchers, are too significant to ignore. Such situations have a great impact on scientific morality and wellbeing. This phenomenon has been discussed often among scientists but less in public. One rare publication appeared near a decade ago in a cancer research journal which asked a profound question: where is the passion for cancer research? Scientific research was once considered a pinnacle profession where intellectual rigor was paired with a passion for novel discovery. Today, despite better equipment, more funding and online access to a growing reservoir of data, researchers in some of the largest cancer research centers in the country appear to be spending less time in the lab and, perhaps, less time worrying about how their work impacts people with cancer. Kern, 2010

This explains (partially) why the future for graduate students in the United States is troubling, according to some: About 60% of graduate students said that they felt overwhelmed, exhausted, hopeless, sad, or depressed nearly all the time. One in 10 said they had contemplated suicide in the previous year. Arnold, 2014

A key approach to changing this situation is to bring back the individuality and fun of science. Worthy scientists should be independent capable thinkers who enjoy challenging the status quo and embracing enhanced frameworks while working toward a lifelong goal to improve. To keep genomics an interesting field, there needs to be a balance between key fundamental aspects: between “boring” data collection and exciting data analyses as well as theoretical synthesis; between tedious daily life and continuously deep thinking on some key topic of interest, without constantly chasing the fashion of research topics; between one’s own technical expertise and newly developed cuttingedge methods; and between the technical progress of one’s own projects and the conceptual progress of the entire field. The following educational strategies below should be very useful in achieving these balances. 8.4.3.1 Knowledge Structure To build a solid scientific foundation for future genomic scientists, the appreciation of the following information/concepts is essential, and many of these subjects should become mandatory courses. 1. Complexity science: Unlike traditional molecular biology where “cause and effect” or linear approaches are the key ways of thinking, complexity science focuses more on nonlinear dynamics, which is unpredictable and multidimensional within the context of emergence. In fact, Water

472

8. THE RATIONALE AND CHALLENGES OF MOLECULAR MEDICINE

Elsasser has called for a different kind of biology in the 1980s in which molecular causal chains are no longer the main focus of study (Elsasser, 1981, 1984). With the increasing discoveries of nonspecificity and fuzzy inheritance in genomics, appreciation of complexity science will finally come into play. In medical schools, courses on “health care as a complex adaptive system” are necessary (Sturmberg, 2013). The adaptive systems way of thinking in health will have a big impact on both health care delivery and how to sustain current health care infrastructures (Sturmberg et al., 2017, 2019). 2. Evolutionary medicine: Evolutionary medicine is a newly emerged field that applies the principles of evolutionary biology to understand health and disease and to use this knowledge to help diseases’ prevention, diagnosis, and treatment. Since its introduction in the early 1990s, evolutionary medicine has impacted various human health issues, such as infectious diseases, immune function, and aging (Nesse et al., 2010). Particularly in cancer research, evolutionary analyses have become quite popular (Heng, 2007a, 2015, 2017a). There has been a recent call to classify evolutionary biology as a basic science in medicine (Grunspan et al., 2018). Because two-phased cancer evolution exists (see Chapter 3), caution is needed when the term “Darwinian Medicine” is used. 3. The philosophy of science: The appreciation of both the history and philosophy of science is important to understand the key limitations of current genomics, especially when the analysis is done through the eyes of Thomas Kuhn and Karl Popper. The textbook description of the preconditions for the paradigm shift will certainty encourage scientists to push such shift in genomics and evolution, also through decisive experiments to scrutinize gene theory and the mechanism of natural selection. In addition, many keystone experiments/ assumptions and their predictions must undergo serious scrutinization with both past and current data. 4. The limitations of mathematics and physics in biology: With increasing involvement of mathematics and physics in biomedical science, especially with the integration of various computational technologies and bioinformatic platforms, an illusion that mathematics/physics/computational analyses will solve the mystery of biology once and for all will likely develop. It is thus extremely important to emphasize the difference between nonlife and life systems. As discussed in a previous chapter, the “laws” will likely differ as well. Knowing the limitations of using principles derived from nonlife systems to study biology is of great

8.4 FUTURE DIRECTION

473

importance, as the degree of uncertainty is quite high in dynamic biosystems, and bio-evolution is historically context-dependent. Different from mathematics and physics, biological principles/ theories are often with many exceptions. When key predictions of a theory fail for the majority of cases, it is time to get a better theory. Furthermore, most biological issues have their close practical implications (for medicine, agriculture, environment, ecology, and more). Respecting reality is one key for designing experiments, if scientists would like to apply their laboratory findings to the real world. 5. Critical thinking: A course on critical thinking is urgently needed in molecular medicine and genomics. Comparing and contrasting the reductionist and holistic approaches in medical science along with the gene and genome theory should be the leading discussions in this course, among many other topics. The limitations of using molecular mechanisms to understand diseases should also be discussed, including the current statistical platforms, as both “parts characterization” and “average profiling” are limiting when studying cellular evolution, one important basis for an individual’s health. 8.4.3.2 Scientific Culture and Professionalism Scientific professionalism can be judged by attitude, character, behavior, and standards of research and communication. One important attitude toward science is considering any scientific theory as a dynamic one and with its limitation. When it does not fit the scientific reality, no matter how favorable to the scientific community, it is about time to search for a better one. Furthermore, today’s theory, by definition and historical experience, likely will be wrong or significantly limited tomorrow. This is the biggest rationale to search for new framework, especially when a paradigm shift seems to be around the corner. Individuals can make differences by changing history, as today’s experiments are tomorrow’s history. As the field of biology has not experienced a true paradigm shift, the effort to search for the new conceptual framework for future genomics and evolution is highly significant. Another important scientific culture is debate. By referring to historical publications, it appears that debate was much more common in the past than now. The general public was also much more highly interested in debating evolution. Both the established side (the constrained force for current knowledge) and the challenging side (the dynamic force for change) are essential for scientific progress. Serious debate is the key to identifying paradoxes and

474

8. THE RATIONALE AND CHALLENGES OF MOLECULAR MEDICINE

fundamentally defending and advancing established theories and searching for new theories when required. Debate can also recruit scientists from different disciplines and more importantly generate exciting opportunities for a new generation of scientists. Leading scientists should set examples for debate, especially when there is much confusion as we have previously discussed. One suggestion is for the field of molecular medicine to systematically debate key paradoxes in the field, with the collaboration of leading scientific journals. The entire research community, including students, should be encouraged to get involved in such activities. In the field of cancer research, for example, there is increasing debate that challenges the gene theory of cancer (Heng, 2015). To create and maintain a healthy research landscape, it is important for individual scientists to adopt a high standard of scientific moral principle and integrity. It is scientists’ responsibility to question the mainstream concepts/platforms when they contradict their key predictions. Persistently and forcefully challenging these overexaggerated promises in molecular medicine is not only essential for science’s regulation but can also help the public establish more realistic expectations from science and medicine. For example, acknowledging the limitations of the current treatment methods of cancer will help patients make more beneficial decisions about their treatment options and encourage the general population to put more effort in making healthier lifestyle choices. This may reduce diseases prevalence in the first place. This is important because medicine cannot just fix most problems, despite its ability to sequence patients’ DNA. It is equally important to support new academic societies and organizations that aim to promote new concepts/approaches. For example, there are new societies on evolution and complexity in health including the International Society for Evolution, Medicine, and Public Health; International Society for Evolution, Ecology and Cancer; and the International Society for Systems and Complexity Sciences for Health. People are extremely motivated to push these concepts into main stream medical research, and their contributions will soon become obvious. 8.4.3.3 Policy Matters Since the end of World War II, the US government has gradually become the major sponsor for biomedical research. For decades, the majority of funding for health studies in universities comes from NIH. In recent years, the sustainability of US biomedical research has become a serious issue, partially because of its own rapid expansion and the reduced overall support from the federal budget. This issue has promoted many leaders (including former president of National

8.4 FUTURE DIRECTION

475

Academy of Science, and former director of NIH) to call for quick action in rescuing US biomedical research. . the remarkable outpouring of innovative research from American laboratoriesdhigh-throughput DNA sequencing, sophisticated imaging, structural biology, designer chemistry, and computational biologydhas led to impressive advances in medicine and fueled a vibrant pharmaceutical and biotechnology sector. In the context of such progress, it is remarkable that even the most successful scientists and most promising trainees are increasingly pessimistic about the future of their chosen career. Based on extensive observations and discussions, we believe that these concerns are justified and that the biomedical research enterprise in the United States is on an unsustainable path. Alberts et al., 2014

Discussions regarding the uncovering of contributing factors and the search for a solution are rising. Many of these discussions are related to funding, how to reduce competitiveness in the environment, and regulation of workforce size. Fewer discussions shed light to the balance between the increasing big science approach and decreasing individuality of scientists. Even within optimal fanatical support from the government (from year 1965e2015), scientific progress was not optimal when compared with the scientific achievements of the previous 50 years (1915e1965), when much less money was spent with a much smaller scientific community. Overall, “Science of the past 50 years seems to be more defined by big projects than by big ideas.”; “the advances are mostly incremental, and largely focused on newer and faster ways to gather and store information, communicate, or be entertained”; and “We are awash in small discoveries, most of which are essentially detections of “statistically significant” patterns in big data. Usually, there is no unifying model or theory that generates predictions, testable or not. That would take too much time and thought.” (Geman and Geman, 2016) Despite the impressive development of molecular manipulation technologies and various large-scale -omics projects, there has been limited theoretical progress in molecular biology when compared to the past 50 years where many important concepts and theories have been established including the discovery of DNA, the formation of modern synthesis, the introduction of the gene theory with model’s principles, establishment of the double-helix model, illustration of the genetic code, the geneeproteinephenotype relationship, and the central dogma of biology. In comparison, the period of 1965e2015 represents the “normal science” phase of molecular biology, according to Kuhn’s definition, characterized by the accumulation of molecular details under the established gene theory without challenging it. Currently, the systematic characterization of genes’ structure, function, and its alterations

476

8. THE RATIONALE AND CHALLENGES OF MOLECULAR MEDICINE

dominates. Based on the accumulated anomalies described throughout this book, the period of 2015e65 should be an exciting time to embrace new theories/technologies and perhaps the long awaited paradigm shift in biology. Given the high proportion of financial support from the federal government, the federal science policy directly impacts the behavior of individual scientists and even the culture of the scientific community. Because molecular medicine is intimately related to health issues of the public, many ethical issues are of ultimate importance. Scientists need to pay more attention to the law and ethical guidelines and to establish these regulations at the federal level; this is essential when many biotechnologies are extremely powerful and could thus cause huge damages to humanity when misused. The following policies will play an important role in maintaining a healthy research ecosystem for biomedical research: 1. Establish regular national think tanks to critically evaluate major scientific fields. Think tanks include experts outside of the examined field, experts representing different schools of thought, and scholars of philosophy of science. The main task of these think tanks is to critically examine key theories, conceptual frameworks, and key technical platforms; evaluate the phases of science (do they belong to the normal science phase where data collection is key? or does it belong to the dynamic changing phase where introducing new concepts is key?); and identify the key predictions and paradoxes in the field. These analyses would help determine priorities for future research. For example, based on the results of the cancer genome project, there are a few hundred driver gene mutations. How many of them should be systematically studied? Are those top 50 or 100 based on the occurring frequency in the patient population? How many grants should be given to study the same gene mutation? How about the different technical platforms? Should we prioritize them and reflect this in funding? 2. Promote different schools of thought on major theoretical issues. The current dominating theory needs to constantly be challenged by alternative ones, especially when key paradoxes exist. This kind of challenge is crucial in improving the mainstream theory as well. To provide opportunities for different competitive concepts to grow is the best policy in maintaining healthy scientific ecosystems. Thus, investment in alternative concepts is very valuable and much more important than simply repeating experiments to confirm mainstream concepts. Scientific evolution will allow only the final

8.4 FUTURE DIRECTION

477

winners to emerge. The current grant review process has completely ignored the importance of maintaining heterogeneous research ecosystems. It requires a policy change to change the behavior of reviewers. Vannevar Bush, the first science adviser to the US president and the visionary who was behind the establishment of the national science foundation, has insightfully advised that: At their best they [universities] provide the scientific worker with a strong sense of solidarity and security, as well as a substantial degree of personal intellectual freedom. All of these factors are of great importance in the development of new knowledge, since much of new knowledge is certain to arouse opposition because of its tendency to challenge current beliefs or practice. Bush, 1945

Clearly, as the majority of bioresearch funds in Universities are from NIH, it is best for NIH to provide conditions for searching and discovering new knowledge. 3. The time is ripe to have more faculty positions for bio-theorists. With the rapid accumulation of big data, theoretical synthesis is becoming extremely important. Unlike bioinformatics and/or computational biologists, theorists are more interested in establishing, validating, or falsifying different bio-theories. Funding mechanisms should also be created for them. In the era of molecular biology, scientists are paying more attention to data generation than theoretical analysis. In the gene cloning era, the key was “who clones what gene first”, and there was less theoretical involvement for cloning individual genes. It is not surprising knowing how “normal science” works. However, it is puzzling that, unlike mathematicians and physicists, bioscientists do not get well-deserved credit for insightful analysis and reanalysis of published data by others. It is as if one must do his or her own experiments, then have the right to analyze, and get credit. There are some exceptions of course. In the field of evolutionary biology, scientists can often analyze others’ data to support their theories. This situation in molecular genomics is now drastically changing, led by waves of analyses using bioinformatic approaches. More bio-theorists will soon find the best research ecosystems, when there is way more data than most scientists can handle. On an important note, some of the most important bio-discoveries/theories are not based on their own experiments, but on information synthesis: from Darwin’s natural selection theory to modern synthesis to the model of the DNA double helix. Surly more is to follow.

478

8. THE RATIONALE AND CHALLENGES OF MOLECULAR MEDICINE

4. Allow individual scientists to access big science infrastructures. Big science infrastructures, including national bio-information sharing systems and national core facilities, are essential for both quality control and saving money. For example, nowadays, a large portion of health researchers are using animal models. As discussed earlier, by eliminating the genetic and environmental heterogeneity, these valuable mouse models become less useful in representing clinical conditions, as without heterogeneity and evolutionary selection, these models no longer relate to clinic reality (Heng, 2015). To address this issue, large animal centers are a good option. To achieve both genetic and environmental heterogeneity, animal model experiments performed by individual investigators become unrealistic. To solve this issue, major animal model centers are needed for systematic evaluation. On the surface, it sounds very expensive to have this type of comprehensive animal model center. However, when compared to the money spent on small scale experiments and their lack of medical applicability, such a collective effort would be justified. In addition, an international database is needed to deposit all animal model data including all negative data. Strong statistical and simulation teams are also required by these centers to dynamically monitor and apply the data generated from animal models. Heng, 2015 (with permission from World Scientific).

Artificial intelligenceebased data analyses centers should also be established and open access should be given to individual scientists, rather than using extra funding to establish small and less sufficient labs. Together, they will provide huge financial benefits. 5. Push the frontier and enforce regulation. Molecular medicine now has many exciting frontiers: stem cellemediated tissue/organ regeneration, CRISPR-Cas9ebased gene editing and therapy, whole genome sequencing for individual patients, big dataebased diagnosis, specific molecular targeting, in vitro fertilization, etc. While each frontier represents an exciting potential to enormously benefit patients, it also represents a high risk when things go wrong or when these powerful tools are used inappropriately or by the wrong people. Therefore, policies regarding how to push these frontiers and particularly how to carefully regulate and control these technologies are of great importance. Although science’s self-regulation works in most cases, clear federal laws and ethical guidelines are needed to enforce all technology, especially for combined use. Otherwise, news stories such as “The scientific world erupted with outrage and concern after a scientist claimed he used gene-editing to alter the DNA of a pair of twins” (Fox, 2018) will start escalating. In addition, both federal laws

8.4 FUTURE DIRECTION

479

and institutional regulations should be required to punish practices of fraud in science to ensure its integrity. The current status of molecular medicine reflects a big gap between our scientific knowledge and our technical capabilities. Despite all of our capabilities, we still do not know why we get cancer and how to cure it (if only based on molecular knowledge). The rationale behind all our efforts is for the genome-based somatic evolutionary theory to fill this gap. Even with the ability to land on Mars and with all the exciting predictions of artificial intelligence, we have no idea how our own intelligence works, and more importantly, we still understand very little about ourselves, where we come from, and what the mechanisms are that make us human and lead us to diseases? The new genomics and evolutionary theories discussed in this book will certainly provide the necessary platform to directly debate and then address these questions. Our ultimate job is to study and answer these questions to the best of our ability, using available information and technology, and to deliver this knowledge to the public. If we fail to do this, we have failed our duty to give back to science and humanity.

Epilogue (or Why We Did What We Did) This book is 36 years in the making. My journey to search for the genome theory started in 1982 at Sichuan University when I first observed some bizarre mitotic figures from treated frog lymphocytes. As a first year grad student, I never anticipated that these puzzling observations would lead to such a long journey of discovery, which has been exciting, yet difficult, and ultimately rewarding. Nearly four decades later, the research landscape has fundamentally changed. From the cloning and characterization of individual genes to the DNA sequencing of entire given genomes, data generation on nearindustrial scales has not only brought increased surprises and confusion but also questioned gene-centric concepts that initially promoted various -omics technologies. During this time, many scientists jumped on many such bandwagons, switching out gene cloning for promoter analysis, jumping from studying mutation spectrums to characterizing layers of modifiers, shifting from genomics to proteomics and then metabolomics, etc. We never chased what was fashionable at the time; we never sought shortcuts. Though staying true to our beliefs despite popular swings was difficult, we thought and rethought and rethought. Now, with the genome-based theory of evolution, we have come full circle. Finally, genome-based 4-D genomics is on the horizon. With this epilogue, I hope to recount a few personal experiences, including rationales and frequently encountered rebuttals for our theories, as well as the way we practice science. In short, this is why we did what we did. Isaac Newton famously said, “If I have seen further, it is only by standing on the shoulders of giants.” At first glance, Newton seems to take a stance of deference. I credit my predecessors, he seems to say, with my new vantage point. (The beauty of classical Newtonian physics and mathematics, no matter how novel, was not without stepping stones laid down by Pythagoras, Kepler, and Galileo.) However, the words imply something about the nature of knowledge in science: it is not enough to simply wallow in the shadows of giants. It is not enough to play variations of the same theory and hope a framework will spin itself. New thinking comes only after emerging from behind these figures, out of their shadows, to stand on their shoulders and see past them. By climbing out of the shadows, by being honest, and by daring to question the giants, more of that great ocean of truth is revealed.

481

482

EPILOGUE (OR WHY WE DID WHAT WE DID)

Simply put, to get out the shadows, one must use a better paradigm, as different frameworks lead to different scientific explanations and realizations. Thomas Kuhn understood this well. He argued that scientists in different historical periods operated in psychologically different worlds. Even when looking at the same objects, they saw different things. I would suggest such psychological differences exist even among scientists within the same periods, as long as they believe different paradigms. There is a profound difference between those in the shadows of giants and those on their shoulders. In other words, the paradigm shapes the eye of the beholder. I initially learned this from my high school math teacher. “Say a sleepy boy was observed with a book in his hand,” he told us. “The reasons for that could be drastically different. If your theory is that he is a good student, you could surmise that he is an incredibly hard-working boy, still trying to read, even in sleep. In contrast, if your theory is that he is a bad student, you could propose that the student is so lazy that he falls asleep as soon as he touches a book.” Knowing that paradigm searching is the best way to advance science is one thing. Considering to do so ourselves is another thing altogether. Where does the inspiration come from? The answer lies in the history of science. Science is advanced by the nonstop replacement of frameworks. No matter how perfect today’s science is, key parts will likely be considered wrong in the future. Therefore, we not only dare to, but also have to, challenge the giants we respect the most. It is a scientist’s duty to do so, no matter its unpopularity, difficulty, or cost. It is an essential condition to significantly advance science. This is not without reason, of course. Red flags began to emerge when predictions set forth by the current theory failed to fit the facts. Confusing research and critical paradoxes popped up, and many scientists were anxious to sweep them under the rug. But for us, these paradoxes only signaled opportunities. We echoed the words of Niels Bohr: “How wonderful that we have met with a paradox. Now we have some hope of making progress.” By confronting paradoxes rather than avoiding them, we can make great strides, especially when key paradoxes related to the first principle can be identified. In addition, increased key paradoxes or anomalies allow scientists distinguish which phase of science they are in: the normal or crisis phase of a given paradigm. Increased paradoxes indicate the crisis phase, where efforts should focus on defining the key limitations and searching for/embracing a better paradigm. Luckily for us, there are currently many ignored major paradoxes. What is really missing is a new framework which can unify the field. Many examples have been listed in this book, and additional examples can be found from my previous book, Debating Cancer: The Paradox in Cancer Research. So, we were on our way to search for this new framework. Karl Popper noted the importance of scientists using the correct conceptual web with

EPILOGUE (OR WHY WE DID WHAT WE DID)

483

which to “capture” facts. The importance of theoretical contexts has had a profound influence on me, hence this theory and book. Because the genome can organize individual genes, and the genome represents the highest level of genomic organization, the new framework needs to be genome-based. Next, assuming that we have a good theoretical web, what types of facts should we capture? In genomics, there are many layers of genetic information. On which level should we focus? Does the choice of a level matter? This question reminds me of the appreciation of beauty, whether the beauty is found in a priceless gemstone or oil painting. An optimal distance is a key to achieving maximum appreciation of a beautiful vase. At too great a distance, one cannot distinguish the vase from other vessels. From too close up, the amplified paint skin might reveal unattractive defects. But at the right distance, beauty can be recognized. Such common sense also applies to genomic research. To appreciate and understand biology, how much should we zoom in? Should we focus on genes, the genome, or a level below the gene or above the genome? The key is to focus on the correct system level, capture data on this system, and develop meaningful knowledge. Clearly, our answer is at the chromosome and genome level. This thinking led to another interesting question: are the various levels of a system governed by the same laws? In other words, does the understanding of each different level require unique concepts and syntheses? Is the knowledge generated from the gene level essential to understanding organization at the genome level? Current understanding of the many levels of a system has been influenced by the stepwise accumulation of knowledge; it has been compared to peeling an onion, where each layer is similar to the layer above or below it. It was therefore reasoned that collecting and accumulating enough information at a lower level (such as the gene level) would allow us to gain an understanding of a higher level (such as the genome). In the past 30 years, I have focused on genome alterations and their biological implications. By acknowledging the fundamental differences between different levels of genetic organization and developing an understanding that genes code for the parts and tools while the genome codes for the network architecture, I realized that there is a knowledge gap between the genes and the genome. Merely accumulating information from the lower gene level will not provide an understanding of the emergent higher genome level. The nice metaphor of peeling back an onion, though beautiful, is extremely limited when trying to understand biology, as it only represents an unusual exceptiondthere are billions of species nowhere near the onion! That is the reason we have to search for the new principles of the genome theory, including the importance of genomic topology, system emergence, and new genomic coding systems.

484

EPILOGUE (OR WHY WE DID WHAT WE DID)

On selecting a level, how can data be collected and evaluated at the same level of a system? There are two opposite issues here, both of which are problematic to current biological research: On the one hand, how can exceptions be evaluated? My college cell biology professor warned us to distinguish between exceptions and general rules in biology. “Biology is full of exceptions,” he explained. “Imagine intelligent alien beings observing us in order to understand human behavior. If, by chance, they observed circus acrobatic performers, they would conclude that Earthlings walk on their hands. That observation, while true, represents an artistic exception.” Unfortunately, many have forgotten that much of our genetic knowledge is based on the analysis of exceptionsdlow-hanging fruits that often are not representative of the general rule. On the other hand, how do we deal with genetic “noise” in an effort to identify patterns using averaging molecular methodologies? Ultimately, it is important to recognize that much of this is not really noise but rather true, unavoidable biological heterogeneity. Hence, the physicist’s practice of reducing “noise” by performing more measurements on more samples may not be as applicable to biological systems. Clearly, the average profiling of biological systems has little to do with drastic bio-evolution and that includes both cellular evolution (such as cancer formation) and organismal speciation. To address these two issues, we have pushed the importance of non-clonal chromosome aberrations (NCCAs) and genome chaos. Fundamentally, the introduction of fuzzy inheritance provided a new understanding of genomic heterogeneity. Moreover, it is important to consider time in genomics and evolution. Why? Time obviously is not a genetic element and cannot be inherited, so how can it be of ultimate importance to understand genomics? I would like to suggest that (1) inheritance is a key feature of life where time and life are inseparable; (2) the genetic system information that emerges from individual genetic parts is time-sensitive and depends on developmental stages, types of tissue, and genome/environment interactions (including different types of internal and external stresses such as aging and infection); (3) issues such as probability of speciation and cancer formation are dependent on time; and (4) time windows set to conduct research and collect data are crucial when trying to understand evolution dynamics and overall system behavior, such as when observing the epigenetic influence between limited generations. Unfortunately, this key factor has long been ignored in traditional genetic research because of its complexity. We now must put time back into the genomics equation, where it belongs. Additionally, time has long been used to fill the gap between micro- and macroevolution. Such an assumed relationship is no longer real, as illustrated by our experiments tracing evolution in action. When time is involved, biological predictions in disease conditions

EPILOGUE (OR WHY WE DID WHAT WE DID)

485

become much, much harder (compared with predictions in normal developmental processes). All in all, by collectively perusing the above questions, it is tempting to question the fundamental limitations of the reductionist modeling and experimental approaches. Is there an unfillable gap between current genomic research and clinical realities? According to modern scientific philosophical thinking, the only certainty is uncertainty and any major advance in science is to redefine its limitations. It is important to define the limitations of artificial knowledge to appreciate the difficulty in translating our genomic knowledge into clinical applications, as it might not be possible to properly interpret the complexity of the real world if we change the character of nature through drastic reduction. Causative relationships can often be “proven” with specific experimental conditions where the created linear relationships dominate, but when complexity is in, causation is out. This idea is not just a trivial argument. It involves the war on cancer, evolutionary concepts, and how to conduct genomic experiments. This realization will have an impact on the design of future genomic experiments by emphasizing holistic approaches. Despite thorough theoretical reasoning and planning, no new framework will be established without decades’ worth of long, hard work spent thinking, identifying problems, applying funding, experimenting, testing predictions, synthesizing, publishing, and communicating. While important, there is (unfortunately) no tradition within the scientific community to support different ways of thinking. Scientists who want to bring new ways of thinking must be able to pay the price of being different. It is not fun at all when your grant proposals are constantly rejected, and many of your publications take 4e6 years of reviewing before it can be published. However, when you can see the truth most cannot, the ultimate reward is high. After all, that is what it means to truly be called a scientist. After we debuted the genome theory, some suggested us to reconsider and simply incorporate our ideas into the prevailing gene-centric theory. Why not modify the findings to fit? Why build an entirely new framework? Feynman said it best: “It doesn’t matter how beautiful your theory is, it doesn’t matter how smart you are. If it doesn’t agree with experiment, it’s wrong.” Our search, an experiment-based one, realized that the system is based on the genome and chromosomal level. Knowing that the current theory is (incorrectly) established on the level of the gene, why should we bend the facts to fit an outdated theory? Instead, we must begin by building the fundamental theory based on genomic facts and genomebased information concepts. Then, we can incorporate matters concerning genes. Such practices have been successfully illustrated by establishing the new models of cancer evolution and speciation. Although both the gene and genome are involved, they are much better explained by the genome theory.

486

EPILOGUE (OR WHY WE DID WHAT WE DID)

Others suggested that we needed to use a more balanced approach when discussing the relationship between genes and chromosomes. In short, they argued that the gene is still important and not to be neglected. We do understand the importance of the gene; however, we must acknowledge that the long-ignored chromosome must be emphasized. Furthermore, despite its value in basic research, during macroevolution, both cellular and organismal evolution, any gene change is often invisible to selection mechanisms, and they, unlike the constrained genome, come and go. Finally, with the genome-based information framework, overwhelmingly gene-based data, most of which is at odds to the gene theory, can be integrated into the genome theory. Despite this outlook, we are clearly aware of our own limitations and challenges faced: The potential resistance to any new framework, especially one like ours amidst the gene era, is titanic. I still remember being contacted to write a book about our “fresh” genomics ideas in 2011. I accepted the challenge as I had frequently received similar suggestions to write books to promote our “new way of thinking.” The review process was rather quick, and the contract was signed. Unexpectedly, however, on receiving my manuscript a year later, the publisher sent it out for one additional round of reviewing. One reviewer was extremely unhappy (even slightly hostile) about our viewpoints and the project was ended when I refused to change our key concepts. Although it was disappointing, I knew that our concepts were ahead of our time. It is thus both surprising and comforting to note how much has changed in the past 6 years in the field of cancer research and genomics. Genome chaos was confirmed by sequencing as a common mechanism of cancer (it was thought to be an artifact just a few years ago), two-phased cancer evolution was accepted, increased reports pointed out that the cancer evolution cannot be explained by neo-Darwinism, and many predictions of the genome theory have been confirmed. Despite all this progress, however, we still anticipate a long way before the necessary changes are brought about. Nevertheless, this book will be a significant step. I fully understand our own limitations of knowledge, especially when writing a book which involves many different subjects. However, the key purpose of this book is to initiate long-delayed debates, rather than be comprehensive on each subject. We dare to promote a new way of thinking, despite its own limitations. Finally, I welcome feedback, both encouragement and candid, critical reviews. For those who agree with our theory, please validate it and pass it around. For those who strongly disagree with us, let us have meaningful debates. I also encourage readers to develop their own concepts if they

EPILOGUE (OR WHY WE DID WHAT WE DID)

487

can identify better ones. When more scientists start searching for their own journeys to question what it is they know, it is better for the health of science. Dare to be honest and think differently. Dare to question and challenge. Dare to make mistakes and correct them, until you emerge from the shadows of giants and stand on their shoulders. There is always more.

Bibliography Abdallah, B. Y., Horne, S. D., Kurkinen, M., et al. (2014). Ovarian cancer evolution through stochastic genome alterations: Defining the genomic role in ovarian cancer. Systems Biology in Reproductive Medicine, 60(1), 2e13. https://doi.org/10.3109/ 19396368.2013.837989. Abdallah, B. Y., Horne, S. D., Stevens, J. B., et al. (2013). Single cell heterogeneity: Why unstable genomes are incompatible with average profiles. Cell Cycle, 12(23), 3640e3649. https://doi.org/10.4161/cc.26580. Abecasis, G. R., Altshuler, D., Auton, A., et al. (2010). A map of human genome variation from population-scale sequencing. Nature, 467(7319), 1061e1073. https://doi.org/ 10.1038/nature09534. Adami, C. (2016). What is information? Philosophical Transactions. Series A, Mathematical, Physical, and Engineering Sciences, 374(2063). https://doi.org/10.1098/rsta.2015.0230. Ahituv, N., Zhu, Y., Visel, A., et al. (2007). Deletion of ultraconserved elements yields viable mice. PLoS Biology, 5(9), e234. https://doi.org/10.1371/journal.pbio.0050234. Akagi, T., Sasai, K., & Hanafusa, H. (2003). Refractory nature of normal human diploid fibroblasts with respect to oncogene-mediated transformation. Proceedings of the National Academy of Sciences of the United States of America, 100(23), 13567e13572. https://doi.org/ 10.1073/pnas.1834876100. Akst, J. (2010). Why sex evolved. The activity is more likely to pop up in heterogeneous environments. The Scientist. Al Achkar, W., Sabatier, L., & Dutrillaux, B. (1989). How are sticky chromosomes formed? Annales de Genetique, 32(1), 10e15. Albert, R. (2005). Scale-free networks in cell biology. Journal of Cell Science, 118(Pt 21), 4947e4957. https://doi.org/10.1242/jcs.02714. Alberts, B., Kirschner, M. W., Tilghman, S., et al. (2014). Rescuing us biomedical research from its systemic flaws. Proceedings of the National Academy of Sciences of the United States of America, 111(16), 5773e5777. https://doi.org/10.1073/pnas.1404402111. Albertson, D. G., Collins, C., McCormick, F., et al. (2003). Chromosome aberrations in solid tumors. Nat Genet, 34(4), 369e376. Alkan, C., Kidd, J. M., Marques-Bonet, T., et al. (2009). Personalized copy number and segmental duplication maps using next-generation sequencing. Nature Genetics, 41(10), 1061e1067. https://doi.org/10.1038/ng.437. Alkuraya, F. S. (2015). Natural human knockouts and the era of genotype to phenotype. Genome Medicine, 7(1), 48. https://doi.org/10.1186/s13073-015-0173-z. Anderson, C. (2008). The end of theory: The data deluge makes the scientific method obsolete. Wired. Anderson, N. D., De Borja, R., Young, M. D., et al. (2018). Rearrangement bursts generate canonical gene fusions in bone and soft tissue tumors. Science, 361(6405). https://doi.org/ 10.1126/science.aam8419. Andriani, G. A., Almeida, V. P., Faggioli, F., et al. (2016). Whole chromosome instability induces senescence and promotes sasp. Scientific Reports, 6, 35218. https://doi.org/ 10.1038/srep35218. Ao, P., Galas, D., Hood, L., et al. (2010). Towards predictive stochastic dynamical modeling of cancer genesis and progression. Interdisciplinary Sciences, 2(2), 140e144. https://doi.org/ 10.1007/s12539-010-0072-3.

489

490

BIBLIOGRAPHY

Armitage, P., & Doll, R. (1954). The age distribution of cancer and a multi-stage theory of carcinogenesis. British Journal of Cancer, 8(1), 1e12. Armitage, P., & Doll, R. (1957). A two-stage theory of carcinogenesis in relation to the age distribution of human cancer. British Journal of Cancer, 11(2), 161e169. Arnold, C. (2014). Paying graduate school’s mental toll. Science AAAS (posted Feb 4, 2014). Aspiras, A. C., Rohner, N., Martineau, B., et al. (2015). Melanocortin 4 receptor mutations contribute to the adaptation of cavefish to nutrient-poor conditions. Proceedings of the National Academy of Sciences of the United States of America, 112(31), 9668e9673. https:// doi.org/10.1073/pnas.1510802112. Assouline, S., & Lipton, J. H. (2011). Monitoring response and resistance to treatment in chronic myeloid leukemia. Current Oncology, 18(2), e71ee83. Avery, O. T., Macleod, C. M., & Mccarty, M. (1944). Studies on the chemical nature of the substance inducing transformation of pneumococcal types: Induction of transformation by a desoxyribonucleic acid fraction isolated from pneumococcus type iii. Journal of Experimental Medicine, 79(2), 137e158. Aylon, Y., & Oren, M. (2016). The paradox of p53: What, how, and why? Cold Spring Harbor Perspectives in Medicine, 6(10). https://doi.org/10.1101/cshperspect.a026328. Baca, S. C., Prandi, D., Lawrence, M. S., et al. (2013). Punctuated evolution of prostate cancer genomes. Cell, 153(3), 666e677. https://doi.org/10.1016/j.cell.2013.03.021. Bachtrog, D. (2003). Adaptation shapes patterns of genome evolution on sexual and asexual chromosomes in drosophila. Nature Genetics, 34(2), 215e219. https://doi.org/10.1038/ ng1164. Bachtrog, D. (2006). A dynamic view of sex chromosome evolution. Current Opinion in Genetics and Development, 16(6), 578e585. https://doi.org/10.1016/j.gde.2006.10.007. Baek, S. T., Kerjan, G., Bielas, S. L., et al. (2014). Off-target effect of doublecortin family shrna on neuronal migration associated with endogenous microrna dysregulation. Neuron, 82(6), 1255e1262. https://doi.org/10.1016/j.neuron.2014.04.036. Bak, P., Bak, A. L., & Zeuthen, J. (1979). Characterization of human chromosomal unit fibers. Chromosoma, 73(3), 301e315. Bakhoum, S. F., & Landau, D. A. (2017). Cancer evolution: No room for negative selection. Cell, 171(5), 987e989. https://doi.org/10.1016/j.cell.2017.10.039. Bakhoum, S. F., Ngo, B., Laughney, A. M., et al. (2018). Chromosomal instability drives metastasis through a cytosolic DNA response. Nature, 553(7689), 467e472. https:// doi.org/10.1038/nature25432. Bakker, B., Taudt, A., Belderbos, M. E., et al. (2016). Single-cell sequencing reveals karyotype heterogeneity in murine and human malignancies. Genome Biology, 17(1), 115. https:// doi.org/10.1186/s13059-016-0971-7. Balaban, N. Q., Merrin, J., Chait, R., et al. (2004). Bacterial persistence as a phenotypic switch. Science, 305(5690), 1622e1625. https://doi.org/10.1126/science.1099390. Baltimore, D. (2000). DNA is a reality beyond metaphor. Caltech and the Human Genome Project. Bastarache, L., Hughey, J. J., Hebbring, S., et al. (2018). Phenotype risk scores identify patients with unrecognized mendelian disease patterns. Science, 359(6381), 1233e1239. https:// doi.org/10.1126/science.aal4043. Bateson, W. (1909). Heredity and variation in modern lights. In A. C. Seward (Ed.), Darwin and Modern Science (pp. 85e101). Cambridge: Cambridge University Press. Bateson, W., & Saunders, E. R. (1902). Experiments [in the physiology of heredity] (Harrison). ¨ bergang von ko¨rperzellen in Bauer, K. H. (1928). Mutationstheorie der geschwulst-entstehung: U geschwulstzellen durch gen-a¨nderung. Berlin: Springer. Bayani, J., Selvarajah, S., Maire, G., et al. (2007). Genomic mechanisms and measurement of structural and numerical instability in cancer cells. Seminars in Cancer Biology, 17(1), 5e18. https://doi.org/10.1016/j.semcancer.2006.10.006.

BIBLIOGRAPHY

491

Bayes, J. J., & Malik, H. S. (2009). Altered heterochromatin binding by a hybrid sterility protein in drosophila sibling species. Science, 326(5959), 1538e1541. https://doi.org/ 10.1126/science.1181756. Beadle, G. W. (1933). A gene for sticky chromosomes inzea mays. Zeitschrift fu¨r Induktive Abstammungs- und Vererbungslehre, 63(1), 195e217. https://doi.org/10.1007/bf01849089. Beall, C. M. (2000). Tibetan and andean contrasts in adaptation to high-altitude hypoxia. Advances in Experimental Medicine and Biology, 475, 63e74. Becks, L., & Agrawal, A. F. (2012). The evolution of sex is favoured during adaptation to new environments. PLoS Biology, 10(5), e1001317. https://doi.org/10.1371/journal. pbio.1001317. Begley, S. (2012). In cancer science, many “discoveries” don’t hold up. Reuters (March 28, 2012). Retrieved from https://www.reuters.com/article/us-science-cancer/in-cancer-sciencemany-discoveries-dont-hold-up-idUSBRE82R12P20120328. Begley, C. G., & Ellis, L. M. (2012). Drug development: Raise standards for preclinical cancer research. Nature, 483(7391), 531e533. https://doi.org/10.1038/483531a. Bell, G. (1982). The masterpiece of nature: The evolution and genetics of sexuality. University of California Press. Bell, G. (1988). Sex and death in protozoa: The history of an obsession. Cambridge: Cambridge University Press. Belton, J. M., Mccord, R. P., Gibcus, J. H., et al. (2012). Hi-c: A comprehensive technique to capture the conformation of genomes. Methods, 58(3), 268e276. https://doi.org/ 10.1016/j.ymeth.2012.05.001. Bennetzen, J. L., & Wang, H. (2014). The contributions of transposable elements to the structure, function, and evolution of plant genomes. Annual Review of Plant Biology, 65, 505e530. https://doi.org/10.1146/annurev-arplant-050213-035811. Bensimon, A., Simon, A., Chiffaudel, A., et al. (1994). Alignment and sensitive detection of DNA by a moving interface. Science, 265(5181), 2096e2098. Berezney, R., & Coffey, D. S. (1974). Identification of a nuclear protein matrix. Biochemical and Biophysical Research Communications, 60(4), 1410e1417. Bernardi, G. (2007). The neoselectionist theory of genome evolution. Proceedings of the National Academy of Sciences of the United States of America, 104(20), 8385e8390. https:// doi.org/10.1073/pnas.0701652104. Bernstein, H., Hopf, F., & Michod, R. (1989). The evolution of sex: DNA repair hypothesis. In C. Rasa & V. E (Eds.), The sociobiology of sexual and reproductive strategies. London Chapman and Hall. Beroukhim, R., Mermel, C. H., Porter, D., et al. (2010). The landscape of somatic copy-number alteration across human cancers. Nature, 463(7283), 899e905. https://doi.org/10.1038/ nature08822. Bertelsen, B., Nazaryan-Petersen, L., Sun, W., et al. (2016). A germline chromothripsis event stably segregating in 11 individuals through three generations. Genetics in Medicine, 18(5), 494e500. https://doi.org/10.1038/gim.2015.112. Bestor, T. H., Edwards, J. R., & Boulard, M. (2015). Notes on the role of dynamic DNA methylation in mammalian development. Proceedings of the National Academy of Sciences of the United States of America, 112(22), 6796e6799. https://doi.org/10.1073/pnas.1415301111. Bielas, J. H., Loeb, K. R., Rubin, B. P., et al. (2006). Human cancers express a mutator phenotype. Proceedings of the National Academy of Sciences of the United States of America, 103, 18238e18242. Biesterfeld, S., Gerres, K., Fischer-Wein, G., et al. (1994). Polyploidy in non-neoplastic tissues. Journal of Clinical Pathology, 47(1), 38e42. Bigger, J. (1944). Treatment of staphylococcal infections with penicillin. Lancet, 244, 497e500. Birky, C. W. (2004). Bdelloid rotifers revisited. Proceedings of the National Academy of Sciences of the United States of America, 101(9), 2651e2652. https://doi.org/10.1073/pnas.0308453101.

492

BIBLIOGRAPHY

Birney, E., Stamatoyannopoulos, J. A., Dutta, A., et al. (2007). Identification and analysis of functional elements in 1% of the human genome by the encode pilot project. Nature, 447(7146), 799e816. https://doi.org/10.1038/nature05874. Bischoff, F. Z., Yim, S. O., Pathak, S., et al. (1990). Spontaneous abnormalities in normal fibroblasts from patients with Li-Fraumeni cancer syndrome: Aneuploidy and immortalization. Cancer Research, 50, 7979e7984. Bishop, J. M. (1991). Molecular themes in oncogenesis. Cell, 64(2), 235e248. Blanc, G., & Wolfe, K. H. (2004). Widespread paleopolyploidy in model plant species inferred from age distributions of duplicate genes. The Plant Cell Online, 16(7), 1667e1678. https://doi.org/10.1105/tpc.021345. Bloomfield, M., & Duesberg, P. (2016). Inherent variability of cancer-specific aneuploidy generates metastases. Molecular Cytogenetics, 9, 90. https://doi.org/10.1186/s13039-0160297-x. Bloomfield, M., & Duesberg, P. (2018). Is cancer progression caused by gradual or simultaneous acquisitions of new chromosomes? Molecular Cytogenetics, 11, 4. https://doi.org/ 10.1186/s13039-017-0350-4. Bloomfield, M., Mccormack, A., Mandrioli, D., et al. (2014). Karyotypic evolutions of cancer species in rats during the long latent periods after injection of nitrosourea. Molecular Cytogenetics, 7(1), 71. https://doi.org/10.1186/s13039-014-0071-x. Blount, Z. D., Barrick, J. E., Davidson, C. J., et al. (2012). Genomic analysis of a key innovation in an experimental escherichia coli population. Nature, 489(7417), 513e518. https:// doi.org/10.1038/nature11514. Bochukova, E. G., Huang, N., Keogh, J., et al. (2010). Large, rare chromosomal deletions associated with severe early-onset obesity. Nature, 463(7281), 666e670. https://doi.org/ 10.1038/nature08689. Bolzer, A., Kreth, G., Solovei, I., et al. (2005). Three-dimensional maps of all chromosomes in human male fibroblast nuclei and prometaphase rosettes. PLoS Biology, 3(5), e157. https://doi.org/10.1371/journal.pbio.0030157. Bonn, D. (1998). Adverse drug reactions remain a major cause of death. Lancet, 351, 1183 (England). Bonney, M. E., Moriya, H., & Amon, A. (2015). Aneuploid proliferation defects in yeast are not driven by copy number changes of a few dosage-sensitive genes. Genes and Development, 29(9), 898e903. https://doi.org/10.1101/gad.261743.115. Bordenstein, S. R., & Theis, K. R. (2015). Host biology in light of the microbiome: Ten principles of holobionts and hologenomes. PLoS Biology, 13(8), e1002226. https://doi.org/ 10.1371/journal.pbio.1002226. Boroviak, K., Fu, B., Yang, F., et al. (2017). Revealing hidden complexities of genomic rearrangements generated with cas9. Scientific Reports, 7(1), 12867. https://doi.org/ 10.1038/s41598-017-12740-6. Borowsky, R. (2008). Restoring sight in blind cavefish. Current Biology, 18(1), R23eR24. https://doi.org/10.1016/j.cub.2007.11.023. Bouche, N., & Bouchez, D. (2001). Arabidopsis gene knockout: Phenotypes wanted. Current Opinion in Plant Biology, 4(2), 111e117. Boutanaev, A. M., Kalmykova, A. I., Shevelyov, Y. Y., et al. (2002). Large clusters of co-expressed genes in the drosophila genome. Nature, 420(6916), 666e669. https:// doi.org/10.1038/nature01216. Boveri, T. (2008). Concerning the origin of malignant tumours by theodor boveri. Translated and annotated by Henry Harris. Journal of Cell Science, 121(Suppl. 1), 1e84. https:// doi.org/10.1242/jcs.025742 (1909). Boyle, E. A., Li, Y. I., & Pritchard, J. K. (2017). An expanded view of complex traits: From polygenic to omnigenic. Cell, 169(7), 1177e1186. https://doi.org/10.1016/ j.cell.2017.05.038.

BIBLIOGRAPHY

493

Bradshaw, A. D. (1991). The croonian lecture, 1991. Genostasis and the limits to evolution. Philosophical Transactions of the Royal Society of London B Biological Sciences, 333(1267), 289e305. https://doi.org/10.1098/rstb.1991.0079. Brash, D. E. (2015). Cancer. Preprocancer. Science, 348(6237), 867e868. https://doi.org/ 10.1126/science.aac4435. Brickner, D. G., Sood, V., Tutucci, E., et al. (2016). Subnuclear positioning and interchromosomal clustering of the gal1-10 locus are controlled by separable, interdependent mechanisms. Molecular Biology of the Cell, 27(19), 2980e2993. https://doi.org/10.1091/ mbc.E16-03-0174. Brockhurst, M. A. (2011). Evolution. Sex, death, and the red queen. Science, 333(6039), 166e167. https://doi.org/10.1126/science.1209420. Brooks, M. (2010). 13 things that don’t make sense: The most intriguing scientific mysteries of our time. Profile Books Limited. Brown, T. A. (2002). Genomes. Wiley-Liss. Brown, K. M., Burk, L. M., Henagan, L. M., et al. (2004). A test of the chromosomal rearrangement model of speciation in drosophila pseudoobscura. Evolution, 58(8), 1856e1860. Bruder, C. E., Piotrowski, A., Gijsbers, A. A., et al. (2008). Phenotypically concordant and discordant monozygotic twins display different DNA copy-number-variation profiles. The American Journal of Human Genetics, 82(3), 763e771. https://doi.org/10.1016/ j.ajhg.2007.12.011. Bucan, M., Abrahams, B. S., Wang, K., et al. (2009). Genome-wide analyses of exonic copy number variants in a family-based study point to novel autism susceptibility genes. PLoS Genetics, 5(6), e1000536. https://doi.org/10.1371/journal.pgen.1000536. Bush, V. (1945). Science, the endless frontier: A report to the president on a program for postwar scientific research. United States: Office of Scientific Research and Development. https:// archive.org/details/scienceendlessfr00unit. Butler, D. (2008). Translational research: Crossing the valley of death. Nature, 453, 840e842 (England). Calogero, A. E., De Palma, A., Grazioso, C., et al. (2001). High sperm aneuploidy rate in unselected infertile patients and its relationship with intracytoplasmic sperm injection outcome. Human Reproduction, 16(7), 1433e1439. Camazine, S. (2003). Self-organization in biological systems. Princeton, NJ: Princeton University press. Campbell, I. M., Gambin, T., Dittwald, P., et al. (2014). Human endogenous retroviral elements promote genome instability via non-allelic homologous recombination. BMC Biology, 12, 74. https://doi.org/10.1186/s12915-014-0074-4. Cancer Genome Atlas Network. (2012). Comprehensive molecular characterization of human colon and rectal cancer. Nature, 487(7407), 330e337. https://doi.org/10.1038/ nature11252. Canil, C. M., Moore, M. J., Winquist, E., et al. (2005). Randomized phase II study of two doses of gefitinib in hormone-refractory prostate cancer: A trial of the national cancer institute of Canada-clinical trials group. Journal of Clinical Oncology, 23, 455e460. Cannella, M., Maglione, V., Martino, T., et al. (2009). DNA instability in replicating huntington’s disease lymphoblasts. BMC Medical Genetics, 10, 11. https://doi.org/10.1186/14712350-10-11. Carbone, L., Harris, R. A., Gnerre, S., et al. (2014). Gibbon genome and the fast karyotype evolution of small apes. Nature, 513(7517), 195e201. https://doi.org/10.1038/ nature13679. Carbone, L., Vessere, G. M., Ten Hallers, B. F., et al. (2006). A high-resolution map of synteny disruptions in gibbon and human genomes. PLoS Genetics, 2(12), e223. https://doi.org/ 10.1371/journal.pgen.0020223.

494

BIBLIOGRAPHY

Carroll, S. (2009). What Darwin never knew (Prozucer). PBS program. December 29, 2009 http://www.pbs.org/wgbh/nova/evolution/darwin-never-knew.html. Caruso, D. (2007). A challenge to gene theory, a tougher look at biotech. The New York Times (July 1, 2007) Retrieved from https://www.nytimes.com/2007/07/01/business/yourmoney/ 01frame.html. Carvalho, C. M., Zhang, F., & Lupski, J. R. (2010). Evolution in health and medicine sackler colloquium: Genomic disorders: A window into human gene and genome evolution. Proceedings of the National Academy of Sciences of the United States of America, 107(Suppl. 1), 1765e1771. https://doi.org/10.1073/pnas.0906222107. Castagnetti, S., Oliferenko, S., & Nurse, P. (2010). Fission yeast cells undergo nuclear division in the absence of spindle microtubules. PLoS Biology, 8(10), e1000512. https://doi.org/ 10.1371/journal.pbio.1000512. Castro-Giner, F., Ratcliffe, P., & Tomlinson, I. (2015). The mini-driver model of polygenic cancer evolution. Nature Reviews Cancer, 15(11), 680e685. https://doi.org/10.1038/nrc3999. Cavalier-Smith, T. (2002). Origins of the machinery of recombination and sex. Heredity (Edinb), 88(2), 125e141. https://doi.org/10.1038/sj.hdy.6800034. Cavalli, G., & Misteli, T. (2013). Functional implications of genome topology. Nature Structural and Molecular Biology, 20(3), 290e299. https://doi.org/10.1038/nsmb.2474. Celton-Morizur, S., & Desdouets, C. (2010). Polyploidization of liver cells. Advances in Experimental Medicine and Biology, 676, 123e135. Cerulus, B., New, A. M., Pougach, K., et al. (2016). Noise and epigenetic inheritance of singlecell division times influence population fitness. Current Biology, 26(9), 1138e1147. https://doi.org/10.1016/j.cub.2016.03.010. Chakravarti, A. (2011). Genomics is not enough. Science, 334(15) (United States). Chan, J. D., Agbedanu, P. N., Zamanian, M., et al. (2014). ‘Death and axes’: Unexpected ca(2)(þ) entry phenologs predict new anti-schistosomal agents. PLoS Pathogens, 10(2), e1003942. https://doi.org/10.1371/journal.ppat.1003942. Chandrakasan, S., Ye, C. J., Chitlur, M., et al. (2011). Malignant fibrous histiocytoma two years after autologous stem cell transplant for hodgkin lymphoma: Evidence for genomic instability. Pediatric Blood and Cancer, 56(7), 1143e1145. https://doi.org/10.1002/ pbc.22929. Chang, S. L., Lai, H. Y., Tung, S. Y., et al. (2013). Dynamic large-scale chromosomal rearrangements fuel rapid adaptation in yeast populations. PLoS Genetics, 9(1), e1003232. https:// doi.org/10.1371/journal.pgen.1003232. Chargaff, E. (1950). Chemical specificity of nucleic acids and mechanism of their enzymatic degradation. Experientia, 6(6), 201e209. Charlesworth, B. (1996). The good fairy godmother of evolutionary genetics. Current Biology, 6(3), 220. Check Hayden, E. (2010). Human genome at ten: Life is complicated. Nature, 464, 664e667 (England). Chen, S., Krinsky, B. H., & Long, M. (2013). New genes as drivers of phenotypic evolution. Nature Reviews Genetics, 14(9), 645e660. https://doi.org/10.1038/nrg3521. Chen, W. H., Minguez, P., Lercher, M. J., et al. (2012). Ogee: An online gene essentiality database. Nucleic Acids Research, 40(Database issue), D901eD906. https://doi.org/ 10.1093/nar/gkr986. Chen, R., Shi, L., Hakenberg, J., et al. (2016). Analysis of 589,306 genomes identifies individuals resilient to severe mendelian childhood diseases. Nature Biotechnology, 34(5), 531e538. https://doi.org/10.1038/nbt.3514. Cheng, K. K., Lee, B. S., Masuda, T., et al. (2014). Global metabolic network reorganization by adaptive mutations allows fast growth of escherichia coli on glycerol. Nature Communications, 5, 3233. https://doi.org/10.1038/ncomms4233.

BIBLIOGRAPHY

495

Cheow, L. F., Courtois, E. T., Tan, Y., et al. (2016). Single-cell multimodal profiling reveals cellular epigenetic heterogeneity. Nature Methods, 13(10), 833e836. https://doi.org/ 10.1038/nmeth.3961. Chin, T. F., Ibrahim, K., Thirunavakarasu, T., et al. (2018). Nonclonal chromosomal aberrations in childhood leukemia survivors. Fetal and Pediatric Pathology, 1e11. https:// doi.org/10.1080/15513815.2018.1492054. Chittka, A., Wurm, Y., & Chittka, L. (2012). Epigenetics: The making of ant castes. Current Biology, 22(19), R835eR838. https://doi.org/10.1016/j.cub.2012.07.045. Cho, R. J., Campbell, M. J., Winzeler, E. A., et al. (1998). A genome-wide transcriptional analysis of the mitotic cell cycle. Molecular Cell, 2(1), 65e73. Chou, J. Y., Hung, Y. S., Lin, K. H., et al. (2010). Multiple molecular mechanisms cause reproductive isolation between three yeast species. PLoS Biology, 8(7), e1000432. https:// doi.org/10.1371/journal.pbio.1000432. Chouard, T. (2008). Darwin 200: Beneath the surface. Nature, 456(7220), 300e303. https:// doi.org/10.1038/456300a. Clark, A. G., Eisen, M. B., Smith, D. R., et al. (2007). Evolution of genes and genomes on the drosophila phylogeny. Nature, 450(7167), 203e218. https://doi.org/10.1038/ nature06341. Cleary, A. S., Leonard, T. L., Gestl, S. A., et al. (2014). Tumour cell heterogeneity maintained by cooperating subclones in wnt-driven mammary cancers. Nature, 508(7494), 113e117. https://doi.org/10.1038/nature13187. Cobb, M. (2013). 1953: When genes became “information”. Cell, 153(3), 503e506. https:// doi.org/10.1016/j.cell.2013.04.012. Cobb, M. (2014). Oswald avery, DNA, and the transformation of biology. Current Biology, 24(2), R55eR60. https://doi.org/10.1016/j.cub.2013.11.060. Cohen, B. A., Mitra, R. D., Hughes, J. D., et al. (2000). A computational analysis of wholegenome expression data reveals chromosomal domains of gene expression. Nature Genetics, 26(2), 183e186. https://doi.org/10.1038/79896. Collins, F. S. (1995). Positional cloning moves from perditional to traditional. Nature Genetics, 9(4), 347e350. https://doi.org/10.1038/ng0495-347. Collins, F. S. (1999). Shattuck lecture–medical and societal consequences of the human genome project. New England Journal of Medicine, 341(1), 28e37. https://doi.org/ 10.1056/nejm199907013410106. Collins, F. S., & Barker, A. D. (2007). Mapping the cancer genome. Pinpointing the genes involved in cancer will help chart a new course across the complex landscape of human malignancies. Scientific American, 296(3), 50e57. Collins, F. S., & Varmus, H. (2015). A new initiative on precision medicine. New England Journal of Medicine, 372(9), 793e795. https://doi.org/10.1056/NEJMp1500523. Compton, D. A. (2011). Mechanisms of aneuploidy. Current Opinion in Cell Biology, 23(1), 109e113. https://doi.org/10.1016/j.ceb.2010.08.007. Conrad, D. F., Keebler, J. E., Depristo, M. A., et al. (2011). Variation in genome-wide mutation rates within and between human families. Nature Genetics, 43(7), 712e714. https:// doi.org/10.1038/ng.862. Conrad, D. F., Pinto, D., Redon, R., et al. (2010). Origins and functional impact of copy number variation in the human genome. Nature, 464(7289), 704e712. https://doi.org/ 10.1038/nature08516. Consortium, GTEx. (2015). Human genomics. The genotype-tissue expression (gtex) pilot analysis: Multitissue gene regulation in humans. Science, 348(6235), 648e660. https:// doi.org/10.1126/science.1262110. Cox, N. J. (2017). Reaching for the next branch on the biobank tree of knowledge. Nature Genetics, 49(9), 1295e1296. https://doi.org/10.1038/ng.3946. Coyne, J. A. (2009). Why evolution is true. Penguin Publishing Group.

496

BIBLIOGRAPHY

Coyne, J. A., & Orr, H. A. (1998). The evolutionary genetics of speciation. Philosophical Transactions of the Royal Society of London B Biological Sciences, 353(1366), 287e305. https:// doi.org/10.1098/rstb.1998.0210. Craddock, T. J., Harvey, J. M., Nathanson, L., et al. (2015). Using gene expression signatures to identify novel treatment strategies in gulf war illness. BMC Medical Genomics, 8, 36. https://doi.org/10.1186/s12920-015-0111-3. Crasta, K., Ganem, N. J., Dagher, R., et al. (2012). DNA breaks and chromosome pulverization from errors in mitosis. Nature, 482(7383), 53e58. https://doi.org/10.1038/nature10802. Creekmore, A. L., Silkworth, W. T., Cimini, D., et al. (2011). Changes in gene expression and cellular architecture in an ovarian cancer progression model. PLoS One, 6(3), e17676. https://doi.org/10.1371/journal.pone.0017676. Cremer, T., Cremer, M., Dietzel, S., et al. (2006). Chromosome territories–a functional nuclear landscape. Current Opinion in Cell Biology, 18(3), 307e316. https://doi.org/10.1016/ j.ceb.2006.04.007. Crespi, B., Foster, K., & Ubeda, F. (2014). First principles of Hamiltonian medicine. Philosophical Transactions of the Royal Society of London B Biological Sciences, 369(1642), 20130366. https://doi.org/10.1098/rstb.2013.0366. Croft, J. A., Bridger, J. M., Boyle, S., et al. (1999). Differences in the localization and morphology of chromosomes in the human nucleus. The Journal of Cell Biology, 145(6), 1119e1131. Cross, W., Graham, T. A., & Wright, N. A. (2016). New paradigms in clonal evolution: Punctuated equilibrium in cancer. The Journal of Pathology, 240(2), 126e136. https://doi.org/ 10.1002/path.4757. Crow, J. F., & Kimura, M. (1965). Evolution in sexual and asexual populations. Amer Natur, 99, 439e450. Culver, D. C., & Culver, P. D. C. (1982). Cave life: Evolution and ecology. Harvard University Press. Cuylen, S., Blaukopf, C., Politi, A. Z., et al. (2016). Ki-67 acts as a biological surfactant to disperse mitotic chromosomes. Nature, 535(7611), 308e312. https://doi.org/10.1038/ nature18610. Dallapiccola, B., Ferranti, G., Altissimi, D., et al. (1989). First-trimester prenatal diagnosis of homozygous (14;21) translocation in a fetus with 44 chromosomes. Prenatal Diagnosis, 9(8), 555e558. Danchin, E´., Charmantier, A., Champagne, F. A., et al. (2011). Beyond DNA: Integrating inclusive inheritance into an extended theory of evolution. Nature Reviews Genetics, 12(7), 475e486. https://doi.org/10.1038/nrg3028. Dandekar, T., Snel, B., Huynen, M., et al. (1998). Conservation of gene order: A fingerprint of proteins that physically interact. Trends in Biochemical Sciences, 23(9), 324e328. Darwin, E. (1800). Phytologia, or the philosophy of agriculture and gardening; with the theory of draining morasses and with an improved construction of the drill plough: J. Johnson. Darwin, C. (1862). On the two forms, or dimorphic condition, in the species of primula, and on theirremarkable sexual relations. Journal of Proceedings of the Linnean Society of London (Botany), 6, 77e96. Darwin, C., & Keynes, R. D. (2000). Charles Darwin’s zoology notes & specimen lists from h.M.S. Beagle edited by Richard Darwin keynes. Cambridge: Cambridge University Press. Davidoff, F. (2009). Heterogeneity is not always noise: Lessons from improvement. Journal of the American Medical Association, 302(23), 2580e2586. https://doi.org/10.1001/ jama.2009.1845. Davies, P. C., & Lineweaver, C. H. (2011). Cancer tumors as metazoa 1.0: Tapping genes of ancient ancestors. Physical Biology, 8(1), 015001. https://doi.org/10.1088/1478-3975/8/ 1/015001.

BIBLIOGRAPHY

497

Davila Lopez, M., Martinez Guerra, J. J., & Samuelsson, T. (2010). Analysis of gene order conservation in eukaryotes identifies transcriptionally and functionally linked genes. PLoS One, 5(5), e10654. https://doi.org/10.1371/journal.pone.0010654. Davis, A., Gao, R., & Navin, N. (2017). Tumor evolution: Linear, branching, neutral or punctuated? Biochimica et Biophysica Acta e Reviews on Cancer, 1867(2), 151e161. https://doi.org/10.1016/j.bbcan.2017.01.003. Davoli, T., Uno, H., Wooten, E. C., et al. (2017). Tumor aneuploidy correlates with markers of immune evasion and with reduced response to immunotherapy. Science, 355(6322). https://doi.org/10.1126/science.aaf8399. Dawkins, R. (1976). The selfish gene. Oxford: Oxford University Press. Dawkins, R. (1996). The blind watchmaker: Why the evidence of evolution reveals a universe without design. Norton. De Braekeleer, E., Loup Huret, J., Mossafa, H., et al. (2016). Cancer cytogenomics resources: Atlas of genetics and cytogenetics in oncology and haematology. Web. http://atlasgeneticsoncology.org/. De Mauro, A., Greco, M., & Grimaldi, M. (2016). A formal definition of big data based on its essential features. Library Review, 65, 122e135. https://doi.org/10.1108/LR-06-2015-0061. De Pagter, M. S., Van Roosmalen, M. J., Baas, A. F., et al. (2015). Chromothripsis in healthy individuals affects multiple protein-coding genes and can result in severe congenital abnormalities in offspring. The American Journal of Human Genetics, 96(4), 651e656. https:// doi.org/10.1016/j.ajhg.2015.02.005. De The, H., & Chen, Z. (2010). Acute promyelocytic leukaemia: Novel insights into the mechanisms of cure. Nature Reviews Cancer, 10(11), 775e783. https://doi.org/10.1038/nrc2943. De Vries, H. (1889). Intracellular pangenesis. The open court publishing Co. Translated from the German by C. Stuart Gager in 1910. De Vries. (1905). Species and Varieties: Their Origin by Mutation. Chicago Open Court Pub. Decottignies, A., Sanchez-Perez, I., & Nurse, P. (2003). Schizosaccharomyces pombe essential genes: A pilot study. Genome Research, 13(3), 399e406. https://doi.org/10.1101/gr.636103. DeLisi, C. (2008). Meetings that changed the world: Santa fe 1986: Human genome babysteps. Nature, 455(7215), 876e877. https://doi.org/10.1038/455876a. Dietrich, M. R. (2003). Richard Goldschmidt: Hopeful monsters and other ’heresies’. Nature Reviews Genetics, 4(1), 68e74. https://doi.org/10.1038/nrg979. Dobzhansky, T. (1936). Studies on hybrid sterility. Ii. Localization of sterility factors in drosophila pseudoobscura hybrids. Genetics, 21(2), 113e135. Dobzhansky, T. (1941). Genetics and the origin of species (2nd ed.). New York: Columbia university press. Dodsworth, S., Chase, M. W., & Leitch, A. R. (2016). Is post-polyploidization diploidization the key to the evolutionary success of angiosperms? Botanical Journal of the Linnean Society, 180(1), 1e5. https://doi.org/10.1111/boj.12357. Domingos, P. (2015). The master algorithm: How the quest for the ultimate learning machine will remake our world. Basic Books. Dover, G. (1982). Molecular drive: A cohesive mode of species evolution. Nature, 299(5879), 111e117. Dror, A. A., & Avraham, K. B. (2009). Hearing loss: Mechanisms revealed by genetics and cell biology. Annual Review of Genetics, 43, 411e437. https://doi.org/10.1146/annurev-genet102108-134135. Druker, B. J., Tamura, S., Buchdunger, E., et al. (1996). Effects of a selective inhibitor of the abl tyrosine kinase on the growth of bcr-abl positive cells. Nature Medicine, 2(5), 561e566. Duelli, D. M., Padilla-Nash, H. M., Berman, D., et al. (2007). A virus causes cancer by inducing massive chromosomal instability through cell fusion. Current Biology, 17(5), 431e437. https://doi.org/10.1016/j.cub.2007.01.049.

498

BIBLIOGRAPHY

Duesberg, P., Li, R., Sachs, R., et al. (2007). Cancer drug resistance: The central role of the karyotype. Drug Resistance Updates, 10(1e2), 51e58. https://doi.org/10.1016/ j.drup.2007.02.003. Duesberg, P., & Mccormack, A. (2013). Immortality of cancers: A consequence of inherent karyotypic variations and selections for autonomy. Cell Cycle, 12(5), 783e802. https:// doi.org/10.4161/cc.23720. Duesberg, P., & Rasnick, D. (2000). Aneuploidy, the somatic mutation that makes cancer a species of its own. Cell Motility and the Cytoskeleton, 47(2), 81e107. https://doi.org/ 10.1002/1097-0169(200010)47:23.0.co;2-#. Duesberg, P. H., & Vogt, P. K. (1970). Differences between the ribonucleic acids of transforming and nontransforming avian tumor viruses. Proceedings of the National Academy of Sciences of the United States of America, 67(4), 1673e1680. Dujon, B. (2006). Yeasts illustrate the molecular mechanisms of eukaryotic genome evolution. Trends in Genetics, 22(7), 375e387. https://doi.org/10.1016/j.tig.2006.05.007. Duncan, A. W. (2013). Aneuploidy, polyploidy and ploidy reversal in the liver. Seminars in Cell and Developmental Biology, 24(4), 347e356. https://doi.org/10.1016/j.semcdb. 2013.01.003. Duncan, A. W., Taylor, M. H., Hickey, R. D., et al. (2010). The ploidy conveyor of mature hepatocytes as a source of genetic variation. Nature, 467(7316), 707e710. https://doi.org/ 10.1038/nature09414. Dutrillaux, B., Biemont, M. C., Viegas-Pequignot, E., et al. (1979). Comparison of the karyotypes of four cercopithecoidae: Papio papio, p. Anubis, macaca mulatta, and m. Fascicularis. Cytogenetics and Cell Genetics, 23(1e2), 77e83. https://doi.org/10.1159/ 000131305. Easwaran, H., Tsai, H. C., & Baylin, S. B. (2014). Cancer epigenetics: Tumor heterogeneity, plasticity of stem-like states, and drug resistance. Molecular Cell, 54(5), 716e727. https://doi.org/10.1016/j.molcel.2014.05.015. Eddy, S. R. (2012). The c-value paradox, junk DNA and encode. Current Biology, 22(21), R898eR899. https://doi.org/10.1016/j.cub.2012.10.002. EDGE Conversation. (2008). Life: A gene-centric view. Public conversation between Venter, c and Dawkins, r. Digital Life Design (Retrieved from). EDGE Interview. (2001). Interview with Ernst Mayr by edge. Retrieved from http://www.edge. org/3rd_culture/mayr/mayr_print.html. EDGE: The Third Culture. (2006). The Selfish gene: Thirty Years On. Retrieved from https:// www.edge.org/3rd_culture/selfish06/selfish06_index.html. Editorial. (2012). Must try harder. Nature, 483, 509 (England). Eichler, E. E., Flint, J., Gibson, G., et al. (2010). Missing heritability and strategies for finding the underlying causes of complex disease. Nature Reviews Genetics, 11(6), 446e450. https://doi.org/10.1038/nrg2809. Eklund, A., Simola, K. O., & Ryyna¨nen, M. (1988). Translocation t(13;14) in nine generations with a case of translocation homozygosity. Clinical Genetics, 33(2), 83e86. El-Brolosy, M. A., & Stainier, D. Y. R. (2017). Genetic compensation: A phenomenon in search of mechanisms. PLoS Genetics, 13(7), e1006780. https://doi.org/10.1371/journal. pgen.1006780. Eldredge, N. (2004). Why we do it: Rethinking sex and the selfish gene. New York: Norton. Eldredge, N., & Gould, S. (1972). Punctuated equilibria: An alternative to phyletic gradualism. In T. Schopf (Ed.), Models in paleobiology (pp. 82e115). San Francisco: Freeman, Cooper and Company. Eldredge, N., Thompson, J., Brakefield, P., et al. (2005). The dynamics of evolutionary stasis (Vol. 31). SPIE.

BIBLIOGRAPHY

499

Elenbaas, B., Spirio, L., Koerner, F., et al. (2001). Human breast cancer cells generated by oncogenic transformation of primary mammary epithelial cells. Genes and Development, 15(1), 50e65. Ellis, G., & Silk, J. (2014). Scientific method: Defend the integrity of physics. Nature, 516(7531), 321e323. https://doi.org/10.1038/516321a. Elowitz, M. B., Levine, A. J., Siggia, E. D., et al. (2002). Stochastic gene expression in a single cell. Science, 297(5584), 1183e1186. https://doi.org/10.1126/science.1070919. Elsasser, W. M. (1981). Principles of a new biological theory: A summary. Journal of Theoretical Biology, 89(1), 131e150. Elsasser, W. M. (1984). Outline of a theory of cellular heterogeneity. Proceedings of the National Academy of Sciences of the United States of America, 81(16), 5126e5129. Erwin, D. H. (2008). Colloquium paper: Extinction as the loss of evolutionary history. Proceedings of the National Academy of Sciences of the United States of America, 105(Suppl. 1), 11520e11527. https://doi.org/10.1073/pnas.0801913105. Escaramis, G., Docampo, E., & Rabionet, R. (2015). A decade of structural variants: Description, history and methods to detect structural variation. Briefings in Functional Genomics, 14(5), 305e314. https://doi.org/10.1093/bfgp/elv014. Esteller, M. (2006). Epigenetics provides a new generation of oncogenes and tumor suppressor genes. British Journal of Cancer, 94, 179e183. Esteller, M. (2008). Epigenetics in cancer. New England Journal of Medicine, 358(11), 1148e1159. https://doi.org/10.1056/NEJMra072067. Estivill, X., & Armengol, L. (2007). Copy number variants and common disorders: Filling the gaps and exploring complexity in genome-wide association studies. PLoS Genetics, 3(10), 1787e1799. https://doi.org/10.1371/journal.pgen.0030190. Evans, J. P., Meslin, E. M., Marteau, T. M., et al. (2011). Genomics. Deflating the genomic bubble. Science, 331(6019), 861e862. https://doi.org/10.1126/science.1198039. Ewald, P. W. (2009). An evolutionary perspective on parasitism as a cause of cancer. Advances in Parasitology, 68, 21e43. https://doi.org/10.1016/s0065-308x(08)00602-7. Fabarius, A., Leitner, A., Hochhaus, A., et al. (2011). Impact of additional cytogenetic aberrations at diagnosis on prognosis of cml: Long-term observation of 1151 patients from the randomized cml study iv. Blood, 118(26), 6760e6768. https://doi.org/10.1182/blood2011-08-373902. Faria, R., Neto, S., Noor, M. A. F., & Navarro, A. (May 2011). Role of natural selection in chromosomal speciation. In eLS. Chichester: John Wiley & Sons Ltd. https://doi.org/10.1002/ 9780470015902.a0022850. http://www.els.net. Fearon, E. R., & Vogelstein, B. (1990). A genetic model for colorectal tumorigenesis. Cell, 61(5), 759e767. Fedoroff, N. (1984). Transposable genetic elements in maize. Scientific American, 65e74. Fedoroff, N. V., & Botstein, D. (1992). The dynamic genome: Barbara mcclintock’s ideas in the century of genetics. Cold Spring Harbor, N.Y.: Cold Spring Harbor Laboratory Press. Feher, T., Papp, B., Pal, C., et al. (2007). Systematic genome reductions: Theoretical and experimental approaches. Chemical Reviews, 107(8), 3498e3513. https://doi.org/10.1021/ cr0683111. Feinberg, A. P. (2014). Epigenetic stochasticity, nuclear structure and cancer: The implications for medicine. Journal of Internal Medicine, 276(1), 5e11. https://doi.org/10.1111/joim.12224. Feinberg, A. P., Ohlsson, R., & Henikoff, S. (2006). The epigenetic progenitor origin of human cancer. Nature Reviews Genetics, 7(1), 21e33. https://doi.org/10.1038/nrg1748. Fenech, M., Chang, W. P., Kirsch-Volders, M., et al. (2003). Humn project: Detailed description of the scoring criteria for the cytokinesis-block micronucleus assay using isolated human lymphocyte cultures. Mutation Research, 534(1e2), 65e75. Ferguson-Smith, M. A., & Trifonov, V. (2007). Mammalian karyotype evolution. Nature Reviews Genetics, 8(12), 950e962. https://doi.org/10.1038/nrg2199.

500

BIBLIOGRAPHY

Ferlini, A., & Fini, S. (2015). The human genome: Better to be dynamic: Book review for genetic heterogeneity and human disease edited by: H heng. European Journal of Human Genetics, 23, 559. https://doi.org/10.1038/ejhg.2015.2. Fernando, M. M., Stevens, C. R., Walsh, E. C., et al. (2008). Defining the role of the mhc in autoimmunity: A review and pooled analysis. PLoS Genetics, 4(4), e1000024. https:// doi.org/10.1371/journal.pgen.1000024. Ferree, P. M., & Barbash, D. A. (2009). Species-specific heterochromatin prevents mitotic chromosome segregation to cause hybrid lethality in drosophila. PLoS Biology, 7(10), e1000234. https://doi.org/10.1371/journal.pbio.1000234. Feuk, L., Carson, A. R., & Scherer, S. W. (2006). Structural variation in the human genome. Nature Reviews Genetics, 7(2), 85e97. https://doi.org/10.1038/nrg1767. Feuk, L., Macdonald, J. R., Tang, T., et al. (2005). Discovery of human inversion polymorphisms by comparative analysis of human and chimpanzee DNA sequence assemblies. PLoS Genetics, 1(4), e56. https://doi.org/10.1371/journal.pgen.0010056. Fidler, I. J., & Hart, I. R. (1981). Biological and experimental consequences of the zonal composition of solid tumors. Cancer Research, 41(8), 3266e3267. Fidlerova´, H., Senger, G., Kost, M., et al. (1994). Two simple procedures for releasing chromatin from routinely fixed cells for fluorescence in situ hybridization. Cytogenetics and Cell Genetics, 65(3), 203e205. Fisher, R. (1930). The genetical theory of natural selection. Oxford: Oxford University Press. Fisher, R. (1936). Has Mendel’s work been rediscovered? Annals of Science, 1, 115e137. Fisher, J. C. (1958). Multiple-mutation theory of carcinogenesis. Nature, 181, 651e652. Flagiello, D., Bernardino-Sgherri, J., & Dutrillaux, B. (2002). Complex relationships between 5-aza-dc induced DNA demethylation and chromosome compaction at mitosis. Chromosoma, 111(1), 37e44. Flores, M., Morales, L., Gonzaga-Jauregui, C., et al. (2007). Recurrent DNA inversion rearrangements in the human genome. Proceedings of the National Academy of Sciences of the United States of America, 104(15), 6099e6106. https://doi.org/10.1073/pnas.0701631104. Flunkert, J., Maierhofer, A., Dittrich, M., et al. (2018). Genetic and epigenetic changes in clonal descendants of irradiated human fibroblasts. Experimental Cell Research, 370(2), 322e332. https://doi.org/10.1016/j.yexcr.2018.06.034. Fodor, J., & Piattelli-Palmarini, M. (2011). What Darwin got wrong. Farrar, Straus and Giroux. Folch, J., Cocero, M. J., Chesne, P., et al. (2009). First birth of an animal from an extinct subspecies (capra pyrenaica pyrenaica) by cloning. Theriogenology, 71(6), 1026e1034. https:// doi.org/10.1016/j.theriogenology.2008.11.005. Forment, J. V., Kaidi, A., & Jackson, S. P. (2012). Chromothripsis and cancer: Causes and consequences of chromosome shattering. Nature Reviews Cancer, 12(10), 663e670. https:// doi.org/10.1038/nrc3352. Forsdyke, D. (2003). William Bateson, Richard Goldschmidt, and non-genic models of speciation. Journal of Biological Systems, 11, 341e350. Forsdyke, D. R. (2004). Chromosomal speciation: A reply. Journal of Theoretical Biology, 230(2), 189e196. https://doi.org/10.1016/j.jtbi.2004.04.020. Fouad, Y. A., & Aanei, C. (2017). Revisiting the hallmarks of cancer. American Journal of Cancer Research, 7(5), 1016e1036. Fournier, D., Estoup, A., Orivel, J., et al. (2005). Clonal reproduction by males and females in the little fire ant. Nature, 435(7046), 1230e1234. https://doi.org/10.1038/nature03705. Fox, M. (2018). Chinese scientist outrages researchers by claiming he gene-edited twins. NBC News report, 2018 Nov 26. Fraga, M. F., Ballestar, E., Paz, M. F., et al. (2005). Epigenetic differences arise during the lifetime of monozygotic twins. Proceedings of the National Academy of Sciences of the United States of America, 102(30), 10604e10609. https://doi.org/10.1073/pnas.0500398102.

BIBLIOGRAPHY

501

Frank, S. A., & Nowak, M. A. (2004). Problems of somatic mutation and cancer. BioEssays, 26(3), 291e299. https://doi.org/10.1002/bies.20000. Franklin, A., Edwards, A. W. F., Fairbanks, D. J., et al. (2008). Ending the mendel-Fisher controversy. University of Pittsburgh Pre. Frias, S., Ramos, S., Salas, C., et al. (2019). Nonclonal chromosome aberrations and genome chaos in somatic and germ cells from patients and survivors of hodgkin lymphoma. Genes (Basel), 10(1). https://doi.org/10.3390/genes10010037. Froenicke, L., Anderson, L. K., Wienberg, J., et al. (2002). Male mouse recombination maps for each autosome identified by chromosome painting. The American Journal of Human Genetics, 71(6), 1353e1368. https://doi.org/10.1086/344714. Fuller, Z. L., Leonard, C. J., Young, R. E., et al. (2018). Ancestral polymorphisms explain the role of chromosomal inversions in speciation. PLoS Genetics, 14(7), e1007526. https:// doi.org/10.1371/journal.pgen.1007526. Futuyma, D. (1998). Evolutionary biology. Sinauer Associates. Futuyma, D. J. (2010). Evolutionary constraint and ecological consequences. Evolution, 64(7), 1865e1884. https://doi.org/10.1111/j.1558-5646.2010.00960.x. Gaiti, F., Jindrich, K., Fernandez-Valverde, S. L., et al. (2017). Landscape of histone modifications in a sponge reveals the origin of animal cis-regulatory complexity. Elife, 6. https:// doi.org/10.7554/eLife.22194. Gao, R., Davis, A., Mcdonald, T. O., et al. (2016). Punctuated copy number evolution and clonal stasis in triple-negative breast cancer. Nature Genetics, 48(10), 1119e1130. https://doi.org/10.1038/ng.3641. Gao, L., Geng, Y., Li, B., et al. (2010). Genome-wide DNA methylation alterations of alternanthera philoxeroides in natural and manipulated habitats: Implications for epigenetic regulation of rapid responses to environmental fluctuation and phenotypic variation. Plant, Cell and Environment, 33(11), 1820e1827. https://doi.org/10.1111/j.1365-3040.2010.02186.x. Gao, C., Su, Y., Koeman, J., et al. (2016). Chromosome instability drives phenotypic switching to metastasis. Proceedings of the National Academy of Sciences of the United States of America, 113(51), 14793e14798. https://doi.org/10.1073/pnas.1618215113. Garagna, S., Marziliano, N., Zuccotti, M., et al. (2001). Pericentromeric organization at the fusion point of mouse robertsonian translocation chromosomes. Proceedings of the National Academy of Sciences of the United States of America, 98(1), 171e175. https://doi.org/ 10.1073/pnas.98.1.171. Garber, K. (2005). Human cancer genome project moving forward despite some doubts in community. Journal of the National Cancer Institute, 97, 1322e1324 (United States). Garrod, A. E. (1902). The incidence of alkaptonuria: A study in chemical individuality. The Lancet, 160, 1616e1620. Gast, C. E., Silk, A. D., Zarour, L., et al. (2018). Cell fusion potentiates tumor heterogeneity and reveals circulating hybrid cells that correlate with stage and survival. Science Advances, 4(9), eaat7828. https://doi.org/10.1126/sciadv.aat7828. Gatenby, R. A., Silva, A. S., Gillies, R. J., et al. (2009). Adaptive therapy. Cancer Research, 69(11), 4894e4903. https://doi.org/10.1158/0008-5472.can-08-3658. Geigl, J. B., Langer, S., Barwisch, S., et al. (2004). Analysis of gene expression patterns and chromosomal changes associated with aging. Cancer Research, 64(23), 8550e8557. https://doi.org/10.1158/0008-5472.CAN-04-2151. Geman, D., & Geman, S. (2016). Opinion: Science in the age of selfies. Proceedings of the National Academy of Sciences of the United States of America, 113(34), 9384e9387. https:// doi.org/10.1073/pnas.1609793113. Gemayel, R., Vinces, M. D., Legendre, M., et al. (2010). Variable tandem repeats accelerate evolution of coding and regulatory sequences. Annual Review of Genetics, 44, 445e477. https://doi.org/10.1146/annurev-genet-072610-155046.

502

BIBLIOGRAPHY

Gemayel, R., Yang, Y., Dzialo, M. C., et al. (2017). Variable repeats in the eukaryotic polyubiquitin gene ubi4 modulate proteostasis and stress survival. Nature Communications, 8(1), 397. https://doi.org/10.1038/s41467-017-00533-4. Genome Web. (2013). Us supreme court strikes down gene patents but allows patenting of synthetic DNA (Jun 13 2013). Retrieved from https://www.genomeweb.com/clinical-genomics/ us-supreme-court-strikes-down-gene-patents-allows-patenting-synthetic-dna#. XDvlhfZFwhd. Gerlinger, M., McGranahan, N., Dewhurst, S. M., et al. (2014). Cancer: Evolution within a lifetime. Annual Review of Genetics, 48, 215e236. Gerrish, P. J., & Lenski, R. E. (1998). The fate of competing beneficial mutations in an asexual population. Genetica, 102e103, 127e144. Gerstein, M. B., Bruce, C., Rozowsky, J. S., et al. (2007). What is a gene, post-encode? History and updated definition. Genome Research, 17(6), 669e681. https://doi.org/10.1101/ gr.6339607. Giaever, G., Chu, A. M., Ni, L., et al. (2002). Functional profiling of the saccharomyces cerevisiae genome. Nature, 418(6896), 387e391. https://doi.org/10.1038/nature00935. Gibbs, W. W. (2003). Untangling the roots of cancer. Scientific American, 289(1), 56e65. Gibson, D. G., Glass, J. I., Lartigue, C., et al. (2010). Creation of a bacterial cell controlled by a chemically synthesized genome. Science, 329(5987), 52e56. https://doi.org/10.1126/ science.1190719. Gil, R., Silva, F. J., Pereto, J., et al. (2004). Determination of the core of a minimal bacterial gene set. Microbiology and Molecular Biology Reviews, 68(3), 518e537. https://doi.org/10.1128/ mmbr.68.3.518-537.2004. Table of contents. Glansdorff, N., Xu, Y., & Labedan, B. (2009). The conflict between horizontal gene transfer and the safeguard of identity: Origin of meiotic sexuality. Journal of Molecular Evolution, 69(5), 470e480. https://doi.org/10.1007/s00239-009-9277-7. Glass, J. I., Assad-Garcia, N., Alperovich, N., et al. (2006). Essential genes of a minimal bacterium. Proceedings of the National Academy of Sciences of the United States of America, 103(2), 425e430. https://doi.org/10.1073/pnas.0510013103. Glass, J. I., Hutchison, C. A., 3rd, Smith, H. O., et al. (2009). A systems biology tour de force for a near-minimal bacterium. Molecular Systems Biology, 5, 330 (England). Gluckman, P. D., Beedle, A. S., Hanson, M. A., et al. (2013). Human growth: Evolutionary and life history perspectives. Nestle´ Nutrition Institute Workshop Series, 71, 89e102. https:// doi.org/10.1159/000342572. Gluckman, P. D., Hanson, M. A., & Low, F. M. (2011). The role of developmental plasticity and epigenetics in human health. Birth Defects Research C Embryo Today, 93(1), 12e18. https://doi.org/10.1002/bdrc.20198. Golding, I., Paulsson, J., Zawilski, S. M., et al. (2005). Real-time kinetics of gene activity in individual bacteria. Cell, 123(6), 1025e1036. https://doi.org/10.1016/j.cell.2005.09.031. Goldschmidt, R. (1940). The material basis of evolution. Yale University Press. Gordon, J. L., Byrne, K. P., & Wolfe, K. H. (2011). Mechanisms of chromosome number evolution in yeast. PLoS Genetics, 7(7), e1002190. https://doi.org/10.1371/journal.pgen.1002190. Gordon, D. J., Resio, B., & Pellman, D. (2012). Causes and consequences of aneuploidy in cancer. Nature Reviews Genetics, 13(3), 189e203. https://doi.org/10.1038/nrg3123. Gore, A. V., Tomins, K. A., Iben, J., et al. (2018). An epigenetic mechanism for cavefish eye degeneration. Nature Ecology and Evolution, 2(7), 1155e1160. https://doi.org/10.1038/ s41559-018-0569-4. Gorelick, R., & Carpinone, J. (2009). Origin and maintenance of sex: The evolutionary joys of self sex. Biological Journal of the Linnean Society, 98, 707e728. Gorelick, R., & Heng, H. H. (2011). Sex reduces genetic variation: A multidisciplinary review. Evolution, 65(4), 1088e1098. https://doi.org/10.1111/j.1558-5646.2010.01173.x. Gould, S. (1977). The return of the hopeful monster. Natural History, 86, 22e30.

BIBLIOGRAPHY

503

Gould, S. J. (1982). The uses of heresy, an introduction to a reissue of Richard Goldschmidt’s the material basis of evolution. In The material basis of evolution (pp. xiiiexlii). Yale University Press. Gould, S. (1990). Caring groups and selfish genes. In The panda’s thumb: More reflections in natural history. London: Penguin Books. Gould, S. J. (1999). The evolution of life. In J. W. Schopf (Ed.), Evolution facts and fallacies. San Diego, USA: Academic Press. Gould, S. J. (2002). The structure of evolutionary theory. Harvard University Press. Grad, I., & Picard, D. (2007). The glucocorticoid responses are shaped by molecular chaperones. Molecular and Cellular Endocrinology, 275(1e2), 2e12. https://doi.org/ 10.1016/j.mce.2007.05.018. Grant, P. R., & Grant, B. R. (2014). 40 years of evolution: Darwin’s finches on daphne major island. Princeton University Press. Grantham, R., Gautier, C., Gouy, M., et al. (1980). Codon catalog usage and the genome hypothesis. Nucleic Acids Research, 8(1), r49er62. Graphodatsky, A. S., Trifonov, V. A., & Stanyon, R. (2011). The genome diversity and karyotype evolution of mammals. Molecular Cytogenetics, 4, 22. https://doi.org/10.1186/17558166-4-22. Gravis, G., Bladou, F., Salem, N., et al. (2008). Results from a monocentric phase II trial of erlotinib in patients with metastatic prostate cancer. Annals of Oncology, 19, 1624e1628. Greene, C. S., Penrod, N. M., Williams, S. M., et al. (2009). Failure to replicate a genetic association may provide important clues about genetic architecture. PLoS One, 4(6), e5639. https://doi.org/10.1371/journal.pone.0005639. Greig, D. (2007). A screen for recessive speciation genes expressed in the gametes of f1 hybrid yeast. PLoS Genetics, 3(2), e21. https://doi.org/10.1371/journal.pgen.0030021. Grunspan, D. Z., Nesse, R. M., Barnes, M. E., et al. (2018). Core principles of evolutionary medicine: A delphi study. Evolution, Medicine, and Public Health, 2018(1), 13e23. https://doi.org/10.1093/emph/eox025. Guenet, J. L. (2005). The mouse genome. Genome Research, 15(12), 1729e1740. https:// doi.org/10.1101/gr.3728305. Guenther, C. (1906). Darwinism and the problems of life. A study of familiar animal life. London: Translated by McCabe, J. A., Owen, London. Guerin, O., Fischel, J. L., Ferrero, J.-M., et al. (2010). EGFR targeting in hormonerefractory prostate cancer: Current appraisal and prospects for treatment. Pharmaceuticals, 3, 2238e2247. Gulland, J. M. (1947). The structures of nucleic acids. Symposia of the Society for Experimental Biology, 1, 1e14. Gupta, P. B., Fillmore, C. M., Jiang, G., et al. (2011). Stochastic state transitions give rise to phenotypic equilibrium in populations of cancer cells. Cell, 146(4), 633e644. https:// doi.org/10.1016/j.cell.2011.07.026. Gurley, K. A., Rink, J. C., & Sanchez Alvarado, A. (2008). Beta-catenin defines head versus tail identity during planarian regeneration and homeostasis. Science, 319(5861), 323e327. https://doi.org/10.1126/science.1150029. Gymrek, M., Willems, T., Guilmatre, A., et al. (2016). Abundant contribution of short tandem repeats to gene expression variation in humans. Nature Genetics, 48(1), 22e29. https:// doi.org/10.1038/ng.3461. Haaf, T., & Schmid, M. (1989). 5-azadeoxycytidine induced undercondensation in the giant x chromosomes of microtus agrestis. Chromosoma, 98(2), 93e98. Haaf, T., & Ward, D. C. (1994). Structural analysis of alpha-satellite DNA and centromere proteins using extended chromatin and chromosomes. Human Molecular Genetics, 3(5), 697e709.

504

BIBLIOGRAPHY

Hahn, W. C., Counter, C. M., Lundberg, A. S., et al. (1999). Creation of human tumour cells with defined genetic elements. Nature, 400(6743), 464e468. https://doi.org/10.1038/22780. Hahn, W. C., & Weinberg, R. A. (2002). Modelling the molecular circuitry of cancer. Nature Reviews Cancer, 2(5), 331e341. https://doi.org/10.1038/nrc795. Han, P. (2009). Scientists unlock genetic code in major cancer breakthrough. London: CNN. Dec 17, 2009. Hanahan, D., & Weinberg, R. A. (2000). The hallmarks of cancer. Cell, 100(1), 57e70. Hanahan, D., & Weinberg, R. A. (2011). Hallmarks of cancer: The next generation. Cell, 144(5), 646e674. https://doi.org/10.1016/j.cell.2011.02.013. Handberg-Thorsager, M., Fernandez, E., & Salo, E. (2008). Stem cells and regeneration in planarians. Frontiers in Bioscience, 13, 6374e6394. Hansemann, D. (1897). Die mikroskopische diagnose der boesartigen geschwuelste. Berlin: August Hirschwald. Harmon, K. (March 24, 2011). Is the “war on cancer” winnable? 40 years after the unofficial declaration, the disease is spreading throughout the globe. Retrieved from https://blogs. scientificamerican.com/observations/is-the-war-on-cancer-winnable-40-years-after-theunofficial-declaration-the-disease-is-spreading-throughout-the-globe/. Harrow, J., Frankish, A., Gonzalez, J. M., et al. (2012). Gencode: The reference human genome annotation for the encode project. Genome Research, 22(9), 1760e1774. https://doi.org/ 10.1101/gr.135350.111. Hartl, D. L., & Fairbanks, D. J. (2007). Mud sticks: On the alleged falsification of mendel’s data. Genetics, 175(3), 975e979. Harton, G. L., & Tempest, H. G. (2012). Chromosomal disorders and male infertility. Asian Journal of Andrology, 14(1), 32e39. https://doi.org/10.1038/aja.2011.66. Hassold, T., & Hunt, P. (2001). To err (meiotically) is human: The genesis of human aneuploidy. Nature Reviews Genetics, 2(4), 280e291. https://doi.org/10.1038/35066065. Hazen, R. M., Griffin, P. L., Carothers, J. M., et al. (2007). Functional information and the emergence of biocomplexity. Proceedings of the National Academy of Sciences of the United States of America, 104(Suppl. 1), 8574e8581. https://doi.org/10.1073/pnas.0701744104. Heiskanen, M., Karhu, R., Hellsten, E., et al. (1994). High resolution mapping using fluorescence in situ hybridization to extended DNA fibers prepared from agarose-embedded cells. Biotechniques, 17(5), 928e929, 932-923. Hendry, A. P., Nosil, P., & Rieseberg, L. H. (2007). The speed of ecological speciation. Functional Ecology, 21(3), 455e464. https://doi.org/10.1111/j.1365-2435.2006.01240.x. Hendry, A. P., Wenburg, J. K., Bentzen, P., et al. (2000). Rapid evolution of reproductive isolation in the wild: Evidence from introduced salmon. Science, 290(5491), 516e519. Heng, H. H. (2007a). Cancer genome sequencing: The challenges ahead. BioEssays, 29(8), 783e794. https://doi.org/10.1002/bies.20610. Heng, H. H. (2007b). Elimination of altered karyotypes by sexual reproduction preserves species identity. Genome, 50(5), 517e524. https://doi.org/10.1139/g07-039. Heng, H. H. (2007c). Karyotype chaos, a form of non-clonal chromosome aberrations, plays a key role in cancer progression and drug resistance. In FASEB summer meeting: Nuclear structure and cancer. Vermont: Saxton reviver. Heng, H. H. (2008a). The conflict between complex systems and reductionism. Journal of the American Medical Association, 300(13), 1580e1581. https://doi.org/10.1001/jama.300.13.1580. Heng, H. H. (2008b). The gene-centric concept: a new liability? Bioessays, 30, 196e197. Heng, H. H. (2009). The genome-centric concept: Resynthesis of evolutionary theory. BioEssays, 31(5), 512e525. https://doi.org/10.1002/bies.200800182. Heng, H. H. (2010). Missing heritability and stochastic genome alterations. Nature Reviews Genetics, 11, 813 (England). Heng, H. H. (2013a). The contribution of genomic heterogeneity. Preface. Cytogenetic and Genome Research, 139, 141e143.

BIBLIOGRAPHY

505

Heng, H. H. (2013b). Genomics: Hela genome versus donor’s genome. Nature, 501, 167 (England). Heng, H. H. (2013c). Bio-complexity: Challenging reductionism. In J. Sturmberg & C. Martin (Eds.), Handbook on systems and complexity in health (pp. 193e208). Springer. Heng, H. H. (2014). Distinguishing constitutional and acquired nonclonal aneuploidy. Proceedings of the National Academy of Sciences of the United States of America, 111, E972 (United States). Heng, H. H. (2015). Debating cancer: The paradox in cancer research. Singapore: World Scientific Publishing Company. Heng, H. H. (2017a). Chapter 5 - the genomic landscape of cancers. In B. Ujvari, B. Roche & F. Thomas (Eds.), Ecology and evolution of cancer (pp. 69e86). Academic Press. Heng, H. H. (2017b). Heterogeneity-mediated cellular adaptation and its trade-off: Searching for the general principles of diseases. Journal of Evaluation in Clinical Practice, 23(1), 233e237. https://doi.org/10.1111/jep.12598. Heng, H. H., Bremer, S. W., Stevens, J., et al. (2006b). Cancer progression by non-clonal chromosome aberrations. Journal of Cellular Biochemistry, 98(6), 1424e1435. https://doi.org/ 10.1002/jcb.20964. Heng, H. H., Bremer, S. W., Stevens, J. B., et al. (2009). Genetic and epigenetic heterogeneity in cancer: A genome-centric perspective. Journal of Cellular Physiology, 220(3), 538e547. https://doi.org/10.1002/jcp.21799. Heng, H. H., Bremer, S. W., Stevens, J. B., et al. (2013a). Chromosomal instability (cin): What it is and why it is crucial to cancer evolution. Cancer and Metastasis Reviews, 32(3e4), 325e340. https://doi.org/10.1007/s10555-013-9427-7. Heng, H. H., Chamberlain, J. W., Shi, X. M., et al. (1996). Regulation of meiotic chromatin loop size by chromosomal position. Proceedings of the National Academy of Sciences of the United States of America, 93(7), 2795e2800. Heng, H. H., & Chen, W. (1985). The study of the chromatin and the chromosome structure for bufo gargarizansmby the light microscope. Journal of Sichuan Normal University Natural Science, (2), 105e109. Heng, H. H., Chen, W., & Wang, Y. (1988). Effects of pingyanymycin on chromosomes: A possible structural basis for chromosome aberration. Mutation Research, 199, 199e205. Heng, H. H., Chen, W., & Yosida, T. (1987a). Studies on amphibian chromosome by the high resolution banding. Proceedings of the Japan Academy Series B, 62, 53e56. Heng, H. H., Goetze, S., Ye, C. J., et al. (2004a). Chromatin loops are selectively anchored using scaffold/matrix-attachment regions. Journal of Cell Science, 117(Pt 7), 999e1008. https://doi.org/10.1242/jcs.00976. Heng, H. H., Horne, S. D., Chaudhry, S., et al. (2018). A postgenomic perspective on molecular cytogenetics. Current Genomics, 19(3), 227e239. https://doi.org/10.2174/ 1389202918666170717145716. Heng, H. H., Horne, S. D., Stevens, J. B., et al. (2016c). Heterogeneity mediated system complexity: The ultimate challenge for studying common and complex diseases. In J. P. Sturmberg (Ed.), The Value of Systems and Complexity Sciences for Healthcare (pp. 107e120). Springer International Publishing. Heng, H. H., Krawetz, S. A., Lu, W., et al. (2001a). Re-defining the chromatin loop domain. Cytogenetics and Cell Genetics, 93(3e4), 155e161. https://doi.org/10.1159/000056977. Heng, H. H., Lin, R., Zhao, X., et al. (1987b). Structure of the chromosome and its formation. II. Studies on the sister unit fibers. The Nucleus, 30, 2e9. Heng, H. H., Liu, G., Alemara, S., et al. (2019). The mechanisms of how genomic heterogeneity impacts bio-emergent properties: The challenges for precision medicine. In J. P. Sturmberg (Ed.), Embracing Complexity in Health. Springer Nature Switzerland AG. https://doi.org/10.1007/978-3-030-10940-0_6.

506

BIBLIOGRAPHY

Heng, H. H., Liu, G., Bremer, S., et al. (2006c). Clonal and non-clonal chromosome aberrations and genome variation and aberration. Genome, 49(3), 195e204. https://doi.org/10.1139/ g06-023. Heng, H. H., Liu, G., Lu, W., et al. (2001b). Spectral karyotyping (SKY) of mouse meiotic chromosomes. Genome, 44(2), 293e298. Heng, H. H., Liu, G., Stevens, J. B., et al. (2010b). Genetic and epigenetic heterogeneity in cancer: The ultimate challenge for drug therapy. Current Drug Targets, 11(10), 1304e1316. Heng, H. H., Liu, G., Stevens, J. B., et al. (2011a). Decoding the genome beyond sequencing: The new phase of genomic research. Genomics, 98(4), 242e252. https://doi.org/10.1016/ j.ygeno.2011.05.008. Heng, H. H., Liu, G., Stevens, J. B., et al. (2013b). Karyotype heterogeneity and unclassified chromosomal abnormalities. Cytogenetic and Genome Research, 139(3), 144e157. https:// doi.org/10.1159/000348682. Heng, H. H., & Regan, S. (2017). A systems biology perspective on molecular cytogenetics. Current Bioinformatics, 12, 4e10. https://doi.org/10.2174/1574893611666160606163419. Heng, H. H., Regan, S. M., Liu, G., et al. (2016a). Why it is crucial to analyze non clonal chromosome aberrations or nccas? Molecular Cytogenetics, 9, 15. https://doi.org/10.1186/ s13039-016-0223-2. Heng, H. H., Regan, S., & Ye, C. (2016b). Genotype, environment, and evolutionary mechanism of diseases. Environmental Disease, 1, 14e23. Heng, H. H., & Shi, X. M. (1997). From free chromatin analysis to high resolution fiber fish. Cell Research, 7(1), 119e124. https://doi.org/10.1038/cr.1997.13. Heng, H. H., Squire, J., & Tsui, L. (1991). Chromatin mapping - a strategy for physical characterization of the human genome by hybridization in situ. In Paper presented at the 8th Int Cong Hum Gen Am J Hum Gent, DC, USA. Heng, H. H., Squire, J., & Tsui, L. C. (1992). High-resolution mapping of mammalian genes by in situ hybridization to free chromatin. Proceedings of the National Academy of Sciences of the United States of America, 89(20), 9509e9513. https://doi.org/10.1073/pnas.89.20.9509. Heng, H. H., Stevens, J. B., Bremer, S. W., et al. (2010a). The evolutionary mechanism of cancer. Journal of Cellular Biochemistry, 109(6), 1072e1084. https://doi.org/10.1002/ jcb.22497. Heng, H. H., Stevens, J. B., Bremer, S. W., et al. (2011b). Evolutionary mechanisms and diversity in cancer. Advances in Cancer Research, 112, 217e253. https://doi.org/10.1016/b978-012-387688-1.00008-9. Heng, H. H., Stevens, J. B., Lawrenson, L., et al. (2008). Patterns of genome dynamics and cancer evolution. Cellular Oncology, 30(6), 513e514. Heng, H. H., Stevens, J. B., Liu, G., et al. (2004c). Imaging genome abnormalities in cancer research. Cell and Chromosome, 3(1), 1. https://doi.org/10.1186/1475-9268-3-1. Heng, H. H., Stevens, J. B., Liu, G., et al. (2006a). Stochastic cancer progression driven by nonclonal chromosome aberrations. Journal of Cellular Physiology, 208(2), 461e472. https:// doi.org/10.1002/jcp.20685. Heng, H. H., Stevens, J., Yang, F., et al. (2004b). Packaging of meiotic chromosomes correlated to AC-content, loop size, and recombination rates. In Paper presented at the annual meeting of American society of human genetics, Toronto. Heng, H. H., Tsui, L. C., & Moens, P. B. (1994). Organization of heterologous DNA inserts on the mouse meiotic chromosome core. Chromosoma, 103(6), 401e407. Heng, H. H., Ye, C. J., Heng, K., et al. (2000). Nuclear matrix sequence is involved but not sufficient for in vivo chromatin loop formation. Molecular Biology of the Cell, 11, 2271. Heng, H. H., Ye, C. J., Yang, F., et al. (2003). Analysis of marker or complex chromosomal rearrangements present in pre- and post-natal karyotypes utilizing a combination of G-banding, spectral karyotyping and fluorescence in situ hybridization. Clinical Genetics, 63, 358e367.

BIBLIOGRAPHY

507

Heng, H. H., & Zhao, X. (1987). The free chromatin structure in blood cell culture. Journal of Sichuan Normal University Natural Science, 4, 480e485. Heppner, G. H. (1984). Tumor heterogeneity. Cancer Research, 44(6), 2259e2265. Heppner, G. H., & Miller, B. E. (1989). Therapeutic implications of tumor heterogeneity. Seminars in Oncology, 16(2), 91e105. Heppner, G. H., & Miller, F. R. (1998). The cellular basis of tumor progression. International Review of Cytology, 177, 1e56. Hillenmeyer, M. E., Fung, E., Wildenhain, J., et al. (2008). The chemical genomic portrait of yeast: Uncovering a phenotype for all genes. Science, 320(5874), 362e365. https:// doi.org/10.1126/science.1150021. Hintze, A., & Adami, C. (2008). Evolution of complex modular biological networks. PLoS Computational Biology, 4(2), e23. https://doi.org/10.1371/journal.pcbi.0040023. Hirai, H., Hirai, Y., Morimoto, M., et al. (2017). Night monkey hybrids exhibit de novo genomic and karyotypic alterations: The first such case in primates. Genome Biology and Evolution, 9(4), 945e955. https://doi.org/10.1093/gbe/evx058. Hirsch, H. A., Iliopoulos, D., Joshi, A., et al. (2010). A transcriptional signature and common gene networks link cancer with lipid metabolism and diverse human diseases. Cancer Cell, 17(4), 348e361. https://doi.org/10.1016/j.ccr.2010.01.022. Ho, S. Y., Phillips, M. J., Cooper, A., et al. (2005). Time dependency of molecular rate estimates and systematic overestimation of recent divergence times. Molecular Biology and Evolution, 22(7), 1561e1568. https://doi.org/10.1093/molbev/msi145. Hoeijmakers, J. H. (2001). Genome maintenance mechanisms for preventing cancer. Nature, 411(6835), 366e374. https://doi.org/10.1038/35077232. Hoelzer, G. A., Smith, E., & Pepper, J. W. (2006). On the logical relationship between natural selection and self-organization. Journal of Evolutionary Biology, 19(6), 1785e1794. https:// doi.org/10.1111/j.1420-9101.2006.01177.x. Hoey, T. (2010). Drug resistance, epigenetics, and tumor cell heterogeneity. Science Translational Medicine, 2(28), 28ps19. https://doi.org/10.1126/scitranslmed.3001056. Holland, A. J., & Cleveland, D. W. (2012). Chromoanagenesis and cancer: Mechanisms and consequences of localized, complex chromosomal rearrangements. Nature Medicine, 18(11), 1630e1638. https://doi.org/10.1038/nm.2988. Ho¨lldobler, B., Wilson, E. O., & Nelson, M. C. (2009). The superorganism: The beauty, elegance, and strangeness of insect societies. W.W. Norton. Holman, L., Trontti, K., & Helantera¨, H. (2016). Queen pheromones modulate DNA methyltransferase activity in bee and ant workers. Biology Letters, 12(1), 20151038. https:// doi.org/10.1098/rsbl.2015.1038. Hong, K. K., Vongsangnak, W., Vemuri, G. N., et al. (2011). Unravelling evolutionary strategies of yeast for improving galactose utilization through integrated systems level analysis. Proceedings of the National Academy of Sciences of the United States of America, 108(29), 12179e12184. https://doi.org/10.1073/pnas.1103219108. Horne, S. D., Abdallah, B. Y., Stevens, J. B., et al. (2013a). Genome constraint through sexual reproduction: Application of 4d-genomics in reproductive biology. Systems Biology in Reproductive Medicine, 59(3), 124e130. https://doi.org/10.3109/19396368.2012.754969. Horne, S. D., Chowdhury, S. K., & Heng, H. H. (2014). Stress, genomic adaptation, and the evolutionary trade-off. Frontiers in Genetics, 5, 92. https://doi.org/10.3389/ fgene.2014.00092. Horne, S. D., & Heng, H. H. (2014). Genome Chaos, Chromothripsis and Cancer Evolution. Journal of Cancer Studies and Theraphy, 1, 1e6. Horne, S. D., Pollick, S. A., & Heng, H. H. (2015c). Evolutionary mechanism unifies the hallmarks of cancer. International Journal of Cancer, 136(9), 2012e2021. https://doi.org/ 10.1002/ijc.29031.

508

BIBLIOGRAPHY

Horne, S. D., Stevens, J. B., Abdallah, B. Y., et al. (2013b). Why imatinib remains an exception of cancer research. Journal of Cellular Physiology, 228(4), 665e670. https://doi.org/ 10.1002/jcp.24233. Horne, S., Ye, C., Abdallah, B., et al. (2015a). Cancer genome evolution. Translational Cancer Research, 4(3), 303e313. Horne, S., Ye, C., & Heng, H. (2015b). Chromosomal instability (cin) in cancer. In eLS. Chichester: John Wiley & Sons Ltd. Hosken, D. J., & Hodgson, D. J. (2014). Why do sperm carry rna? Relatedness, conflict, and control. Trends in Ecology and Evolution, 29(8), 451e455. https://doi.org/10.1016/ j.tree.2014.05.006. Hu, J., Liu, Y. F., Wu, C. F., et al. (2009). Long-term efficacy and safety of all-trans retinoic acid/arsenic trioxide-based therapy in newly diagnosed acute promyelocytic leukemia. Proceedings of the National Academy of Sciences of the United States of America, 106(9), 3342e3347. https://doi.org/10.1073/pnas.0813280106. Huang, S. (2013). Genetic and non-genetic instability in tumor progression: Link between the fitness landscape and the epigenetic landscape of cancer cells. Cancer and Metastasis Reviews, 32(3e4), 423e448. https://doi.org/10.1007/s10555-013-9435-7. Huang, S., Ernberg, I., & Kauffman, S. (2009). Cancer attractors: A systems view of tumors from a gene network dynamics and developmental perspective. Seminars in Cell and Developmental Biology, 20(7), 869e876. https://doi.org/10.1016/j.semcdb.2009.07.003. Hulten, M. A., Jonasson, J., Iwarsson, E., et al. (2013). Trisomy 21 mosaicism: We may all have a touch of down syndrome. Cytogenetic and Genome Research, 139(3), 189e192. https:// doi.org/10.1159/000346028. Hunt, G. (2007). The relative importance of directional change, random walks, and stasis in the evolution of fossil lineages. Proceedings of the National Academy of Sciences of the United States of America, 104(47), 18404e18408. https://doi.org/10.1073/pnas.0704088104. Hurst, L. D. (2009). Fundamental concepts in genetics: Genetics and the understanding of selection. Nature Reviews Genetics, 10(2), 83e93. https://doi.org/10.1038/nrg2506. Hutchison, C. A., Chuang, R. Y., Noskov, V. N., et al. (2016). Design and synthesis of a minimal bacterial genome. Science, 351(6280), aad6253. https://doi.org/10.1126/ science.aad6253. Huxley, J. (1956). Cancer biology: Comparative and genetic. Biological Reviews, 31, 474e514. Iafrate, A. J., Feuk, L., Rivera, M. N., et al. (2004). Detection of large-scale variation in the human genome. Nature Genetics, 36(9), 949e951. https://doi.org/10.1038/ng1416. Ideker, T., Galitski, T., & Hood, L. (2001). A new approach to decoding life: Systems biology. Annual Review of Genomics and Human Genetics, 2, 343e372. https://doi.org/10.1146/ annurev.genom.2.1.343. Ikami, K., Nuzhat, N., & Lei, L. (2017). Organelle transport during mouse oocyte differentiation in germline cysts. Current Opinion in Cell Biology, 44, 14e19. https://doi.org/ 10.1016/j.ceb.2016.12.002. Inaki, K., & Liu, E. T. (2012). Structural mutations in cancer: Mechanistic and functional insights. Trends in Genetics, 28(11), 550e559. https://doi.org/10.1016/j.tig.2012.07.002. Insel, T. R., & Wang, P. S. (2010). Rethinking mental illness. Journal of the American Medical Association, 303(19), 1970e1971. https://doi.org/10.1001/jama.2010.555. Institute, N. C. (October 3, 2018). NCI budget and appropriations. Retrieved from https://www. cancer.gov/about-nci/budget. Institute, NHGR. Frequently asked questions about genetic and genomic science. http:// www.genome.gov/19016904. Iourov, I. Y., Vorsanova, S. G., Liehr, T., et al. (2009). Aneuploidy in the normal, alzheimer’s disease and ataxia-telangiectasia brain: Differential expression and pathological meaning. Neurobiology of Disease, 34(2), 212e220. https://doi.org/10.1016/ j.nbd.2009.01.003.

BIBLIOGRAPHY

509

Iourov, I. Y., Vorsanova, S. G., & Yurov, Y. B. (2008a). Chromosomal mosaicism goes global. Molecular Cytogenetics, 1, 26. https://doi.org/10.1186/1755-8166-1-26. Iourov, I. Y., Vorsanova, S. G., & Yurov, Y. B. (2008b). Molecular cytogenetics and cytogenomics of brain diseases. Current Genomics, 9(7), 452e465. https://doi.org/10.2174/ 138920208786241216. Iourov, I. Y., Vorsanova, S. G., & Yurov, Y. B. (2012). Single cell genomics of the brain: Focus on neuronal diversity and neuropsychiatric diseases. Current Genomics, 13(6), 477e488. https://doi.org/10.2174/138920212802510439. Isalan, M., Lemerle, C., Michalodimitrakis, K., et al. (2008). Evolvability and hierarchy in rewired bacterial gene networks. Nature, 452(7189), 840e845. https://doi.org/10.1038/ nature06847. Iskow, R. C., Mccabe, M. T., Mills, R. E., et al. (2010). Natural mutagenesis of human genomes by endogenous retrotransposons. Cell, 141(7), 1253e1261. https://doi.org/10.1016/ j.cell.2010.05.020. Jabbour, E., Hochhaus, A., Cortes, J., et al. (2010). Choosing the best treatment strategy for chronic myeloid leukemia patients resistant to imatinib: Weighing the efficacy and safety of individual drugs with bcr-abl mutations and patient history. Leukemia, 24(1), 6e12. https://doi.org/10.1038/leu.2009.193. Jablonka, E. (2012). Epigenetic variations in heredity and evolution. Clinical Pharmacology and Therapeutics, 92(6), 683e688. https://doi.org/10.1038/clpt.2012.158. Jablonka, E. (2013). Epigenetic inheritance and plasticity: The responsive germline. Progress in Biophysics and Molecular Biology, 111(2e3), 99e107. https://doi.org/10.1016/ j.pbiomolbio.2012.08.014. Jablonka, E., & Raz, G. (2009). Transgenerational epigenetic inheritance: Prevalence, mechanisms, and implications for the study of heredity and evolution. The Quarterly Review of Biology, 84(2), 131e176. Jablonski, D. (1987). Heritability at the species level: Analysis of geographic ranges of cretaceous mollusks. Science, 238(4825), 360e363. https://doi.org/10.1126/science. 238.4825.360. Jablonski, D., Belanger, C. L., Berke, S. K., et al. (2013). Out of the tropics, but how? Fossils, bridge species, and thermal ranges in the dynamics of the marine latitudinal diversity gradient. Proceedings of the National Academy of Sciences of the United States of America, 110(26), 10487e10494. https://doi.org/10.1073/pnas.1308997110. Jackson, D. A., Symons, R. H., & Berg, P. (1972). Biochemical method for inserting new genetic information into DNA of simian virus 40: Circular sv40 DNA molecules containing lambda phage genes and the galactose operon of escherichia coli. Proceedings of the National Academy of Sciences of the United States of America, 69(10), 2904e2909. Jamal-Hanjani, M., Wilson, G. A., Mcgranahan, N., et al. (2017). Tracking the evolution of non-small-cell lung cancer. New England Journal of Medicine, 376(22), 2109e2121. https://doi.org/10.1056/NEJMoa1616288. Jeffery, W. R. (2008). Emerging model systems in evo-devo: Cavefish and microevolution of development. Evolution and Development, 10(3), 265e272. https://doi.org/10.1111/j.1525142X.2008.00235.x. Jenkins, S. (2008). Is horse racing breeding itself to death? The Washington Post. May 4, 2008 http://www.chai-online.org/en/campaigns/racing/media/washington-post_8belles_ 4may08.pdf. Jiao, Y., Wickett, N. J., Ayyampalayam, S., et al. (2011). Ancestral polyploidy in seed plants and angiosperms. Nature, 473(7345), 97e100. https://doi.org/10.1038/nature09916. Johannsen, W. (1909). Elemente der exakten erblichkeitslehre. Jena: Gustav Fischer. Johnson, N. A. (2010). Hybrid incompatibility genes: Remnants of a genomic battlefield? Trends in Genetics, 26(7), 317e325. https://doi.org/10.1016/j.tig.2010.04.005.

510

BIBLIOGRAPHY

Jolly, M. K., Jia, D., Boareto, M., et al. (2015). Coupling the modules of emt and stemness: A tunable ’stemness window’ model. Oncotarget, 6(28), 25161e25174. https://doi.org/ 10.18632/oncotarget.4629. Jolly, M. K., Somarelli, J. A., Sheth, M., et al. (2018). Hybrid epithelial/mesenchymal phenotypes promote metastasis and therapy resistance across carcinomas. Pharmacology and Therapeutics. https://doi.org/10.1016/j.pharmthera.2018.09.007. Jones, R. N. (2005). Mcclintock’s controlling elements: The full story. Cytogenetic and Genome Research, 109(1e3), 90e103. https://doi.org/10.1159/000082387. Jones, P. A., & Baylin, S. B. (2002). The fundamental role of epigenetic events in cancer. Nature Reviews Genetics, 3(6), 415e428. https://doi.org/10.1038/nrg816. Jones, F. C., Grabherr, M. G., Chan, Y. F., et al. (2012). The genomic basis of adaptive evolution in threespine sticklebacks. Nature, 484(7392), 55e61. https://doi.org/10.1038/ nature10944. Jones, M. J., & Jallepalli, P. V. (2012). Chromothripsis: Chromosomes in crisis. Developmental Cell, 23(5), 908e917. https://doi.org/10.1016/j.devcel.2012.10.010. Joyner, M. J., Paneth, N., & Ioannidis, J. P. (2016). What happens when underperforming big ideas in research become entrenched? Journal of the American Medical Association, 316(13), 1355e1356. https://doi.org/10.1001/jama.2016.11076. Joyner, M. J., & Prendergast, F. G. (2014). Chasing mendel: Five questions for personalized medicine. The Journal of Physiology, 592(11), 2381e2388. https://doi.org/10.1113/ jphysiol.2014.272336. Juhas, M., Eberl, L., & Glass, J. I. (2011). Essence of life: Essential genes of minimal genomes. Trends in Cell Biology, 21(10), 562e568. https://doi.org/10.1016/j.tcb.2011.07.005. Jung, Y. H., Sauria, M. E. G., Lyu, X., et al. (2017). Chromatin states in mouse sperm correlate with embryonic and adult regulatory landscapes. Cell Reports, 18(6), 1366e1382. https:// doi.org/10.1016/j.celrep.2017.01.034. Kachroo, A. H., Laurent, J. M., Yellman, C. M., et al. (2015). Evolution. Systematic humanization of yeast genes reveals conserved functions and genetic modularity. Science, 348(6237), 921e925. https://doi.org/10.1126/science.aaa0769. Kaiser, J. (2003). War on cancer. NCI goal aims for cancer victory by 2015. Science, 299, 1297e1298 (United States). Kalmus, H. (1950). A cybernetical aspect of genetics. Journal of Heredity, 41, 19e22. Kampourakis, K. (2017). Making sense of genes. Cambridge University Press. Kang, H., Jung, Y. L., Mcelroy, K. A., et al. (2017). Bivalent complexes of prc1 with orthologs of brd4 and moz/morf target developmental genes in drosophila. Genes and Development, 31(19), 1988e2002. https://doi.org/10.1101/gad.305987.117. Kathiresan, S., Willer, C. J., Peloso, G. M., et al. (2009). Common variants at 30 loci contribute to polygenic dyslipidemia. Nature Genetics, 41(1), 56e65. https://doi.org/10.1038/ng.291. Kauffman, S. A., & Macready, W. G. (1995). Search strategies for applied molecular evolution. Journal of Theoretical Biology, 173(4), 427e440. https://doi.org/10.1006/jtbi.1995.0074. Keim, B. (2008). Genomics started with boozing session in Maine. Wired Science. Keller, E. F. (1993). Rethinking the meaning of genetic determinism. For the tanner lectures on human values delivered at the University of Utah. February 18, 1993. Keller, E. F. (2009). The century of the gene. Harvard University Press. Keller, E. F. (2011). Towards a science of informed matter. Studies in History and Philosophy of Biological and Biomedical Sciences, 42(2), 174e179. https://doi.org/10.1016/j.shpsc. 2010.11.024. Keller, E. F. (2012). Genes, genomes and genomics. Biological Theory, 6, 132e140. Keller, E. F., & Harel, D. (2007). Beyond the gene. PLoS One, 2(11), e1231. https://doi.org/ 10.1371/journal.pone.0001231. Kern, S. E. (2010). Where’s the passion? Cancer Biology and Therapy, 10(7), 655e657. https:// doi.org/10.4161/cbt.10.7.12994.

BIBLIOGRAPHY

511

Kevles, D. J., & Hood, L. E. (1992). The code of codes scientific and social issues in the human genome project. Khan, R. (2011). Everything I didn’t know about sex. Retrieved from http://blogs. discovermagazine.com/gnxp/2011/07/everything-i-didnt-know-about-sex/ #.XFD73PZFz3w. Khoury, M. J., & Galea, S. (2016). Will precision medicine improve population health? Journal of the American Medical Association, 316(13), 1357e1358. https://doi.org/10.1001/ jama.2016.12260. Kim, T., Bershteyn, M., & Wynshaw-Boris, A. (2014). Chromosome therapy. Correction of large chromosomal aberrations by inducing ring chromosomes in induced pluripotent stem cells (ipscs). Nucleus, 5(5), 391e395. https://doi.org/10.4161/nucl.36300. Kim, T., Plona, K., & Wynshaw-Boris, A. (2017). A novel system for correcting large-scale chromosomal aberrations: Ring chromosome correction via reprogramming into induced pluripotent stem cell (ipsc). Chromosoma, 126(4), 457e463. https://doi.org/10.1007/ s00412-016-0621-6. Kimura, M. (1983). The neutral theory of molecular evolution. Cambridge University Press. King, M. (1995). Species evolution: The role of chromosome change. Cambridge University Press. Kingsbury, M. A., Friedman, B., Mcconnell, M. J., et al. (2005). Aneuploid neurons are functionally active and integrated into brain circuitry. Proceedings of the National Academy of Sciences of the United States of America, 102(17), 6143e6147. https://doi.org/10.1073/ pnas.0408171102. Klasson, L., & Andersson, S. G. (2004). Evolution of minimal-gene-sets in host-dependent bacteria. Trends in Microbiology, 12(1), 37e43. Kla´sterska´, I., & Natarajan, A. T. (1975). Stickiness in rosa meiosis induced by hybridization. Caryologia, 28, 81e88. Kleckner, N., Storlazzi, A., & Zickler, D. (2003). Coordinate variation in meiotic pachytene SC length and total crossover/chiasma frequency under conditions of constant DNA length. Trends in Genetics, 19(11), 623e628. https://doi.org/10.1016/j.tig.2003.09.004. Klein, C. A. (2013). Selection and adaptation during metastatic cancer progression. Nature, 501, 365e372. Kleinjan, D. A., & Lettice, L. A. (2008). Long-range gene control and genetic disease. Advances in Genetics, 61, 339e388. https://doi.org/10.1016/s0065-2660(07)00013-2. Knauss, S., & Klein, A. (2012). From aneuploidy to cancer: The evolution of a new species? Journal of Biosciences, 37(2), 211e220. Knudson, A. G., Jr. (1971). Mutation and cancer: Statistical study of retinoblastoma. Proceedings of the National Academy of Sciences of the United States of America, 68(4), 820e823. Kohn, M., Hogel, J., Vogel, W., et al. (2006). Reconstruction of a 450-my-old ancestral vertebrate protokaryotype. Trends in Genetics, 22(4), 203e210. https://doi.org/10.1016/ j.tig.2006.02.008. Kok, F. O., Shin, M., Ni, C. W., et al. (2015). Reverse genetic screening reveals poor correlation between morpholino-induced and mutant phenotypes in zebrafish. Developmental Cell, 32(1), 97e108. https://doi.org/10.1016/j.devcel.2014.11.018. Kolodkin, A., Simeonidis, E., & Westerhoff, H. V. (2013). Computing life: Add logos to biology and bios to physics. Progress in Biophysics and Molecular Biology, 111(2e3), 69e74. https://doi.org/10.1016/j.pbiomolbio.2012.10.003. Kondrashov, A. S. (1988). Deleterious mutations and the evolution of sexual reproduction. Nature, 336(6198), 435e440. https://doi.org/10.1038/336435a0. Konstantinidis, K. T., Ramette, A., & Tiedje, J. M. (2006). The bacterial species definition in the genomic era. Philosophical Transactions of the Royal Society of London B Biological Sciences, 361(1475), 1929e1940. https://doi.org/10.1098/rstb.2006.1920.

512

BIBLIOGRAPHY

Konstantinidis, K. T., & Tiedje, J. M. (2005). Genomic insights that advance the species definition for prokaryotes. Proceedings of the National Academy of Sciences of the United States of America, 102(7), 2567e2572. https://doi.org/10.1073/pnas.0409727102. Koonin, E. V. (2000). How many genes can make a cell: The minimal-gene-set concept. Annual Review of Genomics and Human Genetics, 1, 99e116. https://doi.org/10.1146/ annurev.genom.1.1.99. Koonin, E. V. (2003). Comparative genomics, minimal gene-sets and the last universal common ancestor. Nature Reviews Microbiology, 1(2), 127e136. https://doi.org/10.1038/ nrmicro751. Koonin, E. V. (2016). The meaning of biological information. Philos Trans A Math Phys Eng Sci, 374. https://doi.org/10.1098/rsta.2015.0065. pii: 20150065. Kosak, S. T., & Groudine, M. (2004). Gene order and dynamic domains. Science, 306(5696), 644e647. https://doi.org/10.1126/science.1103864. Koslik, H. J., Hamilton, G., & Golomb, B. A. (2014). Mitochondrial dysfunction in gulf war illness revealed by 31Phosphorus magnetic resonance spectroscopy: A case-control study. PLoS One, 9(3), e92887. https://doi.org/10.1371/journal.pone.0092887. Kowald, A., & Kirkwood, T. B. (2011). Evolution of the mitochondrial fusion-fission cycle and its role in aging. Proceedings of the National Academy of Sciences of the United States of America, 108(25), 10237e10242. https://doi.org/10.1073/pnas.1101604108. Kozlov, A. P. (2010). The possible evolutionary role of tumors in the origin of new cell types. Medical Hypotheses, 74(1), 177e185. https://doi.org/10.1016/j.mehy.2009.07.027. Kravats, A. N., Hoskins, J. R., Reidy, M., et al. (2018). Functional and physical interaction between yeast hsp90 and Hsp70. Proceedings of the National Academy of Sciences of the United States of America, 115(10), E2210ee2219. https://doi.org/10.1073/pnas.1719969115. Krug, A., & Jablonski, D. (2012). Long-term origination rates are reset only at mass extinctions. Geology, 40, 731e734. Kruiswijk, F., Labuschagne, C. F., & Vousden, K. H. (2015). P53 in survival, death and metabolic health: A lifeguard with a licence to kill. Nature Reviews Molecular Cell Biology, 16(7), 393e405. https://doi.org/10.1038/nrm4007. Kuhn, T. (1962). The structure of scientific revolution. Chicago: The University of Chicago Press. Kulkarni, P., Shiraishi, T., & Kulkarni, R. V. (2013). Cancer: Tilting at windmills? Molecular Cancer, 12, 108 (England). Kultz, D. (2003). Evolution of the cellular stress proteome: From monophyletic origin to ubiquitous function. Journal of Experimental Biology, 206(Pt 18), 3119e3124. Kultz, D. (2005). Molecular and evolutionary basis of the cellular stress response. Annual Review of Physiology, 67, 225e257. https://doi.org/10.1146/annurev.physiol. 67.040403.103635. Kumaran, M., Cass, C. E., Graham, K., et al. (2017). Germline copy number variations are associated with breast cancer risk and prognosis. Scientific Reports, 7(1), 14621. https:// doi.org/10.1038/s41598-017-14799-7. Kundaje, A., Meuleman, W., Ernst, J., et al. (2015). Integrative analysis of 111 reference human epigenomes. Nature, 518(7539), 317e330. https://doi.org/10.1038/nature14248. Kurten, B. (1959). Rates of evolution in fossil mammals. Cold Spring Harbor Symposia on Quantitative Biology, 24, 205e215. Laland, K., Uller, T., Feldman, M., et al. (2014). Does evolutionary theory need a rethink? Nature, 514(7521), 161e164. https://doi.org/10.1038/514161a. Lamichhaney, S., Han, F., Webster, M. T., et al. (2018). Rapid hybrid speciation in Darwin’s finches. Science, 359(6372), 224e228. https://doi.org/10.1126/science.aao4593. Landan, G., Cohen, N. M., Mukamel, Z., et al. (2012). Epigenetic polymorphism and the stochastic formation of differentially methylated regions in normal and cancerous tissues. Nature Genetics, 44(11), 1207e1214. https://doi.org/10.1038/ng.2442.

BIBLIOGRAPHY

513

Lander, E. S. (2011). Initial impact of the sequencing of the human genome. Nature, 470(7333), 187e197. https://doi.org/10.1038/nature09792. Lane, N. (2009). Why sex is worth losing your head for. New Scientist, 13, 40e43. Larson, E. (2002). The theory of evolution: A history of controversy. The Teaching Company, LLC (P)2002 The Great Courses. Lathe, W. C., 3rd, Snel, B., & Bork, P. (2000). Gene context conservation of a higher order than operons. Trends in Biochemical Sciences, 25(10), 474e479. Lathi, R. B., & Milki, A. A. (2004). Rate of aneuploidy in miscarriages following in vitro fertilization and intracytoplasmic sperm injection. Fertility and Sterility, 81(5), 1270e1272. https://doi.org/10.1016/j.fertnstert.2003.09.065. Lawrence, M. S., Stojanov, P., Mermel, C. H., et al. (2014). Discovery and saturation analysis of cancer genes across 21 tumour types. Nature, 505(7484), 495e501. https://doi.org/ 10.1038/nature12912. Lawrence, M. S., Stojanov, P., Polak, P., et al. (2013). Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature, 499(7457), 214e218. https://doi.org/ 10.1038/nature12213. Lawrenson, L. (2010). Tracking profiles of genomic instability in spontaneous transformation and tumorigenesis. Detroit: Wayne state university press (Ph.D and M.D), Wayne State University. Lazebnik, Y. (2010). What are the hallmarks of cancer? Nature Reviews Cancer, 10(4), 232e233. https://doi.org/10.1038/nrc2827. Lazebnik, Y. (2015). Are scientists a workforce? - or, how dr. Frankenstein made biomedical research sick: A proposed plan to rescue us biomedical research from its current ’malaise’ will not be effective as it misdiagnoses the root cause of the disease. EMBO Reports, 16(12), 1592e1600. https://doi.org/10.15252/embr.201541266. Le Page, M. (February 29, 2016). Super-fast evolving fish splitting into two species in same lake. New Scientist, Daily News. Lee, J. M., & Sonnhammer, E. L. (2003). Genomic gene clustering analysis of pathways in eukaryotes. Genome Research, 13(5), 875e882. https://doi.org/10.1101/gr.737703. Lemons, D., & Mcginnis, W. (2006). Genomic evolution of hox gene clusters. Science, 313(5795), 1918e1922. https://doi.org/10.1126/science.1132040. Lercher, M. J., Urrutia, A. O., & Hurst, L. D. (2002). Clustering of housekeeping genes provides a unified model of gene order in the human genome. Nature Genetics, 31(2), 180e183. https://doi.org/10.1038/ng887. Levin, D. A. (1978). The origin of isolating mechanisms in flowering plants. In M. Hecht, W. Steere & B. Wallace (Eds.), Evolutionary biology (pp. 185e317). Levin, H. L., & Moran, J. V. (2011). Dynamic interactions between transposable elements and their hosts. Nature Reviews Genetics, 12(9), 615e627. https://doi.org/10.1038/nrg3030. Levine, A. J., & Oren, M. (2009). The first 30 years of p53: Growing ever more complex. Nature Reviews Cancer, 9, 749e758 (England). Lewontin, R. C. (1993). The doctrine of DNA: Biology as ideology. Penguin. Lewontin, R. C., & Lewontin, U. R. C. (1974). The genetic basis of evolutionary change. Columbia University Press. Li, R., Li, Y., Zheng, H., et al. (2010). Building the sequence map of the human pan-genome. Nature Biotechnology, 28(1), 57e63. https://doi.org/10.1038/nbt.1596. Liao, B. Y., & Zhang, J. (2008). Null mutations in human and mouse orthologs frequently result in different phenotypes. Proceedings of the National Academy of Sciences of the United States of America, 105(19), 6987e6992. https://doi.org/10.1073/pnas.0800387105. Liehr, T. (2016). Cytogenetically visible copy number variations (cg-cnvs) in banding and molecular cytogenetics of human; about heteromorphisms and euchromatic variants. Molecular Cytogenetics, 9, 5. https://doi.org/10.1186/s13039-016-0216-1.

514

BIBLIOGRAPHY

Liehr, T. (2017). "Classical cytogenetics" is not equal to "banding cytogenetics". Molecular Cytogenetics, 10, 3. https://doi.org/10.1186/s13039-017-0305-9. Ling, S., Hu, Z., Yang, Z., et al. (2015). Extremely high genetic diversity in a single tumor points to prevalence of non-darwinian cell evolution. Proceedings of the National Academy of Sciences of the United States of America, 112(47), E6496eE6505. https://doi.org/10.1073/ pnas.1519556112. Liu, Z., Cai, Y., Wang, Y., et al. (2018). Cloning of macaque monkeys by somatic cell nuclear transfer. Cell, 174(1), 245. https://doi.org/10.1016/j.cell.2018.01.036. Liu, P., Erez, A., Nagamani, S. C., et al. (2011). Chromosome catastrophes involve replication mechanisms generating complex genomic rearrangements. Cell, 146(6), 889e903. https://doi.org/10.1016/j.cell.2011.07.042. Liu, G., Stevens, J., Horne, S., et al. (2014). Genome chaos: Survival strategy during crisis. Cell Cycle, 13, 528e537. Liu, G., Ye, C. J., Chowdhury, S. K., et al. (2018). Detecting chromosome condensation defects in gulf war illness patients. Current Genomics, 19, 200e206. Locke, M. (1990). Is there somatic inheritance of intracellular patterns? Journal of Cell Science, 96, 563e567. Lodato, M. A., Woodworth, M. B., Lee, S., et al. (2015). Somatic mutation in single human neurons tracks developmental and transcriptional history. Science, 350(6256), 94e98. https://doi.org/10.1126/science.aab1785. Loeb, L. A., Bielas, J. H., & Beckman, R. A. (2008). Cancers exhibit a mutator phenotype: Clinical implications. Cancer Research, 68(10), 3551e3557. https://doi.org/10.1158/00085472.can-07-5835. discussion 3557. Loeb, L. A., Springgate, C. F., & Battula, N. (1974). Errors in DNA replication as a basis of malignant changes. Cancer Research, 34(9), 2311e2321. Loidl, J., Scherthan, H., Den Dunnen, J. T., et al. (1995). Morphology of a human-derived yac in yeast meiosis. Chromosoma, 104(3), 183e188. Luo, J., Sun, X., Cormack, B. P., et al. (2018). Karyotype engineering by chromosome fusion leads to reproductive isolation in yeast. Nature, 560(7718), 392e396. https://doi.org/ 10.1038/s41586-018-0374-x. Lynn, A., Koehler, K. E., Judis, L., et al. (2002). Covariation of synaptonemal complex length and mammalian meiotic exchange rates. Science, 296(5576), 2222e2225. https://doi.org/ 10.1126/science.1071220. Macarthur, D. G., Balasubramanian, S., Frankish, A., et al. (2012). A systematic survey of lossof-function variants in human protein-coding genes. Science, 335(6070), 823e828. https:// doi.org/10.1126/science.1215040. MacNeill, A. (2011). Evolutionary biology, part 1: The Darwinian revolutions. Recorded Books, Inc. Maheshwari, S., & Barbash, D. A. (2011). The genetics of hybrid incompatibilities. Annual Review of Genetics, 45, 331e355. https://doi.org/10.1146/annurev-genet-110410-132514. Maley, C. C., Galipeau, P. C., Finley, J. C., et al. (2006). Genetic clonal diversity predicts progression to esophageal adenocarcinoma. Nature Genetics, 38(4), 468e473. https:// doi.org/10.1038/ng1768. Maley, C. C., Galipeau, P. C., Li, X., et al. (2004). The combination of genetic instability and clonal expansion predicts progression to esophageal adenocarcinoma. Cancer Research, 64(20), 7629e7633. https://doi.org/10.1158/0008-5472.can-04-1738. Malhotra, A., Lindberg, M., Faust, G. G., et al. (2013). Breakpoint profiling of 64 cancer genomes reveals numerous complex rearrangements spawned by homology-independent mechanisms. Genome Research, 23(5), 762e776. https://doi.org/10.1101/gr.143677.112. Mann, A. (2010). Sponge genome goes deep. Nature, 466, 673 (England). Manolio, T. A., Collins, F. S., Cox, N. J., et al. (2009). Finding the missing heritability of complex diseases. Nature, 461(7265), 747e753. https://doi.org/10.1038/nature08494.

BIBLIOGRAPHY

515

Marchetti, F., & Wyrobek, A. J. (2005). Mechanisms and consequences of paternallytransmitted chromosomal abnormalities. Birth Defects Research C Embryo Today, 75(2), 112e129. https://doi.org/10.1002/bdrc.20040. Mardis, E. R. (2010). The $1,000 genome, the $100,000 analysis? Genome Medicine, 2(11), 84. https://doi.org/10.1186/gm205. Margulis, L., & Sagan, D. (2003). Acquiring genomes: A theory of the origin of species. Basic Books. Mariotto, A. B., Yabroff, K. R., Shao, Y., et al. (2011). Projections of the cost of cancer care in the United States: 2010-2020. Journal of the National Cancer Institute, 103(2), 117e128. https:// doi.org/10.1093/jnci/djq495. Markowetz, F. (2016). A saltationist theory of cancer evolution. Nature Genetics, 48(10), 1102e1103. https://doi.org/10.1038/ng.3687. Marques, D. A., Lucek, K., Meier, J. I., et al. (2016). Genomics of rapid incipient speciation in sympatric threespine stickleback. PLoS Genetics, 12(2), e1005887. https://doi.org/ 10.1371/journal.pgen.1005887. Martincorena, I., Fowler, J. C., Wabik, A., et al. (2018). Somatic mutant clones colonize the human esophagus with age. Science, 362(6417), 911e917. https://doi.org/10.1126/ science.aau3879. Martincorena, I., Roshan, A., Gerstung, M., et al. (2015). Tumor evolution. High burden and pervasive positive selection of somatic mutations in normal human skin. Science, 348(6237), 880e886. https://doi.org/10.1126/science.aaa6806. Martinez-Castro, P., Ramos, M. C., Rey, J. A., et al. (1984). Homozygosity for a robertsonian translocation (13q14q) in three offspring of heterozygous parents. Cytogenetics and Cell Genetics, 38(4), 310e312. https://doi.org/10.1159/000132080. Mattick, J. S. (2010). Rna as the substrate for epigenome-environment interactions: Rna guidance of epigenetic processes and the expansion of rna editing in animals underpins development, phenotypic plasticity, learning, and cognition. BioEssays, 32(7), 548e552. https://doi.org/10.1002/bies.201000028. Maynard Smith, J. (1968). Mathematical Ideas in Biology. Cambridge University Press, ISBN 0521-07335-9. Maynard Smith, J. (1978). The evolution of sex. Cambridge: Cambridge University Press. Mayr, E. (1963). Animal species and evolution. Cambridge: Belknap Press of Harvard University Press. Mayr, E. (1988). Toward a new philosophy of biology: Observations of an evolutionist. Harvard University Press. Mayr, E. (1997). The objects of selection. Proceedings of the National Academy of Sciences of the United States of America, 94(6), 2091e2094. Mayr, E. (2001). What evolution is. Basic Books. Mazor, T., Pankov, A., Song, J. S., et al. (2016). Intratumoral heterogeneity of the epigenome. Cancer Cell, 29(4), 440e451. https://doi.org/10.1016/j.ccell.2016.03.009. Mccann, J., Choi, E., Yamasaki, E., et al. (1975). Detection of carcinogens as mutagens in the salmonella/microsome test: Assay of 300 chemicals. Proceedings of the National Academy of Sciences of the United States of America, 72(12), 5135e5139. Mcclellan, J., & King, M. C. (2010). Genetic heterogeneity in human disease. Cell, 141(2), 210e217. https://doi.org/10.1016/j.cell.2010.03.032. Mcclintock, B. (1980). Modified gene expressions induced by transposable elements. In W. Scott, R. Werner, R. Joseph & J. Schultz (Eds.), Miami wintersymposium #17: Mobilization and reassembly of genetic information (pp. 11e19). New York: Academic Press, Inc. Mcclintock, B. (1984). The significance of responses of the genome to challenge. Science, 226(4676), 792e801. Mcgee, M. D., Borstein, S. R., Neches, R. Y., et al. (2015). A pharyngeal jaw evolutionary innovation facilitated extinction in lake victoria cichlids. Science, 350(6264), 1077e1079. https://doi.org/10.1126/science.aab0800.

516

BIBLIOGRAPHY

Mckusick, V. A., & Ruddle, F. H. (1987). A new discipline, a new name, a new journal. Genomics, 1(1), 1e2. https://doi.org/10.1016/0888-7543(87)90098-X. Meaburn, K. J., Misteli, T., & Soutoglou, E. (2007). Spatial genome organization in the formation of chromosomal translocations. Seminars in Cancer Biology, 17(1), 80e90. https:// doi.org/10.1016/j.semcancer.2006.10.008. Medicine. UNLO. What is a genome? Retrieved from https://ghr.nlm.nih.gov/primer/ hgp/genome. Meier, J. I., Marques, D. A., Mwaiko, S., et al. (2017). Ancient hybridization fuels rapid cichlid fish adaptive radiations. Nature Communications, 8, 14363. https://doi.org/10.1038/ ncomms14363. Mendel, G. (1866). Experiments in plant hybridization. In P. J (Ed.), Classic papers in genetics (pp. 1e20). Englewood Cliffs: Prentice-Hall. Metzger, M. J., & Goff, S. P. (2016). A sixth modality of infectious disease: Contagious cancer from devils to clams and beyond. PLoS Pathogens, 12(10), e1005904. https://doi.org/ 10.1371/journal.ppat.1005904. Michor, F., Frank, S. A., May, R. M., et al. (2003). Somatic selection for and against cancer. Journal of Theoretical Biology, 225(3), 377e382. Miescher, F. (2015). Adaptation and speciation mechanisms in sticklebacks. Retrieved from https://www.mpg.de/9269898/adaptation-speciation-mechanisms-sticklebacks. Miklos, G. (2005). The human genome project - one more misstep in the war on cancer. Nature Biotechnology, (23), 535e537. Milholland, B., Dong, X., Zhang, L., et al. (2017). Differences between germline and somatic mutation rates in humans and mice. Nature Communications, 8, 15183. https://doi.org/ 10.1038/ncomms15183. Miller, D., Ostermeier, G. C., & Krawetz, S. A. (2005). The controversy, potential and roles of spermatozoal rna. Trends in Molecular Medicine, 11(4), 156e163. https://doi.org/10.1016/ j.molmed.2005.02.006. Milton, J. (2010). Variety sparks sexual evolution: A changeable environment encourages a move away from asexual reproduction. Nature. https://doi.org/10.1038/ news.2010.535, 13 October 2010. Mingorance, J., & Tamames, J. (2004). The bacterial dcw gene cluster: An island in the genome? In M. Vicente, J. Tamames, A. Valencia & J. Mingorance (Eds.), Molecules in time and space, bacterial shape, division and phylogeny. New York: Kluwer Academic, 2004. Misteli, T. (2005). Concepts in nuclear architecture. BioEssays, 27(5), 477e487. https:// doi.org/10.1002/bies.20226. Mitelman, F. (2000). Recurrent chromosome aberrations in cancer. Mutation Research, 462(2e3), 247e253. Mitelman database of chromosome aberrations and gene fusion in cancer. https://cgap.nci. nih.gov/Chromosomes/Mitelman. Mittra, I., Khare, N. K., Raghuram, G. V., et al. (2015). Circulating nucleic acids damage DNA of healthy cells by integrating into their genomes. Journal of Biosciences, 40(1), 91e111. Mittra, I., Samant, U., Sharma, S., et al. (2017). Cell-free chromatin from dying cancer cells integrate into genomes of bystander healthy cells to induce DNA damage and inflammation. Cell Death Discovery, 3, 17015. https://doi.org/10.1038/ cddiscovery.2017.15. Moens, P. B., & Pearlman, R. E. (1989). Satellite DNA in chromatin loops of rat pachytene chromosomes and in spermatids. Chromosoma, 98(4), 287e294. Morgan, T. H. (1917). The theory of the gene. The American Naturalist, 51, 513e544. Morgan, T. H. (1926). The theory of the gene. New Haven: Yale University Press. Morokuma, J., Durant, F., Williams, K. B., et al. (2017). Planarian regeneration in space: Persistent anatomical, behavioral, and bacteriological changes induced by space travel. Regeneration (Oxf), 4(2), 85e102. https://doi.org/10.1002/reg2.79.

BIBLIOGRAPHY

517

Morran, L. T., Schmidt, O. G., Gelarden, I. A., et al. (2011). Running with the red queen: Hostparasite coevolution selects for biparental sex. Science, 333(6039), 216e218. https:// doi.org/10.1126/science.1206360. Mullard, A. (2008). The genes that drive speciation. Nature News. https://doi.org/10.1038/ news.2008.1297. Published online. Mullard, A. (2011). Reliability of ’new drug target’ claims called into question. Nature Reviews Drug Discovery, 10, 643e644 (England). Muller, H. J. (1932). Some genetic aspects of sex. Amer Natur, 66, 118e138. Muller, H. (1958). Evolution by mutation. Bulletin of the American Mathematical Society, 64, 137e160. Mu¨ller, G. B. (2007). Evo-devo: Extending the evolutionary synthesis. Nature Reviews Genetics, 8(12), 943e949. https://doi.org/10.1038/nrg2219. Murphy, W. J., Larkin, D. M., Everts-Van Der Wind, A., et al. (2005). Dynamics of mammalian chromosome evolution inferred from multispecies comparative maps. Science, 309(5734), 613e617. https://doi.org/10.1126/science.1111387. Myles, S., Boyko, A. R., Owens, C. L., et al. (2011). Genetic structure and domestication history of the grape. Proceedings of the National Academy of Sciences of the United States of America, 108(9), 3530e3535. https://doi.org/10.1073/pnas.1009363108. Nakayama, H., Nakayama, N., Seiki, S., et al. (2014). Regulation of the knox-ga gene module induces heterophyllic alteration in north american lake cress. The Plant Cell Online, 26(12), 4733e4748. https://doi.org/10.1105/tpc.114.130229. Nakayama, H., Sinha, N. R., & Kimura, S. (2017). How do plants and phytohormones accomplish heterophylly, leaf phenotypic plasticity, in response to environmental cues. Frontiers of Plant Science, 8, 1717. https://doi.org/10.3389/fpls.2017.01717. Narasimhan, V. M., Hunt, K. A., Mason, D., et al. (2016). Health and population effects of rare gene knockouts in adult humans with related parents. Science, 352(6284), 474e477. https://doi.org/10.1126/science.aac8624. Navarro, A., & Barton, N. H. (2003). Chromosomal speciation and molecular divergence– accelerated evolution in rearranged chromosomes. Science, 300(5617), 321e324. https:// doi.org/10.1126/science.1080600. Navin, N., Kendall, J., Troge, J., et al. (2011). Tumour evolution inferred by single-cell sequencing. Nature, 472(7341), 90e94. https://doi.org/10.1038/nature09807. Nazaryan-Petersen, L., Bertelsen, B., Bak, M., et al. (2016). Germline chromothripsis driven by l1-mediated retrotransposition and alu/alu homologous recombination. Human Mutation, 37(4), 385e395. https://doi.org/10.1002/humu.22953. Nealson, K. H., & Venter, J. C. (2007). Metagenomics and the global ocean survey: What’s in it for us, and why should we care? The ISME Journal, 1(3), 185e187. https://doi.org/ 10.1038/ismej.2007.43. Nesse, R. M., Bergstrom, C. T., Ellison, P. T., et al. (2010). Evolution in health and medicine sackler colloquium: Making evolutionary biology a basic science for medicine. Proceedings of the National Academy of Sciences of the United States of America, 107(Suppl. 1), 1800e1807. https://doi.org/10.1073/pnas.0906224106. Network, C. G. A. (2012). Comprehensive molecular characterization of human colon and rectal cancer. Nature, 487(7407), 330e337. https://doi.org/10.1038/nature11252. Nevo, E., Filippucci, M. G., Redi, C., et al. (1994). Chromosomal speciation and adaptive radiation of mole rats in asia minor correlated with increased ecological stress. Proceedings of the National Academy of Sciences of the United States of America, 91(17), 8160e8164. ćhova´, J., et al. (2013). Neo-sex chromosomes and adaptive Nguyen, P., Sy´korova´, M., Sı potential in tortricid pests. Proceedings of the National Academy of Sciences of the United States of America, 110(17), 6931e6936. https://doi.org/10.1073/pnas.1220372110. Epub 2013 Apr 8.

518

BIBLIOGRAPHY

Nichol, S. T., Rowe, J. E., & Fitch, W. M. (1993). Punctuated equilibrium and positive Darwinian evolution in vesicular stomatitis virus. Proceedings of the National Academy of Sciences of the United States of America, 90(22), 10424e10428. Niederwieser, C., Nicolet, D., Carroll, A. J., et al. (2016). Chromosome abnormalities at onset of complete remission are associated with worse outcome in patients with acute myeloid leukemia and an abnormal karyotype at diagnosis: Calgb 8461 (alliance). Haematologica, 101(12), 1516e1523. https://doi.org/10.3324/haematol.2016.149542. Nikolaichik, Y. A., & Donachie, W. D. (2000). Conservation of gene order amongst cell wall and cell division genes in eubacteria, and ribosomal genes in eubacteria and eukaryotic organelles. Genetica, 108(1), 1e7. Noble, D. (2008). Genes and causation. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 366(1878), 3001e3015. https://doi.org/ 10.1098/rsta.2008.0086. Noble, D. (2013). Physiology is rocking the foundations of evolutionary biology. Experimental Physiology, 98(8), 1235e1243. https://doi.org/10.1113/expphysiol.2012.071134. Noor, M. A., & Feder, J. L. (2006). Speciation genetics: Evolving approaches. Nature Reviews Genetics, 7(11), 851e861. https://doi.org/10.1038/nrg1968. Noor, M. A., Grams, K. L., Bertucci, L. A., et al. (2001). Chromosomal inversions and the reproductive isolation of species. Proceedings of the National Academy of Sciences of the United States of America, 98(21), 12084e12088. https://doi.org/10.1073/pnas.221274498. Nordling, C. O. (1953). A new theory on cancer-inducing mechanism. British Journal of Cancer, 7(1), 68e72. Norman, T. M., Lord, N. D., Paulsson, J., et al. (2015). Stochastic switching of cell fate in microbes. Annual Review of Microbiology, 69, 381e403. https://doi.org/10.1146/ annurev-micro-091213-112852. Nosil, P., & Schluter, D. (2011). The genes underlying the process of speciation. Trends in Ecology and Evolution, 26(4), 160e167. https://doi.org/10.1016/j.tree.2011.01.001. Nowak, M. A. (2006). Five rules for the evolution of cooperation. Science, 314(5805), 1560e1563. https://doi.org/10.1126/science.1133755. Nowell, P. C. (1976). The clonal evolution of tumor cell populations. Science, 194(4260), 23e28. Nowell, P. C., & Hungerford, D. A. (1960). Chromosome studies on normal and leukemic human leukocytes. Journal of the National Cancer Institute, 25, 85e109. Ochman, H., & Davalos, L. M. (2006). The nature and dynamics of bacterial genomes. Science, 311(5768), 1730e1733. https://doi.org/10.1126/science.1119966. Olejniczak, M., Urbanek, M. O., Jaworska, E., et al. (2016). Sequence-non-specific effects generated by various types of rna interference triggers. Biochimica et Biophysica Acta, 1859(2), 306e314. https://doi.org/10.1016/j.bbagrm.2015.11.005. Omholt, S. W. (2013). From sequence to consequence and back. Progress in Biophysics and Molecular Biology, 111(2e3), 75e82. https://doi.org/10.1016/j.pbiomolbio.2012.09.003. Ooi, S. K., Wolf, D., Hartung, O., et al. (2010). Dynamic instability of genomic methylation patterns in pluripotent stem cells. Epigenetics and Chromatin, 3(1), 17. https://doi.org/ 10.1186/1756-8935-3-17. Orgel, L. (1973). The origins of life: Molecules and natural selection. Ostrander, E. A., Wayne, R. K., Freedman, A. H., et al. (2017). Demographic history, selection and functional diversity of the canine genome. Nature Reviews Genetics, 18(12), 705e720. https://doi.org/10.1038/nrg.2017.67. Ottesen, E. A., Hong, J. W., Quake, S. R., et al. (2006). Microfluidic digital pcr enables multigene analysis of individual environmental bacteria. Science, 314(5804), 1464e1467. https://doi.org/10.1126/science.1131370. Otto, S. (2008). Sexual reproduction and the evolution of sex. Nature Education, 1, 1. Otto, S., & Lenormand, T. (2002). Resolving the paradox of sex and recombination. Nature Reviews Genetics, 3(4), 252e261. https://doi.org/10.1038/nrg761.

BIBLIOGRAPHY

519

Palazzo, A. F., & Lee, E. S. (2015). Non-coding rna: What is functional and what is junk? Frontiers in Genetics, 6, 2. https://doi.org/10.3389/fgene.2015.00002. Palsson, B. (2015). Systems biology: Constraint-based reconstruction and analysis. Cambridge: Cambridge University Press. Parada, L. A., McQueen, P. G., & Misteli, T. (2004). Tissue-specific spatial organization of genomes. Genome Biology, 5(7), R44. Parihar, V. K., Hattiangady, B., Shuai, B., et al. (2013). Mood and memory deficits in a model of gulf war illness are linked with reduced neurogenesis, partial neuron loss, and mild inflammation in the hippocampus. Neuropsychopharmacology, 38(12), 2348e2362. https://doi.org/10.1038/npp.2013.158. Parra, I., & Windle, B. (1993). High resolution visual mapping of stretched DNA by fluorescent hybridization. Nature Genetics, 5(1), 17e21. https://doi.org/10.1038/ng0993-17. Pauling, L., Itano, H. A., Singer, S. J., et al. (1949). Sickle cell anemia, a molecular disease. Science, 110(2865), 543e548. https://doi.org/10.1126/science.110.2865.543. Pavelka, N., Rancati, G., Zhu, J., et al. (2010). Aneuploidy confers quantitative proteome changes and phenotypic variation in budding yeast. Nature, 468(7321), 321e325. https://doi.org/10.1038/nature09529. Paynter, N. P., Chasman, D. I., Pare´, G., et al. (2010). Association between a literature-based genetic risk score and cardiovascular events in women. Journal of the American Medical Association, 303(7), 631e637. https://doi.org/10.1001/jama.2010.119. Pellestor, F., Gatinois, V., Puechberty, J., et al. (2014). Chromothripsis: Potential origin in gametogenesis and preimplantation cell divisions. A review. Fertility and Sterility, 102(6), 1785e1796. https://doi.org/10.1016/j.fertnstert.2014.09.006. Pellestor, F., & Gatinois, V. (2018). Chromoanasynthesis: Another way for the formation of complex chromosomal abnormalities in human reproduction. Human reproduction, 33, 1381e1387. https://doi.org/10.1093/humrep/dey231. Pennisi, E. (2010). Shining a light on the genome’s ’dark matter’. Science, 330(6011), 1614. https://doi.org/10.1126/science.330.6011.1614. Pennisi, E. (2012). Genomics. Encode project writes eulogy for junk DNA. Science, 337(6099), 1159e1161. https://doi.org/10.1126/science.337.6099.1159. Pennisi, E. (2018). Human mutation rate a legacy from our past. Science, 360, 143 (United States). Pepper, J. W. (2012). Drugs that target pathogen public goods are robust against evolved drug resistance. Evolutionary Applications, 5(7), 757e761. https://doi.org/10.1111/j.17524571.2012.00254.x. Pertea, M., & Salzberg, S. L. (2010). Between a chicken and a grape: Estimating the number of human genes. Genome Biology, 11(5), 206. https://doi.org/10.1186/gb-2010-11-5-206. Pessim, C., Pagliarini, M. S., Silva, N., et al. (2015). Chromosome stickiness impairs meiosis and influences reproductive success in panicum maximum (poaceae) hybrid plants. Genetics and Molecular Research, 14(2), 4195e4202. https://doi.org/10.4238/2015.April.28.2. Peters, J. (1959). Classic papers in genetics. Englewood Cliffs: Prentice-Hall. Pieau, C., Dorizzi, M., & Richard-Mercier, N. (1999). Temperature-dependent sex determination and gonadal differentiation in reptiles. Cellular and Molecular Life Sciences, 55(6e7), 887e900. Pigliucci, M. (2007). Do we need an extended evolutionary synthesis? Evolution, 61(12), 2743e2749. https://doi.org/10.1111/j.1558-5646.2007.00246.x. Pigliucci, M. (2010). Genotype-phenotype mapping and the end of the ’genes as blueprint’ metaphor. Philosophical Transactions of the Royal Society of London B Biological Sciences, 365(1540), 557e566. https://doi.org/10.1098/rstb.2009.0241. Pikor, L., Thu, K., Vucic, E., et al. (2013). The detection and implication of genome instability in cancer. Cancer and Metastasis Reviews, 32(3e4), 341e352. https://doi.org/10.1007/ s10555-013-9429-5.

520

BIBLIOGRAPHY

Pisco, A. O., Brock, A., Zhou, J., et al. (2013). Non-Darwinian dynamics in therapy-induced cancer drug resistance. Nature Communications, 4, 2467. Ponder, R. G., Fonville, N. C., & Rosenberg, S. M. (2005). A switch from high-fidelity to errorprone DNA double-strand break repair underlies stress-induced mutation. Molecular Cell, 19(6), 791e804. https://doi.org/10.1016/j.molcel.2005.07.025. Poot, M. (2017). Retrotransposing gremlins may disrupt our brain’s genomes. Molecular Syndromology, 8, 55e57 (Switzerland). Poot, M., & Haaf, T. (2015). Mechanisms of origin, phenotypic effects and diagnostic implications of complex chromosome rearrangements. Molecular Syndromology, 6(3), 110e134. https://doi.org/10.1159/000438812. Popa, O., & Dagan, T. (2011). Trends and barriers to lateral gene transfer in prokaryotes. Current Opinion in Microbiology, 14(5), 615e623. https://doi.org/10.1016/j.mib.2011.07.027. Popa, O., Hazkani-Covo, E., Landan, G., et al. (2011). Directed networks reveal genomic barriers and DNA repair bypasses to lateral gene transfer among prokaryotes. Genome Research, 21(4), 599e609. https://doi.org/10.1101/gr.115592.110. Popper, K. R. (1996). In search of a better world: Lectures and essays from thirty years. Routledge. Poyatos, J. F., & Hurst, L. D. (2007). The determinants of gene order conservation in yeasts. Genome Biology, 8(11), R233. https://doi.org/10.1186/gb-2007-8-11-r233. Prasad, N. G., Dey, S., Joshi, A., et al. (2015). Rethinking inheritance, yet again: Inheritomes, contextomes and dynamic phenotypes. Journal of Genetics, 94(3), 367e376. Puig, M., Casillas, S., Villatoro, S., et al. (2015). Human inversions and their functional consequences. Briefings in Functional Genomics, 14(5), 369e379. https://doi.org/ 10.1093/bfgp/elv020. Putnam, N. H., Srivastava, M., Hellsten, U., et al. (2007). Sea anemone genome reveals ancestral eumetazoan gene repertoire and genomic organization. Science, 317(5834), 86e94. Queitsch, C., Sangster, T. A., & Lindquist, S. (2002). Hsp90 as a capacitor of phenotypic variation. Nature, 417(6889), 618e624. https://doi.org/10.1038/nature749. ¨ ber Zelltheilung. In C. Gegenbaur (Ed.), Morphologisches Jahrbuch (Vol. 10). Rabl, C. (1885). U pp. 214e330. Radich, J. P. (2007). The biology of CML blast crisis. Hematology-American Society of Hematology Education Program, 384e391. Radick, G. (2015). History of science. Beyond the “Mendel-Fisher controversy”. Science, 350(6257), 159e160. https://doi.org/10.1126/science.aab3846. Rajapakse, I., Perlman, M. D., Scalzo, D., et al. (2009). The emergence of lineage-specific chromosomal topologies from coordinate gene regulation. Proceedings of the National Academy of Sciences of the United States of America, 106(16), 6679e6684. https://doi.org/10.1073/ pnas.0900986106. Rakyan, V. K., Chong, S., Champ, M. E., et al. (2003). Transgenerational inheritance of epigenetic states at the murine axin(fu) allele occurs after maternal and paternal transmission. Proceedings of the National Academy of Sciences of the United States of America, 100(5), 2538e2543. https://doi.org/10.1073/pnas.0436776100. Ramaswami, R., Bayer, R., & Galea, S. (2018). Precision medicine from a public health perspective. Annual Review of Public Health, 39, 153e168. https://doi.org/10.1146/ annurev-publhealth-040617-014158. Ramos, S., Navarrete-Meneses, P., Molina, B., et al. (2018). Genomic chaos in peripheral blood lymphocytes of hodgkin’s lymphoma patients one year after abvd chemotherapy/ radiotherapy. Environmental and Molecular Mutagenesis, 59(8), 755e768. https://doi.org/ 10.1002/em.22216xu. Rancati, G., Pavelka, N., Fleharty, B., et al. (2008). Aneuploidy underlies rapid adaptive evolution of yeast cells deprived of a conserved cytokinesis motor. Cell, 135(5), 879e893. https://doi.org/10.1016/j.cell.2008.09.039.

BIBLIOGRAPHY

521

Rangel, N., Forero-Castro, M., & Rondon-Lagos, M. (2017). New insights in the cytogenetic practice: Karyotypic chaos, non-clonal chromosomal alterations and chromosomal instability in human cancer and therapy response. Genes (Basel), 8(6). https://doi.org/ 10.3390/genes8060155. Ranz, J. M., Maurin, D., Chan, Y. S., et al. (2007). Principles of genome evolution in the drosophila melanogaster species group. PLoS Biology, 5(6), e152. https://doi.org/ 10.1371/journal.pbio.0050152. Rasnick, D. (2011). The chromosomal imbalance theory of cancer: The autocatalyzed progression of aneuploidy is carcinogenesis. CRC Press. Raynes, Y & Weinreich, D.M. (2018). Genomic clustering of fitness-affecting mutations favors the evolution of chromosomal instability. Evolutionary Applications, 12(2), 301e313. https://doi.org/10.1111/eva.12717. Rebollo, R., Horard, B., Hubert, B., et al. (2010). Jumping genes and epigenetics: Towards new species. Gene, 454(1e2), 1e7. https://doi.org/10.1016/j.gene.2010.01.003. Redpath, J. L., Bengtsson, U., Desimone, J., et al. (2003). Sticky anaphase aberrations after g2-phase arrest of gamma-irradiated human skin fibroblasts: Tp53 independence of formation and tp53 dependence of consequences. Radiation Research, 159(1), 57e71. Redpath, J. L., Short, S. C., Woodcock, M., et al. (2003). Low-dose reduction in transformation frequency compared to unirradiated controls: The role of hyper-radiosensitivity to cell death. Radiation Research, 159(3), 433e436. Rehen, S. K., Mcconnell, M. J., Kaushal, D., et al. (2001). Chromosomal variation in neurons of the developing and adult mammalian nervous system. Proceedings of the National Academy of Sciences of the United States of America, 98(23), 13361e13366. https://doi.org/10.1073/ pnas.231487398. Rehen, S. K., Yung, Y. C., Mccreight, M. P., et al. (2005). Constitutional aneuploidy in the normal human brain. Journal of Neuroscience, 25(9), 2176e2180. https://doi.org/ 10.1523/jneurosci.4560-04.2005. Reid, J. B., & Ross, J. J. (2011). Mendel’s genes: Toward a full molecular characterization. Genetics, 189(1), 3e10. https://doi.org/10.1534/genetics.111.132118. Reilly, M. T., Faulkner, G. J., Dubnau, J., et al. (2013). The role of transposable elements in health and diseases of the central nervous system. Journal of Neuroscience, 33(45), 17577e17586. https://doi.org/10.1523/jneurosci.3369-13.2013. Research Advisory Committee on Gulf War Veterans’ Illnesses. (2008). Annual report. https://www.va.gov/RAC-GWVI/docs/Committee_Documents/AnnualReport_ Dec2008.pdf. Reuter, J. A., Spacek, D. V., & Snyder, M. P. (2015). High-throughput sequencing technologies. Molecular Cell, 58(4), 586e597. https://doi.org/10.1016/j.molcel.2015.05.004. Ridley, M. (1994). The red queen: Sex and the evolution of human nature. Penguin Books Limited. Ridley, M. (2001). The cooperative gene: How mendel’s demon explains the evolution of complex beings. Free Press. Rieseberg, L. H., & Blackman, B. K. (2010). Speciation genes in plants. Annals of Botany, 106(3), 439e455. https://doi.org/10.1093/aob/mcq126. Riesgo, A., Farrar, N., Windsor, P. J., et al. (2014). The analysis of eight transcriptomes from all poriferan classes reveals surprising genetic complexity in sponges. Molecular Biology and Evolution, 31(5), 1102e1120. https://doi.org/10.1093/molbev/msu057. Righolt, C., & Mai, S. (2012). Shattered and stitched chromosomes-chromothripsis and chromoanasynthesis-manifestations of a new chromosome crisis? Genes Chromosomes Cancer, 51(11), 975e981. https://doi.org/10.1002/gcc.21981. Roberts, P. C., Mottillo, E. P., Baxa, A. C., et al. (2005). Sequential molecular and cellular events during neoplastic progression: A mouse syngeneic ovarian cancer model. Neoplasia, 7(10), 944e956.

522

BIBLIOGRAPHY

Robertson, M. (1986). The proper study of mankind. Nature, 322(6074), 11. https://doi.org/ 10.1038/322011a0. Rogers, M. B., Hilley, J. D., Dickens, N. J., et al. (2011). Chromosome and gene copy number variation allow major structural change between species and strains of leishmania. Genome Research, 21(12), 2129e2142. https://doi.org/10.1101/gr.122945.111. Roix, J. J., Mcqueen, P. G., Munson, P. J., et al. (2003). Spatial proximity of translocation-prone gene loci in human lymphomas. Nature Genetics, 34(3), 287e291. https://doi.org/ 10.1038/ng1177. Romanes, G. J. (1886). The world as an eject. Contemporary Review, 50, 44e59. Rommens, J. M., Iannuzzi, M. C., Kerem, B., et al. (1989). Identification of the cystic fibrosis gene: Chromosome walking and jumping. Science, 245(4922), 1059e1065. Rossi, A., Kontarakis, Z., Gerri, C., et al. (2015). Genetic compensation induced by deleterious mutations but not gene knockdowns. Nature, 524(7564), 230e233. https://doi.org/ 10.1038/nature14580. Rowley, J. D. (1973). Letter: A new consistent chromosomal abnormality in chronic myelogenous leukaemia identified by quinacrine fluorescence and Giemsa staining. Nature, 243(5405), 290e293. Rowley, J. D. (1998). The critical role of chromosome translocations in human leukemias. Annual Review of Genetics, 32, 495e519. https://doi.org/10.1146/annurev.genet.32.1.495. Rowley, J. D. (2013). Genetics. A story of swapped ends. Science, 340(6139), 1412e1413. https://doi.org/10.1126/science.1241318. Sansone, P., Savini, C., Kurelac, I., et al. (2017). Packaging and transfer of mitochondrial DNA via exosomes regulate escape from dormancy in hormonal therapy-resistant breast cancer. Proceedings of the National Academy of Sciences of the United States of America, 114(43), E9066ee9075. https://doi.org/10.1073/pnas.1704862114. Schlichting, C. D., & Levin, D. A. (1986). Effects of inbreeding on phenotypic plasticity in cultivated phlox. Theoretical and Applied Genetics, 72(1), 114e119. https://doi.org/ 10.1007/BF00261465. Schlichting, C., & Pigliucci, M. (1998). Phenotypic evolution: A reaction norm perspective. Sinauer. Schmid, M., Davison, T. S., Henz, S. R., et al. (2005). A gene expression map of arabidopsis thaliana development. Nature Genetics, 37(5), 501e506. https://doi.org/10.1038/ng1543. Schmitt, A. D., Hu, M., & Ren, B. (2016). Genome-wide mapping and analysis of chromosome architecture. Nature Reviews Molecular Cell Biology, 17(12), 743e755. https://doi.org/ 10.1038/nrm.2016.104. Schrock, E., Du Manoir, S., Veldman, T., et al. (1996). Multicolor spectral karyotyping of human chromosomes. Science, 273(5274), 494e497. Schukken, K. M., & Foijer, F. (2018). CIN and aneuploidy: Different concepts, different consequences. BioEssays, 40(1). https://doi.org/10.1002/bies.201700147. Schultz, M. D., He, Y., Whitaker, J. W., et al. (2015). Human body epigenome maps reveal noncanonical DNA methylation variation. Nature, 523(7559), 212e216. https://doi.org/ 10.1038/nature14465. Science Daily. (2017). Human and sponges share gene regulation. April 11, 2017 https://www. sciencedaily.com/releases/2017/04/170411104532.htm. Science News. (2009). Parasites may have had role in evolution of sex. July 31, 2009 https://www. sciencedaily.com/releases/2009/07/090706171542.htm. Sebat, J., Lakshmi, B., Troge, J., et al. (2004). Large-scale copy number polymorphism in the human genome. Science, 305(5683), 525e528. https://doi.org/10.1126/science.1098918. Semsarian, C., & Seidman, C. E. (2001). Molecular medicine in the 21st century. Internal Medicine Journal, 31(1), 53e59. https://doi.org/10.1046/j.1445-5994.2001.00001.x. Setlur, S. R., & Lee, C. (2012). Tumor archaeology reveals that mutations love company. Cell, 149(5), 959e961. https://doi.org/10.1016/j.cell.2012.05.010.

BIBLIOGRAPHY

523

Sgaramella, V., & Astolfi, P. A. (2010). Somatic genome variations interact with environment, genome and epigenome in the determination of the phenotype: A paradigm shift in genomics? DNA Repair (Amst), 9(4), 470e473. https://doi.org/10.1016/ j.dnarep.2009.11.011. Shachaf, C. M., Kopelman, A. M., Arvanitis, C., et al. (2004). Myc inactivation uncovers pluripotent differentiation and tumour dormancy in hepatocellular cancer. Nature, 431(7012), 1112e1117. https://doi.org/10.1038/nature03043. Shao, Y., Lu, N., Wu, Z., et al. (2018). Creating a functional single-chromosome yeast. Nature, 560(7718), 331e335. https://doi.org/10.1038/s41586-018-0382-x. Shaw, A. T., & Solomon, B. (2011). Targeting anaplastic lymphoma kinase in lung cancer. Clinical Cancer Research, 17(8), 2081e2086. https://doi.org/10.1158/1078-0432.CCR-10-1591. Shcherbakov, V. P. (2010). Biological species is the only possible form of existence for higher organisms: The evolutionary meaning of sexual reproduction. Biology Direct, 5, 14. https://doi.org/10.1186/1745-6150-5-14. Sheltzer, J. M., Blank, H. M., Pfau, S. J., et al. (2011). Aneuploidy drives genomic instability in yeast. Science, 333(6045), 1026e1030. https://doi.org/10.1126/science.1206412. Shen, K. C., Heng, H., Wang, Y., et al. (2005). Atm and p21 cooperate to suppress aneuploidy and subsequent tumor development. Cancer Research, 65(19), 8747e8753. https:// doi.org/10.1158/0008-5472.can-05-1471. Shield, W. (1988). Sex and adaptation. In R. Michod & B. Levin (Eds.), The evolution of sex: An examination of current ideas. Sunderland, MA Sinauer. Shields, W. (1982). Philopatry, inbreeding, and the evolution of sex Albany. State Univ. of New York Press. Siegel, J. J., & Amon, A. (2012). New insights into the troubles of aneuploidy. Annual Review of Cell and Developmental Biology, 28, 189e214. https://doi.org/10.1146/annurev-cellbio101011-155807. Silk, A. D., Zasadil, L. M., Holland, A. J., et al. (2013). Chromosome missegregation rate predicts whether aneuploidy will promote or suppress tumors. Proceedings of the National Academy of Sciences of the United States of America, 110(44), E4134eE4141. https:// doi.org/10.1073/pnas.1317042110. Silver, D., Huang, A., Maddison, C. J., et al. (2016). Mastering the game of go with deep neural networks and tree search. Nature, 529(7587), 484e489. https://doi.org/10.1038/ nature16961. Simons, B. D. (2016). Deep sequencing as a probe of normal stem cell fate and preneoplasia in human epidermis. Proceedings of the National Academy of Sciences of the United States of America, 113(1), 128e133. https://doi.org/10.1073/pnas.1516123113. Singer, E. (2016). The legendary biologists who clocked evolution’s astonishing speed. Wired Magazine. Singh, R. J., & Nelson, R.,L. (2015). Intersubgeneric hybridization between Glycine max and G. tomentella: Production of F1, amphidiploid, BC1, BC2, BC3, and fertile soybean plants. Theoretical and Applied Genetics, 128(6), 1117e1136. https://doi.org/10.1007/s00122-0152494-0. Sinsheimer, R. L. (2006). To reveal the genomes. The American Journal of Human Genetics, 79(2), 194e196. https://doi.org/10.1086/505887. Sjoblom, T., Jones, S., Wood, L. D., et al. (2006). The consensus coding sequences of human breast and colorectal cancers. Science, 314(5797), 268e274. https://doi.org/10.1126/ science.1133427. Skinner, B. M., & Griffin, D. K. (2012). Intrachromosomal rearrangements in avian genome evolution: Evidence for regions prone to breakpoints. Heredity (Edinb), 108(1), 37e41. https://doi.org/10.1038/hdy.2011.99.

524

BIBLIOGRAPHY

Skorski, T. (2011). Chronic myeloid leukemia cells refractory/resistant to tyrosine kinase inhibitors are genetically unstable and may cause relapse and malignant progression to the terminal disease state. Leukemia and Lymphoma, 52, 23e29. Slezak, M. (2014). Monster cancer chromosome is made from shattered DNA. New Scientists. Smigrodzki, R. M., & Khan, S. M. (2005). Mitochondrial microheteroplasmy and a theory of aging and age-related disease. Rejuvenation Research, 8(3), 172e198. https://doi.org/ 10.1089/rej.2005.8.172. Smith, L., Plug, A., & Thayer, M. (2001). Delayed replication timing leads to delayed mitotic chromosome condensation and chromosomal instability of chromosome translocations. Proceedings of the National Academy of Sciences of the United States of America, 98(23), 13300e13305. https://doi.org/10.1073/pnas.241355098. Sole´, R. V., Manrubia, S. C., Benton, M., et al. (1999). Criticality and scaling in evolutionary ecology. Trends in Ecology and Evolution, 14(4), 156e160. Sole, R. V., Valverde, S., Rodriguez-Caso, C., et al. (2014). Can a minimal replicating construct be identified as the embodiment of cancer? BioEssays, 36(5), 503e512. https://doi.org/ 10.1002/bies.201300098. Sollars, V., Lu, X., Xiao, L., et al. (2003). Evidence for an epigenetic mechanism by which hsp90 acts as a capacitor for morphological evolution. Nature Genetics, 33(1), 70e74. https://doi.org/10.1038/ng1067. Song, J., Li, X., Sun, L., et al. (2016). A family with robertsonian translocation: A potential mechanism of speciation in humans. Molecular Cytogenetics, 9, 48. https://doi.org/ 10.1186/s13039-016-0255-7. Sonneborn, T. (1954). The relation of autogamy to senescence and rejuvenescence in paramecium aurelia. Journal of Protozoology, 1, 38e53. Sonnenschein, C., & Soto, A. M. (2013). The aging of the 2000 and 2011 hallmarks of cancer reviews: A critique. Journal of Biosciences, 38(3), 651e663. Sosnikhina, S. P., Kirillova, G. A., Mikhailova, E. I., et al. (2003). [abnormal condensation of meiotic chromosomes caused by the mei8 mutation in rye secale cereale l]. Genetika, 39(3), 362e369. Soto, A. M., & Sonnenschein, C. (2011). The tissue organization field theory of cancer: A testable replacement for the somatic mutation theory. BioEssays, 33(5), 332e340. https:// doi.org/10.1002/bies.201100025. Soto, A. M., & Sonnenschein, C. (2013). Paradoxes in carcinogenesis: There is light at the end of that tunnel! Disruptive Science and Technology, 1(3), 154e156. https://doi.org/10.1089/ dst.2013.0008. Sottoriva, A., Kang, H., Ma, Z., et al. (2015). A big bang model of human colorectal tumor growth. Nature Genetics, 47(3), 209e216. https://doi.org/10.1038/ng.3214. Specchia, V., Piacentini, L., Tritto, P., et al. (2010). Hsp90 prevents phenotypic variation by suppressing the mutagenic activity of transposons. Nature, 463(7281), 662e665. https://doi.org/10.1038/nature08739. Speicher, M. R., Gwyn Ballard, S., & Ward, D. C. (1996). Karyotyping human chromosomes by combinatorial multi-fluor fish. Nature Genetics, 12(4), 368e375. https://doi.org/ 10.1038/ng0496-368. Spiegel report. (2010). Spiegel interview with Craig Venter ‘we have learned nothing from the genome’. Spiegel. July 29, 2010. Stefansson, H., Helgason, A., Thorleifsson, G., et al. (2005). A common inversion under selection in europeans. Nature Genetics, 37(2), 129e137. https://doi.org/10.1038/ng1508. Stein, R. (2014). Reshaping the cancer transcriptome - genetic engineering & biotechnology news, 34(7), Feature Articles. Apr 1, 2014. Stepanenko, A. A., Andreieva, S. V., Korets, K. V., et al. (2016). mTOR inhibitor temsirolimus and MEK1/2 inhibitor U0126 promote chromosomal instability and cell type-dependent

BIBLIOGRAPHY

525

phenotype changes of glioblastoma cells. Gene, 579(1), 58e68. https://doi.org/10.1016/ j.gene.2015.12.064. Stepanenko, A. A., & Dmitrenko, V. V. (2015). Pitfalls of the MTT assay: Direct and off-target effects of inhibitors can result in over/underestimation of cell viability. Gene, 574(2), 193e203. https://doi.org/10.1016/j.gene.2015.08.009. Stepanenko, A. A., & Heng, H. H. (2017). Transient and stable vector transfection: Pitfalls, offtarget effects, artifacts. Mutation Research, 773, 91e103. https://doi.org/10.1016/ j.mrrev.2017.05.002. Stepanenko, A. A., & Kavsan, V. M. (2014). Karyotypically distinct u251, u373, and snb19 glioma cell lines are of the same origin but have different drug treatment sensitivities. Gene, 540(2), 263e265. https://doi.org/10.1016/j.gene.2014.02.053. Stephens, P. J., Greenman, C. D., Fu, B., et al. (2011). Massive genomic rearrangement acquired in a single catastrophic event during cancer development. Cell, 144(1), 27e40. https://doi.org/10.1016/j.cell.2010.11.055. Stern, K. G. (1947). Nucleoproteins and gene structure. Yale J Biol Med, 19(6), 937e949. Stern, C., Pertile, M., Norris, H., et al. (1999). Chromosome translocations in couples with invitro fertilization implantation failure. Human Reproduction, 14(8), 2097e2101. Sterrer, W. (2016). Cancer-mutational resurrection of prokaryote endofossils. Cancer Hypotheses, 1, 1e15. Stevens, J., Abdallah, B., Horne, S., et al. (2011a). Genetic and epigenetic heterogeneity in cancer. In eLS. Chichester: John Wiley & Sons Ltd. Stevens, J. B., Abdallah, B. Y., Liu, G., et al. (2011b). Diverse system stresses: Common mechanisms of chromosome fragmentation. Cell Death and Disease, 2, e178. https://doi.org/ 10.1038/cddis.2011.60. Stevens, J. B., Abdallah, B. Y., Liu, G., et al. (2013b). Heterogeneity of cell death. Cytogenetic and Genome Research, 139(3), 164e173. https://doi.org/10.1159/000348679. Stevens, J. B., Abdallah, B. Y., Regan, S. M., et al. (2010). Comparison of mitotic cell death by chromosome fragmentation to premature chromosome condensation. Molecular Cytogenetics, 3, 20. https://doi.org/10.1186/1755-8166-3-20. Stevens, J., & Heng, H. (2013c). Differentiating chromosome fragmentation and premature chromosomecondensation. In V. Yurov & Iourov (Eds.), Human interphase chromosomes: The biomedical aspects (pp. 85e105). Springer. Stevens, J. B., Horne, S. D., Abdallah, B. Y., et al. (2013a). Chromosomal instability and transcriptome dynamics in cancer. Cancer and Metastasis Reviews, 32(3e4), 391e402. https:// doi.org/10.1007/s10555-013-9428-6. Stevens, J. B., Liu, G., Abdallah, B. Y., et al. (2014). Unstable genomes elevate transcriptome dynamics. International Journal of Cancer, 134(9), 2074e2087. https://doi.org/10.1002/ ijc.28531. Stevens, J. B., Liu, G., Bremer, S. W., et al. (2007). Mitotic cell death by chromosome fragmentation. Cancer Research, 67(16), 7686e7694. https://doi.org/10.1158/00085472.can-07-0472. Stiewe, T., & Haran, T. E. (2018). How mutations shape p53 interactions with the genome to promote tumorigenesis and drug resistance. Drug Resistance Updates, 38, 27e43. https:// doi.org/10.1016/j.drup.2018.05.001. Stratton, M. (2013). The genome of cancer cells: Jean Shanks lecture. Stratton, M. R., Campbell, P. J., & Futreal, P. A. (2009). The cancer genome. Nature, 458(7239), 719e724. https://doi.org/10.1038/nature07943. Strohman, R. C. (1997). The coming kuhnian revolution in biology. Nature Biotechnology, 15(3), 194e200. https://doi.org/10.1038/nbt0397-194. Strohman, R. (1999). The upcoming biological revolution. An interview with Richard Strohman. Wild Duck Review, V(2).

526

BIBLIOGRAPHY

Sturmberg, J. (2013). Complexity in health: An introduction. In J. Sturmberg & C. M (Eds.), Handbook on systems and complexity in health (pp. 1e17). Springer. Sturmberg, J. P., Bennett, J. M., Martin, C. M., et al. (2017). ’Multimorbidity’ as the manifestation of network disturbances. Journal of Evaluation in Clinical Practice, 23(1), 199e208. https://doi.org/10.1111/jep.12587. Sturmberg, J. P., Picard, M., Aron, P. C., et al. (2019). Health and Disease e Emergent States Resulting from Adaptive Social and Biological Network Interactions. A Framework for Debate. Frontiers in Medicine. https://doi.org/10.3389/fmed.2019.00059. Sulem, P., Helgason, H., Oddson, A., et al. (2015). Identification of a large set of rare complete human knockouts. Nature Genetics, 47(5), 448e452. https://doi.org/10.1038/ng.3243. Sullivan, K. G., Emmons-Bell, M., & Levin, M. (2016). Physiological inputs regulate speciesspecific anatomy during embryogenesis and regeneration. Communicative and Integrative Biology, 9(4), e1192733. https://doi.org/10.1080/19420889.2016.1192733. Sun, R., Hu, Z., & Curtis, C. (2018). Big bang tumor growth and clonal evolution. Cold Spring Harbor Perspectives in Medicine, 8(5). https://doi.org/10.1101/cshperspect.a028381. Szostak, J. W. (2003). Functional information: Molecular messages. Nature, 423(6941), 689. https://doi.org/10.1038/423689a. Takahashi, A., Okada, R., Nagao, K., et al. (2017). Exosomes maintain cellular homeostasis by excreting harmful DNA from cells. Nature Communications, 8, 15287. https://doi.org/ 10.1038/ncomms15287. Takizawa, T., Meaburn, K. J., & Misteli, T. (2008). The meaning of gene positioning. Cell, 135(1), 9e13. https://doi.org/10.1016/j.cell.2008.09.026. Tam, Z. Y., Gruber, J., Halliwell, B., et al. (2015). Context-dependent role of mitochondrial fusion-fission in clonal expansion of mtdna mutations. PLoS Computational Biology, 11(5), e1004183. https://doi.org/10.1371/journal.pcbi.1004183. Tamames, J. (2001). Evolution of gene order conservation in prokaryotes. Genome Biology, 2(6). Research0020. Tamames, J., Casari, G., Ouzounis, C., et al. (1997). Conserved clusters of functionally related genes in two bacterial genomes. Journal of Molecular Evolution, 44(1), 66e73. Tamames, J., Gonzalez-Moreno, M., Mingorance, J., et al. (2001). Bringing gene order into bacterial shape. Trends in Genetics, 17(3), 124e126. Tamas, I., Klasson, L., Canback, B., et al. (2002). 50 million years of genomic stasis in endosymbiotic bacteria. Science, 296(5577), 2376e2379. https://doi.org/10.1126/ science.1071278. Tautz, D. (2000). A genetic uncertainty problem. Trends in Genetics, 16(11), 475e477. Telerman, A., & Amson, R. (2009). The molecular programme of tumour reversion: The steps beyond malignant transformation. Nature Reviews Cancer, 9(3), 206e216. https://doi.org/ 10.1038/nrc2589. The Economist. (March 11, 2004). Fixing the drugs pipeline. The ENCODE Project Consortium. (2012). An integrated encyclopedia of DNA elements in the human genome. Nature, 489, 57e74. The White House Office of the Press Secretary. (2000a). Remarks by the president (June 26, 2000). The White House Office of the Press Secretary. (2000b). Press briefing by Dr. Neal Lane, Dr. Francis Collins, Dr. Craig Venter, and Dr. Ari Patrinos (June 26, 2000). Theissen, G. (2000). Evolutionary developmental genetics of floral symmetry: The revealing power of linnaeus’ monstrous flower. BioEssays, 22(3), 209e213. https://doi.org/ 10.1002/(sici)1521-1878(200003)22:33.0.co;2-j. Theodoraki, M. A., & Caplan, A. J. (2012). Quality control and fate determination of hsp90 client proteins. Biochimica et Biophysica Acta, 1823(3), 683e688. https://doi.org/ 10.1016/j.bbamcr.2011.08.006.

BIBLIOGRAPHY

527

Thomas, N. S., Bryant, V., Maloney, V., et al. (2008). Investigation of the origins of human autosomal inversions. Human Genetics, 123(6), 607e616. https://doi.org/10.1007/ s00439-008-0510-z. Timp, W., & Feinberg, A. P. (2013). Cancer as a dysregulated epigenome allowing cellular growth advantage at the expense of the host. Nature Reviews Cancer, 13, 497e510 (England). Tishkoff, S. A., & Verrelli, B. C. (2003). Patterns of human genetic diversity: Implications for human evolutionary history and disease. Annual Review of Genomics and Human Genetics, 4, 293e340. https://doi.org/10.1146/annurev.genom.4.070802.110226. Todeschini, A. L., Georges, A., & Veitia, R. A. (2014). Transcription factors: Specific DNA binding and specific gene regulation. Trends in Genetics, 30(6), 211e219. https:// doi.org/10.1016/j.tig.2014.04.002. Tomasetti, C., & Vogelstein, B. (2015). Cancer etiology. Variation in cancer risk among tissues can be explained by the number of stem cell divisions. Science, 347(6217), 78e81. https:// doi.org/10.1126/science.1260825. Tompa, P. (2012). Intrinsically disordered proteins: A 10-year recap. Trends in Biochemical Sciences, 37(12), 509e516. https://doi.org/10.1016/j.tibs.2012.08.004. Tompa, P., & Csermely, P. (2004). The role of structural disorder in the function of rna and protein chaperones. The FASEB Journal, 18(11), 1169e1175. https://doi.org/10.1096/ fj.04-1584rev. Trent, R. J. (2005). Molecular Medicine : An introductory text/ronald j. Trent. Amsterdam ; Boston: Elsevier Academic Press. Trinklein, N. D., Aldred, S. F., Hartman, S. J., et al. (2004). An abundance of bidirectional promoters in the human genome. Genome Research, 14(1), 62e66. https://doi.org/10.1101/ gr.1982804. Tubio, J. M., & Estivill, X. (2011). Cancer: When catastrophe strikes a cell. Nature, 470, 476e477 (England). Tuller, T., Girshovich, Y., Sella, Y., et al. (2011). Association between translation efficiency and horizontal gene transfer within microbial communities. Nucleic Acids Research, 39(11), 4743e4755. https://doi.org/10.1093/nar/gkr054. Upender, M. B., Habermann, J. K., Mcshane, L. M., et al. (2004). Chromosome transfer induced aneuploidy results in complex dysregulation of the cellular transcriptome in immortalized and cancer cells. Cancer Research, 64(19), 6941e6949. https://doi.org/ 10.1158/0008-5472.can-04-0474. Ursell, L. K., Metcalf, J. L., Parfrey, L. W., et al. (2012). Defining the human microbiome. Nutrition Reviews, 70(Suppl. 1), S38eS44. https://doi.org/10.1111/j.1753-4887.2012.00493.x. Uszczynska-Ratajczak, B., Lagarde, J., Frankish, A., et al. (2018). Towards a complete map of the human long non-coding rna transcriptome. Nature Reviews Genetics, 19(9), 535e548. https://doi.org/10.1038/s41576-018-0017-y. Valadi, H., Ekstrom, K., Bossios, A., et al. (2007). Exosome-mediated transfer of mrnas and micrornas is a novel mechanism of genetic exchange between cells. Nature Cell Biology, 9(6), 654e659. https://doi.org/10.1038/ncb1596. Valdar, W., Solberg, L. C., Gauguier, D., et al. (2006). Genetic and environmental effects on complex traits in mice. Genetics, 174(2), 959e984. https://doi.org/10.1534/ genetics.106.060004. Valind, A., & Gisselsson, D. (2014). Reply to heng: Inborn aneuploidy and chromosomal instability. Proceedings of the National Academy of Sciences of the United States of America, 111(11), E973. Valind, A., Jin, Y., Baldetorp, B., et al. (2013). Whole chromosome gain does not in itself confer cancer-like chromosomal instability. Proceedings of the National Academy of Sciences of the United States of America, 110(52), 21119e21123. https://doi.org/10.1073/pnas.1311163110.

528

BIBLIOGRAPHY

Van Echten-Arends, J., Mastenbroek, S., Sikkema-Raddatz, B., et al. (2011). Chromosomal mosaicism in human preimplantation embryos: A systematic review. Human Reproduction Update, 17(5), 620e627. https://doi.org/10.1093/humupd/dmr014. Van Valen, L. (1973). A new evolutionary law. Evolutionary Theory, 1, 1e30. Van Valen, L., & Mairorana, V. (1991). Hela, a new microbial species (Vol. 10, pp. 71e74). Vanneste, E., Voet, T., Le Caignec, C., et al. (2009). Chromosome instability is common in human cleavage-stage embryos. Nature Medicine, 15(5), 577e583. https://doi.org/10.1038/ nm.1924. Vargas-Rondon, N., Villegas, V. E., & Rondon-Lagos, M. (2017). The role of chromosomal instability in cancer and therapeutic responses. Cancers (Basel), 10(1). https://doi.org/ 10.3390/cancers10010004. Veltman, J. A., & Brunner, H. G. (2012). De novo mutations in human genetic disease. Nature Reviews Genetics, 13(8), 565e575. https://doi.org/10.1038/nrg3241. Venter, J. C., Adams, M. D., Myers, E. W., et al. (2001). The sequence of the human genome. Science, 291(5507), 1304e1351. https://doi.org/10.1126/science.1058040. Vieira, C. P., Vieira, J., & Hartl, D. L. (1997). The evolution of small gene clusters: Evidence for an independent origin of the maltase gene cluster in drosophila virilis and drosophila melanogaster. Molecular Biology and Evolution, 14(10), 985e993. https://doi.org/ 10.1093/oxfordjournals.molbev.a025715. Vijg, J., & Dolle´, M. E. (2002). Large genome rearrangements as a primary cause of aging. Mechanism of Ageing and Development, 123(8), 907e915. Vilar, E., & Gruber, S. B. (2010). Microsatellite instability in colorectal cancer-the stable evidence. Nature Reviews Clinical Oncology, 7(3), 153e162. https://doi.org/10.1038/ nrclinonc.2009.237. Vincent, M. D. (2010). The animal within: Carcinogenesis and the clonal evolution of cancer cells are speciation events sensu stricto. Evolution, 64(4), 1173e1183. https://doi.org/ 10.1111/j.1558-5646.2009.00942.x. Vincent, M. D. (2011). Cancer: Beyond speciation. Advances in Cancer Research, 112, 283e350. https://doi.org/10.1016/b978-0-12-387688-1.00010-7. Vincent, A. E., Turnbull, D. M., Eisner, V., et al. (2017). Mitochondrial nanotunnels. Trends in Cell Biology, 27(11), 787e799. https://doi.org/10.1016/j.tcb.2017.08.009. Vinces, M. D., Legendre, M., Caldara, M., et al. (2009). Unstable tandem repeats in promoters confer transcriptional evolvability. Science, 324(5931), 1213e1216. https://doi.org/ 10.1126/science.1170097. Visscher, P. M., Hill, W. G., & Wray, N. R. (2008). Heritability in the genomics era–concepts and misconceptions. Nature Reviews Genetics, 9(4), 255e266. https://doi.org/10.1038/ nrg2322. Vitoux, D., Nasr, R., & De The, H. (2007). Acute promyelocytic leukemia: New issues on pathogenesis and treatment response. The International Journal of Biochemistry and Cell Biology, 39(6), 1063e1070. https://doi.org/10.1016/j.biocel.2007.01.028. Vogelstein, B. (2011). Cancer genomes and their implications for curing cancer. Johns Hopkins Advanced Program. June 5. Vogelstein, B., & Kinzler, K. W. (1993). The multistep nature of cancer. Trends in Genetics, 9(4), 138e141. Vogelstein, B., & Kinzler, K. W. (2004). Cancer genes and the pathways they control. Nature Medicine, 10(8), 789e799. https://doi.org/10.1038/nm1087. Vogelstein, B., Lane, D., & Levine, A. J. (2000). Surfing the p53 network. Nature, 408(6810), 307e310. https://doi.org/10.1038/35042675. Vogelstein, B., Papadopoulos, N., Velculescu, V. E., et al. (2013). Cancer genome landscapes. Science, 339(6127), 1546e1558. https://doi.org/10.1126/science.1235122.

BIBLIOGRAPHY

529

Vojtech, L., Woo, S., Hughes, S., et al. (2014). Exosomes in human semen carry a distinctive repertoire of small non-coding rnas with potential regulatory functions. Nucleic Acids Research, 42(11), 7290e7304. https://doi.org/10.1093/nar/gku347. Vorsanova, S. G., Yurov, Y. B., Soloviev, I. V., et al. (2010). Molecular cytogenetic diagnosis and somatic genome variations. Current Genomics, 11(6), 440e446. https://doi.org/10.2174/ 138920210793176010. Wade, N. (2009). Hoopla, and disappointment, in schizophrenia research. New York Times (July 1, 2009). Retrieved from https://tierneylab.blogs.nytimes.com/2009/07/01/hoopla-anddisappointment-in-schizophrenia-research/. Wade, N. (2010). A decade later, genetic map yields few new cures. New York Times (June 12, 2010). Retrieved from https://www.nytimes.com/2010/06/13/health/research/ 13genome.html. Wagle, N., Emery, C., Berger, M. F., et al. (2011). Dissecting therapeutic resistance to RAF inhibition in melanoma by tumor genomic profiling. Journal of Clinical Oncology, 29, 3085e3096. Wagner, G. P., & Zhang, J. (2011). The pleiotropic structure of the genotype-phenotype map: The evolvability of complex organisms. Nature Reviews Genetics, 12(3), 204e213. https:// doi.org/10.1038/nrg2949. Wallace, D. (2005). You’ve come a long way, doctor! Journal of the Oklahoma State Medical Association, 98(9), 432e434. Wallace, D. C. (2012). Mitochondria and cancer. Nature Reviews Cancer, 12(10), 685e698. https://doi.org/10.1038/nrc3365. Wallace, D. C., & Chalkia, D. (2013). Mitochondrial DNA genetics and the heteroplasmy conundrum in evolution and disease. Cold Spring Harbor Perspectives in Biology, 5(11), a021220. https://doi.org/10.1101/cshperspect.a021220. Walters, R. G., Jacquemont, S., Valsesia, A., et al. (2010). A new highly penetrant form of obesity due to deletions on chromosome 16p11.2. Nature, 463(7281), 671e675. https:// doi.org/10.1038/nature08727. Wang, J., Fan, H. C., Behr, B., et al. (2012). Genome-wide single-cell analysis of recombination activity and de novo mutation rates in human sperm. Cell, 150(2), 402e412. https:// doi.org/10.1016/j.cell.2012.06.030. Wang, K., Gaitsch, H., Poon, H., et al. (2017). Classification of common human diseases derived from shared genetic and environmental determinants. Nature Genetics, 49(9), 1319e1325. https://doi.org/10.1038/ng.3931. Wang, Y., Waters, J., Leung, M. L., et al. (2014). Clonal evolution in breast cancer revealed by single nucleus genome sequencing. Nature, 512(7513), 155e160. https://doi.org/ 10.1038/nature13600. Warburg, O. (1956). On respiratory impairment in cancer cells. Science, 124(3215), 269e270. Warburton, D., Kline, J., Stein, Z., et al. (1986). Cytogenetic abnormalities in spontaneous abortion of recognized conceptions. In I. Porter, N. Hatcher & N. Willey (Eds.), Perinatal genetics: Diagnosis and treatment. New York: Academic Press. Ward, P., & Kirschvink, J. (2015). A new history of life: The radical new discoveries about the origins and evolution of life on earth. Bloomsbury Publishing. Waterson, R. (2005). Initial sequence of the chimpanzee genome and comparison with the human genome. Nature, 437(7055), 69e87. https://doi.org/10.1038/nature04072. Waterston, R. H., Lindblad-Toh, K., Birney, E., et al. (2002). Initial sequencing and comparative analysis of the mouse genome. Nature, 420(6915), 520e562. https://doi.org/10.1038/ nature01262. Watson, J. (2013). Oxidants, antioxidants and the current incurability of metastatic cancers. Open Biology, 3(1), 120144. https://doi.org/10.1098/rsob.120144. Watson, J. D., & Crick, F. H. (1953a). Molecular structure of nucleic acids; a structure for deoxyribose nucleic acid. Nature, 171(4356), 737e738.

530

BIBLIOGRAPHY

Watson, J. D., & Crick, F. H. (1953b). Genetical implications of the structure of deoxyribonucleic acid. Nature, 171(4361), 964e967. Watts, P. C., Buley, K. R., Sanderson, S., et al. (2006). Parthenogenesis in komodo dragons. Nature, 444(7122), 1021e1022. https://doi.org/10.1038/4441021a. Wayne, L. G. (1988). International committee on systematic bacteriology: Announcement of the report of the ad hoc committee on reconciliation of approaches to bacterial systematics. Zentralblatt fu¨r Bakteriologie, Mikrobiologie und Hygiene, 268(4), 433e434. Wayne, L. G., Dj, B., C, R., Grimont, P., et al. (1987). Matters relating to the international committee on systematic bacteriology: Report of the ad hoc committee on reconciliation of approaches to bacterial systematics. International Journal of Systematic Bacteriology, 37, 463e464. https://doi.org/10.1099/00207713-37-4-463. Weaver, B. A., & Cleveland, D. W. (2006). Does aneuploidy cause cancer? Current Opinion in Cell Biology, 18(6), 658e667. https://doi.org/10.1016/j.ceb.2006.10.002. Weaver, B. A., Silk, A. D., Montagna, C., et al. (2007). Aneuploidy acts both oncogenically and as a tumor suppressor. Cancer Cell, 11(1), 25e36. https://doi.org/10.1016/ j.ccr.2006.12.003. Weinberg, R. A. (1982). Fewer and fewer oncogenes. Cell, 30(1), 3e4. Weinberg, R. A. (2014). Coming full circle-from endless complexity to simplicity and back again. Cell, 157(1), 267e271. https://doi.org/10.1016/j.cell.2014.03.004. Weiner, J. (1994). The beak of the finch: A story of evolution in our time. Knopf Doubleday Publishing Group. Weiner, J. (2014). In Darwin’s footsteps. The New York Times. Weinstein, I. B., & Case, K. (2008). The history of cancer research: Introducing an AACR centennial series. Cancer Research, 68(17), 6861e6862. https://doi.org/10.1158/00085472.can-08-2827. Weis, E., Galetzka, D., Herlyn, H., et al. (2008). Humans and chimpanzees differ in their cellular response to DNA damage and non-coding sequence elements of DNA repairassociated genes. Cytogenetic and Genome Research, 122(2), 92e102. https://doi.org/ 10.1159/000163086. Weiss, L. A., Shen, Y., Korn, J. M., et al. (2008). Association between microdeletion and microduplication at 16p11.2 and autism. New England Journal of Medicine, 358(7), 667e675. https://doi.org/10.1056/NEJMoa075974. Welch, H. G., & Black, W. C. (2010). Overdiagnosis in cancer. Journal of the National Cancer Institute, 102(9), 605e613. https://doi.org/10.1093/jnci/djq099. Welch, D. B. M., & Meselson, M. (2000). Evidence for the evolution of bdelloid rotifers without sexual reproduction or genetic exchange. Science, 288(5469), 1211e1215. 10817991. Welch, J. M. L., Welch, D. B. M., & Meselson, M. (2004). Cytogenetic evidence for asexual evolution of bdelloid rotifers. Proceedings of the National Academy of Sciences of the United States of America, 101(6), 1618e1716 21. 14747655. Weldon, W. F. R. (1902). Mendel’s laws of alternative inheritance in peas. Biometrika, 1, 228e254. Wessely, S., & Freedman, L. (2006). Reflections on gulf war illness. Philosophical Transactions of the Royal Society of London B Biological Sciences, 361(1468), 721e730. https://doi.org/ 10.1098/rstb.2006.1830. White, M. J. D. (1978). Modes of speciation. San Francisco: W.H. Freeman. White, J. K., Gerdin, A. K., Karp, N. A., et al. (2013). Genome-wide generation and systematic phenotyping of knockout mice reveals new roles for many genes. Cell, 154(2), 452e464. https://doi.org/10.1016/j.cell.2013.06.022. White, R. F., Steele, L., O’callaghan, J. P., et al. (2016). Recent research on gulf war illness and other health problems in veterans of the 1991 gulf war: Effects of toxicant exposures during deployment. Cortex, 74, 449e475. https://doi.org/10.1016/j.cortex.2015.08.022.

BIBLIOGRAPHY

531

Whitesell, L., Santagata, S., Mendillo, M. L., et al. (2014). Hsp90 empowers evolution of resistance to hormonal therapy in human breast cancer models. Proceedings of the National Academy of Sciences of the United States of America, 111(51), 18297e18302. https:// doi.org/10.1073/pnas.1421323111. Wiegant, J., Kalle, W., Mullenders, L., et al. (1992). High-resolution in situ hybridization using DNA halo preparations. Human Molecular Genetics, 1(8), 587e591. Wienberg, J., Jauch, A., Stanyon, R., et al. (1990). Molecular cytotaxonomy of primates by chromosomal in situ suppression hybridization. Genomics, 8(2), 347e350. Wilkins, A. S. (2007). For the biotechnology industry, the penny drops (at last): Genes are not autonomous agents but function within networks! BioEssaya, 29, 1179e1181. Wilkins, J. (2008). Some new work on speciation and species. Retrieved from https:// scienceblogs.com/evolvingthoughts/2008/12/12/some-new-work-on-speciation-an. Wilkins, A. S. (2010). The enemy within: An epigenetic role of retrotransposons in cancer initiation. BioEssays, 32(10), 856e865. https://doi.org/10.1002/bies.201000008. Wilkins, A. S., & Holliday, R. (2009). The evolution of meiosis from mitosis. Genetics, 181(1), 3e12. https://doi.org/10.1534/genetics.108.099762. Williams, G. (1975). Sex and evolution: In the monographs in population biology series. Princeton PrincetonUniversity Press. Williams, M. J., Werner, B., Barnes, C. P., et al. (2016). Identification of neutral tumor evolution across cancer types. Nature Genetics, 48(3), 238e244. https://doi.org/10.1038/ng.3489. Wilmut, I., Schnieke, A. E., Mcwhir, J., et al. (1997). Viable offspring derived from fetal and adult mammalian cells. Nature, 385(6619), 810e813. https://doi.org/10.1038/385810a0. Winkler, H. (1920). Verbreitung und ursache der parthenogenesis im pflanzen e und tierreiche. Jena: VerlagFischer. Wolfe, K. H. (2006). Comparative genomics and genome evolution in yeasts. Philosophical Transactions of the Royal Society of London B Biological Sciences, 361(1467), 403e412. https://doi.org/10.1098/rstb.2005.1799. Wong, K. (October 20, 2000). High-speed speciation. Scientific American. Wood, L. D., Parsons, D. W., Jones, S., et al. (2007). The genomic landscapes of human breast and colorectal cancers. Science, 318(5853), 1108e1113. https://doi.org/10.1126/ science.1145720. Woodruff, M. F. (1983). Cellular heterogeneity in tumours. British Journal of Cancer, 47(5), 589e594. Wu, C. I., & Ting, C. T. (2004). Genes and speciation. Nature Reviews Genetics, 5(2), 114e122. https://doi.org/10.1038/nrg1269. Xu, G. L., Bestor, T. H., Bourc’his, D., et al. (1999). Chromosome instability and immunodeficiency syndrome caused by mutations in a DNA methyltransferase gene. Nature, 402(6758), 187e191. https://doi.org/10.1038/46052. Yamamoto, Y. (2004). Cavefish. Current Biology, 14(22), R943. https://doi.org/10.1016/ j.cub.2004.10.035. Yang, M. Q., Koehly, L. M., & Elnitski, L. L. (2007). Comprehensive annotation of bidirectional promoters identifies co-regulation among breast and ovarian cancer genes. PLoS Computational Biology, 3(4), e72. https://doi.org/10.1371/journal.pcbi.0030072. Yang, F., O’brien, P. C., Milne, B. S., et al. (1999). A complete comparative chromosome map for the dog, red fox, and human and its integration with canine genetic maps. Genomics, 62(2), 189e202. https://doi.org/10.1006/geno.1999.5989. Ye, C. J., & Heng, H. H. (2017). High Resolution Fiber-Fluorescence In Situ Hybridization. Methods in Molecular Biology, 1541, 151e166. Ye, J. C., Liu, G., Bremer, S. W., et al. (2007). The dynamics of cancer chromosomes and genomes. Cytogenetic and Genome Research, 118(2e4), 237e246. https://doi.org/10.1159/ 000108306.

532

BIBLIOGRAPHY

Ye, J. C., Liu, G., & Heng, H. (2016). Simultaneous fluorescence immunostaining and FISH. In book: Fluorescence in situ hybridization (FISH). In T. Liehr (Ed.), Fluorescence in situ hybridization (FISH). Springer. https://doi.org/10.1007/978-3-662-52959-1_33. Ye, J. C., Liu, G., & Heng, H. H. (2018a). Experimental induction of genome chaos. Methods in Molecular Biology, 1769, 337e352. https://doi.org/10.1007/978-1-4939-7780-2_21. Ye, J. C., Lu, W., Liu, G., et al. (2001). The combination of sky and specific loci detection with fish or immunostaining. Cytogenetics and Cell Genetics, 93(3e4), 195e202. https://doi.org/ 10.1159/000056984. Ye, J. C., Regan, S., Liu, G., et al. (2018b). Understanding aneuploidy in cancer through the lens of system inheritance, fuzzy inheritance and emergence of new genome systems. Molecular Cytogenetics, 11, 31. https://doi.org/10.1186/s13039-018-0376-2. Ye, C. J., Sharpe, Z., Alemara, S., et al. (2019). Micronuclei and genome chaos: Changing the system inheritance. Genes, 10, 366. https://doi.org/10.3390/genes10050366. Ye, C. J., Stevens, J. B., Liu, G., et al. (2006). Combined multicolor-FISH and immunostaining. Cytogenetic and Genome Research, 114(3e4), 227e234. Ye, J. C., Stevens, J. B., Liu, G., et al. (2009). Genome based cell population heterogeneity promotes tumorigenicity: The evolutionary mechanism of cancer. Journal of Cellular Physiology, 219(2), 288e300. https://doi.org/10.1002/jcp.21663. Ying, A. Y., Ye, C. J., Jiang, H., et al. (2018). Simulation of karyotype evolution and biodiversity in asexual and sexual reproduction. BioRxiv. Retrieved from https://doi.org/10.1101/481275. Yosida, T. H., Kato, H., Tsuchiya, K., et al. (1974). Cytogenetical survey of black rats, rattus rattus, in southwest and central asia, with special regard to the evolutional relationship between three geographical types. Chromosoma, 45(1), 99e109. Youg, E. (2017). What If (Almost) Every Gene Affects (Almost) Everything? The Atlantic. Jun 16) https://www.theatlantic.com/science/archive/2017/06/its-like-all...man/530532/. Zack, T. I., Schumacher, S. E., Carter, S. L., et al. (2013). Pan-cancer patterns of somatic copy number alteration. Nature Genetics, 45(10), 1134e1140. https://doi.org/10.1038/ng.2760. Zanetti, M. (2017). Chromosomal chaos silences immune surveillance. Science, 355(6322), 249e250. https://doi.org/10.1126/science.aam5331. Zarrei, M., Macdonald, J. R., Merico, D., et al. (2015). A copy number variation map of the human genome. Nature Reviews Genetics, 16, 172e183 (England). Zeggini, E., Scott, L. J., & Saxena, R. (2008). Meta-analysis of genome-wide association data and large-scale replication identifies additional susceptibility loci for type 2 diabetes. Nature Genetics, 40, 638e645. Zeliadt, N. (2013). Profile of david jablonski. Proceedings of the National Academy of Sciences of the United States of America, 110(26), 10467e10469. https://doi.org/10.1073/ pnas.1309893110. Zhang, S., Mercado-Uribe, I., Xing, Z., et al. (2014). Generation of cancer stem-like cells through the formation of polyploid giant cancer cells. Oncogene, 33(1), 116e128. https://doi.org/10.1038/onc.2013.96. Zhang, Y., & Rowley, J. D. (2011). Chronic myeloid leukemia: Current perspectives. Clinics in Laboratory Medicine, 31(4), 687e698. https://doi.org/10.1016/j.cll.2011.08.012. Zhang, K., Wong, H. N., Song, B., et al. (2005). The unfolded protein response sensor ire1alpha is required at 2 distinct steps in b cell lymphopoiesis. Journal of Clinical Investigation, 115(2), 268e281. https://doi.org/10.1172/jci21848. Zhao, W. W., Wu, M., Chen, F., et al. (2015). Robertsonian translocations: An overview of 872 robertsonian translocations identified in a diagnostic laboratory in China. PLoS One, 10(5), e0122647. https://doi.org/10.1371/journal.pone.0122647. Zhu, J., Pavelka, N., Bradford, W. D., et al. (2012). Karyotypic determinants of chromosome instability in aneuploid budding yeast. PLoS Genetics, 8(5), e1002719. https://doi.org/ 10.1371/journal.pgen.1002719. Zimmer, C. (2000). Parasite rex. New York: The Free Press.

BIBLIOGRAPHY

533

Zimmer, C. (2010). Evolution: The triumph of an idea. HarperCollins. Zimmer, C., & Emlen, D. J. (2012). Evolution: Making sense of life. Roberts Publishers. Zuk, O., Hechter, E., Sunyaev, S. R., et al. (2012). The mystery of missing heritability: Genetic interactions create phantom heritability. Proceedings of the National Academy of Sciences of the United States of America, 109(4), 1193e1198. https://doi.org/10.1073/pnas.1119675109.

Index ‘Note: Page numbers followed by “f” indicate figures, “t” indicate tables.’

A Abnormal cellular growth, 228 genotypes, 213e214 phenotype, 213e216 Accepting system inheritance, 197e199 Active genome alterations, 374 Acute myeloid leukemia (AML), 97e98 Acute promyelocytic leukemia (APL), 103 Adaptation, 436e444 adaptive landscape evolutionary model, 322 cellular, 342e343 differs from speciation, 341e343 gene-mediated, 332 population, 400e401 short-term, 321e324, 334e335, 337, 342, 369, 378 transcriptional, 178e179 Advanced cytogenomic methods, 82 Alkaptonuria, 428 Alleles, 203 Alligator weed (Alternanthera philoxeroides), 213 AlphaGo (computer program), 463 Altered karyotypes and evolutionary certainty, 353e357 AML. See Acute myeloid leukemia (AML) Aneuploidy, 145, 222, 468 theory, 129 Angiogenesis, 107 Animal cloning, 194e195 Antigenetic determinism, ignored voice of, 18e21 APL. See Acute promyelocytic leukemia (APL) Arabidopsis, 178e179, 233e237 Artificial evolutionary experiments, 391 Artificial intelligence, 441e444, 462e469 data analysis centers, 478 Artificial mating, 381e382 Artificial selection, 316e332 Artificial species creation by shattering genome, 381e382

535

artificial animals/plants creation, 381e382 artificial laboratory species creation, 381 new cell line creation, 381 Asexual bdelloid chromosomes, 274 Asexual cellular reproduction, 279 Asexual organismal reproduction, 268 Asexual populations/individuals, 296 Asexual reproduction, 207, 264, 288 Asexualesexual transition, 272e273 Astyanax mexicanus, 330e331 ATGC, 196

B Bacteria artificial chromosome (BAC), 86 Bacteria(l), 62 artificial evolutionary experiments, 402e403 evolution, 271, 341 persistence, 211e212 species, 271, 341 Bcr/Abl fusion gene, 98 Bernardi’s neoselectionist theory of genome evolution, 421 Big data, 48e50 in biological systems, 462e469 and phenotypes, 468e469 theories vs., 464e465 Big Science infrastructures, 478 projects, 5, 7e8 Bio-efficacy, 391e392 Bio-heterogeneity, 457e458 Bio-inheritance, 172e173 Bio-uncertainty, 459e462 contributing factors, 462t Biocomplexity, 177 Bioelectric signals, 214 Bioinformatics, 49e50 Bioinformation, 173, 195e197 Biological principles/theories, 472e473 processes, 459 responses, 403e405

536

INDEX

Biological (Continued) species, 71, 284 systems, 67 emergent properties, 388e389 Biomarkers for adaptive biosystems, 462e469 collecting necessary data to creating new generation of, 465e467 Biomedical science, future of, 469e479 Bioprocesses, 436e438 Bioresearch landscape, 470 Biosystem levels, 437 “Bizarre-looking” chromosomes, 145 “Black box” algorithms of complex diseases, 468 BRCA mutations, 395e397, 445e446 Buchnera aphidicola, 271 “Business as usual” attitude, 430

C C-Frags. See Chromosome fragmentations (C-Frags) Cancer, 73e74, 98, 110, 119e120, 181e183 attractor theory, 130e131 cells, 131, 305e306 drug resistance, 211e212 populations, 204 challenging concept of sequential accumulation of gene mutations, 124e125 cytogenetic analyses, 257e258 drug resistance, 107 evolution, 268e269, 307 mechanism, 155e161 models, 304e307 phases, 438e439 studies, 459 gene mutation theory, 96e97, 110e111, 113e114, 116e117, 120e124, 128e129, 133, 161 initiation and progression, 156 limitations of searching for hallmarks, 125e128 model of cancer evolution, 165, 166f process, 268 stem cells, 132 susceptibility genes, 162 theories by developmental biology and epigenetics, 132 by natural history of evolution, 131 related to genetic and environmental factors, 132e133

Cancer Genome Project, 37, 108e109, 112, 114e115 Cancer genome sequencing discoveries and surprises, 111e116 chromosomal-level alterations, 114 fewer newly identified driver genes, 112 interesting gene mutation/genomic alteration patterns, 113 landscape dynamics during cancer progression, 115e116 multiple levels of genetic/genomic/ epigenomic landscapes, 114e115 validation of known cancer gene mutations, 112 initial goal and controversy, 108e111 project, 153 ultimate challenge to current cancer theory, 116e119 Cancer research, 96e134 cancer genome sequencing, 108e119 exceptions vs. general rule, 97e108 comparison of hematological malignancies, 101t exceptions of model systems and reality of cancer, 105e108 heterogeneity and treatment response, 102e104 population genomic structure and microenvironments, 102 unforeseeable negative impact of CML on research community, 104e105 increasing calls for new cancer theories, 129e134 noted competing theories/concepts, 129e133 search for new framework, 133e134 somatic gene mutation theory, 119e129 Canine transmissible venereal tumor (CTVT), 307e308 Causative factor, 267 “Cause and effect” approaches, 471e472 CCA. See Clonal chromosome aberration (CCA) cDNA, 26 Cell line creation, 381 Cell(ular) adaptation, 342e343 crisis, 149 death, 249 heterogeneity, 209 evolution, 269e270

INDEX

model, 402e403 fusion, 257, 348 heterogeneities, 208e209 immortalization model, 149 key genomic feature of cellular population, 205e210 stress response, 437e438 CG-CNVs. See Cytogeneticist microscopically visible harmless CNVs (CG-CNVs) Chimpanzee Sequencing and Analysis Consortium (2005), 272 Chromatin loop domains constrain gene function, 83e85 dynamic configuration, 87f dynamic model, 88 gene expression and, 85e88 Chromoplexy, 114, 419 Chromosomal instability (CIN), 127e128, 157, 441e444 CIN-mediated genome chaos, 167e168 Chromosomal set codes “system inheritance”, 185e195 background and rationale, 185e189 chromosomal coding, 195e201 accepting system inheritance, 197e199 limitations of reductionist tradition and power of metaphor, 199e201 topology, 195e197 mechanism and significance of preserving chromosomal coding, 192e195 model and prediction, 189e192, 189f Chromosomal variations, generally accepted, 242e243, 243t Chromosomal/chromosomes, 53e54, 170, 343, 392e394. See also Gene(s) aberrations, 134, 165, 354, 438e439. See also Unclassified chromosomal/ nuclear aberrations alterations, 157 chromosomal-level alterations, 114 chromosome-related gene silencing effect, 186 chromosome/chromatin territories, 186 coding, 40e41, 56, 170e201, 387 genes code “parts inheritance”, 175e184 inheritance, 173e175 rationale of searching for new types of inheritance, 170e173 condensation process, 247

537

heterogeneity, 171 inversion, 224 pulverization, 249 speciation, 72e73, 82 theory of inheritance, 53e54 unification, 258e260, 259f Chromosome fragmentations (C-Frags), 149, 225, 249, 250f Chromosome-based “systemic mutations”, 343e344, 385 Chromosome-based genomics, 77 Chromosome-mediated speciation, 343e358 altered karyotypes and evolutionary certainty, 353e357 core genome concept, 352e353 genome chaos, 350e352 geographic isolation role in speciation, 357e358 hybrid speciation, 348e350 new species formation, 357 spontaneous chromosome alterations mediated speciation, 345e348 Chromothripsis, 114, 419 Chronic myeloid leukemia (CML), 97e108 unforeseeable negative impact of, 104e105 Chronic phase chronic myeloid leukemia (CML-CP), 97e98, 100e102, 104e105 CIN. See Chromosomal instability (CIN) Classical “tumor suppressor”, 395e397 Clonal chromosome aberration (CCA), 117e118, 135e136, 136f, 191, 294, 432. See also Unclassified chromosomal/nuclear aberrations CML. See Chronic myeloid leukemia (CML) CML-CP. See Chronic phase chronic myeloid leukemia (CML-CP) CNVs. See Copy number variations (CNVs) Common ancestry, 314te315t Common genetic/epigenetic loci, 445e446 Common/complex diseases, 444e458 genetic disease classification, 446t GWI, 450e452 key features and types, 445e447, 445t Competent paradigm, 386e387 Complex phenotypes, 171e172 Complexity science, 471e472 Constrained genome, 60e67

538

INDEX

Constraint factors, 306 Context-defined inheritance, 198 Continuous selection, 312 Copernican concept, 17e18 Copy number variations (CNVs), 114e115, 176e177 fuzzy inheritance mechanisms, 226 Core genome concept, 352e353 Correlating phenotype, 304 CRISPR, 51, 195 CRISPR/Cas9, 441e444 CRISPRCas9emediated genome editing, 354e355 Critical thinking, 473 Cross Death Valley, 123 CTVT. See Canine transmissible venereal tumor (CTVT) Culturing process, 149 Cutting-edge technologies, 73 genomic technologies, 415 Cystic fibrosis, 173e174 Cytogenetic methodologies, 425e426 Cytogeneticist microscopically visible harmless CNVs (CG-CNVs), 226 Cytogenetics, 418 Cytogenomics, 418 Cytosolic DNA response, 157

D Darwinian evolutionary theory, 307, 312, 314te315t, 325 Darwinian microaccumulative model, 362f of natural selection, 333e334, 338e339 Darwinism, 301e302 Data collection, 465 De novo gene mutations, 211 Defective mitotic figures (DMFs), 225, 247e249, 248f, 252f Deoxyribose nucleic acid, 195 Developmental macromutations, 77 Diploidization, 349 Disease phenotype, 439 Disease-specific genomic patterns, 430e431 Distinctive karyotypes, 165e167 Diverse biological phenomena, 416 Diverse combinational patterns, 456e457 Diverse experimental manipulations, 441e444 Diverse individual factors, 285

DMFs. See Defective mitotic figures (DMFs) DNA, 173. See also Chromosomal/ chromosomes coding strategy, 421 DNA-binding sites, 233e237 specificity, 459 DNA/chromatin fragments, 233e237 double-helix model, 2 “halo” preparations, 245e246 methylation, 212 model, 195 repair, 377, 403e405 gene, 133 hypothesis, 265e266 model, 273 sequencing, 73e74, 187, 335 transfer methods, 441e444 DobzhanskyeMuller model, 70e71 DoE. See US Department of Energy (DoE) Double-strand breaks (DSBs), 222e224 Drosophila, 233e237 D. persimilis, 348e349 D. pseudoobscura, 70e71, 348e349 Drug resistance, 181e183, 208e209, 305 treatment, 153e155 DSBs. See Double-strand breaks (DSBs) Duchenne muscular dystrophy, 2e3 Dynamic anchor reconciles, 86e88 Dynamic chromatin domain model of transcription regulation, 86

E Ecological model, 72 Education of biomedical science, 469e479 Encyclopedia of DNA elements project (ENCODE project), 16, 24, 392e394, 456e457 Endomitosis, 290 Environment(al), 439e441 environmentegene interaction, 13 factors, 197, 202, 218e219 interaction, 391, 403e405 isolation, 350 stress, 209e210, 441 Enzyme, 183e184 Epigene-mediated selection, 397e400 Epigenetic alterations, 441e444

INDEX

fuzzy inheritance mechanisms at Epigenetic level, 230e232 mechanisms, 174 signatures, 403e405 Epogen, 24e25 Escherichia coli, 177, 271 genomes, 69 Eukaryotic/eukaryotes, 271, 338, 402e403 cells, 62 genome reorganization in eukaryotic evolution, 365te366t somatic cell evolutionary pattern, 402e403 systems, 187 Evolution, 314te315t, 358e381 astonishing speed, 326e327 genome reorganization in eukaryotic evolution, 365te366t integrated model of speciation, 359e369 reinterpretations, 369e381 extinction issues, 378e379 fast or sluggish evolution, 370 genome basis of punctuated equilibrium, 372e374 genome-based alternative mechanisms, 369e370 invisible missing link in macroevolution, 374e376 irrelevance of neutral theory, 371e372 multilevel evolution and constraint, 376e377 somatic cell dynamics and germline constraint, 377e378 unified evolutionary theory, 379e381 Evolutionary adaptive principle, 422e423 Evolutionary constraints, 332e338, 392e394 genome integrity and, 334e338 factors contribute to genome constraint, 337e338 genome-level constraint, 334e337 importance, 333e334 Evolutionary field, 262 Evolutionary mechanism, 149 of cancer, 107, 127e128, 138, 149, 155e161, 166f components, 160f diverse individual mechanisms, 158e161 linking genome heterogeneity to tumorigenesis, metastasis, and drug resistance, 155e158

539

Evolutionary medicine, 472 Evolutionary scandal, 272 Evolutionary theory, 384 comparing/reexamining, 313e316 Exosomes, 233e237 Extended evolutionary synthesis, 384 Extinction issues, 378e379 Extrachromosomal elements, 73

F Falsifications of genome theory, 419e421 Fast evolution, 370 Fast reproduction isolation, 356 “Filter out” aberrations, 276 FISH. See Fluorescence in situ hybridization (FISH) high-resolution fiber FISH, 245e246 Fission yeast, 181e183 Fixed genomes, 400e401 Flatworms (Dugesia japonica), 215 Fluorescence in situ hybridization (FISH), 84, 86, 87f, 185, 244e245, 246f 4D genomics, 43e44, 51e52, 300, 339, 422 Free chromatin, 244e247, 244fe246f Fullerenes, 59 Fusion fusion, 348 genes, 157 process, 150 Fuzzy inheritance, 202e242, 220f, 391e392, 405e406, 426, 453e454 inheritance of heterogeneity in organismal systems, 210e216 and key differences compared to traditional inheritance, 216e221 mechanisms, 221e238, 223t at CNV level, 226 at epigenetic level, 230e232 at gene level, 227e230 of interesting observations, 233e238 at karyotype level, 222e226 for mitochondrion, 232e233 new inheritance to heterogeneity, 205e210 potential significance and implications, 238e242 rationales for searching, 202e205

G G-banding, 134 GEN. See Genetic Engineering & Biotechnology News (GEN)

540

INDEX

Gene theory, 60e83, 216e219, 458 constrained genome, 60e67 genomes not genes defining biosystems, 67e83 gene or genome alterations responsible for speciation, 70e83 minimal gene sets in nature, 67e70, 68t selfish gene, 60e67 Gene-based genomics diminishing power of, 15e37 ignored voice of antigenetic determinism, 18e21 reality check of “industry gene” concept, 24e28 rising and falling of gene, 21e24 gene-based 1D genomics, 28e37 Gene(s), 2, 53e54, 170, 376. See also Chromosomal/chromosomes code “parts inheritance”, 175e184 coding, 394 conflicting relationship between genome and, 83e92 expression and chromatin loops, 85e88 gene-based developmental macro mutations, 343e344, 385 inheritance, 173e174 knowledge, 170 molecular genetics, 2e3 research, 171 gene-centric concept, 2, 66e67, 97, 204 genetics, 238e239 reductionist approach, 3, 55e56 theory, 414 gene-level diversity, 288 gene-mediated adaptation, 332 microevolution, 400e401, 402t selection, 397e400 geneeenvironment interactions, 174, 395e397 geneeproteinephenotype, 173 genome context determining gene function, 92e93 hunting movement, 3 mutations, 171, 180e181, 226, 292, 395e397, 438e439 in normal tissues, 227 patterns, 113, 464 order, 193 Generosity, 277

Genetic Engineering & Biotechnology News (GEN), 150 Genetic Engineering Technology, 24e25 Genetic(s), 2, 6e8, 38e41 code, 196 compensation, 178e179 dark matter, 28e29, 414 determinism, 18, 21e22 diversity, 266, 281e282, 291 earthquake, 152 experiments, 79e80 factors, 202 genetic/genomic information, 198 heterogeneity, 408e409 identical clones, 283 information, 195 inheritance, 173 loci, 395e397 manipulation, 2e3 models, 72 multiple levels of genetic/epigenetic alterations, 406e408 parts, 264 speciation, 72 variation, 267 Genome, 53e56, 336e337, 376 action requirement, 93e94 alterationemediated speciation model, 359, 360f, 370 basis of punctuated equilibrium, 372e374 in biology, 3 cluster, 334e335 codes, 335 conflicting relationship between gene and genome, 83e92 chromosomal position and loop size, 83e85 gene expression and chromatin loops, 85e88 loops/chromosome length and AT/GC composition, 88e92 constraints, 292, 329e330 context, 81 determining gene function, 92e93 defining species, 403e405 editing technology, 441e444 emergent relationship with genes, 57e60 genome-based alternative mechanisms for evolution, 369e370 evolutionary theory, 314te315t function of sex, 289

INDEX

genomic theory, 388, 414 inheritance, 170 macroevolutionary concept, 297 genome-defined genetic network, 387 system inheritance, 380e381, 391 genome-level alterations, 447 constraints, 334e337, 370 heterogeneity, 159 stability, 269e270 genome-mediated macroevolution, 400e401, 402t speciation model, 73, 373 genomeeenvironmentetime interaction, 449 heterogeneity, 209e210, 389, 390f, 390t hypothesis, 421 instability, 127e128, 158 packageebased selection strategy, 347e348 reexamining gene theory predictions, 60e83 replacement, 400e401 reshuffling, 409e410 response to stress, 406e408 selection, 381e382 stochasticity, 453e454, 453f type, 304 variations, 242e260 generally accepted chromosomal variations, 242e243 unclassified chromosomal/nuclear aberrations, 243e260 Genome alterations, 444e458 new model with new explanations, 452e458 responsible for speciation, 70e83 stochastic, 447e450, 450f Genome chaos, 134e161, 254e257, 255f, 311, 350e352, 424 evolutionary mechanism of cancer, 155e161 genome chaos-mediated cancer drug resistance, 181e183 genomic landscape reorganization, 144e155 images of abnormal separation and condensation, 256f linking incidental NCCAs to CIN and evolutionary potential, 135e139 nonclassified spectral karyotype, 256f

541

structural and/or numerical, 154fe155f Genome integrity, 332e338 evolutionary constraints, 334e338 importance, 333e334 Genome reinterpretation dethrones queen changing concepts, 277e286 facts, 296 fill in knowledge gaps, 296e297 first principle, 293 function and common mechanism of sex, 273e277 paradoxes, 293e295 purpose of sex, 262e267 reinterpretation using new framework, 289e292 simulation, 286e289 parental genome size, 287f topological changes, 287f Genome reorganization, 185, 279, 345, 378e379 in eukaryotic evolution, 365te366t genome reorganizationemediated speciation, 358 Genome theory, 51, 178e179, 277, 334, 409e410, 418 challenges, 421e426 genome theoryebased 4D genomics, 300 implications, 338e358 chromosome-mediated speciation, 343e358 concept of species, 339e341 origin of adaptation differs from speciation, 341e343 multilayer sandwich model of selection and self-organization, 412f predictions, 413e415 implications, limitations, and falsifiability, 413e421 principles, 392e412 rationale for establishing genome-based, 383e387 unique considerations, 387e392 emergent properties in biological systems, 388e389 genomic principles, 389e392 integrated information unit, 387e388 Genome-wide association studies (GWAS), 173e174, 448e449, 459 Genomic alteration patterns, 113 Genomic landscape reorganization, 144e155 causative factors, 149

542

INDEX

Genomic landscape reorganization (Continued) implications, 152e155 mechanisms, 149e150 principal component analysis of global gene expression data, 148f SKY images, 147f transition from relatively normal to chaotic genome, 146f unusual rounds of cell division and SKY/ reverse DAPI images, 151f Genomic model for cancer evolution, 161e168 emergent new genome systems, 164e168 game for outliers, 163e164 Genomic(s), 2e8 arrangement, 355e356 changes degree, 206e207 coding systems in biology, 394 diminishing power of gene-based genomics, 15e37 emergence of, 2e15 fundamental limitations of traditional genetics, 9e15 genomic/epigenetic variations, multiple levels of, 438e439 heterogeneity, 239e240, 441e444 information, 150, 197e199 new genomic science on horizon, 37e52 crisis created new opportunities, 41e50 4D genomics, 51e52 time to rethinking genetics and genomics, 38e41 new scientific expectations, 45e46 organization, 185 parts versus the whole, 57e60 topology, 188e189, 387, 392e394 Genotype, 204e205, 209e210 evolution pattern, 142e143 Genotypeeenvironmentephenotype relationship, 174 Genotypeephenotype mapping, 222 relationship, 59e60, 175, 204, 353e357 Geographic isolation, 342 role in speciation, 357e358 Germline chromothripsis event, 152 constraint, 377e378 genomes, 449 mutations, 203 transmission, 387

Giant nucleus, 151f Goldschmidt’s theory, 77 Gradual accumulation, 312 Gradualism, 314te315t Gulf War illness (GWI), 249, 415, 450e452 general model of GWI, 450e451 symptoms, 451f GWAS. See Genome-wide association studies (GWAS) GWI. See Gulf War illness (GWI)

H Haemophilus influenzae, 186 Haploid yeast, 181e183 Healthy research ecosystem for biomedical research, 476e479 Heritability, 202 Heterogeneity, 42e43, 49, 69e70 heterogeneity-favoring strategy, 405e406 heterogeneous environments, 290 hidden inheritance of heterogeneity from normal genomes, 213e216 inheritance of, 209e210 karyotype, 208e209, 211, 224, 408e409 new inheritance to, 205e210 and treatment response, 102e104 Heteromorphic chromosomes, 274 Heteroplasmy, 232e233 HGT. See Horizontal gene transfer (HGT) HI. See Hybrid isolation (HI) Hi-C technology, 88, 418, 423, 425 “Hidden” phenotypes, 233e237 High-speed speciation, 368 Higher-level interactions, 441e444 Histone gene cluster, 192 “Hopeful monster” concept, 307 Horizontal gene transfer (HGT), 62e63, 271 “Housekeeping” proteins, 188 Hox gene cluster, 192 Hsp90, 233e237 HUGO. See Human Genome Organization (HUGO) huIFNB1. See Human IFN-b gene (huIFNB1) Human cell immortalization model, 153e155 Human Genome Organization (HUGO), 4e5 Human Genome Project, 4e5, 19, 25, 57, 92e93, 108e109, 134, 173e175, 246e247, 428e429, 470

INDEX

Human genomics, 65 Human hologenomes, 440 Human IFN-b gene (huIFNB1), 86 Humanemouse hybrid cell line, 254 Huntington’s disease, 233e237 Hybrid isolation (HI), 75 Hybrid speciation, 348e350 Hybrid-generated polyploidy, 349 Hybridization, 348e349 Hypothesis, 289

I ICGC. See International Cancer Genome Consortium (ICGC) IFN-b. See Interferon-b (IFN-b) Imatinib, 98e104 Immortalization, 208e209, 305 In vitro immortalization model of LiFraumeni fibroblast cells, 139 Individual genomes, 397e400 Individualized medicine, 16e17 “Industry gene” concept, reality checking of, 24e28 Infection theory of cancer, 132 Inheritance, 22e23, 70, 173e175, 400e401 of heterogeneity, 209e210 in organismal systems, 210e216 abnormal phenotype, 213e216 types of organisms, 210e213 rationale of searching for new types of, 170e173, 202e205 of unstable cellular population, 206e207 Interferon-b (IFN-b), 86 Internal genetic conflicts, 76 International Cancer Genome Consortium (ICGC), 111 Irrelevance of neutral theory, 371e372 Isolated genetic factor, 53e54

K Karyotype(s), 56, 139e141, 140f, 347e348, 392e394, 403e405, 425 coding, 40e41, 56, 170e201 dynamics, 435e436 evolution, 100 fuzzy inheritance mechanisms, 222e226 genome-level diversity, 288 heterogeneity, 208e209, 211, 408e409 in somatic systems, 224 karyotype-defined genomes, 454e455 karyotype-defined system, 202 inheritance, 344

543

karyotype-mediated-punctuated macroevolution, 202 plus gene topology, 204 Knowledge gaps, fill in, 296e297 Knowledge structure, 471e473 Kuhn’s criteria, 60

L Large-scale methodologies, 424 Large-scale-omics, 384, 459 Li-Fraumeni fibroblast cells, in vitro immortalization model of, 139 Limited fuzzy inheritance within populations, 352e353 Linear approaches, 471e472 Long-term evolution for speciation, 321e323, 327, 351 stability, 336

M Macro/microevolutionary transition, 310e311 Macrocellular evolution, 115e116, 152, 164e165 Macroevolution, 78e79, 300, 309, 310f, 321e322, 370, 406e408 artificial selection, 316e332 artificial species creation by shattering genome, 381e382 cancer evolutionary models, 304e307 comparing/reexamining evolutionary theories, 313e316 creation, 359e369 evolution, 358e381 evolutionary constraints, 332e338 genome theory implications, 338e358 natural evolution, 307e309, 316e332 Maize (Zea mays), 177e178 Making sense of genes (Kampourakis), 38 Mammalian species, 176e177 Massive chaotic chromosomes, 254 Massive extinction, 351 Massive speciation during crisis, 350e352 Mathematical models, 419 Mathematics in biology, 472e473 Mayr’s biological concept of species, 308 MC4R gene. See Melanocortin 4 receptor gene (MC4R gene) Meiosis, 286e287, 294 Melanocortin 4 receptor gene (MC4R gene), 330e331

544

INDEX

Mendel’s law/theory, 9e12 genetic factors, 9e10 of genetics, 58, 203 in genomics, 465 Mendel’s peas, 10, 204 Mendelian genetics, 197, 338e339 diminishing power of gene-based genomics, 15e37 emergence of genomics, 2e15 new genomic science on horizon, 37e52 Mendelian inheritance, 2 Mendelian Inheritance in Man (MIM), 6 Metabolic functions, 230 Microbiome research, 440 Microenvironments, 284e285 Microevolution, 300, 310f, 400 artificial selection, 323e332 creation, 359e369 natural selection, 323e332 selective contributions from individual genes, 337 Microevolutionary selection, 397e400 Micronuclei (MN), 257 cluster, 257e258, 257fe258f Microscopic freshwater invertebrate, 272 Million veteran program (MVP), 468e469 MIM. See Mendelian Inheritance in Man (MIM) Minimal gene sets in nature, 67e70, 68t Missing heritability, 29, 199 Missing link in macroevolution, 374e376 Mitelman Database, 98 Mitochondria’s contribution to cancer, 132e133 Mitochondrion dysfunction, 415 fuzzy inheritance mechanisms for, 232e233 MN. See Micronuclei (MN) Mobile chromatin fragments, 233e237 Modern genetics, 9, 11 Modern Synthesis (MS), 384, 389e391 Molecular biologists, 413 of DMFs, 247 genetics, 2, 175e176 magic bullet, 437e438 mechanisms, 254 medicine, 427e429, 476, 478e479. See also Precision medicine (PM) pathway, 171 Molecular reproducibility, 461

Monogonont rotifers, 290 Monster cancer chromosomes, 153 Moon Shots Program, 96 “Moonlighting” proteins, 395e397 Moral crisis, 317e318 Mosaicism, 260, 459 Mouse Genome Sequencing Consortium (2002), 272 Mouse model system, 433e434 MS. See Modern Synthesis (MS) Muller’s ratchet, 264 Multicellular organisms, 277e278 Multilayer sandwich model, 412f, 416, 417t Multilevel evolution and constraint, 376e377 Mutational spectra, 113 Mutator phenotype, 133 MVP. See Million veteran program (MVP) MYC, 112, 167 Mycoplasma genitalium, 178e179 Mycoplasmas, 68 MYO1 gene, 64

N National Biomarkers Development Alliance, 465e466 Natural evolution, 307e309 Natural populations, 323e324 Natural selection (NS), 314te315t, 316e332, 384, 397e400 Nature Review Genetics, 35e36 NCCAs. See Nonclonal chromosome aberrations (NCCAs) NCI. See US National Cancer Institute (NCI) Negative falsification, 420 Neo-Darwinians, 344e345 concepts, 309e313 evolution, 202, 313 neo-Darwinianebased concept of speciation, 342 population concept, 302 synthesis, 342, 419 theory, 327, 333e334 Neo-Darwinism, 322e323, 358e359 Network organizer, 392e394 New inheritance to heterogeneity, 205e210 karyotype heterogeneity, 208e209 single cell, 208 system inheritance, 207e208 of unstable cellular population, 206e207

INDEX

New species formation, 357 Newton’s second law, 13e14 Non-Darwinian evolution dynamics, 311 Nonallelic homologous recombination, 224 Nonclonal aneuploidy, 468 Nonclonal cells, 375e376 Nonclonal chromosome aberrations (NCCAs), 102e103, 122e123, 135, 136f, 190, 294, 310e311, 345, 406e408, 432, 468. See also Unclassified chromosomal/nuclear aberrations linking incidental NCCAs to CIN and evolutionary potential, 135e139 NCCA/CCA cycle, 168 relationship between stress, hallmarks, and CIN, 128f two phases of cancer evolution, 139e144 Noncoding RNA studies, 422 Nongenetic variants, 284 Nongenic factors, 77 Nonhomologous end joining, 352 Nonlinear dynamics, 471e472 Nonlinear genome/environmental interactions, 219 NOR. See Nuclear organizer region (NOR) Normal genotype, 439 “Normal science” phase of molecular biology, 475e476 NS. See Natural selection (NS) Nuclear clusters, 150 Nuclear matrix, 85e86 Nuclear organizer region (NOR), 244 Nuclear scaffold/matrix-attachment regions (S/MARs), 85, 87f Nuclei transferebased cloning, 195 Nucleic acids, 195 Numerical chromosomal variations, 242 “Nursing” process, 233e237

O Obesity, 414e415 Oenothera gigas, 72e73 Oenothera lamarckiana, 72e73 “Omics” projects, 16e17 “Omnigenic” model, 174 Oncogenes, 113, 164 1D-gene framework, 422 Onions (Allium), 177e178 Open reading frame (ORF), 183e184 Organic systems, 420

545

Organism space, 339e340 Organismal evolution, 269e270, 269f Organismal systems, inheritance of heterogeneity in, 210e216 Organisms, 71

P p53 gene, 395e397 characterization, 434e435 functions, 433e434 knockout cells, 437e438 mutation, 432e433 mutations, 445e446 mutome, 435 p53-related molecular pathways, 432 studying, 431e436 Pain relief medication, 441 Pangen, 53e54 Paradigm shift, 20, 41e42 Paradox(es), 293e295 of sex, 266, 276 Parthenogenesis, 262e263 Parts coding, 394 Parts inheritance, 185, 193e194, 199, 275, 305, 322e323, 394, 464 Parts-whole relationship, 176e184 Pathological condition, organismal systems in, 210e216 Pathway switching, 453e454 pd7, 139e141 pd19, 139e141 pd54, 139e141 Personal Genome Project, 30 Phantom heritability, 29 Phase transitions, 166f, 208, 360f Phenotypic/phenotypes, 204e205, 209e210, 337, 353e354 alterations, 441e444 penetration, 183e184 plasticity, 221 Philadelphia chromosome (Ph), 98, 134 Philosophy of science, 472 Physics in biology, 472e473 Physiological condition, organismal systems in, 210e216 PM. See Precision medicine (PM) Policy matters, 474e479 Polygenic model, 40 Population adaptation and survival, 400e401 diversity, 159, 288 genetics, 458

546 “Positive” validation of theory, 420 Precision medicine (PM), 429e430 challenges and opportunities for, 430e458 future direction, 458e479 promises, 427e430 Precision Medicine Initiative, 429, 438e439 Premature chromosome condensation. See Chromosomal/ chromosomesdpulverization Preserving chromosomal coding, 192e195 Professionalism, 473e474 Prokaryotes, 338, 402e403 Prokaryotic systems, 186 Protein-coding genes, 74, 178e179 Punctuated equilibrium, 142e143 genome basis of, 372e374 Punctuated evolution, 142e143 Punctuated genome-level alteration, 198e199 “Punctuated” cancer evolution, 142e143 “Purifying gene mutation” model, 273 Purine degradation pathway, 187e188

Q Quantitative model, 403e405, 426

R Random mating model, 288e289 Rationale of searching for new types of inheritance, 170e173, 202e205 Reactive oxygen species (ROS), 285 Reciprocal translocation, 222e224 Recombinant DNA technology, 2 Recurrent miscarriages, 152 Red Queen hypothesis, 265, 291 Reductionism, 58e59 Reductionist approaches, 423 Reductionist tradition and power of metaphor, 199e201 Reinterpretation using new framework, 289e292 Reproductive isolation, 72, 329, 342 Research communities, 470e471 Retroviral interaction, 403e405 Rewiring the gene network, 183e184 Rice (Oryza sativa), 177e178 RNA, 173 ROS. See Reactive oxygen species (ROS)

INDEX

S S/MARs. See Nuclear scaffold/matrixattachment regions (S/MARs) Saccharomyces cerevisiae, 82, 178e179 yeast, 272e273 Saccharomyces paradoxus, 82 Saltationist theory, 312 SC. See Synaptonemal complex (SC) Schizosaccharomyces pombe, 178e179 Scientific culture, 473e474 Scientific research, 471 SCNAs. See Somatic copy number alterations (SCNAs) Segregation principle, 203 Self-healing, 441 Self-organization, 416 Selfish DNA, 65 Selfish Gene, The (Dawkins), 2, 23e25, 61 Selfish gene concept, 60e67 Sequential accumulation of gene mutations, 124e125 Sex, 264 function, 286e289 and common mechanism, 273e277 purpose, 262e267 safeguards genomic integrity, 335e336 sexegenome relationship, 294e295 Sex-mediated genome constraints, 377e378 integrity, 336 Sexual filter, 274 Sexual populations/individuals, 296 Sexual reproduction, 207, 262e263, 288, 290, 303, 308, 337, 339e340, 344e347, 351, 356 generates diversity, 265 Short tandem repeats (STRs), 233e237 Short-term adaptation, 321e324, 334e335, 337, 342, 369, 378 Short-term falsification, 420 Sickle cell anemia, 428 Simple evolutionary principles, 300e303 Simple translocations, 145 Single cell, 208 technology, 459 Single geneemediated phenotypes, 216 Single-molecular mechanisms, 149 Sister unit fibers, 251, 252f SKY. See Spectral karyotyping (SKY) Sluggish evolution, 333, 370 Somatic cell(s), 270 dynamics, 377e378

INDEX

evolution, 198, 400e403 conflict with neo-Darwinian concepts, 309e313 model, 269e270, 269f similarities and differences with natural evolution, 307e309 genomes, 449 models, 220e221 reproduction, 268 Somatic CNV, 114e115 Somatic copy number alterations (SCNAs), 114e115, 226 Somatic evolution, 430 Somatic gene mutation theory, 119e129 challenging, 120e124 sequential accumulation of gene mutations, 124e125 clinic facts not supporting cancer gene mutation theory, 128e129 limitations of searching for hallmarks of cancer, 125e128 Somatic genomes, 440 alterations, 408e409 Somatic hybrids, 348 Somatic systems, karyotype heterogeneity in, 224 Speciation, 70e83, 314te315t, 367, 402e403 adaptation differs from, 341e343 concept in evolution, 465 event, 328 genome alterationemediated, 359, 360f genome reorganizationebased macroevolutionary model of, 362f of gibbons, 352 integrated model of, 359e369 spontaneous chromosome alterations mediated, 345e348 Species concept of, 339e341 maintenance, 359e369 species-specific fuzzy inheritance, 213 Spectral karyotyping (SKY), 88, 89f, 134, 207 Spermeegg interaction, 274 Sponges, 392e394 Spontaneous chromosome alterations mediated speciation, 345e348 Steroid receptors, 233e237 Sticklebacks, 331e332 Sticky chromosomes, 251e254, 253f Stochastic genomic alterations, 447e450

547

“Stochastic/unpredictable” cancer evolution, 142e143 Stress, 436e444 stress-induced dynamics, 406e408 stress-induced system dynamics, 159 instability, 422e423 “Stress promoted, genetic variation mediated cellular evolution” concept, 441e444 STRs. See Short tandem repeats (STRs) Structural chromosomal variations, 242 Super-fast evolving fish splitting, 368 Synaptonemal complex (SC), 89f, 91 SC length, 88e91 Syngamy, 281 Synthetic DNA, 26 System behavior, 468 biologists, 394, 413 biology, 387e388 inheritance, 185, 199, 207e208, 275, 305, 322e323, 394, 464 instability, 432, 449 mutation, 77, 79e80

T TADs. See Topologically associated domains (TADs) TCGA project. See The Cancer Genome Atlas project (TCGA project) TDFTSeeTasmanian devil facial disease (TDFT) Telomere dysfunction, 152 TEs. See Transposable elements (TEs) Tetranucleotides, 196 The Cancer Genome Atlas project (TCGA project), 108e109 Theory-based science of genetics, 428 Three-stage model, 406e408 Time factor, 422 Tissue organization field theory (TOFT), 130 Toadflax flower (Linaria vulgaris), 213 “Toolkit or regulation” proteins, 188 Topologically associated domains (TADs), 88 Topology, 195e197 information, 197 Toxicity response, 403e405 Trade-off, 436e444 Traditional genetic

548 Traditional genetic (Continued) analyses, 45e46 fundamental limitations of, 9e15 Traditional inheritance, 216e221 Transcriptional adaptation, 178e179 Transcriptome heterogeneity, 209 Transient nonclonal events, 267e268 Transitional karyotypes, 255 Translational genomic research, 430 Transposable elements (TEs), 65, 73, 81, 190 Tumor society, 131 Tumor suppressor gene, 113 Tumorigenicity, comparative studies of, 159 Two phases of cancer evolution, 118e119, 139e144, 141f, 166, 360, 385 dynamics, 143 implications, 143e144 terminologies, 142e143

U UK Biobank, 468e469 Ultraconserved DNA elements, 183e184 Unclassified chromosomal/nuclear aberrations, 243e260. See also Chromosomal/chromosome; Nonclonal chromosome aberrations (NCCAs) abnormalities/variations, 258, 259f C-Frag, 249 DMFs, 247e249 free chromatin, 244e247 genome chaos, 254e257

INDEX

MN cluster, 257e258 sticky chromosomes, 251e254 unification of different types of chromosomal aberrations, 258e260 unit fibers, 249e251 Unified evolutionary theory, 379e381 Unit fibers, 249e251, 251f Unstable cellular population, inheritance of, 206e207 Unstable genomes, 454e455 US Department of Energy (DoE), 4e5 US Food and Drug Administration, 468 US National Cancer Institute (NCI), 31, 96, 427e428

V Variation, 436e444 Variety sparks sexual evolution, 290 Vertebrate b-globin locus, 192

W Warburg effect, 132e133 “Watching karyotype evolution in action” experiment, 139 Wild-type p53 gene, 437e438 “Window of opportunity”, 449 Wine grapes, 283

X X-linked chronic granulomatous diseases, 2e3

Y Yeast polyubiquitin gene, 233e237