Genetic Diversity [1 ed.] 9781608765416, 9781607411765

150 11 8MB

English Pages 320 Year 2009

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Genetic Diversity [1 ed.]
 9781608765416, 9781607411765

Citation preview

Copyright © 2009. Nova Science Publishers, Incorporated. All rights reserved.

Copyright © 2009. Nova Science Publishers, Incorporated. All rights reserved.

Genetics – Research and Issues Series

GENETIC DIVERSITY

Copyright © 2009. Nova Science Publishers, Incorporated. All rights reserved.

No part of this digital document may be reproduced, stored in a retrieval system or transmitted in any form or by any means. The publisher has taken reasonable care in the preparation of this digital document, but makes no expressed or implied warranty of any kind and assumes no responsibility for any errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of information contained herein. This digital document is sold with the clear understanding that the publisher is not engaged in rendering legal, medical or any other professional services.

Genetics – Research and Issues Series Sex Chromosomes: Genetics, Abnormalities, and Disorders Cynthia N. Weingarten and Sally E. Jefferson (Editors) 2009. ISBN: 978-1-60741-304-2

Copyright © 2009. Nova Science Publishers, Incorporated. All rights reserved.

Genetic Diversity Conner L. Mahoney and Douglas A. Springer (Editors) 2009. ISBN: 978-1-60741-176-5

Genetics – Research and Issues Series

GENETIC DIVERSITY

CONNER L. MAHONEY AND

DOUGLAS A. SPRINGER

Copyright © 2009. Nova Science Publishers, Incorporated. All rights reserved.

EDITORS

Nova Science Publishers, Inc. New York

Copyright © 2009 by Nova Science Publishers, Inc.

All rights reserved. No part of this book may be reproduced, stored in a retrieval system or transmitted in any form or by any means: electronic, electrostatic, magnetic, tape, mechanical photocopying, recording or otherwise without the written permission of the Publisher. For permission to use material from this book please contact us: Telephone 631-231-7269; Fax 631-231-8175 Web Site: http://www.novapublishers.com NOTICE TO THE READER The Publisher has taken reasonable care in the preparation of this book, but makes no expressed or implied warranty of any kind and assumes no responsibility for any errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of information contained in this book. The Publisher shall not be liable for any special, consequential, or exemplary damages resulting, in whole or in part, from the readers’ use of, or reliance upon, this material. Any parts of this book based on government reports are so indicated and copyright is claimed for those parts to the extent applicable to compilations of such works. Independent verification should be sought for any data, advice or recommendations contained in this book. In addition, no responsibility is assumed by the publisher for any injury and/or damage to persons or property arising from any methods, products, instructions, ideas or otherwise contained in this publication. This publication is designed to provide accurate and authoritative information with regard to the subject matter covered herein. It is sold with the clear understanding that the Publisher is not engaged in rendering legal or any other professional services. If legal or any other expert assistance is required, the services of a competent person should be sought. FROM A DECLARATION OF PARTICIPANTS JOINTLY ADOPTED BY A COMMITTEE OF THE AMERICAN BAR ASSOCIATION AND A COMMITTEE OF PUBLISHERS.

Copyright © 2009. Nova Science Publishers, Incorporated. All rights reserved.

Library of Congress Cataloging-in-Publication Data Genetic diversity / [edited by] Conner L. Mahoney and Douglas A. Springer. p. ; cm. Includes bibliographical references and index. ISBN 978-1-60876-541-6 (E-Book) 1. Variation (Biology) I. Mahoney, Conner L. II. Springer, Douglas A. [DNLM: 1. Genetic Variation--genetics. 2. Adaptation, Biological--genetics. 3. Biodiversity. 4. Evolution, Molecular. QU 500 G328 2009] QH401.G46 2009 576.5'8--dc22 2009012468

Published by Nova Science Publishers, Inc.    New York

Contents Preface

Copyright © 2009. Nova Science Publishers, Incorporated. All rights reserved.

Chapter 1

vii Analysis of Sequence Diversity at Two Mitochondrial Genes on Different Taxonomic Levels. Applicability of DNA Based Distance Data in Genetics of Speciation and Phylogenetics Y. Ph. Kartavtsev

Chapter 2

Chromosomal Variability and the Origin of Citrus Species Marcelo Guerra

Chapter 3

Genetic Diversity of Mycobacterium Tuberculosis Population in Bulgaria Violeta Valcheva, Igor Mokrousov, Olga Narvskaya, Nalin Rastogi and Nadya Markova

1 51

69

Chapter 4

Genetic Diversity in Switchgrass – A Potential Bioenergy Crop B. Narasimhamoorthy, M. C. Saha, H. S. Bhandari and J. H. Bouton

105

Chapter 5

Genetic Variability in the Fescue-Ryegrass Complex F. M. Kirigwi, A. A. Hopkins and M. C. Saha

129

Chapter 6

Genetic Diversity of the Population of Russia: Gene Pool and Genegeography Sergei Rychkov, Oksana Naumova, Alexei Evsyukov, Irina Morozova, Yuri Shneider and Olga Zhukova

Chapter 7

Genetic Variability within Cypella fucata Ravenna in Southern Brazil Évilin Giordana de Marco, Luana Olinda Tacuatiá, Lilian Eggers, Eliane Kaltchuk-Santos and Tatiana Teixeira de Souza-Chies

149

179

vi Chapter 8

Chapter 9

Chapter 10

Chapter 11

Chapter 12

Copyright © 2009. Nova Science Publishers, Incorporated. All rights reserved.

Index

Contents Genetic and Functional Diversity of Phosphate Solubilizing Fluorescent Pseudomonads and Their Simultaneous Role in Promotion of Plant Growth and Soil Health K. Badri Narayanan, M. Jaharamma, G. Raman and N. Sakthivel Genetic Diversity and Population Structure of Alpine Plants Endemic to Qinghai-Tibetan Plateau, with Implications for Conservation under Global Warming Yupeng Geng, John Cram and Yang Zhong

195

213

Bayesian Inference under Complex Evolutionary Scenarios Using Microsatellite Markers: Multiple Divergence and Genetic Admixture Events in the Honey Bee, Apis Mellifera Jean-Marie Cornuet, Laurent Excoffier, Pierre Franck and Arnaud Estoup

229

Geographic Structure of Craniometric Variation and the Estimates of Possible Dispersal Routes of Major Human Populations Tsunehiko Hanihara

247

Intra-Specific Genetic Variation in Mosses: A Novel Approach to Detect Environmental Changes Valeria Spagnuolo, Stefano Terracciano and Simonetta Giordano

271 293

Copyright © 2009. Nova Science Publishers, Incorporated. All rights reserved.

Preface Genetic diversity is a level of biodiversity that refers to the total number of genetic characteristics in the genetic makeup of a species. It is distinguished from genetic variability, which describes the tendency of genetic characteristics to vary. Research has found that genetic diversity and biodiversity are dependent upon each other, that diversity within a species is necessary to maintain diversity among species, and vice versa. If any one type is removed from the system, the cycle can break down, and the community may become dominated by a single species. Thus, genetic diversity plays a huge role in survival and adaptability of a species. This book provides research on genetic diversity in plant, animal and human species. Relationships to environment changes and global warming are also studied. Chapter 1 - Algorithms of nucleotide diversity estimates and other measures of genetic divergence for the two genes Cyt-b (cytochrome b) and Co-1 (cytochrome oxidase 1) are analyzed. Based on the theory and algorithms of distance estimates on DNA sequences, as well as on the observed distance values retrieved from literature, it is recommended for realistic tree building to use a specific nucleotide substitution model from at least 56 available from Modeltest 3.7 or other software depending on the specific set of nucleotide sequences. Using a database of p-distances and similar measures gathered from published sources and GenBank (http://www.ncbi.nlm.nih.gov) sequences, genetic divergence of populations (1) and taxa of different rank, such as subspecies, semispecies or/and sibling species (2), species within a genus (3), species from different genera within a family (4), and species from separate families within an order (5) have been compared. Empirical data for 18,192 vertebrate and invertebrate species demonstrate that the data series are realistic and interpretable when p-distance and its various derivates are used. The focus was on vertebrates and fish species in particular, and the newest dataset obtained in the framework of FishBOL (http://www.fishbol.org). Distance data revealed various and increasing levels of genetic divergence of the sequences of the two genes Cyt-b and Co-1 in the five groups compared. Mean unweighted scores of p-distances for five groups are: Cyt-b (1) 1.46±0.34, (2) 5.35±0.95, (3) 10.46±0.96, (4) 17.99±1.33 (5) 26.36±3.88 and Co-1 (1) 0.72±0.16, (2) 3.78±1.18, (3) 10.87±0.66, (4) 15.00±0.90, (5) 19.97±0.80. The estimates show good correspondence with former analyses. This testifies to the applicability of pdistance for most intraspecies and interspecies comparisons of genetic divergence up to the

Copyright © 2009. Nova Science Publishers, Incorporated. All rights reserved.

viii

Conner L. Mahoney and Douglas A. Springer

order level for the two genes compared. As seen from the numbers above, and from a regression analysis, there is no a sign of saturation, usually expected from a homoplasy effect. Differences in divergence between the genes themselves at the five hierarchical levels were also found. This conforms to the ample evidence showing different and nonuniform evolution rates of these and other genes and their various regions. The results of the analysis of the nucleotide as well as allozyme divergence within species and higher taxa of animals are, firstly, in a good agreement with previous results and showed the stability of a general trend, and, secondly, suggest that in animals, phyletic evolution is likely to prevail at the molecular level, and speciation mainly corresponds to the geographic model (type D1). The prevalence of the D1 speciation mode does not mean that other modes are absent. There are at least seven possible modes of speciation. How we can recognize them formally with operational genetic criteria is a key question for establishing a quantifiable genetic model (theory) of speciation. An approach is suggested that allows a step forward in this direction. Research was supported by the Russian Foundation for Fundamental Research grants #07-04-00186, #08-04-91200 and the Far Eastern Branch of the Russian Academy of Sciences (RAS) grant #08-3B-06-031, RAS Board Programs, grant #09-1P23-06. Chapter 2 - The genus Citrus includes some of the most important crop plants in the world although its taxonomy remains one of the most controversial among angiosperms. Most species are of hybrid origin and some of them may include germplasm from other genera. Cytologically, Citrus species are characterized by a stable chromosome number and a highly variable pattern of heterochromatic bands. Most accessions display heteromorphic chromosome pairs, suggesting that they were originated from cross hybridization. On the other hand, citron (C. medica), pummelo (C. maxima), a few mandarin accessions, and most wild Citrus species and related genera exhibit chromosome pairs that are homomorphic for similar heterochromatic bands. Based on these findings, hybrids and non-hybrid accessions were identified and the possible origin and relationship among most accessions were reconsidered. Chapter 3 - Tuberculosis remains an important public health issue for Bulgaria, a Balkan country located in the world region with contrasting epidemiological situation for tuberculosis. Here, we present results of the recent studies on the genetic diversity of Mycobacterium tuberculosis population in Bulgaria that was evaluated with various DNA fingerprinting methods (spoligotyping, 24-MIRU-VNTR and IS6110-RFLP typing). The spoligotype-based population structure of M. tuberculosis in Bulgaria was shown to be sufficiently heterogeneous. It is dominated by several worldwide distributed spoligotypes ST53 and ST47 and Balkan-specific spoligotypes ST125 and ST41. The Beijing genotype strains were not found in Bulgaria in spite of close links with Russia in the recent and historical past. Comparison with international database SITVIT2 (Pasteur Institute of Guadeloupe) showed that spoligotype ST53 is found in similar and rather high proportion in the neighboring Greece and Turkey and almost equally distributed across different regions of Bulgaria. Contrarily, ST125 is not found elsewhere and is specific for Bulgaria; furthermore it appears to be mainly confined to the southern part of the country. Novel 15/24-loci format of MIRU-VNTR typing was found to be the most discriminatory tool compared to spoligotyping and IS6110-RFLP typing of M. tuberculosis strains in Bulgaria. Furthermore,

Copyright © 2009. Nova Science Publishers, Incorporated. All rights reserved.

Preface

ix

VNTR typing was shown useful for resolving ambiguous phylogeny of some spoligotypes, in particular, those classified as LAM/S by bioinformatics approach. In practical terms, a reduced “Bulgaria-specific” 5-locus set (MIRU40, Mtub04, Mtub21, QUB-11b, QUB-26) provided a sufficiently high differentiation and may be preliminarily recommended for a firstline typing of M. tuberculosis isolates in Bulgaria although further studies are needed to validate this scheme. At the same time, a comprehensive secondary subtyping of the clustered isolates should target all 15 discriminatory loci. We additionally investigated molecular basis of drug resistance of the studied strains. Three types of the rpoB mutations were found in 20 of 27 RIF-resistant isolates; rpoB S531L was the most frequent. Eleven (48%) of 23 INHresistant isolates had katG S315T mutation. inhA -15C>T mutation was detected in one INHresistant isolate (that also had katG315 mutation) and three INH-susceptible isolates. A mutation in embB306 was found in 7 of 11 EMB-resistant isolates. Consequently, rpoB and embB306 mutations may serve for rapid genotypic detection of the majority of the RIF and EMB-resistant strains in Bulgaria; the results on INH resistance are complex and further investigation of more genes is needed. Comparison with spoligotyping and 24-VNTR locus typing data suggested that emergence and spread of drug-resistant and MDR-TB in Bulgaria are not associated with any specific spoligotype or MIRU-VNTR genotype. A local circulation of the particular clones appears to be an important factor to take into consideration in the molecular epidemiological studies of tuberculosis in Bulgaria. Chapter 4 - Switchgrass (Panicum virgatum L.) is a warm-season C4 perennial grass belonging to the family Poaceae. It is native to North America. Persistence across a wide geographical range, in addition to high biomass production with minimum inputs, makes it an excellent choice for a sustainable bioenergy crop. Switchgrass is a highly heterozygous, selfincompatible and out-crossing species. Broad species adaptation, natural selection and photoperiodism have combined to create considerable ecotypic differentiation in switchgrass. The natural population is classified into two distinct cytotypes; upland and lowland. Upland cytotypes are mostly octaploid (2n = 8x = 72) and lowlands are tetraploid (2n = 4x = 36); however, multiple ploidy levels ranging from diploid (2n = 2x = 18) to dodecaploid (2n = 12x = 108) have been reported in switchgrass. In the USA, uplands are adapted to the mid and northern latitudes, while lowlands are in the southern parts of the country. In addition, these ecotypes differ with respect to photosynthesis, drought tolerance and N-use efficiency. Knowledge on the amount of genetic diversity and polymorphism in switchgrass is necessary to enhance the effectiveness of breeding programs and germplasm conservation efforts. In the past two decades, several studies have been conducted to evaluate the genetic variability in switchgrass populations. Molecular markers, such as RFLPs, RAPDs and SSRs, were used to find within and among population variation in a wide range of switchgrass cytotypes. Hybrid cultivars can be an attractive option for improving biomass production. Molecular marker and phenotypic data suggest that lowland and upland genotypes represent different heterotic groups that can potentially be used to produce F1 hybrid cultivars. This review summarizes the current understandings on the genetic diversity available in P. virgatum populations, with a focus on studies performed at the Noble Foundation, where the genetic variability and the relationships within and among switchgrass populations were determined with simple sequence repeat markers and ploidy analysis.

Copyright © 2009. Nova Science Publishers, Incorporated. All rights reserved.

x

Conner L. Mahoney and Douglas A. Springer

Chapter 5 - Fescues and ryegrasses in the Lolium genus are widely used as forage and turf, especially in temperate regions of the world. These highly productive grass species provide feed and fodder for livestock and wild animals, play a major role as turf on golf courses and lawns worldwide, and prevent soil erosion. Among these grasses, tall fescue [Lolium arundinaceum (Schreb.) Darbysh.] germplasm is classified into five botanical varieties that range from tetraploid to decaploid and into two major germplasm pools, “Continental" and “Mediterranean,” as well as into two functional groups, forage and turf types. Important species in the genus Lolium include the outcrossing Lolium perenne L., (perennial ryegrass) and the self-pollinated L. temulentum L. subsp. temulentum (darnel, darnel ryegrass). The majority of the Lolium are self-infertile, have a strong selfincompatibility system and are, therefore, highly heterogeneous. Grazing or selection may lead to loss of rare alleles that may be useful in adaptation in extreme environments, e.g., when these cool-season grasses are grown in warmer, drier areas. Understanding the levels of genetic diversity within and genetic relationships between populations is therefore important for not only breeding, but also for ensuring adaptability and persistence, quality and disease resistance of germplasm accessions, breeding lines and populations. At the Noble Foundation, efforts have been concentrated on collecting tall fescue and L. temulentum germplasm, and the development of molecular tools for these species. Molecular tools developed in-house were employed to study genetic diversity and to understand the utility of various marker tools for diversity studies. In this chapter, we review the genetic diversity work carried out in Lolium, with an emphasis on our work at the Noble Foundation. Various marker systems have been found to be useful in the Lolium genus, with SSRs in particular being transferable across the fescue-ryegrass complex. Chapter 6 - Genetic differentiation of the population of Russia is investigated. The work is based on data about immuno-biochemical and molecular markers polymorphism in about 1,500 populations from 62 ethnoses belonging to six main linguistic families and having different cultural traditions. Genetic diversity is studied by cartographic and statistical methods and is presented in a form of genegeographical maps. The position of the Russian gene pool on the Eurasian background is described. The genetic relief of Russia is investigated, and main structure components are revealed in the gene pool. Analysis of these components from the ethno-historical point of view revealed their connection with different Eurasia regions (West and Central Europe, Central and East Asia). Chapter 7 - Iridaceae is a relatively large family of monocots comprising over 2,030 species in 65-75 genera. Cypella fucata Ravenna is characterized as a perennial herb which presents bulbous and beautiful orange flowers that have ornamental value. The distribution of the species comprises Brazil, in the states of Rio Grande do Sul and Santa Catarina, and Uruguay. This study aims to compare two geographically distinct survey areas of C. fucata using molecular approaches and to offer a contribution to the knowledge of genetic variation of the species. Cypella fucata specimens were collected in the State of Rio Grande do Sul, Brazil, in two sites: the municipalities of Piratini (26 specimens) and Capão do Leão (28 specimens). Survey sites were localized along a road, and were 22 km distant from each other. Specimens were analyzed by ISSR-PCR (Inter Simple Sequence Repeats) since ISSR markers have a high capacity to reveal polymorphism and offer a great potential to determine intra- and interspecific levels of variation. Nine primers were tested, generating 201

Copyright © 2009. Nova Science Publishers, Incorporated. All rights reserved.

Preface

xi

fragments (bands) with sizes ranging from 150 bp to 2,000 bp and an average of 22 bands per primer. A matrix of presence and absence of fragments was constructed and the Jaccard’s coefficient was calculated. A dendrogram based on these values was generated to reveal the genetic structure of both populations. The patterns were highly polymorphic within each collection site, with samples aggregated into two major groups, corresponding to the surveyed populations. In addition, φ ST was calculated and may indicate some interpopulation gene flow (φ ST = 0.0851) and an intermediate structure. The Nei’s genetic distance showed a high identity between the two collection sites analyzed (98%). Since the sampled areas were near each other, our data may suggest that they in fact correspond to two subpopulations derived from a single original one. These data may indicate that C. fucata presents crosspollination and the vegetative propagation does not play an important role in the maintenance of the populations. Specimens from other sites will be analyzed to confirm the mating system. This study is the first contribution to the knowledge of evolutionary aspects of this species. Chapter 8 - Soil microbes that solubilize the insoluble phosphates play a vital role in maintaining soil fertility, plant health and subsequent enhancement of crop yield. Fluorescent pseudomonad group of bacteria are often predominant among bacterial species associated with the plant rhizosphere. Due to their innate capability for plant growth promotion, plant disease suppression and their potential for biodegradation of agricultural chemical pollutants, fluorescent pseudomonads have been a major focus for investigators around the world. In recent years, rich knowledge has been generated on diversity, functional potential of fluorescent pseudomonads. This chapter describes the genetic and functional diversity of fluorescent pseudomonads and their role in phosphate solubilization, biological control and soil fertility. Chapter 9 - The Qinghai-Tibetan Plateau is one of the most important centers of biodiversity for alpine species in the world and is among the areas that are most sensitive to global warming. Knowledge about population genetics is essential for understanding the dispersal ability and evolutionary potential of alpine species in a warming world. In this chapter, we review the genetic diversity and population structure of 19 alpine plant species endemic to the Qinghai-Tibetan Plateau. Generally, the population genetic variation can varygreatly among different species and the endangered species have much lower levels of genetic diversity than the co-occurring common species. Although a few species showed increased levels of genetic diversity along altitude, we dectected no significiant correlation between diversity and altitude in most species. In addition, the isolation-by-distance model cannot explain the spatial genetic structure in most alpine species that have been investigated, which may partially due to the discontinous distribution of alpine species shaped by complex geomorphology in Qinghai-Tibetan Plateau. The implications of these results for the conservation of alpine plants during global warming are discussed. Chapter 10 - Making inference from molecular data on the demographic parameters of complex evolutionary scenarios remains methodologically challenging. The approximate Bayesian computation (ABC) method has the potential to treat such scenarios (Beaumont et al.., 2002). We have developed a user-friendly methodological framework based on ABC that allows one to make inferences from microsatellite data under evolutionary scenarios including any combination of admixture, divergence and (discontinuous) effective population size variation events, and this for any number of populations. We illustrate here the potential

Copyright © 2009. Nova Science Publishers, Incorporated. All rights reserved.

xii

Conner L. Mahoney and Douglas A. Springer

of this methodological framework by making inferences on a complex scenario involving four A. mellifera populations sharing two divergence and two admixture events. Four groups of honey bee populations belonging to two genetic lineages (M and C) and genotyped at eight microsatellite loci have been analysed twice to evaluate estimation stability. In addition, mean relative bias and errors have been computed from 500 data sets simulated with known values of parameters (close to estimates on real data), showing that the order of magnitude of all parameters is correctly estimated. Time estimates of divergences between populations are compatible with previous estimates: -0.6 My for lineages M and C divergence and -0.2 My for French and Italian M lineage divergence. The estimated proportion of lineage M alleles in the subspecies ligustica, amounting to 12%, is intermediate between estimates obtained by two different methods. Furthermore, our ABC analysis allows decomposing the previous estimate of 35% of lineage M alleles in the recently admixed population as 23% from the local mellifera subspecies and 77%×12% (9.2%) from the imported ligustica, making a total of 32.2%. The most unexpected result concerns the time of the admixture of lineages M and C that gave rise to the subspecies ligustica. It is estimated at –2,000 years with an approximate credibility interval of (-1,000, -7,000). Chapter 11 - In the last decade, a near consensus has emerged in supporting single African origin of modern humans. However, the timing of dispersal out of Africa and the routes taken are far from obvious and focus of debate. In the present study, possible dispersal routes taken across Eurasia and finally New World and the Pacific were investigated using craniometric dataset consisting of 34 measurements. The degree of intra-regional variation shows that sub-Saharan Africans are the most diverse and that the diversity of non-Africans is negatively correlated with geographic distance to East Africa. The relationship between regional variation and geographic distance from sub-Saharan Africa tested by linear regression analysis supports a possible dispersal route proposed from the research of mtDNA haplotype variation, the Horn of Africa (the route across the Bab el Mandeb Strait) as a passageway in major human migration out of Africa. The results obtained support, moreover, the multiple migration hypothesis for the peopling of East/Northeast Asian region; mainly from central/western Asia with minor contribution from Southeast Asia. Nonlinear regression (exponential approximation) analysis using geographic distance measured along a hypothetical dispersal route shows that phenotypic similarity between populations decreases as the geographic distance increases. Such findings suggest that geographic distance is a primary and significant determinant of not only genetic but also craniometric variation between major human population groups. The present study illustrates that modern human cranial diversity patterns fit an evolutionary model of neutral expectation and a dispersal model of iterative founder effects with an African origin. Chapter 12 - Intra-specific genetic variation is considered an important factor for evaluating biodiversity; indeed, the higher genetic variation within a species, the higher its surviving ability. The loss of suitable habitats for moss species involves demographic decreases and genetic impoverishment. Mosses, have a short generation time compared to phanerogamic vegetation, particularly trees, and therefore may exhibit all these effects earlier, predicting the destiny of higher plant communities and the ongoing changes in natural landscapes. Indeed, intra-specific genetic variation in moss species may represent an ideal model system for investigating species fitness consequent to natural and man driven

Preface

xiii

Copyright © 2009. Nova Science Publishers, Incorporated. All rights reserved.

environmental changes, both at a local level, and at a large scale. At a local level these studies provide useful information for territory management since they promptly signal local environmental changes; whereas, over a large scale they highlight historical processes which have affected taxon origin, distribution, radiation, in relation to the main geological events. Genetic variation and structure within moss species is influenced by reproductive strategy and dispersal, giving information about gene exchange, occurrence of sexual reproduction, selfing/outcrossing rates. Demographic constraints and especially ongoing demographic fluctuations also concur to shape population genetic diversity and structure, evidencing phenomena such the relative importance of the founder effect, the occurrence of bottleneck and genetic drift. Moss genetic variation may highlight environmental disturbance caused both by natural events and by land use and human pressure. Among disturbances, habitat fragmentation is one of the most studied due to the increasing loss of suitable habitats for moss species. In general, it can be stated that intraspecific genetic variation in mosses reflect environmental gradients, with high amount of variation in natural environment, versus low level of variation in threatened environments. The rapid transformation of the environment into a network of patches due to habitat fragmentation, and the increasing environmental disturbance, lead to a genetic erosion in isolated populations, with consequent increase of extinction risk. Thus, intraspecific genetic variation in mosses appears a suitable tracer of environmental disturbance due to the global ubiquity and the fast generation time of these plants.

Copyright © 2009. Nova Science Publishers, Incorporated. All rights reserved.

In: Genetic Diversity Editors: C. L. Mahoney and D. A. Springer

ISBN 978-1-60741-176-5 © 2009 Nova Science Publishers, Inc.

Chapter 1

Analysis of Sequence Diversity at Two Mitochondrial Genes on Different Taxonomic Levels. Applicability of DNA Based Distance Data in Genetics of Speciation and Phylogenetics Y. Ph. Kartavtsev A. V. Zhirmunsky Institute of Marine Biology of the Far Eastern Branch of the Russian Academy of Sciences, Vladivostok 690041, Russia

Copyright © 2009. Nova Science Publishers, Incorporated. All rights reserved.

Abstract Algorithms of nucleotide diversity estimates and other measures of genetic divergence for the two genes Cyt-b (cytochrome b) and Co-1 (cytochrome oxidase 1) are analyzed. Based on the theory and algorithms of distance estimates on DNA sequences, as well as on the observed distance values retrieved from literature, it is recommended for realistic tree building to use a specific nucleotide substitution model from at least 56 available from Modeltest 3.7 or other software depending on the specific set of nucleotide sequences. Using a database of p-distances and similar measures gathered from published sources and GenBank (http://www.ncbi.nlm.nih.gov) sequences, genetic divergence of populations (1) and taxa of different rank, such as subspecies, semispecies or/and sibling species (2), species within a genus (3), species from different genera within a family (4), and species from separate families within an order (5) have been compared. Empirical data for 18,192 vertebrate and invertebrate species demonstrate that the data series are realistic and interpretable when p-distance and its various derivates are used. The focus was on vertebrates and fish species in particular, and the newest dataset obtained in the framework of FishBOL (http://www.fishbol.org). Distance data revealed various and increasing levels of genetic divergence of the sequences of the two genes Cyt-b and Co-1 in the five groups compared. Mean unweighted scores of p-distances for

2

Y. Ph. Kartavtsev five groups are: Cyt-b (1) 1.46±0.34, (2) 5.35±0.95, (3) 10.46±0.96, (4) 17.99±1.33 (5) 26.36±3.88 and Co-1 (1) 0.72±0.16, (2) 3.78±1.18, (3) 10.87±0.66, (4) 15.00±0.90, (5) 19.97±0.80. The estimates show good correspondence with former analyses. This testifies to the applicability of p-distance for most intraspecies and interspecies comparisons of genetic divergence up to the order level for the two genes compared. As seen from the numbers above, and from a regression analysis, there is no a sign of saturation, usually expected from a homoplasy effect. Differences in divergence between the genes themselves at the five hierarchical levels were also found. This conforms to the ample evidence showing different and nonuniform evolution rates of these and other genes and their various regions. The results of the analysis of the nucleotide as well as allozyme divergence within species and higher taxa of animals are, firstly, in a good agreement with previous results and showed the stability of a general trend, and, secondly, suggest that in animals, phyletic evolution is likely to prevail at the molecular level, and speciation mainly corresponds to the geographic model (type D1). The prevalence of the D1 speciation mode does not mean that other modes are absent. There are at least seven possible modes of speciation. How we can recognize them formally with operational genetic criteria is a key question for establishing a quantifiable genetic model (theory) of speciation. An approach is suggested that allows a step forward in this direction. Research was supported by the Russian Foundation for Fundamental Research grants #07-04-00186, #08-04-91200 and the Far Eastern Branch of the Russian Academy of Sciences (RAS) grant #08-3B-06-031, RAS Board Programs, grant #091P23-06.

Copyright © 2009. Nova Science Publishers, Incorporated. All rights reserved.

Introduction Currently in the field of molecular phylogenetics there is a hum of activity that stimulated by such sources as the Tree of Life Project (http://tolweb.org/tree/ ), CBOL (Consortium for Barcoding of Life; http://www.barcoding.si.edu/ ) and FishBOL (http://www.fishbol.org ) global initiatives. Data bases have increased in asymptotically and many newcomers are researchers in the field. However, analysis of genetic variation and tree building is not a routine task even for those with experience. Many experimental papers, reviews, monographs and software (e.g. Kumar et al., 1993; Avise, 1994; Li, 1997; Avise, Wollenberg, 1997; Johns, Avise, 1998; Posada, Grandal, 1998; Nei, Kumar, 2000; Swofford et al., 1996; 2000; Hall, 2001; Hebert et al., 2002a; Felsenstein, 2004; etc.) are available for consulting. Still, not much attention is paid to general recommendations in the field for such newcomers and experts from other disciplines of genetics and evolutionary biology, which is quite important for general biology and general genetics. Specifically, such a review is required for molecular taxonomic differentiation and the genetics of speciation. Thus, investigation of the molecular divergence of organisms over time must take into account basic genetic properties of the organisms and their groups, forming in nature such reproduction units as populations and biological species. It seems logical to combine the issues of population genetics and molecular evolution to avoid a contradiction between the Biological Species Concept (BSC) and Phylogenetic Species Concept (PSC), a contradiction that is more apparent then real (Avise, Wollenberg, 1997).

Copyright © 2009. Nova Science Publishers, Incorporated. All rights reserved.

Analysis of Sequence Diversity at Two Mitochondrial Genes…

3

Temporal population genetic dynamics cannot be separated from spatial population dynamics and understanding of the bases of intraspecies genetic differentiation. Misled by the vast possibilities of phylogenetic reconstructions inferred from the primary DNA sequences, some authors even reject the analysis of spatial divergence at all, opposing the PSC to the BSC (Cracraft, 1983, DeQuieros, Donoghue, 1988). Fortunately, many geneticists are far from such extreme views, understanding the common nature of many intraspecies and interspecies divergence mechanisms (Altukhov, 1983; Ayala, 1984; Nei, 1987; Avise, Wollenberg, 1997; Avise, 2001). These, as well as some other issues, are considered in the current review, which is intended for geneticists of different specialties. Kartavtsev and Lee (2006) considered this three years ago, but since then many new data have appeared. This review will both update former information and look at a somewhat different angle on molecular phylogenetics and genetics of speciation. Further, the English edition of the former review as was not satisfactorily translated in some places, and the current review attempts to improve this area. Since the author is a marine biologist, many examples are from the literature in this field. Here, I have mainly summarized and analyzed the evidence on the proportion of nucleotide substitutions in populations within species and in taxa of different ranks, although there have been earlier generalizations on similar topics (Avise, 1994; Li, 1997; Powell, 1997, Johns, Avise, 1998; Graur, Li, 1999). I was motivated by two reasons: first, the rapid increase in information amount in this field and, second, estimates for different genes had not been compared, except for Kartavtsev and Lee (2006). In the last decade, mitochondrial genes for cytochrome b (Cyt-b) and cytochrome oxidase 1 (Co-1) have been most frequently used for taxonomic and phylogenetic analysis at the species-and-family level. These genes proved to be useful for estimating divergence in taxa up to the family level in many animal groups (Johns, Avise, 1998; Graur, Li, 1999; Hebert et al., 2002 a, b; Greer et al., 2003; Sazaki et al., 2007, Kartavtsev, Hanzawa, 2007). A survey of the evidence on intraspecies divergence of mitochondrial genes in 256 vertebrate, mostly sexually reproducing species, indicated that 56% of them form distinct intraspecies maternal lines, which typically are confined geographically (Avise, Walker, 1999). Thus, the polytypic species or subdivision into groups of most species is documented by sources that are independent from other ecological or demographic data and in good agreement with the latter. In the present paper I do not consider problems related to the construction and analysis of phylogenetic trees and related phylogenetic issues. This is a specific topic discussed elsewhere (Li, Zarkhih, 1995; Swofford et al, 1996; Avise, 2000; Nei, Kumar, 2000; Hall, 2001; Sanderson, Shaffer, 2002; Felsenstein, 2004). The main objective of this study is considering the levels of nucleotide diversity in animal populations and taxa of four various ranks. For convenience, I will refer to these categories as comparison groups. In connection with the main objective, the aims of the review are as follows: (1) comparing statistical algorithms for analysis of molecular variation and evolution; (2) comparing estimates of nucleotide divergence or proportion of nucleotide substitutions in sampled pair of sequences (p-distance); and (3) briefly summarizing the views on the species in genetic terms and showing whether and how molecular genetic variability and divergence are related to speciation.

4

Y. Ph. Kartavtsev

Copyright © 2009. Nova Science Publishers, Incorporated. All rights reserved.

1. Material and Methods The primary nucleotide sequences of genes (further sequences for shortage) are the focus of this paper. The conclusions are mainly based on information from a database on pdistances of two genes, Cyt-b and Co-1, presented in the table (see Appendix). A considerable part of genetic distances in this table was obtained primarily from sequences or taken directly or indirectly as author’s estimates both for Cyt-b (Johns, Avise, 1998) and Co1 (Hebert et al., 2002 a, b). Most sequences were retrieved by the authors of the cited works from GenBank (Release 103.0, 131). For Cyt-b, 2821 gene sequences were examined and for Co-1, 655 and 13320 sequences. Sequence length varied for Cyt-b from 200 bp (Johns, Avise, 1998) up to a nearly complete 1200 bp (Hardman, 2004; Kartavtsev et al., 2007 a, b, etc.) and for Co-1 varied in different group comparisons from 619 to 669 bp (Hebert et al., 2002 a, b; Ward et al., 2005; Kartavtsev et al., 2008; 2009a, b, etc.). In each group compared, the p-distance or its derivate was estimated (see section 2.1). My analysis consisted in computing and comparing the mean values from which the database was formed, which included much data from other sources (see table in the Appendix for references). The information was retrieved from the literature sources by means of the following three methods. (1) If the distance matrices were available, the arithmetic means were calculated directly, using each of the pairs once relative to the other units of comparison: e.g., 1-2 and 13, but not 2-3 of the three possible pairs. This principle, which permits avoiding restriction of random choice, imposed by the matrix, was also employed earlier (Johns, Avise, 1998). Hebert et al. (2002b) compared all possible pairs of n(n - 1)/2, while in Hebert et al. (2002 a), the comparison principles were different for different taxa compared. I also made all pairwise comparisons in the cases where only a few sequences were available or when a choice within data matrix was complicated. (2) When the distance matrixes were not available, I extrapolated the distances from the scores presented on plots and dendrograms (this can be readily accomplished, using the scales of graphs and dendrograms). (3) In many works, the pdistances between the comparison groups required were directly presented. Note that virtually all values from (Johns, Avise, 1998) were computed from plots. This procedure inevitably entails some approximation. However, in view of very high intragroup (intrataxon) distance variance, these errors were negligible for comparative group analysis. In addition to distances, some other measures on DNA marker variability were examined. The literature data were screened using the Thompson Institute of Scientific Information, Science Citation Index, SCI data base, and other sources. Articles of 1995 through 2008 were examined. Our work also included analysis and obtaining analytical expressions for the statistics used. Since this part of the work is indirectly related to examination of observed data on molecular variation, it is only briefly outlined here (section 2.1). Statistical analysis was performed using the STATISTICA (1994) software package. From this package, we employed the basic module for calculating mean and variance parameters, as well as those for parametric analysis of variance (ANOVA, and multi-dimensional version, MANOVA) and Kruskall-Wallis nonparametric ANOVA.

Analysis of Sequence Diversity at Two Mitochondrial Genes…

5

2. Intraspecies and Interspecies DNA Variation

Copyright © 2009. Nova Science Publishers, Incorporated. All rights reserved.

The BSC implies that a species is an isolated reproductive unity. Molecular data, especially pertaining to mitochondrial DNA (mtDNA) show that, on the one hand, natural hybridization between species may leads to introgression of genes from one gene pool to the other one. On the other hand, sequences of individual genes exemplified that the variability of DNA markers increases with the rank of the taxon (Johns, Avise, 1998; Hebert et al., 2002a; Ward et al., 2005). Hence, I believe it is expedient to compare the data on nucleotide divergence for several genes, from several data sources, and, in addition, to substantiate both the variability and distance parameters. The latter is important for understanding the essence of estimating divergence at the DNA level and its connection to speciation. Some complications as mentioned in the Introduction above are available to obscure real DNA variability. There may be other hidden factors. In particular, various genes may encode different functional properties of phenotype (macromolecules firstly). To compare such genes we have to know some of these properties. For example protein coding genes have biased proportions of pyrimidines (T, C) and purines (A, G). Bias in the ratio of (T+C) : (A+T) is well described in the literature for some protein-coding genes (e.g. Kim et al. 2004). However, it is frequently stated without statistical substantiation. The presented analysis for Cyt-b in flatfish (Figure 1) validated this for Cyt-b on a firm statistical basis and also emphasized that taxonomic differences are discernible (Figure 1; Kartavtsev et al., 2007 b).

Figure 1. Plot of the average proportion of four nucleotides at Cyt-b gene in flatfish species, order Pleuronectiformes (1-2) in comparison with representatives of Perciformes (3) (From Kartavtsev et al., 2007b). Cyt-b gene nucleotide content presented for all three nucleotide positions. Results of one-factor MANOVA are given. Groups 1 to 3: 1, Species from study by Kartavtsev et al. (2007b); 2, Species taken from GenBank; 3, GenBank data on Perciformes. T+C : A+G ratio significantly deviate from 1:1. Significance of the impact for this factor and comparison groups is given on the top.

Y. Ph. Kartavtsev

6

The same bias was substantiated for Cyt-b in catfish (Kartavtsev et al., 2007 a) and for Co-1 in flatfish and two other taxa (Kartavtsev et al., 2008 a, b, c). It is believed that the nucleotide bias reflects the hydrophobic properties of the encoded proteins (Naylor et al. 1996). However, the taxonomic differences, if observed (Figure 1), are more relevant to taxa evolution and reflected their separate divergence. Thus, in such cases distance estimates may have an unexpected impact. 2.1. Polymorphism of DNA Sequences. Nucleotide Diversity Understanding DNA sequence polymorphism as a result of nucleotide substitution is of primary interest for molecular phylogenetics. Amino acid sequence substitution rate is also an important to estimate but this is out of the scope of this paper. If the nucleotide sequence for a particular set of loci or alleles in a population sample is known, DNA polymorphism can be assessed in a several ways. The best measures of DNA sequence divergence are nucleotide diversity as a per site measure, π (Nei, 1987) and the proportion of different nucleotide cites at a pair of randomly choose sequences, p-distance as P or its estimate p (Nei, Kumar, 2000, p. 33). π = Σij хiхj πij ,

(2.1)

where хi, хj is the population frequency of the ith and jth types of DNA sequences, and πij is the proportion of different nucleotides between the ith and jth types of DNA sequences. In a panmictic population, π is usually referred to as heterozygosity at a nucleotide level. Its estimates can be found either by π^ = [n/(n-1)] Σij хi^хj^ πij

Copyright © 2009. Nova Science Publishers, Incorporated. All rights reserved.

or by

π^ = Σi 0.20; mean = 11.38±0.91, sample size, n = 85, standard deviation, SD = 8.43, Skewness = 0.80±0.26, Kurtosis = 0.33±0.35; Co-1, K-S d = 0.1076, P < 0.20; mean = 9.37±0.68, n = 103, SD = 6.85, Skewness = 0.26±0.24, Kurtosis

Copyright © 2009. Nova Science Publishers, Incorporated. All rights reserved.

18

Y. Ph. Kartavtsev

= 0.77±0.47. For both genes: K-S d = 0.0915, P < 0.10; mean = 10.28±0.56, n = 188, SD = 7.65, Skewness = 0.66±0.18, Kurtosis = 0.33±0.35. A one-way ANOVA (model with random effects for groups of the same size) showed that mean distances in the five groups analyzed were significantly different for the two genes: Cyt-b, F = 32.50, d.f. = 4, 80; P < 0.0001; Co-1, F = 60.81, d.f. = 4, 98; P < 0.0001. Accordingly, pooling of the data in a two-way MANOVA (see scheme below) for the two genes produced a statistically significant increase in the p-distances in the hierarchy of the comparison groups: F = 84.90, d.f. = 4, 178, P < 0.000001 (Figure 3, top). Interaction of factors in this data set is not statistically significant: F = 1.966, d.f. = 4, 178, P = 0.1017. However, this pooling is not quite correct for all of the DNA sequences compared, because it includes heterogeneous groups of different size. Consequently, categorized representation of mean values with weighting an individual score on a sample size (n) for each gene is more correct (Figure 3, bottom). However, both approaches showed that the distance for two genes increases with the rank. Mean unweighted distances for the five groups were as follows: Cytb (1) 1.46±0.34, (2) 5.35±0.95, (3) 10.46±0.96, (4) 17.99±1.33 (5) 26.36±3.88 and Co-1 (1) 0.72±0.16, (2) 3.78±1.18, (3) 10.87±0.66, (4) 15.00±0.90, (5) 19.97±0.80 (Appendix). Taking in account variation in sample size (n) for each i-th distance measure in comparison groups (Appendix), we performed a two-way MANOVA with p-distances weighted by n (factor 1, comparison groups: 1, populations within species; 2, sibling species, etc.; 3, morphologically distinct species within genera; 4, genera within a family, and 5, families within an order; factor 2, genes: Cyt-b and Co-1; also, a model with random effect of factors was applied) (Figure 3, bottom). In this MANOVA, the effect of factor 1 (i.e., comparison group) was significant F = 4715.42, d.f. = 4, 18295; P < 0.000001. The effect of factor 2 (mean p-distance differences for two genes) proved to be significant: F = 15.40, d.f. = 1, 18295; P = 0.00009. The interaction between factors 1 and 2 was significant too: F = 268.63, d.f. = 4, 18295; P < 0.000001. The categorized graph of the distribution of mean weighted p-distance values supported the earlier conclusion on the increase of distances with the rank of the groups compared. Fit of the bivariate distribution (the taxa rank, “taxa” against the distance score, “distance”) showed that there is accordance with the linear regression model, although factorial impact is moderate, 44-72%: for Cyt-b, taxa = 1.7911+0.1021*distance (t = 96.10, d.f. = 3107, P < 0.0001, R2 = 0.7247; rp = 0.77); for Co1, taxa = 1.4332+0.1471*distance (t = 194.14, d.f. = 15194, P < 0.0001, R2 = 0.4383; rp = 0.85). Thus, it is possible to conclude that there is little, if any, impact of saturation at both genes up to the order level in our data set. The lower graph in Figure 3 clearly shows the meaning of the factor interaction: the p-distance values or its derivates of these two genes differ among some of five comparison groups; i.e., the substitution rates are different for Cytb and Co-1 at least in some of the groups of animal taxa compared. This conclusion, on an extended data set, validates the same conclusion made earlier for these two genes (Kartavtsev, Lee, 2006). The data presented in Figures 2 and 3 demonstrate that both genes show a trend of increasing mean p-distances with increasing rank of the groups compared, from populations to orders. Because of the importance of this conclusion, the data presented in Figures 2 and 3 were additionally tested using nonparametric Kruskall – Wallis ANOVA. In this case, unweighted scores were used to have more conservative estimation. For gene Cyt-b, H =

Copyright © 2009. Nova Science Publishers, Incorporated. All rights reserved.

Analysis of Sequence Diversity at Two Mitochondrial Genes…

19

57.01, d.f. = 4, n = 85, P = 0.0001. For gene Co-1, H = 74.05, d.f. = 4, n = 103, P = 0.0001. Thus, the comparative analysis of the data for nucleotide sequences of genes Cyt-b and Co-1, performed for groups with increasing the rank for each of the genes separately, demonstrates (with a probability of error P < 0.0001) that in animals, genetic divergence increases with the taxon rank. Heterogeneity of gene evolution rate, also significant in our data for the two genes (Figure 3), is widely known in literature (e.g. Li, 1997; Machordom, Macpherson, 2004), which was noted previously. Let us take one more look at the detected differences to explain the essence of them. The differences in p-distance estimates between the two genes can have the following interpretations. Firstly, the substitution rate may in fact be different in the two genes but hidden somehow. For instance, the data on taxonomic groups from the most representative sources (Johns, Avise, 1998; Hebert et al., 2002 a, b), which can differ in divergence level, may be differently represented in our database. Actually, heterogeneity of K2P values at Cytb gene was found for the vertebrate groups examined: amphibians and reptiles have the highest, and birds, the lowest variability (Johns, Avise, 1998). Significant heterogeneity of the nucleotide diversity was obtained for Co-1 among flatfish genera (Figure 4). Interspecies heterogeneity of nucleotide diversity estimates at Cyt-b can be found even within a single fish genus (Garcia-Machado et al., 2004). Secondly, in the two most representative works on Co-1, several different measures were used (Hebert et al., 2002 a, b). In addition, instead of K2P and other similar measures (expected distance), non-corrected pdistance (observed distance) was employed in many studies. In general, a shortcoming of analysis of such data array is high biological heterogeneity of the material and presence of some unknown or not identifiable components of the estimates (some of them were mentioned above). For instance, p-distances and other distance measures can differentially represent one and the same group of comparison. However, non-weighted p-distances in the most numerous comparison group (morphologically distinct species within a genus) did not statistically significantly differ between two groups, (1) p-distance and (2) other distance estimates (K2P, GTR, TrN, etc.). The results of ANOVA based on data table (see Appendix) were as follows: Cyt-b, F = 0.18; d.f. = 1, 30; P < 0.6707; Co-1, F = 0.52; d.f. = 1, 41; P < 0.8197. For both genes: F = 0.28; d.f. = 1, 73; P = 0.5981. However, as we remember, unmodified p-distance must undergo homoplasy faster, i.e., be smaller than the expected values of K2P, GTR, TrN, etc. (see section 2.1). The differences between these groups are also non-significant, when n is used as covariance in ANOVA of the distance scores. However, the differences between the groups are significant, if the distance scores are weighted by n: Cyt-b, F = 231.38; d.f. = 1, 943; P < 0.01; Co-1, F = 207.60; d.f. = 1, 13888; P < 0.01. The latter differences apparently are caused by unequal representation of taxa in compared groups and also their different numeric representation. This effect is still obscure; e.g. there was no correlation detected between the distance score and n: Spearman’s correlation coefficient, rs = 0.1125, P = 0.1252. For the Cyt-b gene, all groups consist almost exclusively of vertebrates, which may on average have differed in p-distances compared to invertebrates that were mostly tested on Co-1 (see Appendix). For the Co-1 gene, 50% of group number 3 constitutes the variable insects (see Appendix, Lepidoptera, Arthropoda). Note, however, that different directions of differences of the mean p-distances at two genes in

Y. Ph. Kartavtsev

20

these two distance groups compared are available. So, this source of variation should not mean much.

0,2 0,18 0,16

p-Distance

0,14 0,12 0,1 0,08 0,06 0,04 0,02 0 0

1

2

3

4

Genera

Figure 4. Plot of p-distances for Co-1 gene sequence data within flatfish genera (After Kartavtsev et al., 2008a). On the x-axis is 3 flatfish groups: 1, Pseudopleuronectes, 2, Verasper + Hippoglossoides, 3, Cynoglossus. On the y-axis is p-distance scores for intragroup comparisons.

3. Biological Species: Genetic Variability, Divergence and Introduction of an Operational Criterium for Delemiting a Speciation Mode in Genetic Terms

Copyright © 2009. Nova Science Publishers, Incorporated. All rights reserved.

In this part of the review, we briefly present a concept of the species (section 3.1), compare molecular genetic and biochemical genetics data (section 3.2), and draw conclusions from this evidence (section 3.3).

3.1. Species Examined Let us clarify, what is usually considered as species in most studies. According to the BSC, the definition of the species is as follows. “A species is a biological group, consisting of one or several crossbreeding individuals that are reproductively isolated from other such groups, are stable in nature, and occupy a particular area”. This is the definition by the author of the present article and made formerly (Kartavtsev, 2005), but it is very close to the one given earlier in the monograph by Timofeev-Ressovsky, Vorontsov, and Yablokov (1977). In principle, this is a definition typical for the BSC. For instance, one of the BSC definitions is formulated by Mayr as

Analysis of Sequence Diversity at Two Mitochondrial Genes…

21

follows: "A species is a reproductive community of individuals (reproductively isolated from others), occupying in nature a certain habitat" (Mayr, 1982, p. 273). In the text below, we will take this definition as a basis for discussing the BSC (which is largely limited to higher bisexual organisms) (Mayr, 1963; Timofeev-Ressovsky et al., 1977; Templeton, 1998). As the concept of the BSC is closest to population genetic theory, so, it seems convenient to use it as the basis of the discussion, despite the above limitation. Several other species concepts, with their attendant advantages and restrictions, have been critically analyzed (Krasilov, 1977; King, 1993; Altukhov, 1997; Templeton, 1998). Conceptual analysis of BSC and its contraposition to the typological species concept were provided by Altukhov (1974; 1983; 1997). Most authors, in spite of criticisms, accept the BSC as the main modern paradigm. We confine ourselves to listing the existing concepts of the species: (1) Linnaean species, (2) BSC; (3) BSC modified by Mayr (1963); (4) BSC, modification II (Mayr, 1982); (5) concept of species recognition (Paterson, 1978; 1985); (6) concept of species cohesion (Templeton, 1998); (7) evolutionary concept of the species; (8) Simpson's evolutionary concept of the species (Simpson 1961); (9) Wiley's evolutionary concept of the species (Wiley, 1978); (10) ecological concept of the species (Van Valen, 1976); (11) phylogenetic concept of the species (Cracraft, 1983), and others (see, e.g., Howard, 1998; DeQuieros, 1998).

Copyright © 2009. Nova Science Publishers, Incorporated. All rights reserved.

3.2. Brief Analysis of Biochemical Genetics Data and Their Comparison to Nucleotide Divergence Let us briefly consider the evidence on variability of structural protein-coding genes as assessed by electrophoretic allozyme analysis. Although they are now not very popular, they give quite a representative view of variability for nuclear genes. Mean heterozygosity per individual (locus) has been recognized as the best measure of variability (Lewontin, 1978; Zhivotovsky, 1983; Nei, 1987). Many statistics have been used to measure taxon divergence during evolution (Nei, 1975; Zhivotovsky, 1983; Pasekov, 1983), but the most popular among them is standard Nei's distance, Dn and the inverse measure, normalized similarity I (Nei, 1972). To assess differentiation at the intraspecies level, minimal distance and standardized variance of allele frequencies are more convenient (these data are not considered in this review as being analyzed previously; e.g. Altukhov, 1983; 1989; 1999; DeWoody, Avise, 2000). Examination of genetic diversity of natural species requires analysis of heterozygosity (diversity) and distances (differences), assessing different aspects of variability, which is not always taken into account. Heterozygosity (and its equivalent, nucleotide diversity) estimates are weighted variabilities of individuals in a population (species), while distance/similarity measures are the pairwise differences between populations (species) at marker genes or molecular sequences. Note, however, that p-distance and π can be used both as measures of variability and measures of distance. Comparing individuals from one or several populations of a species, one can estimates intraspecies diversity (heterozygosity), while comparison of individuals of different species provides an estimate of their divergence (distance). Brief results of comparing H and I. Mean heterozygosity per individual H widely varies in plant and animal taxa. The total mean H = 0.076; in vertebrates, H = 0.054, for

Copyright © 2009. Nova Science Publishers, Incorporated. All rights reserved.

22

Y. Ph. Kartavtsev

invertebrates, H = 0.100 (Nevo et al., 1984). A number of other surveys give similar data (Aronshtam et al., 1977; Hedgecock, Nelson, 1981; Nei, Koehn, 1983; Hedgecock, 1986; Ward et al., 1992; Kartavtsev, 2005, etc.). The H value underestimates the actual genetic diversity by approximately one-third, owing to technical restrictions of protein electrophoresis, which is commonly used to estimate variability at that level (Nei, 1975; 1987; Altukhov, 1989; 1999, and others). Coefficients of genetic distance or similarity at enzyme loci show in comparable scale genetic divergence in taxa of various ranks, from subspecies to families (Nei, 1975; 1987; Lewontin, 1978). Comparison of higher-rank taxa at this level is hindered by high probability of synonymous substitutions increasing nonlinearity of genetic similarity (distance) and divergence time (Lewontin, 1978; Nei, 1987). Coefficients of intraspecies genetic similarity of difference were estimated in many groups of animals. The mean genetic difference at this level is I = 0.95 (Lewontin, 1978; Nei, 1987; Altukhov, 1983; 1989; Kartavtsev, 2005). According to our database, which comprises more than 300 populations of 80 animal species, I = 0.94±0.01 (Kartavtsev, Lee, 2006). In the hierarchy of animal taxa, subspecies have coefficients of similarity I ranging from 0.6 to 1.0, with a mode of approximately 0.9; the variation range is I = 0.5 – 1.0 (mode about 0.8) for semispecies and sibling species; the variation range is 0.5 – 1.0 (mode about 0.7) for species within a genus and 0.0 – 1.0 (mode 0.4) for genera within a family (Avise, Aquadro, 1982; Nei, 1987; Thorpe, 1983; Kartavtsev, 2005; Kartavtsev, Lee, 2006). This means that genetic similarity significantly decreases with increasing rank of the taxon and conversely, distance increases with increasing taxon rank (Kartavtsev, 2005; Kartavtsev, Lee, 2006). Thus, the current molecular genetic evidence (section 2.2) and the results of analysis of protein marker genes support, first, the basic BSC idea that taxon formation necessarily requires isolation of gene pools and, second, that the geographic (divergent) speciation mode prevails in nature, implying gradual accumulation of small genetic differences after separation of gene pools. Yet, there are facts warning against simplified conclusions on modes of speciation. For instance, it has long been known that the genetic "weight" of the species, say, on the Dn scale, may be different for different animal taxa. For example, Dn is on average 1.1 in amphibians, which is an order of magnitude higher than the corresponding value in birds (Dn = 0.1) (Avise, Aquadro, 1982). Other examples of this trend can be found (Avise, 1994). The range of nucleotide diversity also shows that some animal taxa display a high divergence level among the species, while others are characterized by a very low value of this measure. As already noted above, avian taxa are substantially less differentiated at Cyt-b than amphibians and reptiles (Johns, Avise, 1998; see also table in Appendix). For three main geographic phyletic groups of Orizias latipes, the nucleotide diversity of Cyt-b was found to be comparable to the within-genus divergence: p = 11.3 – 11.8% (Takehana et al., 2003). For the other gene, Co-1, the species within the genus Cnidaria have p = 1%, while in crustaceans p = 15.4% (Hebert et al., 2002b). The difference for flatfish genera at Co-1 sequences was show above (see Figure 4). Data in the table presented (Appendix) allow assessments of heterogeneity among animal taxa at the level of genus for both Cyt-b and Co1. One-way parametric ANOVA and K-W ANOVA proved this conclusion: for the Cyt-b, F = 265.08, d.f. =3, 10654, P < 0.01; K-W H = 10.87, d.f. = 3, n =32, P = 0.01, and for Co-1, F = 196.91, d.f. =3, 13886, P < 0.01; K-W H = 12.11, d.f. = 3, n =43, P = 0.007.

Copyright © 2009. Nova Science Publishers, Incorporated. All rights reserved.

Analysis of Sequence Diversity at Two Mitochondrial Genes…

23

Some studies show that the concept of natural selection is necessary to explain joint variation of H and environmental variability (Nevo et al., 1984), an association of individual heterozygosity at enzyme genes (Ho) with physiological, morphological, and other components of phenotypic variation in population – environment gradient (Nei, Koehn, 1983; Aronshtam et al., 1977; Koehn, 1978; Zouros et al., 1980; Ayala, 1981; Hedgecock, Nelson, 1981; Avise, Aquadro, 1982; Koehn et al., 1983; 1988; Hedgecock, 1986; Zouros, 1987; Zouros, Foltz, 1987; Powers, 1987; Altukhov, 1989; 1999; Kartavtsev, 1992; Takehana et al., 2003; Kartavtsev, Svinyna, 2003). The data on genetic similarity may be interpreted in the same way. For instance, frequencies of genetic similarity coefficients for enzyme loci, estimated for various species, follow a U-shaped distribution, whereas neutrality implies a reverse association with the expected differentiation (Ayala, 1975), i.e., a distribution close to normal. Nevertheless, nearly normal distribution of coefficients of similarity has been found for some protein loci, e.g., duplicated hemoglobin loci of salmonid fishes (Kartavtsev, 2005, Figure 8.3.5). In general, the observed temporal differentiation at many loci is consistent with the neutral process of drift (King, Jukes, 1969; Kimura, 1969; Kimura, 1983; Nei, 1987; Ohta, Gillespie, 1996). On the other hand, as stressed at the end of section 2.1, the role of natural selection in determining molecular diversity of various genes and their different regions has been conclusively demonstrated. Thus, the early expectations of predominantly selective neutrality of variation in DNA sequences and other markers, including mitochondrial DNA markers, have not been supported by observations. The problems of selectivity/neutrality of mtDNA markers were considered in special reviews (Ohta, Gillespie, 1996; Rand, Cann, 1998; Gerber et al., 2001). In particular, it was pointed out that assessments of genotype expression in different nuclear backgrounds in many cases reveal differential fitness, caused by co-evolution. Experimental manipulations also showed that particular haplotypes are selectively advantageous (Gerber et al., 2001). However, generally data are complicated and ambiguous. First, as known since early studies by Mukai and coauthors (1980), it is virtually impossible to experimentally assess weak effects of molecular markers on fitness and second, there is a complex of factors that may disrupt stochastic processes, but these factors are not necessarily adaptive ones. In particular, the gene bank data show that a half of the species pairs examined do not substantially deviate from neutrality expectations, while the other half exhibit a significant excess of amino acid polymorphism in structural genes (Rand, Cann, 1998 ). Gillespie (2001) has offered his view on the ratio of stochastic and selective processes, expressed as the genetic draft model. Some novel ideas on using molecular data for proving the role of natural selection (Plotkin et al., 2004) received strong criticism (Hahn et al., 2005; Nielsen, Hubisz, 2005).

3.3. Applicability of Molecular Evolution Data to Speciation Genetics It is of interest to comprehend whether the obtained evidence is relevant to genetic aspects of speciation. As shown in the previous sections of this review, genetic differences are acquired gradually, in formed isolated populations or their groups. The process of divergence further proceeds to diversify semi-species and sibling species, genera, and so on. The presented data on nucleotide sequences of genes Cyt-b, Co-1, and

Copyright © 2009. Nova Science Publishers, Incorporated. All rights reserved.

24

Y. Ph. Kartavtsev

protein markers conclusively demonstrate that this process is implemented to the order level (see Figures 2 – 3), although other molecular markers present good evidence in favor of phyletic evolution as the main process of divergence, also for higher rank taxa (Nei, 1987; Li, 1997). We cannot cover all aspects of speciation in a short paper. This issue has been addressed to different extents by a number of authors (Templeton, 1981; Ayala, Fitch, 1997; Avise, Walker, 1999). I will present own views on these processes. It is important to emphasize that evolutionary genetics lacks a speciation theory in the strict scientific sense, implying a formal, analytic model and prediction of future events on its basis. In a particular case, such model must predict the formation of a species or at least distinguish different speciation modes on the basis of quantitatively estimated parameters and their empirical estimates. The attempts taken in this direction (Templeton, 1981; 1998; DeQuieros, 1998) do not meet the above requirements. To step in, a scheme and an algorithmic approach have been developed (Kartavtsev, 2000; Kartavtsev et al., 2002; Kartavtsev, 2005) to distinguish speciation modes (models) on the basis of key population genetic parameters and their estimates available in literature. This approach, which I will call it the operation-and-genetic approach for delimiting speciation mode, may lay the foundation for a future theory, a genetic theory of speciation. As a basis for the evolutionary genetic concept of speciation, descriptions by Templeton (1981) were used. As a result, a classification scheme for seven known modes of speciation was developed (Kartavtsev et al., 2002; Kartavtsev, 2005). Here, I present for illustration the main elements of this scheme for types D1 – D3 (divergent speciation) and T1—T4 (transformative or transilience speciation) (Figure 5). This approach leads to a relatively simple experimental scheme, which allows us to (1) organize further investigation of speciation in various groups of organisms, based on a focused genetic approach and (2) obtain analytic expressions (equations) for each of the speciation modes (Figure 6). Using the proposed scheme (Kartavtsev et al., 2002; Kartavtsev, 2005, Figure 7.4.1), one can determine the conditions required for speciation (necessity conditions or necessary conditions) and sufficient for the formation of a species (adequacy conditions or sufficient conditions). Importantly, in addition to the general definition of the sufficient conditions, four (1 – 4) experimentally measured descriptors are introduced (their number can be increased, if necessary) to clarify how, and in which form, these conditions are manifested in a particular case of speciation or in a potential model. For instance, the divergent type of speciation D1 explains classic geographic (or allopatric) speciation (see Figure 5). According to the BSC, this model implies that large populations are isolated (disruption of the gene flow) and evolve separately, accumulating mutations, while reproductive isolating barriers (RIBs) are caused by pleiotropic effects. The longer the time elapsed from the isolation event, the greater the distances between the corresponding taxa. Accordingly, in my notation a descriptor is introduced: (1) DT > DS (where subscripts T and S indicate genetic distances in the putative parental taxon and in conspecific populations or at the higher and lower levels of taxonomic hierarchy in statu nascendi situation).

Analysis of Sequence Diversity at Two Mitochondrial Genes…

25

DIVERGENCE SM D3. HABITAT

D2. CLINAL

D1. ADAPTIVE

Necessary Conditions for Speciation D1. a) Erection of extrinsic Reproductive Isolating Barriers (RIB) followed by gene flow break; b)Pleiotropic origin of RIB in long time

D2. a) Selection on a cline with isolation by distance; b) Pleiotropic origin of RIB

DESCRIPTORS:

D3. a) Selection over multiple habitats with no isolation by distance; b) Origin of RIB by disruptive selection at genes, determining behavior

Sufficient Conditions for Speciation

Lack of efficient hybridization in the zone of contact

1. DT > DS 2. ED = EP 3. HD = HP 4. TM-

1

(S)

Lack of efficient hybridization outside the zone of contact 1. DT > DS 2. ED EP 3. HD = HP 4. TM-

2

(S)

Lack of efficient hybridization inside and outside the zone of contact 1. DT = DS 2. ED EP 3. HD =< HP 4. TM+

3

(S)

D, Genetic distance at structural genes: DT, in suggested parent taxa, DS, among conspecific demes, DD, among subspecies or sibling species; HD, Mean heterozygosity/diversity in suggested daughter population; Hp, Mean heterozygosity/diversity in suggested parent population; EP, Divergence in regulatory genes among suggested parent taxa; ED, Divergence in regulatory genes among suggested daughter taxa; TM+, Test for modification (positive); TM–, Test for modification (negative). RIB, Reproductive Isolation Barriers.

Experimentally measurable features and possible descriptors for the model (theory), (S)

TRANSILIENCE SM T1. GENETIC

T2. CHROMOSOMAL

T3. HYBRIDOGENIC 1

T4. HYBRIDOGENIC 2

Necessary Conditions for Speciation T1. a) Founder event causing a rapid shift in previously stable genetic system; b) RIB origin as byproduct of one or a small number gene substitutions

T2. a) Inbreeding and drift causing fixation of strongly underdominant chromosomal mutatins; b) RIB origin as a cause of hybrid disgenesis

T3. a) Hybridization of incompartible parental species followed by selection for maintenance for hybrid state; b) RIB origin as a cause of hybrid disgenesis

T4. a) Hybridization of incompartible parental species followed by inbreeding and selection for stabilized recombinant; b) RIB origin as a cause of hybrid disgenesis

Sufficient Conditions for Speciation

Copyright © 2009. Nova Science Publishers, Incorporated. All rights reserved.

Lack of efficient hybridization inside and outside the zone of contact 1. DT = DD 2. ED = EP 3. HD HP 4. TM-

1. DT > DD 2. ED EP 3. HD > HP 4. TM-

5

(S)

6

(S)

Lack of efficient hybridization inside and outside the zone of contact 1. DT > DS 2. ED EP 3. HD < HP 4. TM-

7

(S)

Experimentally measurable features and possible descriptors for the model (theory), (S)

Figure 5. Schematic representation of the divergent speciation type (ST), based on population genetic principles (From Kartavtsev, 2005 with modifications). D1–D3, divergent speciation modes; T1–T4, transformative (transilience) speciation modes.

26

Y. Ph. Kartavtsev

Copyright © 2009. Nova Science Publishers, Incorporated. All rights reserved.

Figure 6. Analytic representation of seven speciation mode (From Kartavtsev, 2005 with modifications). D1–D3, divergent speciation modes; T1–T4, transformative (transilience) speciation modes. Descriptors: D, genetic distances for structural gene; DT: in putative parental taxon; DS: among conspecific demes; DD: among subspecies or sibling species; HD: mean heterozygosity/diversity in putative daughter population; HP: mean heterozygosity/diversity in putative parental population; EP: divergence at regulatory genes in putative parental taxon; ED: divergence at regulatory genes in putative daughter taxon; TM+: test for modification (positive); TM–: test for modification (negative).

Likewise, since upon implementation of the D1 mode, no significant genetic diversity differences appear at either structural gene or the regulatory part of the genome (because the initial and derived taxa are large), we introduce parameters (2) HD = HP and (3) ED = EP (differences in heterozygosity/diversity and gene expression between the daughter and the parental taxon are absent). Finally, upon some types of speciation, not only variability and genetic distances, but also some quantitative loci (polygenes) are of importance, which cannot be distinguished at the molecular level, but lead to the RIB formation. Hence, we introduce TM (TM+ vs TM-; an experimental test for modification), which also allows to distinguish between epigenetic variation and taxonomic differences. Do all these data imply that speciation always corresponds to the D1 type? Apparently not. Here is an example supporting this answer. In a Swedish mountain lake, two trout (Salmo trutta) forms were known. It was unclear whether their gene pools were isolated. A genetic examination (Ryman et al., 1979) revealed in these forms two different fixed alleles, which unambiguously proved total reproductive isolation of these sympatric trout forms. The gene pools of these taxa were found to differ by five out of seven polymorphic loci examined (Ryman et al., 1979). There are other examples of bursts of fish evolution, documented by molecular markers (Rutaisire et al., 2004; Duftner et al., 2005). These, as well as other data, for instance from our data base of coefficients of similarity, indicate that sometimes very small differences in structural genes may result in the appearance of RIBs (and thus reproductively isolated biological entities). In the case of the trout mentioned above, the genetic differences between the two forms Dn = 0.02 (Ryman et al., 1979), which corresponds to the level of intraspecies genetic differentiation. There are many other

Copyright © 2009. Nova Science Publishers, Incorporated. All rights reserved.

Analysis of Sequence Diversity at Two Mitochondrial Genes…

27

examples for salmonid fishes (Kartavtsev, 2005), supporting the view that in these fishes, small changes can generate biological species during a short period of time. This evidence also suggests an alternative speciation mode, such as the transformative (T1) or other types (Figure 5-6), though in general, D1 speciation mode prevails in this group. Thus, we can now accept that speciation does not necessarily involve changes in structural genes that can be very small (at the level typical for populations of the species). Conversely, in some cases of speciation we can expect substantial rearrangements of regulatory genes (Wilson, 1976), chromosomal or other reorganizations of the genome. Data on regulatory changes upon speciation are scarce in literature, because exact investigation of regulatory shifts or changes in gene expression is very labor-consuming. Moreover, the classification of genes into structural and regulatory ones is rather arbitrary (Wilson, 1976; Klug, Cummings, 2002). However, apart from the task of precise estimation of differences in expression, very valuable comparative information for speciation studies can be obtained approximately. In particular, considerable regulatory differences (in the expression level of enzyme genes) were found for two sibling char species, in which up to 32% of loci diverged in this respect, whereas distance Dn = 0.08, i.e., nearly at the level characteristic of populations within a species (Kartavtsev et al., 1983). Similar results were obtained for a group of species in status nascendi, in the family of white-fish and graylings in the Baikal Lake. In this case, genetic differences Dn between several fish forms ranged from 0.01 to 0.03, whereas the divergence in the expression level reached 9 – 27% (Kartavtsev, Mamontov, 1983). These and other similar data (Ferris, Whitt, 1978; 1979; Laurie-Ahlberg, 1982; Kartavtsev et al., 2002) suggest that correct judgment on the mode of speciation (and the critical species features from genetic viewpoint) should be based not only on distances, but also on heterozygosity (diversity), variability of other genomic elements, and include other operational criteria (like the TM descriptor, that testifies for a modification as suggested above and others). Developments in evolutionary genetics were made in several directions. I will touch only a few, that are close to the topic of this paper. For instance, the method of distance scaling along phyletic lines was suggested by Avise and Walker (1998). It was designed for the normalization of taxa weights; and as an outcome the unification of Systematics is expected. The estimation of gene trees’ cohesion was suggested by Templeton (2001) to decide on species boundaries. The second approach includes the notion of genetic exchangeability and/or ecological interchangeability among lineages belonging to the same species (Templeton, 2001). Both approaches are operational for species delimitation but it seems that these techniques hardly will solve the above mentioned “rigidity” of species problem and species boundaries without formalization of a species notion. Some authors reached similar conclusions on the basis of independent analysis of different characters and approaches for species delimitation (Ferguson, 2002; Wiens, Penkrot, 2002; Sites, Marshall, 2004). In particular, the latter authors emphasize the idea of diffuse peculiarities of the species concept and species boundaries and, consequently, the necessity and applicability of several sets of operational criteria in a multiple approach for species identification (Sites, Marshall, 2004). This is also emphasized in the approach suggested here (Figure 5-6). The scheme presented in the current paper is designed originally to define a speciation mode. However, it is also contains the logical criteria of whether species have or have not yet originated. Thus, this

Copyright © 2009. Nova Science Publishers, Incorporated. All rights reserved.

28

Y. Ph. Kartavtsev

approach is quite suitable for species delimiting as a complex and empirically operational approach. It has weakness, which all current methods, both the non-tree based and tree-based methods have (Sites, Marshall, 2004), i.e. in some cases, the approach will require researches to make qualitative judgments because of the infinite ways for species to originate. Potentially the approach developing is close to the Population Aggregation Analysis (PAA) in Davis’s (1999) version because it is based on population based parameters like DT, HD etc. (see Figure 5-6). However, this PAA1 approach could easily be converted to the mode PAA2 as defined by Brower (1999) (his notations; see also Sites, Marshall, 2004) and may even have properties of the tree-based method (see below). As in PAA2, it is suggested to use not only genotypic scores (character states) but other suitable descriptors (qualitative and quantitative traits: QT, QTL, etc.); they could be represented as per individual sets of the records or as vector-scores for implementing a multi-dimensional analysis (Canonical, PCA, PAA, etc.) with the aims of (a) testing a null hypothesis (H1) of the absence of vectors’ gatherings and if rejected, the alternative hypothesis H2 will be tested for discrimination among them and taking solution in the frame of logic suggested (Figure 5), and (b) obtaining a solution whether vectors’ genetic (=phylogenetic) unity is available, both as a distance value and a coalescent signature; again solving H1 and H2. To obtain phyletic signal it will be necessary to develop new descriptors in the Figure 5 scheme and introduce them in the set of equations D1 – T4 (and others when developed) in Figure 6. These special descriptors, like the branch length or the parsimony outcomes to current OTU (Operational Taxonomic Units) at a strict consensus tree built at several gene sequences, could be operational criteria among others. The approach is basically empirical but different from such others, reviewed for instance by Sites, Marshall (2004), as having (1) a general genetics and population genetics theory basis and (2) having formalization as equations of set theory. Such approach has its own limitations and advances. One limitation is that it is restricted to sexually reproducing species, for which basic population genetic principles are more or less clear. The other limitation is that generalizations (deductions) are only possible in a framework of the genetic terms defined. But individuals comprising species are phenotypes. Thus, genotype/phenotype correspondence should be defined in an appropriate form and genotype-and-environment interaction or ecological interchangeability should also be introduced somehow. An advance is that this approach is wider than many other suggested for species delimiting (see Sites, Marshall, 2004) in its ability to define different speciation modes (or take into account the differences in species types). Also, by weighting the members of equations in a specific way it is possible to further develop the approach as framework for future theory, the genetic theory of speciation. In the conclusion of this paper, I have to discuss two complications observed under pdistance data comparison. The first one is connected with the possible contradiction between gradual species formation, as evidenced from p-distance increase with the increasing taxonomic rank, and data on environmentally caused flux in species number (Bernatchez, Wilson, 1998; Ruber, Zardoya, 2005). The second came from recent observations on a bifurcation impact hidden in molecular phylogenetic trees vs distances (Pagel et al., 2006). As to the first, it seems more apparent than real, because the environment shifts may stimulate both the species number increase and genetic distance decrease (through reduced time for the substitutions to accumulate when time for species origin shorten). In such

Analysis of Sequence Diversity at Two Mitochondrial Genes…

29

circumstances D1 may not prevail, but, perhaps, D2, T3 or T4 modes (see Figures 5-6). That trend should create differences of mean distances in the taxa which undergo such speciation modes and those that do not; and this may be a reason for the observed heterogeneity of distance scores among taxa of the same rank. However, the genetic trend to get bigger distances with time since gene pool separation is an innate property of modes D2, T3, and T4 (see Figures 5-6). That is why, in a long time span, genetic distance will increase as taxonomic rank increases, especially bearing in mind D1 mode prevalence. Also, averaging of distance scores across numerable taxa should “align” the proportional gradual dependence. In considering the second complication, I have analyzed my own data on p-distances and estimated whether the distance scores are indeed correlated with the branch number or OTU number in a tree; a statistically significant and positive correlation for Cyt-b and Co-1 that included sequences of flatfish and catfish genera was obtained: rs = 0.54, t = 2.51, k = 17, OTU number = 530, P = 0.0241. However, regression analysis showed that factorial impact is insignificant here (P = 0.4802), despite a significant intercept (P < 0.001). Less directly, the analysis of the correlation between distance score and species number, n (not OTU), given earlier (section 2.2, last paragraph) showed the same weak impact, if any, in agreement with the old observations (Avise, Ayala, 1979; Kartavtsev et al., 1984). I am aware that all facts and interpretations provided here present only one angle of a view on molecular genetic data in respect of evolution and species origin. I have omitted consideration of such events as horizontal transfer through mobile elements, chromosome change, gene and genome duplications, deletions/insertions, organelle vs organism commensalisms and others and their impact. Other views are possible if different markers or time spans are considered. More drastic effects of transformative evolution may become evident in this case. Anyway, quantitative data analyses of any kind are always welcome and here I take one step in this direction applying a statistical analysis and some formal genetic notations together with the equations of set theory.

Copyright © 2009. Nova Science Publishers, Incorporated. All rights reserved.

Conclusion (1) The theory and the algorithms of calculation of genetic distances from nucleotide DNA sequences suggest that a suitable model should be thoughtfully selected for analysis of empirical data. However, the observed data for nearly 20000 species confirm the realistic character and interpretability of the data sets, analyzed for pdistance or its derivates. This testifies to the possibility of using this measure for most interspecies and intraspecies comparisons of genetic divergence up to the order level. (2) The data on p-distances show different levels of genetic divergence of sequences of the compared Cyt-b and Co-1 genes in the five comparison groups examined. Differences between genes themselves were also found. This is in good agreement with ample data on different evolution rates of genes and their regions. (3) The results of our analysis of nucleotide and allozyme divergence within animal species and taxa of different ranks, first, are in good agreement with other similar data, including protein gene markers and, second, these data allow a generalization

Y. Ph. Kartavtsev

30

that phyletic evolution prevails in the animal kingdom at the molecular level, while speciation mainly follows the D1 type (the geographic mode). (4) The prevalence of the type D1 speciation does not preclude other speciation modes. There are at least seven such modes. Recognition of different speciation modes is a task requiring the construction of a quantitative genetic model (theory) of speciation. In view of the vast diversity of the possible causes of RIBs and species origin, some of the newly appearing questions remain unanswered and species delimiting requires further work. Their solution is likely to lie in an increase of the number of descriptors and members of the equations (D1-T4, Figures 5 – 6) on the basis of DNA markers and other genomic characteristics and phenotype tests.

Appendix Average Genetic Distances within and between Species for Two mtDNA Genes (Cyt-b and Co-1) At Five Comparison Groups of the Increased Categorical (taxa) Ranks Distance

Copyright © 2009. Nova Science Publishers, Incorporated. All rights reserved.

1.1* 0.4 3.2 0.4 1.54 0.96 0.28* 4 0.32 3.09 1.61 1.59 0.46

5.5 12 4.8 0.9

Model of distance estimate

Species number, n

Taxa

Reference

Cyt-b Intraspecies, among individuals of the same species (1) K2P 5 Mammalia Lepus Halanych et al., 1999 K2P 7 Mammalia Microtus Mazurok et al., 2001 TrN 1 Mammalia Martes Stone, Cook, 2002 TrN 5 Mammalia Martes Stone, Cook, 2002 K2P 9 Mammalia Apodemus Suzuki et al., 2004 K2P 9 Mammalia Apodemus Suzuki et al., 2004 Cyanopica, p 2 Aves Pica Kryukov et al., 2004 GTR 2 Amphibia Rana Sumida, Ogata, 1998 K2P 20 Pisces Mormiridae Kramer et al., 2003 p 9 Pisces Siluriformes Hardman, 2004 TVM 2 Pisces Molidae Bass et al., 2005 Kartavtsev et al., p 29 Pisces Siluriformes 2007a PleuroKartavtsev et al., p 34 Pisces nectiformes 2007b Mean distance = 1.46±0.34, k=13, n=134 Intragenus, among sibling species, semispecies and subspecies (2) K2P 87 Mammalia Johns, Avise, 1998 p 2 Mammalia Rhabdomys Rambau et al., 2003 HKY 2 Mammalia Peromiscus Zheng et al., 2003 K2P 2 Mammalia Lepus Halanych et al., 1999

Analysis of Sequence Diversity at Two Mitochondrial Genes…

3.5 3.8

Model of distance estimate K2P HKY

5.7* 3.5

p K2P

8.69 2.5 8

p p p

Distance

9.4 12.5* 14 11.4 8.5 6.2 22 13.5 12 8.95*

Copyright © 2009. Nova Science Publishers, Incorporated. All rights reserved.

14.5 7.8 11 7.4 12.3 12 14 14.8 26.2 11.8 1.43 2.3 9 7.89* 12 3.5

Species number, n 94 Aves 12 Aves 2 96

Aves Pisces

Taxa Motacillidae Cyanopica, Pica -

Reference Johns, Avise, 1998 Volker, 1999 Kryukov et al., 2004 Johns, Avise, 1998 Chernoweth et al., 2002 Kramer et al., 2003 Johnson et al., 2003

2 Pisces Urocampus 2 Pisces Pollimyrus 2 Pisces Cyprinidae Mean distance = 5.35±0.95, k=11, n=303 Intragenus, among morphologically distinct species of the same genus (3) K2P 7 Mammalia Microtus Mazurok et al., 2001 GTR 6 Mammalia Sciuridae Piaggio, Spicer, 2001 K2P 2 Mammalia Apodemus Serizawa et al., 2000 K2P 92 Mammalia Johns, Avise, 1998 GTR 23 Mammalia Neotamias Piaggio, Spicer, 2001 TrN 5 Mammalia Martes Stone, Cook, 2002 TrN 2 Mammalia Mustella Stone, Cook, 2002 HKY 2 Mammalia Peromyscus Zheng et al., 2003 p 2 Mammalia Rhabdomys Rambau et al., 2003 K2P 11 Mammalia Lepus Halanych et al., 1999 Rocha-Olivares et al., K2P 67 Mammalia Rodents 1999a K2P 88 Aves Johns, Avise, 1998 K2P 15 Aves Pollimirus Kimbal et al., 1999 K2P 7 Aves Alectoris Kimbal et al., 1999 Cyanopica, p 2 Aves Pica Kryukov et al., 2004 K2P 11 Reptilia Johns, Avise, 1998 K2P 16 Amphibia Johns, Avise, 1998 K2P 8 Amphibia Rana Sumida et al., 2000 p 2 Amphibia Rana Sumida, Ogata, 1998 K2P 81 Pisces Johns, Avise, 1998 Rocha-Olivares et al., p 15 Pisces Sebastomus 1999a Rocha-Olivares et al., p 15 Pisces Sebastomus 1999b Rocha-Olivares et al., p 45 Pisces Sebastes 1999b Rocha-Olivares et al., K2P 285 Pisces Several orders 1999a p 2 Pisces Rhabdomys Rambau et al., 2003 Moller, Gravlund, p 19 Pisces Zoarcidae 2003

31

Y. Ph. Kartavtsev

32

Appendix Continued

12.5 1.8 5.46 5.66

Model of distance estimate TrN K2P p TVM

5.28

p

17.51

p

15.5

K2P

23 14.7 32.8 16.7 19.26* 12.5 31.3 14.5 20.5 19.5 31

K2P K2P p TrN K2P K2P K2P p K2P K2P K2P

15.3* 24.8 13.2

K2P TrN K2P

9.5 6.6 16.27 12.28 18.33 17.02 22.81 14.22

p p p p p p p TVM

16.37

p

11.74

p

Copyright © 2009. Nova Science Publishers, Incorporated. All rights reserved.

Distance

Species number, n 6 13 31 2 29

Taxa Pisces Pisces Pisces Pisces Pisces

Clupeidae Pollimyrus Siluriformes Molidae

Reference Jerome et al., 2003 Kramer et al., 2003 Hardman, 2004 Bass et al., 2005 Kartavtsev et al., 2007a Kartavtsev et al., 2007b

Siluriformes Pleuro34 Pisces nectiformes Mean distance = 10.46±0.96, k=32, n=945 Intrafamily, among genera of the same family (4) 48 Mammalia Johns, Avise, 1998 Rocha-Olivares et al., 67 Mammalia Rodents 1999a 2 Mammalia Murinae Serizawa et al., 2000 5 Mammalia Scuridae Piaggio, Spicer, 2001 9 Mammalia Scuridae Stone, Cook, 2002 13 Mammalia Leporidae Halanych et al., 1999 37 Aves Johns, Avise, 1998 25 Aves Phasianinae Kimbal et al., 1999 15 Aves Falconidae Griffits, 1997 18 Reptilia Johns, Avise, 1998 3 Amphibia Johns, Avise, 1998 8 Amphibia Rana/Xenopus Sumida et al., 2000 Rocha-Olivares et al., 285 Pisces Several orders 1999a 6 Pisces Clupeidae Jerome et al., 2003 19 Pisces Mormiridae Kramer et al., 2003 Moller, Gravlund, 19 Pisces Zoarcidae 2003 32 Pisces Cottidae Kontula et al., 2003 861 Perciformes Sparidae Orrell, 2000 1 Perciformes Lutjanidae Orrell, 2000 1 Perciformes Haemulidae Orrell, 2000 1 Perciformes Lethrinidae Orrell, 2000 1 Perciformes Nemipteridae Orrell, 2000 2 Pisces Molidae Bass et al., 2005 Kartavtsev et al., 29 Pisces Siluriformes 2007a Pleuronectifor Kartavtsev et al., 34 Pisces mes 2007b

Copyright © 2009. Nova Science Publishers, Incorporated. All rights reserved.

Analysis of Sequence Diversity at Two Mitochondrial Genes…

Distance

Model of distance estimate

Species Taxa Reference number, n Mean distance = 17.99±1.33, k=25, n=1541 Intraorder, among families of the same order (5) 121 Actinopterigii Perciformes Orrell, 2000 Tetraodontifor mes Bass et al., 2005 2 Pisces Kartavtsev et al., 29 Pisces Siluriformes 2007a Pleuronectifor Kartavtsev et al., mes 2007b 34 Pisces Mean distance = 26.36±3.88, k=4, n=186 Co-1 Intraspecies, among individuals of the same species (1) 173 Pisces Several orders Ward et al., 2005 2 Pisces Sphyrna Quatro et al., 2006 Papasotiropoulos et al., 4 Teleostei Mugilidae 2007 13 Pisces Several orders Ward et al., 2008 Pleuronectifor 8 Pisces mes Kartavtsev et al., 2008 Pleuronectifor Sharina, Kartavtsev, 5 Pisces mes 2008 Kartavtsev et al., 5 Pisces Perciformes 2009b Scorpaeniform Kartavtsev et al., 9 Pisces es 2009a 2 Agnata Letentheron Yamazaki et al., 2003 3 Echinodermata Zoroasteridae Howell et al., 2004 2 Mollusca Cephalopoda Herke, Foltz, 2002 1 Crustacea Potamonautes Daniels et al., 2002 13 Lepidoptera Arctidae Hebert et al., 2002a 30 Lepidoptera Geometri Hebert et al., 2002a 42 Lepidoptera Noctuida Hebert et al., 2002a 14 Lepidoptera Notodontidae Hebert et al., 2002a 8 Lepidoptera Sphingidae Hebert et al., 2002a Martinez-Navarro et 12 Coleoptera Carabidae al., 2005 6 Arthropoda Theridiidae Garb et al., 2004 Machordom, 7 Crustacea Decapoda Macpherson, 2004 16 Collembola Hexapoda Hogg, Hebert, 2004 3 Hymenoptera Apidae Bertsch et al., 2005

22.58

p

37.44

TVM

19.81

p

25.60

p

0.39 3.3

K2P GTR

0.41 0.34*

K2P K2P

0.17

p

0.09

p

0.11

p

1.00 1.4 0.49 T mutation was detected in one INH-resistant isolate (that also had katG315 mutation) and three INH-susceptible isolates. A mutation in embB306 was found in 7 of 11 EMB-resistant isolates. Consequently, rpoB and embB306 mutations may serve for rapid genotypic detection of the majority of the RIF and EMB-resistant strains in Bulgaria; the results on INH resistance are complex and further investigation of more genes is needed. Comparison with spoligotyping and 24-VNTR locus typing data suggested that emergence and spread of drug-resistant and MDR-TB in Bulgaria are not associated with any specific spoligotype or MIRU-VNTR genotype. A local circulation of the particular clones appears to be an important factor to take into consideration in the molecular epidemiological studies of tuberculosis in Bulgaria.

Copyright © 2009. Nova Science Publishers, Incorporated. All rights reserved.

Introduction Tuberculosis (TB) infects a significant proportion of the world population and constitutes a major public health problem, particularly, in the developing regions. A reemergence of TB accompanied by an increasing number of drug resistant and multidrug-resistant (i.e. resistant to at least rifampin [RIF] and isoniazid [INH]) Mycobacterium tuberculosis strains has been noted since the mid-1980s. Management of tuberculosis is complicated by the emergence of drug resistant M. tuberculosis strains, which has become a serious health problem worldwide (WHO, 2008ab). Tuberculosis (TB) remains an important public health issue for Bulgaria whereas no genotypic data on the circulating Mycobacterium tuberculosis strains were yet published from this Balkan country. Although a number of new cases is showing a steady decline since 2001 (48.6/100,000), the TB incidence rate in Bulgaria is still sufficiently high (41/100,000 in 2006) (WHO, 2008b). Geographically, Bulgaria is located in the region with contrasting epidemiological situation for tuberculosis. The southern neighbour, Greece, reported TB rates to have been gradually decreased, while the incidence was only 6.9/100,000 in 2005. The reported TB rates for Romania and Turkey are significantly higher and have been increasing and reached 135.2/100,000 in Romania and 28.1/100,000 in Turkey in 2005 (EuroTB, 2007). The rate of the MDR-TB among newly diagnosed TB patients in Bulgaria was estimated to be 10.7% (95% CLs 1.8-44.7) that is higher than in the neighboring countries such as, Romania (2.8% [95% CLs 1.8-4.2]), Greece (1.1% [95% CLs 0.2-7.4]) or Turkey (1.4% [95% CLs 0.2-9.0 ]) and is more similar to this estimated rate in Ukraine (16% [95% CLs 13.7-18.4]) and Russia (13% [95% CLs 11.3-14.8]) (WHO, 2008a). However one should take notion of the CL values of these estimations.

Copyright © 2009. Nova Science Publishers, Incorporated. All rights reserved.

Mycobacterium tuberculosis in Bulgaria

71

Recent advances in molecular techniques have enabled development of a variety of genotyping methods for differentiation of clinical isolates of M. tuberculosis (van Embden et al., 1993, van Soolingen et al., 2001, Moström et al., 2002). In particular, repetitive and insertion sequences were proven useful to study both epidemiology and phylogeography of M. tuberculosis (Supply et al., 2001; Sola et al., 2001; Brudey et al., 2006ab; Mokrousov, 2007; Zozio et al., 2005; van Soolingen et al., 2001; Moström et al., 2002; Al-Hajoj et al., 2007) and regularly updated genetic diversity databases are available for this pathogen (Filliol et al., 2003; Brudey et al., 2006b; Mokrousov et al., 2005; Mokrousov, 2007, 2008; Weniger et al., 2007; El-Sahly et al., 2004; Kremer et al., 2004). The chromosomal locus, containing a large number of direct repeats (DRs) interspersed with unique spacer sequences, is the target of spoligotyping (spacer oligonucleotide typing) technique (Kamerbeek et al., 1997) (Figure 1). This method has been widely applied to study molecular epidemiology and evolutionary genetics of TB. Since the technique is PCR based, it requires less DNA than conventional IS6110 restriction fragment length polymorphism analysis, which is the most widely applied and standardized molecular typing method (van Embden et al 1993, van Soolingen et al 2001). In recent years, various novel DNA typing methods have been developed which are faster and easier to perform than IS6110-RFLP method. Among them, VNTR typing is probably the most popular approach. This method is based on the variable-number tandem repeats of mycobacterial interspersed repetitive units (MIRU-VNTR) scattered throughout the genome and each isolate is typed based on the number of copies of repeated units (Supply et al., 2001). Implementation of the large number of loci is expected to achieve a high discrimination. This relatively new method, which requires only basic PCR and agarose electrophoresis equipment, was shown with different strain samples to possess a higher discriminatory power than that of spoligotyping and only slightly below that of IS6110-RFLP typing (Supply et al., 2001) although this may vary depending on local population structure (Zozio et al. 2005; Mokrousov et al., 2004). The apparent advantage of the VNTR approach (compared to the IS6110 typing) is its portability due to easy digitalization of the generated profiles and hence easy interlaboratory exchange, as well as easy creation and maintenance of the databases. Since 1998, the VNTR typing of M. tuberculosis has undergone a remarkable improvement. Whereas the initial scheme used only six exact tandem repeat loci (Frothingham, R., and W. A. Meeker-O'Connell. 1998), a more recently developed and already classical MIRU set involved 12 loci (Supply et al., 2001), finally, the most recently proposed new format for MIRU typing includes 24 loci (Supply et al., 2006) (Figure 1). The development and application of the MIRU-VNTR typing for M. tuberculosis became an important methodological achievement towards a better understanding of the molecular epidemiology of tuberculosis. The first paper on the new 24-locus format dealt with mainly cosmopolitan, geographically diverse set of strains (Supply et al., 2006). Although it made a critically important step in evaluating a wide array of loci and selecting those most appropriate, more common in-field studies are carried out, by definition, in the geographically limited settings with possibly biased local population structures of the circulating strains. A population-based study in Hamburg, Germany, concluded that the 15and 24-locus VNTR typing combined with spoligotyping represents the first PCR-based method with operating parameters (specificity and sensitivity) comparable to those of the

72

Violeta Valcheva, Igor Mokrousov, Olga Narvskaya et al.

Copyright © 2009. Nova Science Publishers, Incorporated. All rights reserved.

“gold standard” IS6110 fingerprinting and thus can be used as a stand-alone approach to study TB transmission (Oelemann et al., 2007). Since than, this new 24-locus set was evaluated in four studies (Allix-Beguec et al., 2008a; Iwamoto et al., 2007; Jiao et al., 2008; Mokrousov et al., 2008). Three of them were carried out in the specific world areas (Japan, China, Russia) with “biased” population structures of M. tuberculosis, i.e., dominated by a single and homogeneous clonal group, the Beijing genotype. In fact, two of these studies focused exclusively on the Beijing genotype strains (Jiao et al., 2008; Mokrousov et al., 2008). Finally, a large-scale three-year population-based study in the Brussels-Capital Region in Belgium included strains of highly diverse origins, in particular, 76% patients were foreign-born from 69 countries, in majority, from Africa (Allix-Beguec et al., 2008a). The early detection of resistance to first line anti-TB drugs is essential for the efficient treatment and constitutes one of the priorities of TB control of MDR strains. Patients infected with drug resistant strains are less likely to be cured, and their treatment is more toxic and expensive than the treatment for patients infected with susceptible organisms. Inadequate and/or interrupted therapy allows for the selection of spontaneous mutations in favor of resistant organisms while sequential acquisition of these mutations in different genome loci results in the development of resistance to multiple drugs. Therefore, a correct and rapid detection of resistant strains is necessary for the appropriate and timely anti-TB therapy and the reduction of total treatment cost. Multiple genes responsible for conferring resistance to the major anti-TB drugs have been identified for M. tuberculosis. A majority of rifampin (RIF) resistant strains harbor mutations in the 81-bp hot-spot region (rifampin resistance determining region, RRDR) of the rpoB gene encoding DNA-dependent RNA polymerase β-subunit, a target of the drug (Telenti et al., 1993; Ramaswamy, Musser, 1998; Martin, Portaels, 2007).

Figure 1. (a) Position of the 24 MIRU-VNTR and DR loci on the M. tuberculosis H37Rv chromosome; (b) their structure and (c) example of spoligoprofiles.

Mycobacterium tuberculosis in Bulgaria

73

Isoniazid (INH) resistance is controlled by a complex genetic system that involves several genes, katG, inhA, ahpC, kasA, and ndh (Ramaswamy, Musser, 1998; Slayden, Barry, 2000; Lee et al., 2001; Martin, Portaels, 2007). Ethambutol (EMB) resistance was most frequently associated with mutations in the embCAB operon which product arabinosyl transferase is involved in mycolic acids metabolism and particularly with mutations in embB codon 306 (Telenti et al., 1997; Sreevatsan et al., 1997; Ramaswamy et al., 2000). More recently, Mokrousov et al. (2002b) highlighted a presence of embB306 mutations in EMBsusceptible strains and Hazbon et al. (2005) suggested an association of embB306 mutations with broad drug resistance and clustering rather than with EMB resistance. This chapter reviews our recent molecular studies of M. tuberculosis strains circulating in Bulgaria, as a necessary step towards an implementation and better understanding of molecular epidemiology of TB here. We further looked at our data at a global scale through comparison with international database SITVIT2. Different typing methods, including IS6110-RFLP and MIRU-VNTR were applied to M. tuberculosis strains from Bulgaria. The objective was to assess new versus traditional molecular markers for epidemiological studies of M. tuberculosis in Bulgaria. The general interest of our study was to evaluate the performance of the newly proposed 24-locus standard (i) in the relatively heterogeneous M. tuberculosis population (ii) circulating in the setting of a single country (iii) devoid of the significant influx of the foreign-born population. Characterization of the molecular basis of drug resistance in a survey area constitutes a first step towards an implementation of the methods permitting its fast detection. Here, we analyzed the molecular basis of drug resistance in M. tuberculosis strains currently circulating in Bulgaria. We also compared the distribution of the drug resistance in the main genotypic clusters defined using spoligotyping and VNTR typing.

Methods

Copyright © 2009. Nova Science Publishers, Incorporated. All rights reserved.

Bacterial Isolates One hundred and thirty three M. tuberculosis isolates were randomly selected among M. tuberculosis strains isolated from newly-diagnosed, adult, pulmonary TB patients in different regions of Bulgaria from December 2004 to March 2006. These isolates were recovered from adult HIV-negative pulmonary TB patients who were permanent residents of the country. The patients were permanent Bulgarian residents and were proven to be unlinked on the basis of a standard epidemiological investigation. No preliminary selection of strains based on their drug resistance or patient data was made. These isolates corresponded to all newly isolated M. tuberculosis cultures available at the time of collection, hence these clinical isolates may be interpreted as a snapshot of the circulating tubercle bacilli clones in Bulgaria. Susceptibility testing for isoniazid (INH), rifampin (RIF), ethambutol (EMB), streptomycin (STR) was carried out by the absolute concentration method on LowensteinJensen medium as recommended (WHO, 1998). The critical concentrations for INH, RIF, EMB, and SM were 0.25, 10, 2.0, and 4 mg/l, respectively.

74

Violeta Valcheva, Igor Mokrousov, Olga Narvskaya et al.

Copyright © 2009. Nova Science Publishers, Incorporated. All rights reserved.

DNA Fingerprinting The DNA of the studied strains was extracted from 4 to 6 weeks Löwenstein-Jensen medium culture using the recommended method (van Embden et al., 1993). Spoligotyping was used to analyze a variation in the DR locus (absence/presence of 43 different spacers) as described previously (Kamerbeek et al., 1997) (Figure 1). The individual spoligotyping patterns were entered in Excel spreadsheet and compared with international database SITVIT2 (Institut Pasteur de Guadeloupe) that is an updated version of the published SpolDB 4.0 database (Brudey et al., 2006). At the time of this comparison (September 4, 2007), SITVIT2 contained a total of 2880 shared-types corresponding to 66846 clinical isolates from 122 isolation countries, and 166 countries of origin. Major phylogenetic clades were assigned according to signatures provided in SpolDB4 (Brudey et al., 2006b), which defines 62 genetic lineages/sub-lineages. These included specific signatures for various M. tuberculosis sub-species such as M. bovis, M. microti, M. caprae, M. pinipedii, M. africanum, as well as rules defining major lineages/sub-lineages for M. tuberculosis sensu stricto. The latter included the Central Asian (CAS) clade (2 sublineages), the East African Indian (EAI) clade (9 sub-lineages), the Haarlem (H) clade (34 sub-lineages), the Latin-American-Mediterranean (LAM) clade (12 sub-lineages), the "Manu" family (3 sub-lineages), the Beijing family, the S clade, the IS6110-low banding X clade (3 sub-lineages), and an ill-defined T clade (5 sub-lineages). IS6110-RFLP typing was performed mainly as previously described (van Embden et al., 1993). Briefly, M. tuberculosis DNA was digested with PvuII, electrophoresed, Southern-blotted and hybridized with a DIGlabeled 245-bp PCR-generated IS6110 probe. Each Southern blot included DNA of M. tuberculosis strain 14323 as an external molecular weight marker. The hybridization profiles were visualized as banding patterns on membrane using alcaline phosphatase (Roche Applied Science, USA) catalyzed colorimetric reaction. Each of the 24 MIRU-VNTR loci was amplified individually with primers specific for sequences flanking the MIRU units as described by Supply et al. (2001, 2006) (Figure 1). The amplicons were evaluated on the 1.5% standard agarose gels using a 100-bp DNA ladder (GE Healthcare). The H37Rv strain was run as additional control of the performance of the method. Size analysis of the PCR fragments in 1.5% agarose gels and assignment of the VNTR alleles were done using TotalLab TL100 software (Nonlinear Dynamics Ltd., UK) and by comparison with correspondence tables kindly provided by Philip Supply. Some PCR reactions were repeated and allele scoring was done by independent analysis by two technicians. Analysis of the IS6110 element specific for LAM genetic family was done as described previously (Marais et al., 2006). In brief, a 205-bp band indicates a LAM strain by the presence of an IS6110 element in a specific site in genome, whereas a 141-bp band indicates a non-LAM strain lacking the IS6110 element in this site.

Resistance Mutations Typing Mutations were detected in rpoB RRDR, katG315, inhA promoter region (positions from -9 to -25), and embB306, as described previously (Morcillo et al., 2002; Mokrousov et al.,

Mycobacterium tuberculosis in Bulgaria

75

2002ab, 2004, 2006). PCR-RFLP method was used to detect mutations in embB306 and katG315, while reverse hybridization method was used to detect mutations in rpoB RRDR region and inhA -15 C>T mutation in the inhA promoter region using membranes prepared in St. Petersburg Pasteur Institute as described previously (Mokrousov et al., 2004, 2006).

Quality Control To minimize the risk of laboratory cross-contamination during PCR amplification, each procedure (preparation of the PCR mixes, the addition of the DNA, the PCR amplification, and the electrophoretic fractionation) was conducted in physically separated rooms. Negative controls (water) were included to control for reagent contamination.

Statistical Analysis EpiCalc 2000 version 1.02 software (Gilman, Myatt, 1998) was used for statistical analysis to calculate Odds ratio and p-values with 95% confidence interval. PAUP* 4.0 package (Swofford, 2002) was used to reconstruct the most parsimonious dendrogram of the VNTR digital profiles treated as categorical variables. Hunter Gaston index (HGI) was calculated as described previously (Hunter Gaston, 1988) and was used to evaluate discriminatory power of the typing methods and allelic diversity of the VNTR loci. HGI is a probability that two strains consecutively taken from a given population would be placed into different types by the typing method; the lower the index value is, the less discriminative is the typing method. The HGI was calculated as described previously (Hunter and Gaston, 1989). Mean HGI was calculated as a mean value of HGI values of the 24 loci. The HGI was calculated using the following formula:

Copyright © 2009. Nova Science Publishers, Incorporated. All rights reserved.

HGI = 1 −

s 1 ∑ n j (n j − 1) N ( N − 1) j =1

where N is the total number of strains in the typing scheme, s is the total number of distinct patterns discriminated by the typing method, and nj is the number of strains belonging to the jth pattern (Hunter, Gaston, 1988). The TotalLab TL100 software (Nonlinear Dynamics Ltd., UK) was used to calculate molecular weight of the fragments in the IS6110-RFLP profiles; the resulting molecular weights matrix was used by Taxotron package (Grimont, 2002) to build a UPGMA (unweighted pair-group method of averages) dendrogram. Relationships between spoligoprofiles were evaluated as a spoligoforest graph according to a deletion model of the evolution of the DR locus (Reyes et al., 2008; Tang et al., 2008) by using SpolTools program (http://www.emi.unsw.edu.au/spolTools). The “spoligoforest” burst layout was generated with SpolTools program using Fruchterman-Reingold algorithm (Reyes et al., 2008; Tang et al., 2008).

76

Violeta Valcheva, Igor Mokrousov, Olga Narvskaya et al.

Results and Discussion Population structure of M. tuberculosis in Bulgaria

Copyright © 2009. Nova Science Publishers, Incorporated. All rights reserved.

A study sample included 133 strains from different regions of the country. Spoligotyping was used as a primary typing tool; it subdivided these strains into 37 types, including 15 clusters and 22 singletons (Table 1; Figure 2).

Figure 2. Relationships of the M. tuberculosis spoligotypes identified in 133 strains from Bulgaria visualized as a forest graph. Circle size is proportional to the number of strains. Solid edges are unique relationships between spoligotypes. Broken-line edges are chosen among multiple edges. Dotted lines have probability 0.5.

Mycobacterium tuberculosis in Bulgaria

77

Twenty-two spoligotypes represented single isolates; the other 111 isolates were grouped into 15 clusters (2 to 33 isolates). HGI was 0.893. The spoligotype designation was attributed by online comparison of the obtained profiles presented in binary code with those included in the international SITVIT2 database (Institut Pasteur de Guadeloupe) (Table 1). This comparison showed a noticeable presence of two globally distributed shared types ST53 (24.8%) and ST47 (5.3%). Twenty-five (19.0%) and six (4.5%) strains belonged to ST125 (LAM/S subfamily) and ST41 (LAM7_TUR subfamily). Eight spoligoprofiles (14 strains) were not found in the SITVIT2 and were designated as “new”; two of them constituted new shared types ST2905 and ST2906 while the other 6 new profiles remained orphans (Table 1).

Copyright © 2009. Nova Science Publishers, Incorporated. All rights reserved.

Figure 3. Regional distribution of the major spoligotypes and clades identified in 133 M. tuberculosis strains from Bulgaria. Circle size is roughly proportional to the number of strains from an area in Bulgaria. Data on Turkey and Greece are based on the SITVIT2 database at Institut Pasteur de Guadeloupe.

The distribution of the major spoligotypes was plotted to the map of the country; this also demonstrated a geographic diversity of our collection (Figure 3). The spoligotype-defined population structure of M. tuberculosis in Bulgaria appears to be both sufficiently heterogeneous (HGI=0.893) and dominated by two spoligotypes ST125 (19%) and ST53 (25%) that distribution patterns differ strikingly. Spoligotype ST53 is found in similar and rather high proportion in the neighboring Greece and Turkey and almost equally distributed across different regions of Bulgaria (Figure 3). Contrarily, ST125 is not found elsewhere (Valcheva et al., 2008a) and is specific for Bulgaria; furthermore it appears to be mainly confined to the southern part of the country (Figure 3).

Copyright © 2009. Nova Science Publishers, Incorporated. All rights reserved.

Table 1. Description of M. tuberculosis spoligotypes found in Bulgaria and compared to SITVIT2 database Spoligotype SITVIT2a ST53 ST612 ST50 ST47 ST602 Orphan A* ST44 ST61 ST453 ST60 ST41 ST40 ST1252 ST878 ST2906* ST37 Orphan B* ST2577 ST34 ST262 ST280 ST144 ST154 Orphan C* ST2139 ST205 Orphan D*

Spoligoprofile

No of strains, this study 33 1 2 7 1 1 1 1 3 1 6 1 1 1 2 2 1 1 7 1 2 1 3 1 1 1 1

Clade, SITVIT1 T1 T H3 H1 U T5 LAM10_CAM T LAM4 LAM7_TUR T4 T X1-VAR U T3

S H1-VAR T1_RUS2 T T H U T U

No of strains in Romania

Turkey

Greece

Russia

1

167

41

67

1 1

52 21 9

10 2

10 6 1

1 1 1

154 2 1 3

3 2

8

2

3 10 2

4

2

4

2 1

4 1

25 5 1

Copyright © 2009. Nova Science Publishers, Incorporated. All rights reserved.

Table 1. (Continued) Spoligotype SITVIT2a ST90 Orphan E* OrphanF* ST2905* ST284 ST1588 ST4 ST1280 ST125 Total No of strains a

asterisk (*) designates a new profile.

Spoligoprofile

No of strains, this study 1 2 1 5 7 2 5 1 25 133

Clade, SITVIT1

No of strains in Romania

U H U U T1 LAM LAM/S T1_RUS2 LAM/S 14

Turkey

Greece

Russia

26

4

1

22 2

3

2 5

897

172

1241

80

Violeta Valcheva, Igor Mokrousov, Olga Narvskaya et al.

Copyright © 2009. Nova Science Publishers, Incorporated. All rights reserved.

A large proportion of the studied strains belonged to spoligotype ST53 (Figure 2, Table 1). This worldwide distributed spoligotype represents 6.0% of strains in SITVIT2. In our sample it constituted a much higher proportion (25.7%). Comparison with geographical (Turkey, Greece, and Romania) and historical (Russia) neighbours revealed that this spoligotype is present in almost as high proportion in Turkey (18.6%) and Greece (23.8%) but not in Russia (5.4%) nor in Romania (7.1%), although this latter case may be biased due to a very small sample size (Table 1; Figure 3). Consequently, in spite of the otherwise global circulation of this genotype ST53, its Bulgarian strains were likely brought to Bulgaria as a result of the Balkan intraregional human movement. In particular, a significant increase in human exchange between Bulgaria and Turkey has been noted since end 1980s (Vasileva, 1992). However, a similar and high proportion of these strains not only in Bulgaria and Turkey but also in Greece makes us to hypothesize a historically relatively more distant time for their importation here driven by the medieval expansion of the Ottoman Empire (http://www.euratlas.com/big/big1600.htm). In particular, it may be noted that starting with the earliest conquests in Thrace (modern Bulgaria) in the 1350, the Ottoman state employed a policy of forced population transfers that over the next three centuries would transport thousands of subjects from Asia into Europe (Hooper, 2003). The other three frequently found spoligotypes in our collection were ST47, ST41 and ST125 (Table 1). In particular, ST41 belongs to the LAM7_TUR subfamily and is mainly circumscribed to Turkey (17.2%) for which its phylogeographical specificity has been suggested (Zozio et al., 2005). However, its only rare isolates have been described in Greece and Romania. It may be possible that ST41 has reached its high rate in Turkey during the course of the 20th century and has not yet penetrated to the neighboring countries in the significant proportions. Comparison with previously published data revealed that characteristic two-band IS6110-RFLP profile found in ST41 strains from Sofia and Plovdiv (Figure 4) is also present in ~40% of ST41 strains in Turkey (Zozio et al., 2005). A closer look at MIRU profiles of the same strains showed some similarity of the Turkish and Bulgarian ST41 isolates. However, all MIRU-typed Turkish strains had 1 copy in a locus MIRU26 (a signature 215125113322 being prevalent and likely ancestral) while Bulgarian strains of this spoligotype had 1 or 5 copies in MIRU26. This suggests a genetical divergence between extant geographical sublineages within ST41 in Bulgaria and Turkey

Figure 4. MIRU and IS6110-RFLP profiles of the spoligotype ST41 strains.

Mycobacterium tuberculosis in Bulgaria

81

Copyright © 2009. Nova Science Publishers, Incorporated. All rights reserved.

Summing up, these observations make us to speculate that the two-band profile found in the ST41 strains from Plovdiv and Sofia may have evolved from the variant brought from Turkey whereas, in its turn, it may have become ancestral to the seven-band profile evolved in situ and found in both strains from Shumen (Figure 4). This might require a relatively long-term evolution and, consequently, could reflect a long-term presence of this two-band RFLP variant of ST41 in Bulgaria. On the other hand, a comparison with SITVIT2 revealed a high gradient for ST125 in Bulgaria (Table 1) and negligible presence of this spoligotype outside Bulgaria and, in particular, in the neighboring countries. A similarity of the IS6110-RFLP profiles confirmed a true relatedness of the spoligotype ST125 strains whereas high diversity of the 12-MIRU loci (Figure 5) suggested a long-term evolution of this spoligotype in Bulgaria. These findings lead us to suggest a Bulgarian phylogeographic specificity of the spoligotype ST125.

Figure 5. 12-MIRU-loci based minimum spanning tree of spoligotype ST125 strains. Each circle, node or tip, is described by MIRU type number (inside a circle), strain origin/number of strains (if more than one), MIRU 12-digit profile (in italic) and IS6110-RFLP one-letter profile designation (in bold). MIRU types numbering within ST125 was done only for convenience of analysis and discussion. Circle size is roughly proportional to the number of strains sharing a respective MIRU profile.

Violeta Valcheva, Igor Mokrousov, Olga Narvskaya et al.

82

Although a detailed comparison with drug susceptibility data is presented below, we noted a high rate of multidrug-resistant (MDR) strains in the studied Bulgarian sample (12%). For example, in Russia the MDR phenotype was found in 48.6% of the Beijing genotype strains versus 29.4% of the non-Beijing strains suggesting that current transmission of MDRTB in Russia is greatly influenced by the ongoing dissemination of the Beijing family strains (Narvskaya et al., 2005). Comparison with global database revealed that several spoligotypes were co-shared by Bulgarian and Russian strains (Table 1), which is readily explained by close links and extensive human movement between the two countries until the end of the 20th century. Nevertheless, the Beijing genotype was not identified in the studied strains from Bulgaria. Consequently, the current situation with MDR-TB in Bulgaria cannot be explained by transmission of the Beijing genotype that apparently has not yet reached this country.

High-Resolution Typing and Comparison of Typing Methods Seventy-three strains had sufficient quantity of DNA for traditional IS6110-RFLP typing. Accordingly, this sub-sample served to compare all three methods in this study, spoligotyping, IS6110-RFLP typing and newly proposed 24-locus VNTR scheme (Supply et al., 2006). One should note that a reduction in the sample size did not decrease neither genetic diversity (spoligotyping HGI133=0.893 versus HGI73=0.939) nor geographical representativeness (“city of isolation” based HGI133=0.838 versus HGI73=0.873) of the collection as a whole. Table 2 shows a comparison of the discriminatory capacity of the different VNTR sets and IS6110-RFLP typing. IS6110 fingerprinting subdivided 73 M. tuberculosis isolates into 39 unique types and 12 clusters. The IS6110 copy number varied between 2 and 13 copies per profile although it was generally high (Figure 6, Table 2). Assuming a low-copy number as less than 5, only three strains in this study were low-banders making an outgroup in the IS6110-RFLP tree (strain 46 and cluster XII in Figure 6).

Copyright © 2009. Nova Science Publishers, Incorporated. All rights reserved.

Table 2. Discriminatory power of the genotyping methods evaluated with 73 strains

*

Method*

No. of types

No. of unique isolates

No. of clustered isolates

No. of clusters

Cluster size (range)

HGI

MIRU-VNTR 24 loci

66

61

12

5

2-3

0.997

MIRU-VNTR 15 loci

65

59

14

6

2-3

0.996

MIRU-VNTR 12 loci

62

55

18

7

2-4

0.994

MIRU-VNTR 5 loci

45

28

45

17

2-4

0.984

IS6110-RFLP

51

39

34

12

2-7

0.983

Spoligotyping

31

18

55

13

2-14

0.939

MIRU-VNTR 12, 15, 24 loci: Supply et al., 2001, 2006. The 5-locus scheme: 5 MIRU-VNTR loci found the most polymorphic in this study: MIRU40, Mtub04, Mtub21, QUB-11b, and QUB-26.

Copyright © 2009. Nova Science Publishers, Incorporated. All rights reserved.

Mycobacterium tuberculosis in Bulgaria

83

Figure 6. IS6110-RFLP based dendrogram of M. tuberculosis strains from Bulgaria compared to their 24-locus VNTR digital haplotypes, 43-signal spoligoprofiles. SIT, spoligotype international type. IS6110-RFLP clusters in the dendrogram are designated with Roman numerals from I to XII. “A” designates 11 repeat units in a VNTR locus. VNTR profiles of the strains included in the IS6110-RFLP clusters are in boxes; minor variable alleles within these clusters are in bold.

The 24 published MIRU-VNTR loci (Supply et al., 2006) were further analyzed in this study (Table 2). Examples of different alleles for the most polymorphic loci are shown in Figure 7. The allelic diversity differed significantly among VNTR loci (Table 3). The highest allelic diversity among all strains was observed for QUB-26 (0.827), and the null allelic diversity was found for the monomorphic MIRU24. The lowest diversity (HGI~0.1) was found for six loci MIRU20, MIRU27, MIRU31, MIRU39, ETR-B, Mtub34.

Copyright © 2009. Nova Science Publishers, Incorporated. All rights reserved.

84

Violeta Valcheva, Igor Mokrousov, Olga Narvskaya et al.

Figure 7. Examples of the VNTR alleles of the most variable loci in this study: QUB11b (a), QUB26 (b) Mtub21 (c). M, molecular weights marker “100 bp DNA ladder” (GE Healthcare).

Mycobacterium tuberculosis in Bulgaria

85

A comparison of different combinations of the VNTR loci revealed that the use of the “old/classical” 12-locus combination was the least discriminatory; it identified 55 unique and 18 clustered strains (HGI=0.994). A better resolution with 59 unique and 14 grouped strains (0.996) was observed by using the 15-locus MIRU-VNTR system. Finally, a use of the full set of the 24 loci permitted us to identify 61 unique and 12 clustered isolates (HGI=0.997). Compared to the 15-locus scheme, the 24-locus scheme differentiated within a cluster of strains 17 and 25 due to the difference in the Mtub34 locus; it may be noted that these two strains were identical in other VNTR loci as well as in their IS6110-RFLP and spoligoprofiles (Figure 6). Otherwise, except for the above example, a use of the moderately polymorphic (Table 3) 9 auxiliary loci of the 24-locus scheme did not contribute to the additional differentiation of strains compared to the 15-locus scheme. Table 3. Allelic diversity of 24 VNTR loci in M. tuberculosis strains from Bulgaria and other locations

Copyright © 2009. Nova Science Publishers, Incorporated. All rights reserved.

VNTR locusa MIRU4 MIRU10 MIRU16 MIRU26 MIRU31 MIRU40 Mtub04 Mtub21 Mtub30 Mtub39 ETR-A ETR-C QUB-11b QUB-26 QUB-4156 MIRU2 MIRU20 MIRU23 MIRU24 MIRU27 MIRU39 ETR-B Mtub29 Mtub34 Mean Reference a

Diversity in Bulgarian strains No. of No. of repeats HGI alleles (range) 5 0-4 0.557 4 2-5 0.532 4 1-4 0.524 7 1-7 0.422 3 2-4 0.106 6 1-6 0.806 4 1-4 0.655 5 1-5 0.669 3 1-4 0.430 5 2-6 0.465 4 1-4 0.554 4 2-5 0.419 7 1-7 0.773 9 2-11 0.827 5 0-4 0.447 3 1-3 0.249 2 1-2 0.129 5 3-8 0.399 1 1 0 3 1-4 0.106 2 1-2 0.153 3 1-3 0.106 4 2-5 0.204 3 2-4 0.154 0.404

HGI, Global set

HGI, Japan, non-Beijing types

0.38 0.74 0.53 0.75 0.72 0.73 0.71 0.76 0.62 0.69 0.75 0.69 0.82 0.84 0.67 0.16 0.30 0.65 0.35 0.25 0.45 0.44 0.48 0.27 0.573 Supply et al., 2006

0.469 0.794 0.610 0.739 0.537 0.752 0.471 0.599 0.580 0.735 0.554 0.230 0.748 0.798 0.665 0.105 0.226 0.597 0.105 0.036 0.497 0.370 0.262 0.036 0.480 Iwamoto et al., 2007

In bold are 5 the most polymorphic loci in this study.

HGI, Japan, Beijing genotype 0.086 0.419 0.310 0.383 0.322 0.327 0.459 0.393 0.403 0.186 0.147 0.022 0.772 0.741 0.611 0 0.022 0.176 0 0.115 0.221 0.033 0.043 0.065 0.260 Iwamoto et al., 2007

HGI, China, Beijing genotype 0.120 0.144 0.068 0.353 0.169 0.194 0.306 0.556 0.068 0.171 0.232 0.094 0.651 0.518 0.395 0 0.014 0.014 0 0.014 0.119 0.014 0.119 0.014 0.181 Jiao et al., 2008

Copyright © 2009. Nova Science Publishers, Incorporated. All rights reserved.

86

Violeta Valcheva, Igor Mokrousov, Olga Narvskaya et al.

We further tested various combinations of VNTR loci in order to find one based on a reduced number of loci and close in discrimination to the 24-locus typing. The applied criteria were number of alleles and individual diversities of the loci assessed as HGI (not shown). Finally, the most obvious combination of the five most polymorphic loci (HGI>0.6) was shown to achieve a good discrimination although below that of the 15-locus scheme, but still higher than IS6110-RFLP typing (Table 2). In the present study, of the three methods used, not unexpectedly, spoligotyping showed the lowest discrimination. At the same time, it may be noted that similar to the German study (Oelemann et al., 2007), in our setting, spoligotyping albeit least discriminatory, contributed to the subdivision within two of five 24-locus VNTR clusters (clusters A and C in Figure 8). Contrarily, spoligotype ST41 makes the most apparent example of the slower evolution of the DR locus compared to the VNTR haplotypes or IS6110-RFLPs: the ST41 strains differ in 7 out of 24 loci although indeed they remained weakly related and located in the same part of the 24-VNTR dendrogram (Figure 8). An interesting finding of this study is that a “gold standard” IS6110-RFLP appeared even less variable marker than classical 12-locus MIRU scheme (Figure 6, Table 2). Most of the IS6110 clusters in the Figure 6 were completely or partially differentiated by use of 24-VNTR set. On the other hand, all three VNTR clusters included strains with identical RFLP profiles (not shown). A remarkable evolutionary stability of some IS6110-RFLP profiles is especially manifested in the ST125/ST4 cluster of strains, a largest cluster in this study (see cluster I in Figure 6). Perhaps, mapping of the IS6110 insertions in the genome in these strains would help to understand this intriguing situation. It may be also noted that addition of the VNTR typing to the IS6110-RFLP allowed for more precise tracing of the local clones at the city level. For example, an IS6110 cluster IV (spoligotype ST2905) was further subdivided by VNTR typing: three strains from Pleven remained identical and differed in three loci from a strain from Shumen (Fig. 6). Previous studies on 24-locus format showed a general congruence of IS6110 and 24-VNTR results while the latter was suggested to be overall more accurate for cluster analysis (Oelemann et al., 2007; Supply et al., 2006). In the Belgian study, of the 23 IS6110 RFLP clusters with high copy numbers, 20 were found to be completely identical by MIRU-VNTR typing. Of the three remaining IS6110 RFLP clusters, two were fully subdivided both by 4 to 7 MIRUVNTR loci (Allix-Beguec et al., 2008a). In this sense, our result of the superior discrimination achieved by the 24-locus VNTR scheme compared to IS6110 fingerprinting is not so surprising. Various sets of MIRU-VNTR loci demonstrated different levels of discriminatory power (Table 2). Only one locus MIRU-24 was monomorphic which is in agreement with previous observation that this locus is phylogenetically conserved and discriminates between large ancestral/modern M. tuberculosis lineages with/without TbD1 genome region (Sun et al., 2004). Compared to the IS6110-RFLP typing, a 12-locus MIRU scheme already showed a good discrimination but indeed the addition of the new VNTR loci, mainly those from the discriminatory 15-locus set, improved a discrimination by reducing the number of clusters and clustered isolates (Table 2). A further closer look at the individual diversities of the loci (Table 3) showed that loci found the most polymorphic in Bulgaria were also among the most

Mycobacterium tuberculosis in Bulgaria

87

Copyright © 2009. Nova Science Publishers, Incorporated. All rights reserved.

polymorphic loci in the global set of strains (Supply et al., 2006). At the same time, some globally variable loci were low-polymorphic in the Bulgarian collection, e.g., MIRU31.

Figure 8. The 24-VNTR-loci based dendrogram of M. tuberculosis strains from Bulgaria compared to their 43-signal spoligoprofiles. Spoligotype number and family were attributed based on comparison with global SITVIT2 database at Institut Pasteur de Guadeloupe.

Altogether and not unexpectedly, the mean per-locus diversity was higher for the global set of strains (Table 3). Comparison with available data from other published studies in Japan and China (Table 3) re-confirmed a strong phylogeographical structure of M. tuberculosis that appeared to have a direct impact on the observed diversity of the VNTR loci. Both China and Japan are dominated by the Beijing genotype strains (Iwamoto et al., 2007; Jiao et al., 2008; Millet et al., 2007; van Soolingen et al., 1995), a closely related clonal group of strains; this lead to low mean per-locus diversity as well as low diversity of the most VNTR loci

88

Violeta Valcheva, Igor Mokrousov, Olga Narvskaya et al.

Copyright © 2009. Nova Science Publishers, Incorporated. All rights reserved.

including those from the 15-locus “discriminatory” set (Table 3). A lower mean HGI for Beijing genotype samples (Japan and China) versus non-Beijing genotype samples (global set, Japan, Bulgaria) may result from a stronger clonality in the Beijing genotype strains compared to the much more diverse strains of other genotypes. On the other hand, a lower mean HGI in the Beijing genotype samples in China versus Japan may be explained by much smaller sample size in the Chinese study. Further, although non-Beijing strains are likely to be of the diverse origins, still these origins differ and depend on the area of isolation. Nevertheless, individual and mean HGI values in the Bulgarian collection were similar to the respective values for the non-Beijing sub-sample from Japan and the global collection (Table 3). This observation is made on the geographically very distant locations, such as, Bulgaria and Japan, and apparently it gives an additional support to the new 24-locus format of M. tuberculosis genotyping. Indeed, mean HGI value in non-Beijing sample from Japan is higher than in Bulgaria, especially due to the low-polymorphic (in our collection) loci MIRU31, MIRU39 and ETR-B. A general explanation, albeit speculatively, may lie in different levels of clonality and/or more/less recent dissemination of the strains in a survey area, or more diverse origins of the circulating strains. These factors may be additionally influenced by human population size and the level of urbanization. An increase of a number of the targeted VNTR loci is expected to result in an increased discrimination. Nevertheless, it also makes such multi-locus schemes rather time-consuming and expensive in the settings with relatively limited resources. It appears that a primary typing may be reasonably limited to a few loci if they still achieve a sufficiently high discrimination. As a population structure of the circulating M. tuberculosis strains vary across different world regions, these first-line typing schemes may be country-dependent and could include different loci. The five most polymorphic (in our study) loci used together allowed to achieve a HGI higher than that of IS6110-RFLP typing (Table 2). In this view, an apparent utility of the newly proposed 24-locus format (Supply et al., 2006) has been manifested by the fact that 4 of 5 loci of the “Bulgaria-specific” reduced set represented these new loci (Mtub04, Mtub21, QUB11b, QUB26) while only one locus was retained from the earlier 12locus scheme (MIRU40). Accordingly, this leads us to preliminarily suggest these five loci (Table 2) for use in the first-line typing of the M. tuberculosis strains in Bulgaria although further studies are undoubtedly required to test the proposed provisional scheme.

Figure 9. Visual presentation of the decision rules for definition of the LAM and S spoligotype families (Filliol et al., 2002) and their application for ST125 and related “ambiguous” spoligotypes.

24-VNTR format: Phylogenetic Utility

Mycobacterium tuberculosis in Bulgaria

89

On the basis of spoligotyping, the 133 strains were subdivided into 37 distinct spoligotypes (Table 1; Figure 2). Application of the published rules for definition of the major spoligotype clades (Brudey et al., 2006; Filliol et al., 2002) and comparison with SITVIT2 global database permitted us to assign most of the 133 strains to the known spoligotype families (Table 1). At the same time, spoligotypes ST4, ST125, and ST1280 were classified as LAM/S since the absence of spacers 21-24 and 33-36 is specific for LAM family whereas the absence of spacers 9-10 and 33-36 is specific for S family (Filliol et al., 2002) (Figure 9). We additionally used a recently proposed PCR approach to the definition of the LAM family (Marais et al., 2006) and found that ST4, ST125 and ST1280 strains did not harbor a LAM-specific IS6110 insertion (Figure 10).

Figure 10. LAM-specific PCR: (a) schematic view of the genome region harboring LAM-specific IS6110 insertion in strain F11. Three primers (arrows) are used in one reaction: LAM-F, LAM-R, XhoI. Non-LAM strain (without this IS6110-insertion): primers LAM-R and LAM-F amplify 141 bp fragment. LAM strain (with this IS6110 insertion): primers LAM-R and XhoI amplify 205 bp fragment (b) gel-electrophoresis. Lanes 1, 2, 4, 5 - LAM strain. M, molecular weights marker “100 bp DNA ladder” (Amersham Bioscience).

Copyright © 2009. Nova Science Publishers, Incorporated. All rights reserved.

A phylogenetic position of these strains (ST4, ST125 and ST1280) was further investigated in the light of the VNTR data. Interestingly, strains of these three spoligotypes were grouped closely in the 24-locus-based VNTR dendrogram and together with ST34 that is a prototype of the S family (a cluster marked by * in Figure 8). It appears that spoligotypes ST125, ST4, and ST1280 may indeed belong to the S family although further studies targeting VNTR loci in strains of these spoligotypes from other world regions are needed to clarify their phylogenetic clade position.

Molecular Basis of Drug Resistance The study collection included 37 drug-resistant and 96 susceptible M. tuberculosis strains (Table 4). A monoresistance was identified in 15 of 37 drug-resistant strains, a majority being limited to the RIF (7/15) and INH (5/15) monoresistance (Table 4). Sixteen strains (12.0%) were resistant to both RIF and INH and thus classified as multidrug resistant. It should be noted that in Bulgaria, a total of 1360 TB cases (42% of all new TB cases) were confirmed by culture in 2006; 1108 of them were subjected to DST and 24 (2.2%) of the DST-screened cultures were found to be multidrug resistant (Euro TB, 2008). A total of 22 MDR M. tuberculosis strains were identified in Bulgaria in 2005 (4.6% of all DST-screened cultures from all new TB cases) (Euro TB, 2007). The relative data for 2007 are not yet

Copyright © 2009. Nova Science Publishers, Incorporated. All rights reserved.

Table 4. Resistance patterns of 133 M. tuberculosis strains isolated in different regions of Bulgaria

Resistance type

MDR Polyresistant (non-MDR)

Monoresistant Fully susceptible a

Phenotypi c resistance profile a HR c HRE HRS RE HE ES R H E S

No. of isolates (%)

rpoB wild type

rpoB rpoB531 rpoB526 katG315 mutant TCG>TTG CAC>TAC wild type del wt5 b

inhA inhA embB306 katG315 -9 … -25 -15C>T wild type AGC>ACC (wild type)

embB306 mutant

9 (6.8) 5 (3.7) 3 (2.2) 3 (2.2) 1 (0.8) 1 (0.8) 7 (5.3) 5 (3.7) 1 (0.8) 2 (1.5)

3 1 1 1 1 1 1 5 1 2

6 3 2 2

4 3 2

4

96 (72.2)

96

4

1

2

5 2 1 3 1 1 7 3 1 2 96

2

6 3 3 2 1 1 6 5 1 2 96

1 2

1

6 1 3 1 1 1 7 5

1 2 96

One-letter abbreviations of drug resistance: H, INH; R, RIF; E, EMB, S, STR. Absence of hybridization with wild type probe #5, i.e., a mutation in rpoB codons 530-534 (Morcillo et al., 2002; Mokrousov et al., 2006). c No information on inhA and embB306 mutations was available for three HR strains. b

2

Copyright © 2009. Nova Science Publishers, Incorporated. All rights reserved.

Mycobacterium tuberculosis in Bulgaria

91

available. Nevertheless, the extrapolation of the published information (Euro TB, 2007, 2008) allows us to estimate the total number of MDR M. tuberculosis strains identified in Bulgaria between January 2005 – June 2007 (the survey period of this study) to be 58 strains. Consequently, regarding the representativeness of our study panel, we note that the studied sub-sample of the MDR isolates represents 29% (17/58) of the MDR M. tuberculosis cultures isolated from newly diagnosed TB patients in Bulgaria within the survey period. This study found a high specificity and sufficiently good sensitivity of the molecular methods to detect RIF and EMB-resistant strains; the results for INH resistance are more complex. Three types of the rpoB RRDR mutations were found in 20 of 27 RIF-resistant strains while rpoB S531L (TCG>TTG) was the most frequent (Table 4). The remaining 7 RIFresistant strains and all 106 RIF-susceptible strains had no mutation in the targeted rpoB hotspot region. Interestingly, 62.5% (10/16) of MDR strains were found to harbor a mutation in the rpoB hot-spot region (S531L). Sensitivity and specificity of the genotypic method to detect RIF-resistance were 74.1% and 100%, respectively. Regarding RIF resistance, the high rate of the rpoB S531L (TCG>TTG) mutation compared to very low rate of the other rpoB mutations found in this study is striking (Table 4). A similar situation was described, e.g., for Russia and Kazakhstan, but it was associated with a Beijing genotype (Mokrousov et al., 2003; Hillemann et al., 2005). In other studies, rpoB 531TTG allele was found in a similar rate of ~50% in the Beijing versus non-Beijing RIF-resistant strains from East Asia (Mokrousov et al., 2006; Qian et al., 2002), Taiwan (Jou et al., 2005), and Latvia (Tracevska et al., 2003), and even less represented in the Beijing genotype RIF-resistant strains from Korea (41% vs 66% [Park et al., 2005]). A variation in the prevalence of this rpoB S531L mutation among Beijing strains in different countries may reflect not only the increased capacity of the Beijing family strains to readily acquire the most frequently observed rpoB mutation but also some specific features of the National TB control programs in different countries (Balabanova et al., 2004; Samarina et al., 2007). In our study the Beijing genotype was not found in Bulgaria among the 133 clinical isolates studied, hence the current situation with MDR-TB in Bulgaria cannot be explained by global dissemination of the Beijing genotype that apparently has not yet reached this country. Summing up, whether the very high rate of rpoB S531L mutation is a surrogate marker of the failure of the national TB control program or is hypothetically linked to another molecular mechanism related to acquisition of the RIF resistance needs to be addressed in further investigations in different settings. A mutation in embB306 was found in 7 of 11 EMB-resistant strains. No such mutation was detected in EMB-susceptible strains (Table 4). Sensitivity and specificity of the genotypic method to detect EMB-resistance were 63.6% and 100%, respectively. The results on embB306 variation obtained in this and a recent German study (Plinke et al., 2006) are in line with earlier findings that correlated mutations in embB306 with EMB resistance (Sreevatsan et al., 1997; Ramaswamy et al., 2000; van Rie et al., 2001). They are in contradiction with more recently reported discrepancies between genotypic and phenotypic EMB resistance (Mokrousov et al., 2002b; Tracevska et al., 2004; Lee et al. 2004; Hazbon et al., 2005). A number of explanations of these contradictory findings have been proposed. Plinke et al. (2006) suggested that there is a small difference between the critical

Copyright © 2009. Nova Science Publishers, Incorporated. All rights reserved.

92

Violeta Valcheva, Igor Mokrousov, Olga Narvskaya et al.

concentration used for EMB susceptibility testing and the MIC, making susceptibility testing more problematic. Mokrousov et al. (2002b) hypothesized an unknown mechanism in MDR M. tuberculosis strains that leads to susceptibility to EMB. Hazbon et al. (2005) suggested that the clear association between mutations in embB306 and EMB resistance found in several earlier studies might be due to the use of pansusceptible strains as control groups. Regarding this latter point, our study in the Bulgarian setting found embB306 mutation in 7 strains of which 6 strains were resistant to more than one drug. Indeed, embB306 mutation was not found in fully susceptible or monoresistant strains (except for one EMBmonoresistant). However all EMB-susceptible MDR strains in this study had embB306 wild type allele. Accordingly, it appears that a hypothesis of Hazbon et al. (2005) about embB306 as a marker of multidrug resistance is not completely supported by our data. Molecular investigation of genetic basis of INH-resistance in M. tuberculosis strains in Bulgaria targeted two the most frequently reported mutations related to INH resistance, katG 315AGC>ACC and inhA -15C>T (Baker et al., 2005; Guo et al. 2006; Nikolayevskyy et al., 2007, and references therein). KatG S315T (AGC>ACC) mutation was detected in 10 (45%) of 22 INH-resistant isolates and in none of 110 INH-susceptible isolates (Table 4). Additional analysis of the inhA promoter region revealed four strains that harbored the inhA -15C>T mutation: one INH-resistant strain that also had katG315 mutation and 3 INH-susceptible strains. Of these latter, two strains were RIF- and EMB-resistant (rpoB531 and embB306 mutations), and one strain was RIF-resistant (rpoB531 mutation). Sensitivity and specificity of the genotypic method to detect INH-resistance were 45.5% and 99.1%, respectively. The global prevalence of the katG S315T substitution in INH-resistant strains highlights the selective advantage conferred by this mutation, which provides the optimal balance between decreased catalase activity and a sufficiently high level of peroxidase activity in KatG. Mutations in the inhA promoter region are thought to increase the InhA protein expression, thereby elevating the drug target levels and producing INH resistance by a drug titration mechanism (Mdluli et al., 1996). A large-scale study (Hazbon et al., 2006) showed that mutations in katG315 were significantly more common in MDR isolates while mutations in the inhA promoter were significantly more common in INH-monoresistant isolates. The prevalence of the katG315 AGC>ACC mutation among INH-resistant M. tuberculosis strains in the world varies but remains high, e.g., 47% in Finland (Marttila et al., 2008), 61% in China (Jiao et al., 2007), 64% in India (Nusrath Unissa et al., 2007), 71% in Vietnam (Caws et al., 2006), 92-94% in Russia (Mokrousov et al., 2002a; Voronina et al., 2004; Afanas’ev et al., 2007). Accordingly, it appears that katG S315T mutation alone can be used to reliably predict a high proportion of the INH-resistant strains in many world regions. For Bulgaria this is not the case. Only 10 of 22 INH-resistant strains would be detected genotypically through an analysis of the two targeted mutations and this result is a surprise. Furthermore, inhA mutation was found in only one INH-resistant strain that also harbored a katG315 mutation. Three other strains with inhA mutation were INH-susceptible. It has been suggested that inhA -15C>T mutation can be present by itself and is associated with a lowlevel INH resistance, 0.2 mg/l (Guo et al., 2006). Hazbon et al. (2006) even observed a strong negative association between mutations in katG315 and mutations in the inhA promoter region (p