An introduction to systems biology - design principles of biological circuits

Table of contents : 1......Page 1 ~WZC66F......Page 2

1,153 37 19MB

English Pages 162

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

An introduction to systems biology - design principles of biological circuits

Citation preview

Chapman & H;ill/CRC Math..:rmticaland Comput;1tion;1I BiologyScrit5

AN INTRODUCTION SYSTEMS BIOLOGY DESIGN PRINCIPLES OF ßlOLOGICAL CIRCUITS

TO

Published Titles

CHAPMAN & HALUCRC Mathematical and Computational B iology Series

Cancer Modeling and Simulation Luigi Prcziosi

Aitns and scope: T!1is :-.erics aim:-i.10 G1plurc ncw dcvclnpmcnt~ aml surnmJrize whnt is known over t!ic whole ,pcc1rum of molhcmatical and compulational biology and mcdicinc. ll secks to encourage lhe i111cgra1io11 of nrn1i1cma1ical,slatistical and computational mcthods into biology by publishing a brnad rangc of lcxthks, rcfercncc works and handlxl()ks, Tltc 1itlcs includcd in the scrics arc meant lo appcal tu studcnts, rcscarchcrx and professionals in thc mathematical, statislical and compulatinnal scicnccs, fundamental biology and bioenginecring, as weil as inlcrdisciplinary rcscarchcrs involvcd in lhe field, Thc inclusion of concretc cxamplcs and applications, and progrnmming tcchn~qucs imd c;,cnmplcs, )$ highly encouraged.

Series Editors

Computationai Biology: A Statistical Mechanics Perspective Ralf Blossey Computational Neuroscience: A Comprehensive Approach Jianfeng Feng Data Analysis Tools for DNA Microarrays Sorin Draghici Differential Equations and Mathcmatical Biology D.S. Jones and B.D. Sleeman

Alison M, Ethcridgc Dqmrtment of Sratistics

U11iw.·r,,i1y of O,jorJ l,ouls. J. Gross f),•Jmrlml'JJ! of J:'colo,:y anti Evolulimwry

Biofogy

Exactly Solvable Models ofBiological Invasion Sergei V. Petrovskii and Lian-Bai Li

Universil_\'of Tr_•m1e.\see Suzannc l .cnharl Dertu-lmf'rU uf Mathemafics tlniw•r.\·it~· ()/1c'm1t'SSK. Maini Mathnmtiical l1Mtiru1e

An Introduction to Systems Biology: Design Principles of Biological Circuits Uri Alon

Knowledge Discovery in Proteomics !gor Jurisica and Dennis Wigle

u,,;,,,niry ojO,:fr,rd Shoha Rnngnnn11!,rn Re,\(_'(11rhJn.\·tit111t·of JUotedmofo,:y P..foct./Uririe Unfrcrsf1y

Hcrshcl M, Safer

o/Science & Bio CmnputiJIR

Wci:mmm ltnfifwc

Bioinformatics

Modeling and Simulaliun of Capsules and Biological Cells C Pozrikidis

Normal Mode Analysis: lheory and Applications to Biological and Chemical Systems Qiang Cui and lvet Bahar Stochastic Modelling for Systems Biology

Ehcrhanl 0. Ynil 11u..• Wallm·c / I. Courer Departmen.t {~{llimneclic-al /~11>;irieerin~ Gnn~~ia Tt.•l"h11mlEmory Univer.üly

Darren

J.Wilkinson

1he Ten Most Wanted Solutions in Protein Bioinformatics Anna Tramontano Proposals for thc scrics shuukl be submillcd to onc of 1hc scrics editors abovc or Jirectly 10: CRC Press, Tnylor & Francis Group 24-25 Bladcs Court Dcodar RoaJ London SWl5 2NU

UK

Chapman & Hall/CRC Mathem:nicaland Comput;itional Biolo,b'Y Scrics

AN INTRODUCTION TO SYSTEMS BIOLOGY DESIGN PRINCIPLES OF ßlOLOGICAL CIRCUITS

URI ALON

Bau RC.R,,\l'l·IY

271

INl)IX

295

When I first rcad a biology textbook, it was like n:,iding a lhriller. Evcry pagc brought a new shock. As a phy.~icist, I was uscd to studying matter that obeys predse mathcmatical laws. But cclls are matter that dances. Structurcs spontancously asscmble, perform elaborate biochcmical functions, and vanish dfortlcssly when thcir work is done. Molecules encode and process information virtually without crrors, despite thc fact that they are under strong thermal noisc and embcddcd in a densc molccular soup. How could this be? Are thcrc spccial laws of nature that apply to biological systcms that can help us to under&tand why thcy are so different from nonliving matter? Wc yearn for laws of nature and simplifying principlcs, but biology is astoundingly complex. Every biochemical intcraction is exquisitcly craftcd, and cells contain nctworks of thousands of such intcractions. T11escnel works arc the result of cvolution, which works by making rnndom changcs and sclccting thc organisms tlwt survivc. 'fhercfore, the structures found by cvolution are, to somc degree, dependcnt on historical chance and arc laden with biochcmical detail that requires special description in evcry case. Despite this complexity, scicntists havc attempted to discern gcneralizablc principlcs throughout the history of biology. Tue search for these principlcs is ongoing and far from complete. lt is madc possible by advances in experimental technology that provide detailed and comprehcnsivc information about nctworks ofbiological intcractions. Such studies lcd to thc discovery that onc can, in fact, formulate gcncral laws that apply to biological nctworks. Bccausc it has cvolved to perform functions, biological circuitry is far from random or haphazard. lt has a defincd style, the strle of systcms that must function. Although evolution works by random tinkering, it convcrgcs again and again onto a defincd sct of circuit elcments that obcy gcncral design principles. "Ihe goal of this book is to highlight some of the design principlcs of biological systc111s,and to providc a mathematical framework in which these principles can be used to understand biological nctworks. T11ernain message is that biological systcms contain an inherent simplicity. Although cells evolvcd to function and did not evolve tobe comprehcnsible, simplifying principlcs makc biological dcsign undcrstandablc to us.

2

111

Cl IAl'HI{

1

'Jhis book is writtcn for studcnts who have had a basic coursc in mathcmatics. Spccialist tcrms and gcne namcs arc avoided, although detaikd dcscriplions of scvernl well-studied biological systcms arc prcscnted in ordcr to dcmonstratc kcy principles. This book prescnts onc path into systems biology based on mathematical principles, with lcss cmphasis on experimental tcdrnology. "lllc examples are those most fomiliar to the author. Othcr dircctions can bc found in thc sourccs listcd at thc end ofthis chapter, andin the extended bibliography at the·cnd ofthis book. 'll1e aim of the mathcmatical modcls in thc book is not to preciscly reproduce experimental data, but rat her to allow intuitive undcrstanding of general principles. 'fäis is the art of "toy modcls" in physics: the belief Lhat a fcw simple equations can capture some essence of a natural phenomenon. 'Jhe 1na1hcmatical dcscriptions in the book are thercforc simplified, so that cach can be solvcd on the blackboard or on a small picce of paper. Wc will sce that it can be vcry uscful to ask, "Why is Lhe system dcsigncd in such a way?" and to try to answcr with simplificd models. We concludc this introduction with an overview of the chaptcrs. 'lhe first part of the book dcals with transcription rt:gulation networks. Elements of nctworks and their dynamics are describcd. Wc will sec that thesc nctworks are made of rcpeating occurrcnccs of simple pattcrns calkd network motifs. Each nctwork motif performs a defined information proccssing function within thc network. These building block circuits wcre rcdiscovcrcd by evolution again and again in different systems. Nctwork motifs in othcr biological nctworks, including signal transduction and neuronal networks, are also discusscd. 111cmain point is that biological syslems show an inhercnt simplicity, by cmploying and combining a rather small sct of basic builtling-block drcuits, cach for specific computational tasks. The secnnd part of the book focuscs on the principle of robustncss: biological circuits are dcsigncd so that their essential function is insensitive to thc naturally occurring fluctuations in the componcnts of the circuit. Whereas many circuit dcsigns can pcrform a given function on paper, wc will see tlrnt vcry few can work robustly in thc cell. 'Ihese few robust circuit dcsigns are nongeneric and particular, and are often aesthctically pleasing. Wc will tise thc robustness principk to understand the dclailcd design of wcll-studicd systems, including bactcrial chcmotaxis and pattcrning in fruit fly dcvclopmcnt. Thc final chaptcrs dcscrihe how constraincd evolution.uy optimization can be uscd to understand optimal circuit design, and how kinctic proofrcading can minimize errors made in biological Information processing. ·1hcse features of biological systems, reuse of a small sei of nctwork motifs, robustncss to component tolcranccs, and constrained optimal design, are also found in a completely different contcxt: systems designcd by human engineers. Biological systcms have additional features in common with engincered systems, such as modularity and hicrnrchical design. 'lllesc similaritics hint at a dcepcr thcory thal can unify our understanding of cvolvcd and dcsigncd systems. 'Jhis is il for thc introduction. A glossary ofterms is providcd at thc end ofthe book, am\ some of thc solvcd cxcrciscs alter cach chaplcr provide morc dctail on topics not discusscd in thc main tcxt. I wish you cnjoyablc rcading.

INTRODUCTION

FURTHERREADING

~r-•~---~••



3

.... ••••----

Fall, C., Marland E., Wagner J.,and Tyson J. (2005). Computational Ce// Biology, Springer. Fell, D., (1996). Understanding thc Control of Mctabolism. Portland Press. Heinrich, R. and Schuster, S. (1996). Thc Regulation of Ce/lu/ar Systems. Kluwer Academic

Publishcrs. Klipp, E., Hcrwig, R., Kowald, A., Wierling, C., and Lehrach, H. (2005). Systems Biology in Practice: Concepts, Implementation and Application. Wilcy. Kriete, A. and Eils, R. (2005). Computalional Systems ßiology. Acadcmic Press. Palsson, B.O. (2006). Systems Iliology: Properties of Reconstrnctcd Nctworks. Cambridge University Press. Savageau,M.A. (1976).Biocl,emical Systems Analysis: A Study of Function and Design in Molecular Biology. Addison Weslcy.

CHAPTER

2

TranscriptionNetworks: BasicConcepts

2.1

JNTlmDUCTION

Tue cell is an integrated device made of several thousand types of interacting proteins. Each protein is a nanometer-size molccular machine that carries out a specific task with exquisite precision. For examplc, the micron-!ong bacterium Escherichia coli is a cell that contains a few million protcins, of about 4000 different typcs (typical numbers, lengths, and timescales can bc found in Table 2.1). Cells encountn different situations that rcquire different proteins. For example, when sugar is sensed, the cell bcgins to producc proteins that can transport the sugar into the cell and utilize it. Whcn damaged, the cell produces repair proteins. Tue cell therefore continuously monitors its environment and calculates the amount at which each type of protein is ncedcd. 1his information-proccssing fonction, which dctermines the rate of production of each protein, is largely carried out by transcription nctworks. The first few chapters in this book will discuss transcription nctworks. "lhe present chapter defines thc dements of transcription nctworks and examines their dynamics.

2.2

THE COCNITIVE PROBLEM OF THE CELL

Cells live in a complex environment .ind can sense many different signals, including physical pararneters such as ternperaturc and osmotic pressure, biological signaling molecules frorn othcr cclls, beneficial nutrients, and harmful chcmicals. Information about the internal state of the ccll, such as the level of key metabolites and internal damage (e.g., damage to DNA, membrane, or proteins), is also important. Cells rcspond to these signals by producing appropriate protcins that act upon the internal or external environment.

5

6



CIIAl'HR

TRANSCRIPTION

2

1.\1\11 •' .1 Typical Parameter Values for thc ßactcrial f. co/i Ccll, thc Singlc-CelleJ

Eukoryute

S,1echaromyces cerevisac (Ycast), and a Mamrnalian Cdl (Human Fibroblast) E. cvli

l'ropcrty

Yc;ist (S. ccrcvisae)

Marmnalian

(Human

Fibroblast)

µm'

Ccll volumc

-1

Proteins/ccll

-4 106

1\.1e~nsize of protcin · Sizc of gcnome

Si,.c of: Regulator binJing sitc

Gene

3

-10,000 pm'

-4 JO'

-4 10 1•

5 nm 4.6 10° bp

1.3 IO' hp

3 Iü' bp

4500 gcncs

6600 gcnes

-30,000 gcnes

-10 bp

-lO bp -1000 bp

-IObp

~1000 bp

-JO' to 10• bp (with

-1 nM

-1 pM

-0.1 pM

-0.1 sec

-10 sec

-100 sec

~10 msec

-0.1 sec

~l min

-30 min (induding

~ 100 hp -1000 hp

Promoter

~1000pm

IO'bp

-JO'to introns)

Conccntration

of ,rnc protcinlccll Diffusion time of protcin

D

acrnss cdl

Diffusion time of small molcrnle acrnss ccll Time to transcribe a gcne

~

IO µm'/sec

-1 msec, D ~ JOOOµm'/sec

-1 min 80 bp/scc

Time to translate a protcin

~2min

mRNA proccssin~)

-2min

-30 min (including

40 aa/scc 2-5 min

-10 min to over 1 h

-10 min to over (0 h

Cell gcncralton

-30 min (rich medium)

~2 h (rich medium)

20 h -

w scveral hours Ribosomes/ccll

-10•

Transitions bctwccn

J-100

BASIC CONCEPTS

II

7

To rcprcscnt thcse environmental statcs, thc cell uses spccial proteins called transcription factors as symbols. Transcription facturs are usually designcd tu transit rapidly between active and inactive molccular states, at a rate that is modulated by a spccific cnvironmcntal signal (input). Each activc transcription factor can bind the DNA to regulatc thc rate at which specific target genes arc read (Figurc 2.1). 'Die gcnes arc rcad (transcribcd} into mRNA, which is thcn translated into protcin, which can act on the cnvironment. 'föe activities of the transcriptiun factors in a ccll therefore can bc considcred an intcrnul rcprcsentation of the environment. For examplc, the bactcrium E. co/i has an intcrnal represcntation with about 300 dcgrecs of frcedom (transcription factors). 'Jhcse regulatc the rates of production of E. coli's 4000 proteins. The internal rcprescntation by a sei of transcription factors is a very compact dcscription of the myriad factors in the cnvironmcnt. lt scems that cvolutiun sclccted internal rcprcsentations that symbolize statcs that ,ire most important for ccll survival and gruwth. Many different situations are summarized by a particular transcription factor activity that significs "! am starving." Many othcr situations ~rc summarized by a different transcription factor activity that significs "My DNA is damaged." These transcription factors regulatc their target gencs to mohilize the appropriate protcin responses in cach casc.

23

UEMENTS OF HU\NSCRll'TIC )N N[I\'/ORKS

Tbc inlcraction bctwcen transcription factors and gcncs is dcscribcd by transcription networks. Let us begin by bricfly dcscrihing the elcments of thc nctwork: gcncs ,rnd transcription factors. Each gcnc is a streich of DNA whosc scquencc cncodcs thc inform,1tion

mRNA nuclcar export)

Typical mRNA lifetime time

Nl'TWORKS:

nondividing

SignalN)

to scvera l hours -10

7

-10• Environment

1-100 )!Sec

1-100 µsec

~ 1 111:Sl.'C

~1 sec

-1 sec

(1 µM aflinity)

(l nM ctflinity)

)ISCC



protcin statcs.

l\

(activc/inacth•c)

Timescale for equilibrium binding of

(l

Tri\n'.'icri[)tion

nM aßinity)

factors

srnall molecule to protein (diffusion

Genes

limitcd)

-1 sec

Timcscale of transcription

Gcru: I

foctor

binding to DNA site Mutation rate

bp: basc-pa,r (DNA Jener).

-JO·'

-10-10

/bplgencration

/bp/gcncration

-IO '/bp/ycar

·n,e

Ger1e 2

Gene :,

Gene 4

Gl:'ne 5

Gene 6 ... Gene k

FICl llü ~-1 mapping bctwcen cnvirnnrncntal signals, transcription factors insidc the ccll, and the gencs that they rcgul,1tc. lhc cnvironmcntal signals acliv.atc spccifi.c transuiption factor protcins. The transcription foctors, whcn activt•t bind DNA to ch:1nge 1he 1ranscription rntc of spe-cifictargct gc-nes1 the r;-i.trat which mRNA is produccd. 'Jhe mRNA is thcn translatcd into prntcin. l lencc, lr,11iscripliort factors regulate the rate at \-\.•hich tln: protcins cncmkd by the gcnl's ;.ll"t" producrd. 'lhcse prolcins .:iffcct thc cnvironmcnt (intrrnal aml L'xternal). Smn-c protdns. are thrmsclvt·.s trJnscription factors that can :1ctiv~Hcor rC'press olher

gencs.

O

a..

l.

r 1/\I'

1 I_ I\

TRANSCRIPTION

L

DNA

'Ihc ralc at which the gcne is trnnscribcd, the numbcr of mRNA produccd per unit time, is controllcd by the promoter, a regulatory region of DNA !hat prcccdcs the genc (l'igurc 2.2,1). RNAp binds a defined sitc (a spccific DNA sequencc) at the promotcr (Figure 2.2a). Thc quality ofthis site specifies the transcription rate ofthe gcne. 1

,..~C Gene Y

,,• mRNA

Tran~lalton

1

~

RNApolymerase

---..~ r

~

Tran.scriptior,

----ll!::ül>--------~-

GcncY

(a)

®

Activ,uor

~ t

Gene Y

X bindlng ~ite

G:) r,;-..,

Transcription factor proteins are themselvcs cncoded by genes, which are rcgulated by othcr transcription factors, which in turn may be regulatecl by yct other transcription factors, and so on. This set of interactions forms a transcription network (Figure 2.3). Tue transcription network describes all of lhc regulatory transcription interactions in a cell (or at least thosc that arc known). In the network, the nodcs are gcnes and edges represent transcriptional rcgulation of onc genc by thc prolein product of another gcnc. A directcd cdgc X-· ► Y mcans that the product of genc X is a transcription factor protcin that binds thc promotcr of gcnc Y to conlrol thc rate at which gcnc Y is transcribcd.

1 'Jhl'"~cque-nccof thc sitc ,ktC'rmln~s the chcmk~ 1 i!fünit y of RNAp to the sitc-. 2 \Vh-:.-nRNAp bind.,; 1he prnmtJl-n, it c;\11tr.an!i-Hintu an tl)i'cn co11formation. Once RNJ\p is in an opcn conformati011,it i11iti1itl'str.ansrrlptioi1: RNAp races down th1: DNA and transcribcs one rnRNA at a rate oftcns of DNA lct• tcrs (base• pa i r.s) pa se-co11d{Tillik 2.l). Tr,1nscriptioll fndor.o; ;ifft•ct t hi: probabiHty p2.3.4

1.2

1.4

1.6

1.8

2

x·,K

Reprl'5SUrCOllCl'ntration, (b)

1l(;LJl{E 2.4 (a) Input functions for activator X describcd by Hill functions with Hili coefficient n = l, 2, and 4. Promoter activity is plotted as a function of the concentration ofX in its active form (X•). Also shown is a step function, also called a logic input function. Tue maximal promoter activity is p,and K is the thrcshold for activatian of a targct gene (thc concentrntion af X' nccded for 50% maximal activalion). (b) Input functions for rcprcssor X describ-ed by Hili functions with Hili coefficient n = 1, 2, and 4. Also shown is the corrcsponding logic input function (step function). 11iemaximal unrcpressed prnmotcr act\vlty is p,and K is the thrcshold for rcprcssion ofa target gene (the conccntration ofX• ncedcd for 50% maximal repression).

Logic Input Functions: A Simple Framcwork for Undcrstanding Network Dyn.imics

Hili input functions are useful for detailed models. For mathematical clarity, howcver, it is oftcn useful to use even simpler functions that capture the essential behavior of thcsc input functions. 'Thc cssence of inpnt functions is transition between low and high values, with a characteristic threshold K. In the coming chapters, we will oftcn approximate input functions in transcription networks using the logic approximation (Figure 2.4) (Glass and Kauffman, 1973; 'Thieffry and 'Jhomas, 1998). In this approximation, thc gcne is eithcr OFF, f(X') = 0, or maximally ON, f(X') = ß.lhe threshold for activation is K. Hence, logic input funclions arc step-like approximations for the smoothcr Hill functions. For activators, the logic input function can he described using a step-function 0 that makes a stcp when x• excecds the threshold K: f(X•) ~

ßß(X* > K)

logic approximationforactivator

(2.3.4)

where 0 is equal to 0 or 1 according to the logic statcmcnt in the parenthcscs. 'Thc logic approximation is equivalent to a vcry stccp llill function with Hili cocfficicnt n -h 0 (Figurc 2.4a).



16

CHAl'HR

2

TRANSCRIPTION

NETWORKS:

llASIC CONCEPTS



17

Similarly, for rcpressors, a decreasing step function is appropriate: ~

f(X') "'

ß 6(X'

logic approximationfor repressor

< K)

(2.3.5)

Wc will sce in the next chapters that by using a logic input function, dynamic equations bccome easy to solve graphically.

Ll.1

,\'\ulti-l)i111c11~i ~) ~

x· OR Y*

0.8

0.6 0,4

.!:;-

0.2

0.1

lO

0

cAMP(mM) (a)

lOU

;:-

0.8

:~ l.5

.. !

0,6

"

~

0.4

0.5'

0

0.2

Oll

0 __,

10 0.1

(2.3.6)

For othcr gencs, binding of cithcr activator is sufficient. 'Ihis resembles an OR gate: f(X', Y*)

l.S

10

Oftcn, multi-dimensional input functions can be usefully approximated by logic functions, just as in the casc of singlc-input functions. Por examplc, consider genes regulated by two activators. Many genes require binding of both activator protcins to the promoter in order to show significant cxprcssion. This is similar to an AND gate:

= ß 8 (X' > K,)

.

Ccncs with Sevcr;il lnpuh

We just saw how Hili functions and logic functions can describe input from a single transcription factor. Many gcnes, howcver, are rcgulated by multiple transcription factors. In othcr words, many nodes in thc nctwork have two or more incoming edges. 'Ibcir promotcr activity is thus a multi-dimensional input function ofthc different input transniption factors (Yuh et al., 1998; Pilpd et al., 2001; Buchler et al., 2003; Setty et al., 2003). Appendix B describes how input functions can be modelcd by cquilibrium binding of multiple transcription factors to the promoter.

f(X\ Y*)

:~

lO

0

cAMP(mM) (h)

(2 3 7)

Not all gcncs havc ßoolcan-like input functions. For example, some gcncs display a SUM input function, in which the inpuls are additive (Kalir and Alon, 2001): f(X', Y*) = ß,x• + ßr r

(2.3.8) 10

Other functions are also possibk. For cxampk, a fonction with s_cvcral platcaus and thrcsholds was found in the /ac system of E. coli (Figure 2.5) (See color insert following page 112). Genes in multi-cellular organisms often display input functions that can calculatc daborate functions of a dozen or more inputs (Yuh et al., 1998; Davidson et al., 2002; ßeer and Tavazoie, 2004). Thc functional form of input functions can be readily changed by rneans of mutations in the promoter of the regulated gcnc. for example, the lac input function of Figurc 2.5 can be changcd to rcscmble pure AND or OR gatcs,,,with a few mutations in the lac promoter (Mayo et al., 2006). lt appcars that the prccisc form of the input function of each genc is under sclcction prcssure during cvolution.

0.01

0.1

IQ

cAMl'(mM}

(c)

FIGURF. 2.5 (Sec color inscrt following pagc 112) Two-di,11.-nsional input fun,tions. (a) Input function mcasurcd in the lac promotcr of E. coli, as a function oftwo input signals, thc induccrs cAMP and IPTG. (b) An AND-like input function, which shows high promote, activi1y only if both inpuls are prcsmt. (c) An OR-like input fonction that shows high promotcr activity if cllhcr input is prcscnt. (rrom Sclty et al., 2003.)

111



CIIAl'TfR

TRANSC:RIPTION

l

in accumulation

.!. U, lnlcrin1 Summary Transcription networks dcscribe thc transcription rcgulation of gencs. Each node reprcscnts a gcnc. 1 Edges dcnotcd X-► Y mcan that gene X encodes for a transcription factor protein that binds the promoter of gcne Y and modulates its rate of transcription. 11ms, thc protcin encodcd by gcne X changes the rate of production of thc protcin encoded by gene Y. Protein Y, in turn, might be a transcription factor that changes the rate of production ofZ, and so Oii, forming an intcraction nctwork. Most nodes in thc nctwork stand for gcncs that cncodc proteins that arc not transcription fiKtors. These proteins carry out thc various functions of thc cell. lhc inputs to thc nctwork are signals !hat carry information and change the aclivity of spccific trnnscription foctors.

from thc cnvironment

·1he active transcription factors bind spccific DNA sitcs in thc promotcrs oftheir targct gcncs to control thc rate of transcriplion. lllis is quantitatively describcd by input functions: the rate of production of genc product Y is a function of the concentration of active transcription factor X*. Genes regulated by multiple transcription factors have multi-dimensional input functions. 'The input functions are often rather sharp and can be approximatcd by Hili functions or logic gatcs.

dcnote

Tue production

DYNAMICS ANI) RESl1 ONSE TIME OF SIMPLE Ci NI: Rl:GULATION

Let us focus on the dynamics of a singlc cdgc in the network. Considcr a gene that is rcgulatcd by a singlc rcgulator, with no additional inputs (or with all other inputs and post-trnnscriptional modes of regulation hcld constant over timcl). This transcription intcraction is dcscribed in the network by

X -➔ Y which reads "transcription factor X rcgulatcs gcnc Y." ünce X bccmrn:s activatcd by a signGl, Y conccntration begins to changc. Lei us calculate the dynamks ofthe concentration of thc gcne product, the protein Y, and its rcsponse time. ln the abscnce of its input signal, X is inactivc and Y is not produced (Figure 2.2b). Whcn the signal S, appears, X rapidly transits to it& active form x• and binds the promoter of gene Y. Gene Y bcgins tobe transcribcd, and the mRNA is translatcd, resulting In bactL•ria, t"acb nodl.' rq)rt•sc.·nts an operon; I>:> >.I>~I>.~~

lüül'

NETWORK

MOTIF

= E/N 2

4.2

TH[ NLJMllEROF /\1'1,rARANCFSOr /\ SUBCl~/\l'H IN RANDOM NETWORKS

In the previous chapter wc discusscd the simplest nctwork motif, self-regulation, a pattern Lhat had one node. Let us now consider !arger patterns of nodes and edges. Such patterns are also called subgraphs.Two examples of three-node subgraphs are shown in Figurc 4.la: the three-node feedback loop and thc three-nodc fccd-forwanl Ioop. In total there are 13 possible ways to connect three nodes with directcd edges, shown in Figure 4.16. 'Ihere are 199 possible directed four-node subgrnphs (Figurc 5.5), 9364 fivc-nodc subgraphs, etc. To find which of these subgraphs are significant, we necd to compare the subgraphs in the real network to thosc in randomized networks. TI1e rest of this section is for readers intcrested in mathematical analysis of random net works. Other readers can safcly skip to Scction 4.3. We begin by calculating thc number of timcs that a givcn subgraph G appears in a random Erdos-Renyi (ER) nctwork (ER networks werc defincd in Section 3.2). The subgraph G that wc arc interested in has n nodes and g edges. TI1c fced-forward loop, for example, has n = 3 nodes and g = 3 edges (Figure 4.la). Othcr three-nodc pattcrns have bctwcen two and six edges (Figure 4.lb). Recall that in the ER randorn network model, E edges arc placed randomly betwecn N nodes (Scction 3.2). Since there are N) possiblc places to pul a directcd cdgc (Eqt1alion 3.2.1), thc probability of an edgc in a givcn direction hetween a given p.iir of nodcs is:

43

(4.2.1)

lt is important to note that most biological networks are sparse, which is lo say thal only a tiny fraction of thc possible edges actnally occur. Sparse nctworks are defincd by p « 1. For example, in the Escherichiu coli nctwork we use as an example, therc are about 400 nodes am! 500 cdges, so that p ~ 0.002. One reason !hat biological networks are sparse is that each intcraction in the network is selectcd by evolution against mutations that would rapidly abolish thc interaction. Tims, only useful interactions arc maintained. Wc want to calculate thc mean numbcr of times that subgraph G occurs in the random network. To generate an instance of subgraph G in the random network, we need to choose n nodes and place g cdgcs in the proper placcs. "Jhus, thc avcragc numbcr of occurrences of subgraph G in the network, denoted ,is approximatcly rqual to the number of ways of choosing a set of n nodes out of N: about N° for !arge networks (because thcre arc N ways of choosing !he first node, times N - 1 ~ N ways of choosing thc sccond node, etc.), multiplied by the probability to get the g cdgcs in the appropriate placcs (each wit h probahil ity p):

(b)

IIC! mr ·1.1 (a) The fee..1hc dcpcnJcnce of on nelwork sizc N is dcscribcd by a scaling rclation. This scaling rdation dcscribcs thc way that thc numhcr of subgraphs in Equation 4.2.4 dcpcnds on thc sizc of thc nctwork (ignoring prcfactors): (4.2.5) Thc scaling rch1tion teils us that thc scaling of subgraph numbcrs in ER ncl works dcpcnds

/:.rn/i

42

ER n,ndorn nct~

1.7:tU

(Z=31l

0./i ± 0.ll

l)cgrcc-prcscrving rnndnm ncls

7 :t 5

(Z =7)

0.2 ;-_{)_(,

{)

Thc 1mrarnctL·rZ is rht.~11un1hcr,,fslamkm.l ,dc\·ia1i,,11s1lu,11111: rcnl nc1w,,rk c;,;1.•ccds thr: n111dumi1cdIIL"tW111"ks. /\n ;ügorithm callcJ Mlindcr. whid1 Jtcncrnll',;, r,md(llnized n~·l\\.'l1rh, l."ou111s sub~n,1,h!., m1i1.I erl-

[l

FIClJ!ff •1.l lhc cight sign combi»ations (typcs) of frcd-forwnrd loops. Arrows dcnote activalion and sy111bolsdcnotc repression,

--l

Each of the threc edges in the fFL can correspond to activation (plus sign) or repression (minus sign). 'Ihere are thereforc 23 = 8 possible typcs of FFL, (Figure 4.3). 11(II 11(1-1 .1 Fccd-forward loops in 1he li. coli lranscription nelwork. Black nodes particip:,te in rl'Ls.

'lhe massive overabundance

of feed-forward loops raiscs the question: Why are thcy

selcctcd against randomizing forces? Do they perform a function that confers an advantagc to the organism' To address this question, !et us now analyze the struclurc and function of the fced-forward loop network molif

4.4

Tl IE SH~UCTUl·U:OI TJ-fEIT[l)-FORWAIW LOOI' C,INI Cll{CUIT

The feed-forward loop is composed of transcription factor X that regulates a second transcription factor, Y, and both X and Y rcgulate gene Z (Figure 4.la). Thus, the feed-forward loop has two parallel rcgulation paths, a direct path from X lo Z and an indirect path that gocs through Y. 'Ihe dircct path consists of a single edgt>,and the indirect path is a cascade of two edges. mH_goingedgcs than 1he averagc nodc; thcsc are global rcgulators lhat rcgufa1c many gcncs =n responsc to kcy cnvironmcntal stimu1i. To includc this prupcfly in the random network model, one can cmnparc the rcul network to rn11domnctwt>rks thnt not only prescrvc lhc total numher of nodes N and edgcs E, b111also prescrvc the number of iJ1cominganti ou1goi11gcdges fur eod1 node Ln Lhenetwork. Desplte thc fact lhat the degrcc ~cqucnce is thc s.arm:, thc idcntity of which trnn'.-.-criptionfa.ctor rcgulatcs whlCh genc is random?:,,ed.These .(hgure 4.5). In somc syslcms lhc s1gnals arc molcculcs thut directly bind the trnnscription factors, andin othcr systcms thc sigr1,tls arc modifications of thc transcriplion factor causcd by signai trnnsduction pathways aclivated by the external stimuli. 'Ihe dlcct of thc signals, which carry information from thc cxtcrnal world, usually opcrates on a much fastcr timescale than the transcriptional intcractions in the 1:1:1..Whcn S, appcars, transcription factor X rap, idly becomes aclive, X•, binds to specilic DNA sitcs in the promotcrs of genes y and Z in a manner of seconds, and changcs thc transcription rate so that thc conccntration of thc protein Z changes on the timescale of minutes to hours. Wc will next K, 1)

= ß,0 (X" > K,,) 0 (Y' > ~,)

(4.5.3)

1l1us, thc Cl-FFL gcnc circuit has three activation thresholds (numbers on the arrows in Figure 4.6). In the case of strong step-like stimulation, X* rapidly crosscs thc two thresholds' K,y and K.,. 'Jhe delay in the production of Z is due to the time it takes Y' to accumulate and cross its threshold ~,- Only after Y' crosses the threshold can Z production proceed at rate ß,.Tue dynamics of Z are governcd by a degradation/di!ution term and a production term with an AND input function:

z

(4.5A)

Wc now have the cquations needcd tu analy1.c the dynaniics of the CJ-FFL. We next analF~ its can be found using Equat1on 4.6.2:

Y*(Tor;l = Y., (l - e .,1.,,.,)=

K,.,

(4.6.'I)

0).

.J

:).L

,_

\ .. i l/\/-

1

l.t'\

1 t-11:

--t

1 Lf"\J~I

\.Jr\VVf''d\.\J

1 \./\.)1

l~I

I

n•\Jf\L\.

r'dl._/1

LI

.) ,I

'Jbis equ,1lion can bc solvcd for T0 :,,, yidding: •t.:,

ToN= 1/a)log jl/(1 -1½/YJJ This cquation dcscrlbcs how thc duration of thc dday dcpends 011 tlic biochcmical parametcrs of thc protcin Y (Figurc 4.8a). 1hcsc paramctcrs arc thc lifctimc of thc prnldn, 11sitiwrlPl,1y i11tlw i\rabinoSt' Systern of F. nJ/i

Our discussion of thc function of the FFL has dealt with this gene circuit in isolation. In reality, this network motif is always embcdded within a network of additional intcractions. lt is therefore crucial to pcrforrn cxperiments on the Ff-L within living cclls, to see whcther it actually pcrforms thc cxpcctcd dynamical fonctions. Experiments havc Jcrnonstrated that sign-scnsitivc dc!ays arc carricd out by the ClFFL in living cclls. For cxample, dynamic behavior of an l'FL was expedmcntally studicd in a well-characterizcd gene system in E. coli, the system that allows the cells to grow on the sugar arabinosc. "lhe arabinose system consists of proteins that transport thc sugar arabinosc into thc cell and break it down for use as an energy and carbon source. Arabinose is only used by the cells when the sugar glucose is not prescnl, because glucose is a superior energy source and is used in preference to most other sugars. Thus, the arabinose system nceds to make a decision bascd on two inputs: the sugars arabinose and glucose. 'lhe proteins in this systern are only made whcn lhe fnllowing condition is met by the sugars in thc environmenl of the cell: arabinosc AND NOTglucose. 'Jhe absence of glucose is symbolized within the cell by the production of a small rnolecul(' callcd cAMP. To rnake its dccision, the arabinose systcm has two transcription achvators, one called CRI' that scnscs cAMP, and thc othcr called araC that scnses arabi-

~

"'

0.'I

0.2

0.7.

0.1

/ac2l'A

~

0 ... ___________

10

20

30

10

10

20

_,

30 T(n1in)

T(min) (h)

flCUls,·) 1

·

0.9 0.8

0.7 [Hi

;; 0.7

,:j 0.5

N

No.6 0.5

0.4

"..___~----,,--,,,,_,~~ 40 60 80 100 120 140 160 Tomo(mln)

0 ·3 o

20

40

60 90 100 1?0 Tirno (min)

l'ICURI:4.IO (a) thc CI-FFL wilh OR logic in thc flagcllasystcm of l:.,oli.111c output gcncs, such asfliLMNOl'QR, makc up thc llagclla mutor. Tue inpul signals S, arc cnvironmcntal factors such as glucosc limitation, osmotk prcssurc, and tcmpcralurc that atfoct thc promotcr ofthc activalor i'lhDC. 'lhe input slgnal Sr to the sccond activator, FliA, is a check poinl that is triggered whcn thc first motors are completcd (a protcin inhibitnr of FHA callcd FlgM is cxportcd through the motors out of thc cclls). (b) Experiments on thc pron1otcr activily of thc output gcncs, mcasured by means of a grccn-fluorcsccnl protcin cxprcssed as a rcpurler from thc tlil. promotcr, aftcr an ON stcp of S,. (~) Promoter dynamics aftcr an 01'1' stcp ofS,, in thc prcscnce ofs,.'Ihc rcsulls are shown for the wild-type bactcrium, and for a bactcrium in which thc genc for FliA was delctcd from thc gcnomc. 1he Fl'l. gcncratcs a dclay aftcr an OH stcp of S,. (From Kalir cl al., 2005.)

tuned over evolutionary timcscalcs by varying the biochemical paramctcrs of rcgulator protcin Y,such as its lifetimc, maximal lcvcl, and activation thrcshold.

4.7

Wc havc sccn timt of the 13 possiblc thrcc-nodc pattcrns, only onc is a signilicanl nctwork motif in sensory transcriplion nclworks timt need to rcspond to external stimuli. This nctwork motif is thc fccd-forward loop. Thc FFL has eight possiblc typcs, cach corresponding to a spccific combination of positive and negative regulations. Two of the PF!. typcs arc far more common than others in transcription nctworks. Thc most common form, callcd cuhcrcnt type-1 FFI., is a sign-scnsitive dclay clcment that can protect against unwanted rcsponscs to fluctuating inputs. TI1c magnitucle of the dclay in lhe FFL can be

\lflht' systc11,on ;1linu..·sc,tl-1.• of"st·.:u11. d · ,dL 1 pos1t1vc autorcgul r 1 ) production of thc tnnscriplio c. ' . . a 101 oop acls to cnhancc thc ' n ,actor oncc lt 15 prcsc 11l · • ff • _ 111 su lc,cnl 1cvels, 1his furthcr st.1bilizcs the ON stc·i ly ·t 1 . f 1 · 'l s a es o t ic transcription foctors. 1hc b1-stabk nature of thesc motifa allows ccll · . • . , , ,· .. assumc difkrcnt fates in wh· 1 . ··1· . s to m,tkc 1rrcvcrs1blc dcc,s1ons and · 1c 1 spcc1 lC scts oJ gencs ·ir, , , .. , 1 . (Dcmongeol et al., 2000):' · ' c cxprcsscl anu othcrs ,trc silcnt ~A~·l,\.s::;l::xan1~;;:;:~1blc-n.egJtivc fccdb.ack Joo > ·1 , • ~11~1pl1h-e.:d ~ll!_si:.:ription. 'lhr ph;1gic.: is (oin 1oscd of~ • 1p~;.1rs._i1i J~f'."li{'larnbda, a virus th,,1 infcct.;; 1:. (O/i. Hcre th~ ph.1ge t:t111m1n·t into th-e..· h,1c1crlum. 'lhi~ ha TC h ~m.Hc1nco1H,t11_1cr,tlrnt hm1~c.sa short ilNA gcnomc, whkh ~ nr1d y (callcd cro and Cl), lo co11trol tl1,• cystinc > isolcucine > melhionine.

A comparison

of the modc of control of inducible systems that degrade nutrients is shown in 'fable 11.1.For examplc, the sugar galactose is sddom present at high conccntrntions in thc cnvironment of E. coli, which corresponds to low dcmand for the g,1lactosc gcncs that dcgrade and utilize this sugar. According to the deman!

< lli\l'lf'R

1:'

thcir func!ions CVl'l1 in the prl'Scncl' of additional interactiom. ·n1is propcrty is dul.' lo the par! irnlar ways thal thc motil:~ an: wircd togetln:r. 111many sysh:ms, nctwork molifs appcar to hc cminccll'd to cach other in w.1ys lhal do nol spoil thc indcpcndcnl functionality of cach motif, allnwing us to undcrstand thc nctwnrk, al least parlially, based on the fonclions individual motifa. Simple cxamples includc thc way timt lhrec-nodc FFI.s are co1111ec1cJlo cach othcr to forma mul!i-011tput FFL. 'Ihis patlcrn prcscrves thc functionality of cach thrcc-nodc FFL (siiin-sensilivc filtl·ring, clc). In addilion, thc multi-oulput FFI. can gcncralc rat her elaboratc pmgrams of cxprcssion timing hetwccn outpul gcncs, as wc saw in Chapter 5. As a rcsult ofthc way thc motifs arc embeddcd into thc nclwork, lhcy can, at least in many cascs, be considcrcd as elcmcnlary circuit clcmcnts.

l:1'11.0GUr:

a

Fixed goal evolution X

Y

SIMl'I

ICITY

IN BIOLOCY



2H

b 7

[X XDR Y) AND (Z XOR W)

or

In addition lu lhe si111pleways in whkh motifs an: wircJ togcthcr, motifs can ncl as de„ mcutary circuit clcmcnts duc to thc scparalion of timescales of diffcrenl inlcnictions. 'Jhc strong scparation of limescalcs betwcrn different biological proccsscs is a general princi• pk that is found in virtually all of thc nctworks in thc ccll. lt allows us to undcrstand thc dynamics on the slow timescalc by using sleady-state approximations for the interactions on fast timcscalcs. For cxample, trnnscriptional motifs that carry out their computations on a slow tirrn:scalc of minutes 10 hours can bc undcrstoml, al least schcmatically, as if thcy actcd in isolalion, dcspite thL' fact that thcy arc cmbeddcd in additional fcedback loops on the lcvcl of prntcin-prntein intcractions that function on the timcsc.ilc of seconds. In short, biological nctworks can bc undcrs\ood, to a first approximation, in tcrms of a rat her limitcd set of recurring circuit pattcrns, each currying out con1putations on a different limcscale.

In addition to thc reust• of nctwork motifs, hio\ogirnl nctworks havc an additional important structural fealurc: modularity (!Jarlwell et al., 1999; lhmcls et al., 2002; Scgal et al., 2003; Wolf aml Arkin, 2003; Schlosser and Wagner, 2004). Most biological functions arc carricd out by spccific groups of gcncs ,rnd proteins, so tlrnt 1•nla111>i11g rq;lllons (IJOl(s), 7S, 76, R8, 90, 1.\-1

lra11:,cripttm1 ndworks,

37, lJ1197,

r l;illl·Ullt,

llil11lioll, J'J, 211, .H, 51, 61

113

:'it't..' Dqtrn·-p1T;;1.•rvl11gr.tndom

l )11;111ra n~lTipl icm la,;:tors. l 2

nclworks

drn1h)..__. lll'g,1lln\

l 18

IT\'NSl',

Hrsten·si,,

117

lncohcrrnl

l'FL, 47, 57, 65

116, 2.17

rncla~1lwr, 238 in, .2JH. 2J9

lndividuality,

69

99, 1()0, 101

as (oinciiicn(C dc,~dors,

l 25, l27

t.'\.olution uf,

r,8,69

Jnver.sc ccology~ 207 !sing n1odels, 247

optimal cxpression lcvd of prolcin undcr

constant condition.s-1 l 94-2tJ0

cohcm1t, ,17, 19, 67, '!8 -99, 125, 204

t'l"r()t'

lnl