Structure and dynamics of the ‘protein folding code’ inferred

更新时间:2023-05-29 18:33:01 阅读量: 实用文档 文档下载

说明:文章内容仅供预览,部分内容可能不全。下载后的文档,内容与下面显示的完全一致。下载之前请确认下面内容是否您想要的,是否完整无缺。

图片特写:“争夺”图书馆 | 依法治校

BioSystems103 (2011) 18–26

Contents lists available at ScienceDirect

BioSystems

j o u r n a l h o m e p a g e:w w w.e l s e v i e r.c o m/l o c a t e/b i o s y s t e m

s

Structure and dynamics of the‘protein folding code’inferred using Tlusty’s topological rate distortion approach

Rodrick Wallace

Division of Epidemiology,The New York State Psychiatric Institute,Box47,1051Riverside Dr.,New York,NY10032,United States

a r t i c l e i n f o

Article history:

Received9July2010

Received in revised form31August2010 Accepted11September2010

Keywords:

Amyloid

Catalysis

Groupoid

Information

Theory

Prion

Rate distortion

Symmetry a b s t r a c t

Tlusty’s topological rate distortion analysis of the genetic code is applied to protein symmetries and protein folding rates.Unlike the genetic case,numerous thermodynamically accessible‘protein folding codes’can be identi?ed from empirical classi?cations.Folding rates follow from a topologically driven rate distortion argument,a model that can,in principle,be extended to intrinsically disordered proteins.The elaborate cellular regulatory machinery of the endoplasmic reticulum and heat shock proteins is needed to prevent transition between the various thermodynamically‘natural’sets of hydrophobic-core protein conformations,and its corrosion by aging would account for the subsequent onset of many protein folding disorders.These results imply markedly different evolutionary trajectories for the genetic and protein folding codes,and suggest that the‘protein folding code’is really a complicated composite,distributed across protein production and a cellular,or higher,regulatory apparatus acting as a canalizing catalyst that drives the system to converge on particular transitive components within a signi?cantly larger‘protein folding groupoid’.

© 2010 Elsevier Ireland Ltd. All rights reserved.

1.Introduction

It is obvious,from the great spectrum of protein folding dis-orders,that the amyloid?bril and other,less well-characterized geometric forms,must compete thermodynamically and kinetically with three-dimensional globular,and unfolded monomeric states. This suggests,we will show,the existence of numerous underly-ing‘protein folding codes’whose ultimate structures are,perhaps, a somewhat debatable matter of formal taxonomy.Fig.1,from Hartl and Hayer-Hartl(2009),schematically expands the spectrum of folded?nal conformations according to an in vivo‘folding fun-nel’model dispersed across a measure of intra-vs.inter-molecular contact for hydrophobic-core proteins forming tertiary structure. Intra-molecular conformations involve three-dimensional assem-blages of?-helices and?-sheets,while the most densely packed inter-molecular form is,perhaps,the semicrystalline amyloid?bril.

At a later stage we will examine intrinsically disordered pro-teins(IDP)that lack the hydrophobic-core,do not have inherent tertiary structures,and reach a conformation only in partnership with another chemical species(e.g.,Uversky et al.,2008;Serdyuk, 2007).

The basic spectrum of Fig.1for proteins having a hydrophobic core,in general,explains the necessity of the elaborate regulatory structures associated with the endoplasmic reticulum and its atten-E-mail address:wallace@pi.cpmc.columbia.edu.dant spectrum of chaperone proteins(e.g.,Scheuner and Kaufman, 2008),and the evolutionary pattern of protein sequences inferred by Goldschmidt et al.(2010).The inevitable corrosion of the cellular regulatory apparatus with age would then explain the subsequent onset of amyloid?bril and other aggregation disorders.

Most particularly,the spectrum of valleys in Fig.1characterizes a set of equivalence classes that de?nes a‘protein folding groupoid’, in the sense of Weinstein(1996).As we will argue below,both the native state and amyloid?bril have structured subdivisions, internal equivalence classes,that de?ne a nested set of groupoids. See Mathematical Appendix for a summary of standard material on groupoids.

With regard to the disjunction between‘native’and‘amyloid’protein forms,very early on,Astbury(1935)conjectured that glob-ular proteins could also have a linear state,based on pioneering X-ray studies.Chiti et al.(1999)argue that

...[P]rovided appropriate conditions are maintained over pro-longed periods of time,the formation of ordered amyloid proto?laments and?brils could be an intrinsic property of many polypeptide chains,rather than being a phenomenon limited to

a very few aberrant sequences.

Wang et al.(2008),in a an elegant series of experiments on bacterial inclusion bodies,conclude that

...[A]myloid aggregation appears to be a common property of protein segments and consequently is observed in both eukaryotes and prokaryotes...[Thus]there must be evolved

0303-2647/$–see front matter© 2010 Elsevier Ireland Ltd. All rights reserved. doi:10.1016/j.biosystems.2010.09.007

图片特写:“争夺”图书馆 | 依法治校

R.Wallace/BioSystems103 (2011) 18–26

19

Fig.1.From Hartl and Hayer-Hartl(2009).Energy landscape spectrum of protein folding and aggregation,parsed according to the degree of intra-vs.inter-molecular contact.Each energy valley de?nes an equivalence class,and the set of such classes de?nes the‘protein folding groupoid’,in the sense of Weinstein(1996).Four basic classi?cations can be seen;native state,amorphous aggregates,semi-structured oligomers,and quasi-crystalline amyloid?brils.Within the native state and the amyloid?brils,systematic subclasses can be identi?ed,leading to a?ne structure for protein coding.

strategies against amyloid formation,which include both qual-ity control mechanisms through molecular chaperones as well as sequence-based[evolutionary]prevention of amyloid aggregation...

...[E]ach protein may exist,not only in an unfolded or folded state,but,by containing at least one amino acid segment that is capable of participating in a sequence-speci?c,ordered, cross-?-sheet aggregated state,may also exist in an amyloid-like aggregate.The process of protein aggregation can thus be viewed as a primitive folding mechanism,resulting in a de?ned, aggregated conformation with each aggregated protein having its own distinctive properties.

Krebs et al.(2009),however,in a paper tellingly titled‘Protein aggregation:more than just?brils’,?nd that the amyloid?bril is not the only structure that aggregating proteins of widely different types may adopt.For example,the occurrence of spherulites,which have been found in vivo as well as in vitro,appears to be generic, although the factors that determine the equilibrium between free ?bril and spherulite are not as yet clear.That is,we have not fully explained the spectrum implied in Fig.1.Nevertheless,here we will use Tlusty’s(2007a,b,2008a,b,c,2010a,b)arguments on the evolu-tion of the genetic code to explore something of that spectrum.

The?rst papers in this series(Wallace,2010a,b)applied Tlusty’s rate distortion analysis of the genetic code to protein folding dynamics,and made a pilot application to the simplest‘protein folding code’.Here we greatly expand the topological methods from that work,focusing on normal three-dimensional globular proteins and the eightfold symmetry of the steric zipper associ-ated with amyloid?brils(Sawaya et al.,2007),but extending that work signi?cantly,to empirical studies of protein folding rates and to intrinsically disordered proteins.

As Kamtekar et al.(1993)point out,experimental studies of natural proteins show how their structures are remarkably toler-ant to amino acid substitution,but that tolerance is limited by a need to maintain the hydrophobicity of interior side chains.Thus, while the information needed to encode a particular protein fold is highly degenerate,this degeneracy is constrained by a require-ment to control the locations of polar and nonpolar residues.This is the precise protein folding analog to Tlusty’s error network anal-ysis of the genetic code,and his graph coloring arguments should thus apply,in some measure,to protein folding as well,allowing inference on the underlying structure of the‘protein folding codes’to be associated with the horizontal axis of Fig.1.

Tycko,2006,likewise,argues that the amyloid?bril is a generically stable structural state of a polypeptide chain,compet-ing thermodynamically and kinetically with globular monomeric states and unfolded monomeric states.Peptides and proteins that are known to form amyloid?brils have widely diverse amino acid sequences and molecular weights.He particularly?nds that

The near sequence independence of amyloid formation repre-sents a challenge to our understanding of the physical chemistry of peptides and proteins.

Such sequence independence is,again,very precisely the degen-eracy associated with Tlusty’s error network approach.

Intermediate forms in Fig.1remain to be studied from this per-spective.

Some of these matters have,of course,already been the sub-ject of considerable attention.A series of elegant experiments by the Hecht group(e.g.,Hecht et al.,2004),extending the Kamtekar et al.(1993)work,has focused on a basic understanding of pro-tein folding through substitution of different polar and nonpolar amino acids in the construction of normal and?bril proteins.?-Helices are found to be natural outcomes of amino acid sequences having a3.6residue/turn patten,i.e.,a digital signal of the form 101100100110,where1indicates a polar,and0a nonpolar amino acid.The resulting three-dimensional structures are formed by the propensity of the different residues to interact with an aqueous environment.

?sheets,on the other hand,emerge from a simpler period2 code,e.g.,1010101,matching the structural repeat of the sheets. More recent work(Kim and Hecht,2006)?nds that generic hydrophobic residues of this form are suf?cient to promote aggre-gation of the Alzheimer’s A?42peptide.However,while the positioning of hydrophobic residues is more important than the exact identities of the hydrophobic side chains for determining overall geometry,reaction kinetics,the rate of?bril formation, was profoundly affected by those identities.This suggests that the ‘protein folding code’may be,in no small part,contextual,that is, determined as much by in vivo cellular regulatory machinery as by in vitro hydrophobic/hydrophilic physical interactions.This,we will suggest below,likely involves the operation of something like the catalytic mechanisms that Wallace and Wallace(2009)and Wallace (2010a)describe.

Before beginning the formal explorations,some comment is necessary regarding a‘biological’explanation of the relation between the number of network holes and tertiary protein sym-metries,according to Tlusty’s treatment.Following the argument of Tlusty(2010b),the genetic code is a mapping of one codon to one amino acid.By contrast,the‘protein folding code’is a mapping of genes to folded amino acid chains,and the com-plexity gap between the two codes is very great indeed(e.g., Mirny and Shakhnovich,2001).The strategy that allows adapta-tion of Tlusty’s methods to protein folding is a coarse-graining of protein structure into a matrix of larger building blocks,e.g.,?-helices and?-sheets.At this lower resolution a‘code’is a mapping between short DNA stretches,analogous to codons,and the con-voluted motifs of proteins,playing the role of amino acids.As a consequence of the great tolerance to amino acid substitutions described above,as long as charge and polarity are conserved, it is possible to cluster all the sequences that encode the same structural motif.This greatly reduces the size of the resulting DNA sequence graph and thus limits the number of possible building blocks.

图片特写:“争夺”图书馆 | 依法治校

20R.Wallace /BioSystems 103 (2011) 18–26

rge scale structure

Broadly,Fig.1embraces a fourfold classi?cation:

1.The ‘native state’determined,at low concentrations,entirely by the amino acid sequence in the classic sense of An?nsen (1973).

2.Amorphous aggregates.

3.Semi-structured oligomers,as explored by Krebs et al.(2009).

4.Amyloid/amyloid-like one-dimensional ?brils.

Generalizing Table 1of Tlusty (2007a,b,2010a)according to the genus of the underlying graph,that is,the number of holes in the error network associated with the proposed code,we can apply Heawood’s graph genus formula for the coloring number that iden-ti?es the maximal number of ?rst excited modes of the coding graph Laplacian,chr ( )=Int

1

2

(7+

1+48 )

.(1)

where Int is the integer value of the enclosed expression and itself is de?ned from Euler’s formula (Tlusty,2007a,b,2010a )as =1?

1

2

(V ?E +F ),(2)

where V is the number of code network vertices,E the number of network edges,and F the number of enclosed faces.Eq.(1)produces the table

(#network holes)chr( )(#prot.syms.)041728394105116,7128,9

13

In Tlusty’s scheme,the second column represents the maximal possible number of product classes that can be reliably produced by error-prone codes having holes in the underlying coding error network.As stated,in this context we have coarse-grained the rela-tion between DNA sequence space and tertiary protein structures.

From Tlusty’s perspective,then,our fourfold classi?cation for Fig.1produces a the simplest possible large-scale ‘protein folding code’,a sphere limited by the four-color problem,and the simplest cognitive cellular regulatory system would thus be constrained to pass/fail on four basic ?avors,as it were,of folded proteins.

Within the funnel leading to the native state,however,chaper-one processes would face far more dif?cult choices.

This suggests a possible twofold cellular regulatory structure,and next we consider the two most fully characterized geometric structures in more detail,the normal and amyloid forms.3.Normal globular proteins

Normal irregular protein symmetries were ?rst classi?ed by Levitt and Chothia (1976),following a visual study of polypep-tide chain topologies in a limited dataset of globular proteins.Four major classes emerged;all ?-helices;all ?-sheets;?/?;and ?+?,as illustrated in Fig.2.

While this scheme strongly dominates observed irregular pro-tein forms,Chou and Maggiora (1998),using a much larger data set,recognize three more ‘minor’symmetry equivalence classes;?(multi-domain);?(small protein);and ?(peptide),and a possible three more ‘subminor’groupings.

We infer that the normal globular ‘protein folding code error network’is,essentially,a large connected ‘sphere’–producing the four dominant structural modes of Fig.2–having one minor,

and

Fig.2.From Chou and Zhang (1995).Standard equivalence classes for inexact pro-tein symmetries according to Levitt and Chothia (1976):(a)all-?helices;(b)all-?sheets;(c)?+?;(d)?/?.More recent work identi?es a minimum of seven,and possibly as many as ten,such classes (Chou and Maggiora,1998).

possibly as many as three more ‘subminor’attachment handles,in the Morse Theory sense (Matsumoto,2002),a matter opening up other analytic approaches.4.Amyloid ?brils

As described above,Kim and Hecht (2006)suggest that over-all amyloid ?bril geometry is very much driven by the underlying ?-sheet coding 1010101,although the rate of ?bril formation may be determined by exact chemical constitution.Work by Sawaya et al.(2007)parses some of those subtleties:they identify an eight-fold ‘steric zipper’symmetry necessarily associated with the linear amyloid ?brils that characterize a vast spectrum of protein folding disorders.Fig.3,adapted from their work,shows those symmetries.In essence,two identical sheets can be classi?ed by the orientation of their faces (face-to-face/face-to-back),the orientation of their strands (with both sheets having the same edge of the strand up or one up and the other down),and whether the strands within the sheets are parallel or antiparallel.Five of the eight symmetry possibilities have been observed.This suggests,from the text table above,that the ‘amyloid folding code error network’is a double donut,that is,has two,different sized,interior holes,resembling,perhaps,a toroid with a smaller attachment handle.5.Amyloid self-replication

Maury (2009)has recently proposed an ‘amyloid world’model for the emergence of prebiotic informational entities,based on the extraordinary stability of amyloid structures in the face of the harsh conditions of the prebiotic world.From this perspec-tive,the synthesis of RNA,and the evolution of the RNA-protein world,were later,but necessary events for further biomolucu-lar evolution.Maury further argues that,in the contemporary DNA ?RNA ?protein world,the primordial ?-conformation-based information system is preserved in the form of a cytoplasmic epigenetic memory.

图片特写:“争夺”图书馆 | 依法治校

R.Wallace/BioSystems103 (2011) 18–26

21

Fig.3.From Sawaya et al.(2007).The eight possible steric zipper symmetry classi?cations for amyloid?brils.

Falsig et al.(2008)examine the many different strains of pri-ons,?nding that differences in kinetics of the elementary steps of prion growth underlie the differential proliferation of prion strains,based on differential frangibility of prion?brils.They argue that an important factor is the size of the stabilizing cross-?amyloid core that appears to de?ne the physical properties of the resulting structures,including their propensity to frag-ment,with small core sizes leading to enhanced frangibility. In terms of the protein folding funnel approach,they?nd that intrinsic frustration implies that several distinct arrangements favoring a certain subset of globally incompatible interactions are possible,re?ecting the observed strain-dependent differ-ences in the parts of the sequence incorporated into the?bril core.

In addition,they argue,there are unexplored similarities between Alzheimer’s and prion diseases,that is,the analogies between prion and A?aggregates could be broader than initially suspected.

Given the eightfold symmetry of the amyloid?ber,say versions A→H,then the simplest‘frangibility code’is the set of identical pairings:{AA,BB,...,GG,HH},producing eight different possible structures and their reproduction by fragmentation.More complex prion symmetries,or the possibility of combinatorial recombina-tion,would allow a much richer structure,producing quasi-species, in the sense of Collinge and Clarke(2007).Permitting different sequence lengths or explicitly identifying different sequence orders would vastly enlarge what Collinge has characterized as a‘cloud’of possibilities,in the case of prion diseases.Indeed,classic studies by Bruce and Dickinson(1987)found15or more different prion strains in a mouse model.

Recent work on prions appears to support something of Maury’s hypothesis.Li et al.(2010)?nd that infectious prions,mainly what

图片特写:“争夺”图书馆 | 依法治校

22R.Wallace/BioSystems

103 (2011) 18–26

Fig.4.From Li et al.(2010),Fig.S10.Schematic energy landscape for prion strains and substrains.The energy landscape diagram suggests that substrains are distin-guishable collectives of prions that interconvert reproducibly and readily because they are separated by low activation energy barriers.The properties of a strain may vary depending on the environment in which it replicates,as the proportions of component substrains may change to favor that replicating most rapidly,indicated by the parison with Fig.1,and the subsequent argument,suggests an underlying topological structure for a‘prion reproduction code’.

is called PrP Sc,a spectrum of?sheet-rich conformers of the normal host protein PrP C,undergo Darwinian evolution in cell culture.In that work,prions show the evolutionary hallmarks:they are subject to mutation,as evidenced by heritable changes of their phenotypes, and to selective ampli?cation,as found by the emergence of dis-tinct populations in different environments.Fig.4,from Li et al. (2010),shows a prion energy landscape similar to Fig.1.This sug-gests the possibility of characterizing the underlying topology of a ‘prion reproduction code’,in the sense of the sections above.

One might speculate that prions and prion diseases represent fossilized remains of Maury’s prebiotic amyloid world.

6.Topology and protein folding rate

Rate distortion arguments similar to those of Tlusty(2007a,b, 2010a),in conjunction with the topological approach,enable a direct analysis of protein folding rates,expanding on the treatment of Wallace(2010a).

Consider a generalized reaction pathway that,in a series of steps, takes an amino acid string S0at time0to a?nal folded conforma-tion S f at time t in a long series of distinct,sequential,intermediate con?gurations S i.

Let N(n)be the number of possible paths having n steps that lead from S0to S f.

This is assumed to be a systematic process governed by a‘gram-mar’and‘syntax’driven by the underlying‘protein folding funnel’, so that it is possible to divide all possible paths x n={S0,S1,..., S n}into two sets,a small,high probability subset that conforms to the demands of the folding funnel topology,and a much larger ‘nonsense’subset having vanishingly small probability.

If N(n)is the number of high probability paths of length n,then the‘ergodic’limit

H=lim

n→∞

log

N(n)

n

,(3)

is assumed both to exist and be independent of the path x,a restate-ment of the Shannon–McMillan Theorem(Khinchin,1957).

That is,the folding of a particular protein,from its amino acid string to its?nal form,is not a random event,but represents a highly –evolutionarily–structured‘statement’by an information source having source uncertainty H.Details of this argument can be found in Wallace(2010a).

An equivalence class algebra can be constructed by choosing different origin and end points S0,S f and de?ning equivalence of two states by the existence of a high probability meaningful path connecting them with the same origin and end.Disjoint parti-tion by equivalence class,analogous to orbit equivalence classes for dynamical systems,de?nes the vertices of a network of devel-opmental protein‘languages’,a network of metanetworks.Each vertex then represents a different equivalence class of develop-mental information sources.This is an abstract set of metanetwork ‘languages’.

This structure generates a groupoid,in the sense of Mathematical Appendix.States a j,a k in a set A are related by the groupoid morphism if and only if there exists a high probabil-ity grammatical path connecting them to the same base and end points,and tuning across the various possible ways in which that can happen–the different developmental languages–parameter-izes the set of equivalence relations and creates the(very large) groupoid.

There is an implicit hierarchy.First,there is structure within the system having the same base and end points.Second,there is a complicated groupoid de?ned by sets of dual information sources surrounding the variation of base and end points.

Consider the simple case,the set of dual information sources associated with a?xed pair of beginning and end states.

High probability meaningful paths from S0to S f are structured by the uncertainty of the associated dual information source,and that,following standard arguments(e.g.,Wallace,2010a;Feynman, 2000),have a homological relation with free energy density.

Index possible information sources connecting base and end points by some set A=∪?.The minimum channel capacity needed to produce average distortion less than D is,according to the Rate Distortion Theorem,as discussed in Wallace(2010a),the rate dis-tortion function R(D).We take the probability of an information source,Hˇ,associated with a particular folding geometry,as deter-mined by the standard expression

P[Hˇ]=

exp[?Hˇ/ R]

?

exp[?H?/ R]

,(4)

where the sum may be an abstract integral.Following Wallace (2010a,b),we have identi?ed Tlusty’s rate distortion function as a kind of temperature equivalent affecting folding dynamics,in this information dynamics formulation for which information source uncertainty is seen as simply another form of free energy,adopting the perspective of Feynman(2000).

It is necessary that the sum/integral always converge.

There is,then,structure within a(cross-sectional)connected component in the base con?guration space,determined by R.Some dual information sources will be‘richer’/smarter than others,but, conversely,must use more available channel capacity for their com-pletion.

This leads to a direct analysis of protein folding speeds,adapting the results of Wallace(2010a,b).

Dill et al.(2007)describe protein folding speeds as follows:

图片特写:“争夺”图书馆 | 依法治校

R.Wallace/BioSystems103 (2011) 18–2623 ...[P]rotein folding speeds–now known to vary over more than

eight orders of magnitude–correlate with the topology of the

native protein:fast folders usually have mostly local structure,

such as helicies and tight turns,whereas slow folders usually

have more non-local structure,such as?sheets(Plaxco et al.,

1998)...

A simple groupoid probability argument reproduces something

of this result.Assume that protein structure can be characterized

by some groupoid representing,at least,the disjoint union of the

groups describing the symmetries of component secondary struc-

tures–e.g.,helices and sheets.Then,in Eq.(4),we take the set

A=∪?as?xed,with increasing?representing increased structural

complexity.If channel capacity is also capped by some mechanism,

so that R is?xed also then,the log of the folding rate will be given

as

log[P[Hˇ]]=log

exp[?Hˇ/ R]

?

exp[?H?/ R]

=

C(R)?Hˇ

R

,(5)

where C(R)is positive.ˇindexes increasing topological complexity, using some appropriate measure.

The simplest assumption is that Hˇ∝ˇ.Then,using an integral approximation,

P[ˇ]=

exp[?mˇ/ R]

?=0

exp[?m?/ R]d?

=

m

R

exp

?m

ˇ

R

,(6)

and

log[P[ˇ]]=log[m/ R]?mˇ

R

.(7)

Thus one expects,at a?xed value of R de?ning a maximum channel capacity,that

log[folding rate]=C?kˇ,(8) C,k constant and all values positive.

As Ivankov et al.(2003)discuss at some length,one standard index of protein complexity is the absolute contact order(Plaxco et al.,1998):

ACO=1

N

N

L i,j(9)

where N is the number of contacts within6?A between nonhydrogen atoms in the protein,and L ij is the number of residues separating the interacting pair of nonhydrogen atoms.

Adjacent residues are assumed to be separated by one residue.

Fig.5,adapted from Gruebele(2005),reexpresses data from Ivankov et al.(2003),showing the correlation of the log of the fold-ing rate with fold complexity,measured by the ACO.The upper line estimates folding speed limited only by fold complexity,following Yang and Gruebele(2004),and seems clearly to represent a maxi-mum possible rate distortion function/channel capacity,according to Eq.(8).The molecular species along the lower curve are assumed to be‘frustrated’by an irregular folding funnel,and appear to fol-low a narrow spectrum of relations like Eq.(7),necessarily below the line de?ned by maximum channel capacity,and necessarily somewhat scattered,according to the variation in R.

It is possible to reproduce something like Fig.5by describing ‘smooth’and‘rough’folding funnels in terms of a Gaussian channel, that is,one in which the signal transmission S0→S f is perturbed by Gaussian noise having a squared-error distortion,so that the rate distortion function has the standard form(Wallace,2010a,b):

R(D)=

1

2

log

2

D

.

(10)

Fig.5.From Gruebele(2005).Correlation of the log of the protein folding rate with

fold complexity.The upper line indicates folding speeds that are limited only by fold

complexity,without the‘frustration’effects of a rough folding funnel.Frustration,in

this model,constrains channel capacity,and hence drives R irregularly lower than

the value implied by the relation for the fastest folders.

R(D)is the rate distortion function at average distortion D,and 2

represents the amplitude of the imposed random noise.A smooth

folding funnel would have little noise.

Plugging Eq.(10)into Eq.(7)gives,over an appropriate range

of parameters,the spectrum of linear relations for log folding rate

shown in Fig.6.D,m and are?xed,andˇand 2increase,as

indicated.

7.Extending the model

As Serdyuk(2007)discusses,many proteins in the cell have no

unique tertiary structure in isolation,although they have a distinct

function under physiological conditions,that is,in partnership.

Thus their conformation is determined not only by their amino

acid

Fig.6.Spectrum of linear relations between log folding rate and increasing topo-

logical complexity for increasing‘roughness’of the folding funnel,as measured by

noise 2for a Gaussian channel.ˇincreases to the right and 2increases downward.

图片特写:“争夺”图书馆 | 依法治校

24R.Wallace/BioSystems103 (2011) 18–26

sequence,but also by the interacting partner.Essentially,they are without a hydrophobic core.

Serdyuk(2007),in fact,proposes four basic protein forms:

In addition to the axiomatic native state–a rigid tertiary structure–it is proposed to consider three more states: molten globule,completely disordered chain,and the structure comprising domains connected with long enough linkers and containing,as a rule,rather long disordered regions at the ends.

These four states cover all conformations that are now known based on the physical understanding of protein structure.

Thus the spectrum of Fig.1becomes an expansion,via sub-groupoids,of a single class within this larger taxonomy,having the magic number four,implying a larger,embedding,‘spherical’error code topology,in Tlusty’s sense.

It is possible,using the previous section,to say something about IDP rates of binding to their targets that,according to Huang and Liu(2009),are greater than for ordered proteins,via a highly?exi-ble‘?y-casting’mechanism,in contrast with ordered proteins that must dock to their targets.IDPs have,then,greater effective cap-ture radius and can weakly bind to targets from a larger distance, and then‘reel themselves in’to the?nal con?guration.Huang and Liu conclude that both fractions of the native interchain contacts and the distance between mass centers,quantities widely used in protein binding problems,only partially describe the features of the binding process,so that better coordinates will be required.

Taking the perspective of Section6above,we assume that IDPs are transferred hand-to-hand,to avoid cellular clean-up processes. Thus,for a given IDP,symbolized by I,there is an initial partner,i, and a?nal partner f,de?ning larger-scale structures.Then we are interested in the number of possible paths,N(n),having n steps leading from an initial partnership S i*1to a?nal partnership S f*1. This instantiates a larger version of the metanetwork of‘languages’discussed above.

Collapsing the argument via Eqs.(7)and(10),the‘?y-casting’mechanism might better be described as a snake slithering down a rocky hillside.That is,an IDP‘falling’down a noisy folding funnel undergoes a self-lubricating catalysis that decreases 2in some-thing like Fig.6,increasing the rate of reaction above what would be expected from a rigid molecule having?xed tertiary struc-tures.Indeed,the child’s toy slinky-spring walking down a staircase comes to mind,and is probably not a bad model.

8.Discussion

Tlusty(2007,2010a,Table1)constructs a relatively smooth error network-symmetry based taxonomy for the evolution of the genetic code,and suggests that future evolutionary process could well expand that code’s expression from20to as many as25amino acids.The globular/amyloid disjunction explored here,in conjunc-tion with the spectrum of possible geometries identi?ed in the horizontal dispersion of Fig.1–the nested protein folding groupoid –suggests a markedly different evolutionary trajectory for the‘pro-tein folding code’,such as it is.If amyloid proteins were indeed the primitive form,as Wang et al.(2008)conjecture,then a remarkable evolutionary change occurred,reducing the topological complex-ity of the code from a double toroid to a sphere with a few small attachment handles.This suggests a very highly punctuated equi-librium transition,in the sense of Eldredge and Gould(1972),a shift from one to three-dimensional protein structures in which the‘protein folding code’became topologically simpli?ed while the overall protein topology became more complex.This is a strikingly different evolutionary pathway from that inferred by Tlusty for the genetic code.More likely,amyloid?brils in particular,and perhaps most of the other geometric forms across Fig.1,have largely been evolutionary dead ends since prebiotic times,and cellular or other mechanisms against their production have been with us a very long time.

Maury’s‘amyloid world’hypothesis is one possible explanation, i.e.,that amyloid frangibility was,itself,the?rst reproductive code, characterized by the quasi-species effects consequent on segment length variation and combinatorial effects,and ultimately leading to the later RNA/DNA worlds in which prion diseases remain as fossilized remnants of that earlier time.

Cellular regulatory machinery assisting protein folding–chap-erone processes associated with the endoplasmic reticulum–must inevitably be cognitive in the sense Atlan and Cohen(1998) attribute to the immune system.It should thus be possible to develop a‘cognitive paradigm’for active protein folding regulation, in the sense of Wallace and Wallace(2008,2009,in press),and begin to incorporate the catalytic effects of epigenetic factors,including patterns of culture and psychosocial stress,into the etiology of pro-tein folding disorders.Such catalytic effects emerge directly from the necessary association of a basic information source with a broad class of cognitive processes,and from the role of an information source as a kind of free energy.Thus patterned external signals from an information source can serve to catalyze cognitive regula-tory phenomena via a form of canalization.In the case of human protein folding disorders,this effect seems to emerge as a kind of premature aging of the cellular regulatory apparatus by a life course trajectory of psychosocial,cultural,and and other environmental stressors.

Thus Serdyuk’s fourfold classi?cation,the set of equivalence classes across Fig.1,and the nested sets within the valleys of the native state,amyloid?brils,and possible other structures such as the spherulites of Krebs et al.(2009),imply a complex nested groupoid structure for protein folding geometry.This shows there can be many‘protein folding codes’,in Tlusty’s sense.That is,each of Serdyuk’s classi?cations,and each of the valleys in Fig.1,can probably be identi?ed with its own code,as could,perhaps,the valleys in Fig.4.

To reiterate,the codes associated with normal folded or intrin-sically disordered proteins must be enforced by elaborate cellular –and perhaps higher order–regulatory apparatus that is strongly cognitive and has its own internal coding,via the dual informa-tion source associated with that cognitive process(Wallace and Wallace,2008,2009,in press;Wallace,2010a).Thus the in vivo ‘protein folding code’is,following the models of Serdyuk,Fig.1, and possibly Fig.4,an exceedingly complicated composite between protein production and regulation–possibly analogous to the phe-nomena of distributed cognition described by Wallace and Fullilove (2008)–in which a regulatory information source acts as a form of catalyst in the sense of Section5.3of Wallace(2010a)to canalize ?nal protein production.

These results are,in all,at variance with the relatively straight-forward internal structures of the genetic code:following,in some measure,the arguments of Thomas(1992),a‘code’implies that local sequences specify local structures simply and uniquely.This is clearly not the case for an in vivo protein folding that can be under-stood only in the context of an elaborate,and apparently delicate, cellular,and perhaps even higher level,regulatory system.

Finally,adaptation of Tlusty’s rate distortion topological approach to protein folding dynamics appears possible,leading to the development of Eqs.(4)–(10)that seems to account,in some measure,for the elegant results of Fig.5,and may perhaps be extended to intrinsically disordered proteins.

Acknowledgments

The author thanks D.Eisenberg,M.Gruebele,and M.Hecht for important references,D.N.Wallace for useful discussions,and a reviewer for comments helpful in revision.

图片特写:“争夺”图书馆 | 依法治校

R.Wallace/BioSystems103 (2011) 18–2625

Appendix A.Mathematical Appendix

A.1.Basic ideas about groupoids

Following Weinstein(1996)closely,a groupoid,G,is de?ned by a base set A upon which some mapping–a morphism–can be de?ned.Note that not all possible pairs of states(a j,a k)in the base set A can be connected by such a morphism.Those that can de?ne the groupoid element,a morphism g=(a j,a k)having the natural inverse g?1=(a k,a j).Given such a pairing,it is pos-sible to de?ne‘natural’end-point maps?(g)=a j,ˇ(g)=a k from the set of morphisms G into A,and a formally associative prod-uct in the groupoid g1g2provided?(g1g2)=?(g1),ˇ(g1g2)=ˇ(g2), andˇ(g1)=?(g2).Then the product is de?ned,and associative (g1g2)g3=g1(g2g3).

In addition,there are natural left and right identity elements g, g such that g g=g=g g(Weinstein,1996).

An orbit of the groupoid G over A is an equivalence class for the relation a j~Ga k if and only if there is a groupoid element g with ?(g)=a j andˇ(g)=a k.Following Cannas Da Silva and Weinstein (1999),we note that a groupoid is called transitive if it has just one orbit.The transitive groupoids are the building blocks of groupoids in that there is a natural decomposition of the base space of a general groupoid into orbits.Over each orbit there is a transitive groupoid,and the disjoint union of these transitive groupoids is the original groupoid.Conversely,the disjoint union of groupoids is itself a groupoid.

The isotropy group of a∈X consists of those g in G with ?(g)=a=ˇ(g).These groups prove fundamental to classifying groupoids.

If G is any groupoid over A,the map(?,?):G→A×A is a mor-phism from G to the pair groupoid of A.The image of(?,?)is the orbit equivalence relation~G,and the functional kernel is the union of the isotropy groups.If f:X→Y is a function,then the kernel of f, ker(f)=[(x1,x2)∈X×X:f(x1)=f(x2)]de?nes an equivalence relation.

Groupoids may have additional structure.As Weinstein(1996) explains,a groupoid G is a topological groupoid over a base space X if G and X are topological spaces and?,?and multiplication are continuous maps.A criticism sometimes applied to groupoid the-ory is that their classi?cation up to isomorphism is nothing other than the classi?cation of equivalence relations via the orbit equiv-alence relation and groups via the isotropy groups.The imposition of a compatible topological structure produces a nontrivial interac-tion between the two structures.Below we will introduce a metric structure on manifolds of related information sources,producing such interaction.

In essence,a groupoid is a category in which all morphisms have an inverse,here de?ned in terms of connection to a base point by a meaningful path of an information source dual to a cognitive process.

As Weinstein(1996)points out,the morphism(?,?)suggests another way of looking at groupoids.A groupoid over A identi?es not only which elements of A are equivalent to one another(iso-morphic),but it also parametizes the different ways(isomorphisms) in which two elements can be equivalent,i.e.,all possible information sources dual to some cognitive process.Given the informa-tion theoretic characterization of cognition presented above,this produces a full modular cognitive network in a highly natural manner.

Brown(1987)describes the fundamental structure as follows:

A groupoid should be thought of as a group with many objects,or

with many identities...A groupoid with one object is essentially just a group.So the notion of groupoid is an extension of that of groups.It gives an additional convenience,?exibility and range of applications...

Example1.A disjoint union[of groups]G=∪ G , ∈ ,is a groupoid:the product ab is de?ned if and only if a,b belong to the same G ,and ab is then just the product in the group G .

There is an identity1 for each ∈ .The maps?,?coincide and map G to , ∈ .

Example2.An equivalence relation R on[a set]X becomes a groupoid with?,?:R→X the two projections,and product (x,y)(y,z)=(x,z)whenever(x,y),(y,z)∈R.There is an identity, namely(x,x),for each x∈X...

Weinstein(1996)makes the following fundamental point:

Almost every interesting equivalence relation on a space B arises in a natural way as the orbit equivalence relation of some groupoid G over B.Instead of dealing directly with the orbit space B/G as an object in the category S map of sets and map-pings,one should consider instead the groupoid G itself as an object in the category G htp of groupoids and homotopy classes of morphisms.

The groupoid approach has become quite popular in the study of networks of coupled dynamical systems which can be de?ned by differential equation models(e.g.,Golubitsky and Stewart,2006).

A.2.11.2Global and local symmetry groupoids

Here we follow Weinstein(1996)fairly closely,using his exam-ple of a?nite tiling.

Consider a tiling of the euclidean plane R2by identical2by 1rectangles,speci?ed by the set X(one-dimensional)where the grout between tiles is X=H∪V,having H=R×Z and V=2Z×R, where R is the set of real numbers and Z the integers.Call each connected component of R2/C,that is,the complement of the two-dimensional real plane intersecting X,a tile.

Let be the group of those rigid motions of R2which leave X invariant,i.e.,the normal subgroup of translations by elements of the lattice =H∩V=2Z×Z(corresponding to corner points of the tiles),together with re?ections through each of the points 1/2 =Z×1/2Z,and across the horizontal and vertical lines through those points.As noted by Weinstein(1996),much is lost in this coarse-graining,in particular the same symmetry group would arise if we replaced X entirely by the lattice of corner points. retains no information about the local structure of the tiled plane. In the case of a real tiling,restricted to the?nite set B=[0,2m]×[0, n]the symmetry group shrinks drastically:the subgroup leaving X∩B invariant contains just four elements even though a repetitive pattern is clearly visible.A two-stage groupoid approach recovers the lost structure.

We de?ne the transformation groupoid of the action of on R2 to be the set

G( ,R2)={(x, ,y|x∈R2,y∈R2, ∈ ,x= y},

with the partially de?ned binary operation

(x, ,y)(y, ,z)=(x, ,z).

Here?(x, ,y)=x,andˇ(x, ,y)=y,and the inverses are natural.

We can form the restriction of G to B(or any other subset of R2) by de?ning

G( ,R2)|B={g∈G( ,R2)|?(g),ˇ(g)∈B}

(1)An orbit of the groupoid G over B is an equivalence class for the

relation

x~Gy if and only if there is a groupoid element g with?(g)=x andˇ(g)=y.

图片特写:“争夺”图书馆 | 依法治校

26R.Wallace/BioSystems103 (2011) 18–26

Two points are in the same orbit if they are similarly placed within their tiles or within the grout pattern.

(2)The isotropy group of x∈B consists of those g in G with

?(g)=x=ˇ(g).It is trivial for every point except those in 1/2 ∩B,for which it is Z2×Z2,the direct product of integers modulo two with itself.

By contrast,embedding the tiled structure within a larger context permits de?nition of a much richer structure,i.e.,the iden-ti?cation of local symmetries.

We construct a second groupoid as follows.Consider the plane R2as being decomposed as the disjoint union of P1=B∩X(the grout),P2=B\P1(the complement of P1in B,which is the tiles), and P3=R2/B(the exterior of the tiled room).Let E be the group of all euclidean motions of the plane,and de?ne the local symmetry groupoid G loc as the set of triples(x, ,y)in B×E×B for which x= y, and for which y has a neighborhood u in R2such that (u∩P i)?P i for i=1–3.The composition is given by the same formula as for G( , R2).

For this groupoid-in-context there are only a?nite number of orbits:

O1=interior points of the tiles.

O2=interior edges of the tiles.

O3=interior crossing points of the grout.

O4=exterior boundary edge points of the tile grout.

O5=boundary‘T’points.

O6=boundary corner points.

The isotropy group structure is,however,now very rich indeed: The isotropy group of a point in O1is now isomorphic to the entire rotation group O2.

It is Z2×Z2for O2.

For O3it is the eight-element dihedral group D4.

For O4,O5and O6it is simply Z2.

These are the‘local symmetries’of the tile-in-context. References

An?nsen,C.,1973.Principles that govern the folding of protein chains.Science181, 223–230.

Astbury,W.,1935.The X-ray interpretation of the denaturation and the structure of the seed gobulins.Biochemistry29,2351–2360.

Atlan,H.,Cohen,I.,1998.Immune information,self-organization,and meaning.

International Immunology10,711–717.

Brown,R.,1987.From groups to groupoids:a brief survey.Bulletin of the London Mathematical Society19,113–134.

Bruce,M.,Dickinson,A.,1987.Biological evidence that scrapie agent has an inde-pendent genome.Journal of General Virology68,79–89.

Cannas Da Silva,A.,Weinstein,A.,1999.Geometric Models for Noncommutative Algebras.American Mathematical Society,Providence,RI.

Chiti,F.,Webster,P.,Taddei,N.,Clark,A.,Stefani,M.,Ramponi,G.,Dobson,C.,1999.

Designing conditions for in vitro formation of amyloid proto?laments and?brils.

Proceedings of the National Academy of Sciences of America96,3590–3594. Chou,K.,Zhang,C.,1995.Prediction of protein structural classes.Critcal Reviews in Biochemistry and Molecular Biology30,275–349.

Chou,K.,Maggiora,G.,1998.Domain structural class prediction.Protein Engineering 11,523–538.

Collinge,J.,Clarke,A.,2007.A general model of prion strains and their pathogenicity.

Science318,930–936.

Dill,K.,Banu Ozkan,S.,Sweikl,T.,Chodera,J.,Voelz,V.,2007.The protein fold-ing problem:when will it be solved?Current Opinion in Structural Biology17, 342–346.

Eldredge,N.,Gould,S.,1972.Punctuated equilibrium:an alternative to phyletic gradualism.In:Schopf,T.(Ed.),Models in Paleobiology,pp.82–115.

Falsig,J.,Nilsson,K.,Knowles,T.,Aguzzi,A.,2008.Chemical and biophysical insights into the propagation of prion strains.HFSP Journal2,332–341.

Feynman,R.,2000.Lectures on Computation.Westview,New York. Goldschmidt,L.,Teng,P.,Riek,R.,Eisenberg,D.,2010.Identifying the amylome, proteins capable of forming amyloid-like?brils.Proceedings of the National Academy of Sciences of America107,3487–3492.

Golubitsky,M.,Stewart,I.,2006.Nonlinear dynamics and networks:the groupoid formalism.Bulletin of the American Mathematical Society43,305–364.Gruebele,M.,2005.Downhill protein folding:evolution meets ptes Rendus Biologies328,701–712.

Hartl,F.,Hayer-Hartl,M.,2009.Converging concepts of protein folding in vitro and in vivo.Naturea Structural and Molecular Biology16,574–581.

Hecht,M.,Das,A.,Go,A.,Bradley,L.,Wei,Y.,2004.De novo proteins from designed combinatorial libraries.Protein Science13,1711–2173.

Huang,Y.,Liu,Z.,2009.Kinetic advantage of intrinsically disordered proteins in coupled folding-binding process:a critical assessment of the‘?y-casting’mech-anism.Journal of Molecular Biology393,1143–1159.

Ivankov,D.,Garbuzynskiy,S.,Alm,E.,Plaxco,K.,Baker,D.,Finkelstein,A.,2003.Con-tact order revisited:in?uence of protein size on the folding rate.Protein Science 12,2057–2062.

Kamtekar,S.,Schiffer,J.,Xiong,H.,Babik,J.,Hecht,M.,1993.Protein design by patterning of polar and nonpolar amino acids.Science262,1680–1685. Khinchin,A.,1957.Mathematical Foundations of Information Theory.Dover,New York.

Kim,W.,Hecht,M.,2006.Generic hydrophobic residues are suf?cient to promote aggregation of the Alzheimer’s A?42peptide.Proceedings of the Nattional Academy of Sciences of America103,15824–15829.

Krebs,M.,Domike,K.,Donald,A.,2009.Protein aggregation:more than just?brils.

Biochemical Society Transactions37(part4),682–686.

Levitt,M.,Chothia,C.,1976.Structural patterns in goblular protiens.Nature261, 552–557.

Li,J.,Browning,S.,Mahal,S.,Oelschlegel,A.,Weissman,C.,2010.Darwinian evolution of prions in cell culture.Science327,869–872.

Matsumoto,Y.,2002.An introduction to Morse Theory.Translations of the American Mathematical Society208,Providence,RI.

Maury,C.,2009.Self-propagating?-sheet polypeptide structures as prebiotic infor-mational molecular entities:the amylod world.Origins of Life and Evolution of Bioshperes39,141–150.

Mirny,L.,Shakhnovich,E.,2001.Protein folding theory:from lattice to all-atom models.Annual Reviews of Biophysics and Biomolecular Structure30,361–396.

Plaxco,K.,Simons,K.,Baker,D.,1998.Contact order,transition state placement and the refolding rates of single domain proteins.Journal of Molecular Biology277, 985–994.

Scheuner,D.,Kaufman,R.,2008.The unfolded protein response:a pathway that links insulin demand with?-cell failure and diabetes.Endocrinology Reviews 29,317–333.

Sawaya,M.,Sambashivan,S.,Nelson,R.,Ivanova,M.,2007.Atomic structures of amyloid cross-?spines reveal varied steric zippers.Nature447,453–457. Serdyuk,I.,2007.Structured proteins and proteins with intrinsic disorder.Molecular Biology41,297–313.

Thomas,D.,1992.Concepts in protein folding.Federation of European Biochemical Societies307,10–13.

Tlusty,T.,2007a.A model for the emergence of the genetic code as a transition in a noisy information channel.Journal of Theoretical Biology249,331–342. Tlusty,T.,2007b.A relation between the multiplicity of the second eigenvalue of a graph Laplacian.Courant’s nodal line theorem and the substantial dimension of tight polyhedral surfaces.Electrical Journal of Linear Algebra16,315–324. Tlusty,T.,2008a.Rate-distortion scenario for the emergence and evolution of noisy molecular codes.Physical Review Letters100,048101–048104.

Tlusty,T.,2008b.A simple model for the evolution of molecular codes driven by the interplay of accuracy,diversity and cost.Physical Biology5,016001.

Tlusty,T.,2008c.Casting polymer nets to optimize noisy molecular codes.Proceed-ings of the National Academy of Sciences of America105,8238–8243.

Tlusty,T.,2010a.A colorful origin for the genetic code:information theory,statis-tical mechanics and the emergence of molecular codes.Physics of Life Reviews, doi:10.1016/j.plrev.2010.06.002.

Tlusty,T.,2010b.Reply to comment.Physics of Life Reviews7,381–384.

Tycko,R.,2006.Molecular structure of amyloid?brils:insights from solid-state NMR.

Quarterly Reviews of Biophysics39,1–55.

Uversky,V.,Old?eld,C.,Dunker,K.,2008.Intrinsically disordered proteins in human diseases:introducing the D2concept.Annual Reviews of Biophysics37,215–246.

Wallace,R.,Fullilove,M.,2008.Collective Consciousness and its Discontents:Institu-tional Distributed Cognition.Racial Policy and Public Health in the United States.

Springer,New York.

Wallace,R.,Wallace,D.,2008.Punctuated equilibrium in statistical models of gener-alized coevolutionary resilience:how sudden ecosystem transitions can entrain both phenotype expression and Darwinian selection.Transactions on Compu-tational Systems Biology IX,LNBI5121,23–85.

Wallace,R.,Wallace,D.,2009.Code,context,and epigenetic catalysis in gene expression.Transactions on Computational Systems Biology XI,LNBI5750,283–334.

Wallace,R.,Wallace,D.,in press.Cultural epigenetics:on the heritability of complex diseases.Transactions on Computational Systems Biology.

Wallace,R.,2010a.A rate distortion approach to protein symmetry.BioSystems101, 97–108.

Wallace,R.,2010b.A scienti?c open season.Physics of Life Reviews7,377–378. Wang,L.,Maji,S.,Sawaya,M.,Eisenberg,D.,Riek,R.,2008.Bacterial inclusion bodies contain amyloid-like structure.PLOS Biology6,e195.

Weinstein,A.,1996.Groupoids:unifying internal and external symmetry.Notices of the American Mathematical Society43,744–752.

Yang,W.,Gruebele,M.,2004.Folding?-repressor at its speed limit.Biophysical Journal87,596–608.

本文来源:https://www.bwwdw.com/article/f2g4.html

Top