Nature - The International Journal of Science / 15 February 2024

  • Author / Uploaded
  • CC

Table of contents :
Absence of female partners explains dawn chorus.
An early look at birth cohort genetics in China.
Calling all engineers: Nature wants to publish your research.
EU climate policy is dangerously reliant on untested carbon-capture technology.
CROWNING GLORY: RARE FOSSIL REVEALS TREE’S STRUCTURE.
WHAT THE BRAINS OF PRICKLY PEOPLE HAVE IN COMMON.
FIRST PASSAGES OF ROLLED-UP HERCULANEUM SCROLL REVEALED.
HOW SOCIAL MEDIA COULD SWAY ELECTIONS IN 2024.
JWST IS THE MOST IN-DEMAND TELESCOPE EVER.
AI MODEL LEARNT LANGUAGE BY SEEING THE WORLD LIKE A BABY.
WHY AUTOIMMUNE DISEASE IS MORE COMMON IN WOMEN.
EU UNVEILS CLIMATE TARGET: WHAT SCIENTISTS THINK.
CERN’S $17-BILLION SUPERCOLLIDER WOULD DWARF LHC.
THE TRY-IT-ALL APPROACH TO PRECISION MEDICINE.
HOW CULTURE WARS ARE AFFECTING US UNIVERSITIES.
Do billionaire philanthropists skew global health research?
New type of magnetism splits from convention.
Unravelling how plant cells divide and differ.
How the brain produces and perceives speech.
Layered ferroelectric materials make waves.
Optimally generate policy-based evidence before scaling.
Rapid spin changes around a magnetar fast radio burst.
Non-Abelian topological order and anyons on a trapped-ion processor.
Observation of plaid-like spin splitting in a noncoplanar antiferromagnet.
A 2D ferroelectric vortex pattern in twisted BaTiO 3 freestanding layers.
Single-photon superradiance in individual caesium lead halide quantum dots.
Designer phospholipid capping ligands for soft metal halide nanocrystals.
Global population profile of tropical cyclone exposure from 2002 to 2019.
Critical transitions in the Amazon forest system.
The Born in Guangzhou Cohort Study enables generational genetic discoveries.
A molecular switch for neuroprotective astrocyte reactivity.
Dynamic behaviour restructuring mediates dopamine-dependent credit assignment.
Large-scale single-neuron speech sound encoding across the depth of human cortex.
Single-neuronal elements of speech production in humans.
SHR and SCR coordinate root patterning and growth early in the cell cycle.
SlyB encapsulates outer membrane proteins in stress-induced lipid nanodomains.
Naturally occurring T cell mutations enhance engineered T cell therapies.
Matrix viscoelasticity promotes liver cancer progression in the pre-cirrhotic liver.
The energetic and allosteric landscape for KRAS inhibition.
Coordination of cohesin and DNA replication observed with purified proteins.
Transcription–replication interactions reveal bacterial genome regulation.
Oxygen-evolving photosystem II structures during S 1 –S 2 –S 3 transitions.
HOW TO LAND A JOB WHEN THE ECONOMY IS STRUGGLING.
Corrections.

Citation preview

https://shop126307946.taobao.com/?spm= 2013.1.1000126.3.c9895b91Isk2Oc

https://doi.org/10.1038/d41586-024-00264-9

News & views Evolution

Absence of female partners explains dawn chorus Diego Gil

Why birds sing intensely in a dawn chorus during the early morning has long been debated. Evidence gathered from observing birds in the wild offers a fresh perspective on what might drive this phenomenon. hypothesis, an intense chorus will happen whenever two conditions are met. First, that females and males separate briefly at dawn and dusk (if females leave their roost later than males in the morning and go to roost earlier than males in the evening); and second, that the males are likely to sing more when their mate is absent. To test this hypothesis, the authors recorded the song rates of male blue tits (Cyanistes caeruleus; Fig. 1) in southern Germany, and related these to the presence or absence of their female partners. Schlicht and colleagues first tagged the birds with radio-frequency-identification (RFID) tags to enable the

NATABA

Although it is widely recognized that solitude can boost artistic production, few people would have guessed that this might apply to animals, too. Writing in Proceedings of the Royal Society B, Schlicht et al.1 support this idea by showing that the intense dawn chorus of male birds can be explained by the absence of female partners. The dawn chorus refers to a period in the morning, well known to early risers, when birds seem to engage in a frenzy of singing. Some time before sunrise, particularly in spring, many species sing loudly simultaneously, and the result is an intense chorus that wanes when the Sun rises. A similar, but less intense, peak of singing, known as the dusk chorus, occurs before the Sun sets. As common as it is, the dawn chorus of birds has puzzled people for centuries, and there is still no consensus regarding its cause. Ornithologists have proposed at least nine hypotheses, none of which seems to accommodate all the existing data2,3. The hypotheses with greatest support so far are those that propose that male birds sing at dawn to warn off competitors or to guard their mates at the peak of their fertility4, or that birds sing to re-establish territorial boundaries after the night5. Other hypotheses provide more mechanistic explanations — for example, that dawn singing enables birds to use up the energy surplus that was stored for their needs at night6 or that birds take the opportunity to sing at a time when predation levels are lower than usual3. Although some of these hypotheses do a good job of explaining particular patterns for various species, none of them seems to be applicable to all dawn-chorus phenomena2. Schlicht et al. propose a hypothesis that aims to explain dawn and dusk choruses in all bird species. According to their female-absence

location of the animals to be determined. They then placed an automatic RFID recorder near the nest box, and used this system to determine whether the female was inside or outside the nest box. After filtering the recordings to make sure that the bird of interest was the one singing in the recordings, Schlicht et al. found that the males sang at high rates while their female partners were still roosting in the nest box at dawn, and stopped singing as soon as the females left the nest box to join them. Similarly, males were more likely to sing when the females went to roost in the evening, and song rates also increased whenever the females entered the nest box during the day. Interestingly, male song rates increased in a linear manner as separation time from the female increased during the day. However, this linear relationship predicts an impossibly high dawn song rate after overnight separation, which was not observed. To check for any extra support for the generality of their hypothesis, the authors searched published work and found some circumstantial evidence that males and females of most species show a mismatch in their daily activity, leading to separation periods at dawn and dusk. Furthermore, several studies (but not all) indicate that male singing activity increases when a female partner is separated from the male in the context of experimental manipulations.

Figure 1 | A singing blue tit (Cyanistes caeruleus).

https://shop126307946.taobao.com/?spm= 2013.1.1000126.3.c9895b91Isk2Oc

Nature | 1

News & views The patterns found for blue tits are exciting for this field, and do convincingly fit this female-absence hypothesis. They also provide support to observations that show similar patterns in other species. However, further work will be necessary to determine whether Schlicht and colleagues’ hypothesis describes a universally applicable mechanism for the dawn chorus. Birdsong in males has a dual function. It is used to defend territories against other males and to attract females, and which of these two roles is dominant varies between species and can depend on the breeding stage for a given species7. Schlicht et al. conducted their study during the ten days around egg laying. This is the fertile period of the blue tit, one of the species in which male dawn singing reaches a maximum at this precise stage of the breeding season when the female is inside the nest. Thus, the change in male singing activity in relation to the presence of a fertile female about to mate makes sense from a mate-guarding perspective. The male cannot guard the female from copulation attempts by other males when she leaves the nest if he sings when she is outside the nest2,8. However, not all bird species sing during

the female’s fertile period: some even become completely silent at this time9. In addition, dawn choruses can happen throughout the breeding season, by males engaging in feeding chicks and even by unpaired males. It remains to be determined whether, in situations such as those and when females are no longer fertile, the singing patterns found in this study also occur. The female-absence hypothesis offers a mechanistic explanation for the dawn chorus, but the authors do not propose an evolutionary function. Rather, Schlicht and colleagues speculate that, possibly after an origin not related to evolutionary processes of natural selection or mate choice by sexual selection, a full range of functions mediated by this singing might have been acquired over time, such as strengthening the pair bond, manipulating female behaviour or displaying male quality. This phenomenon nevertheless begs for a functional explanation. Although the first of the two conditions of the hypothesis (separation between males and females at dawn or dusk) could be explained by an origin due to inherent sex-specific differences under natural selection, the second condition (less singing when the female is present) strongly

2 | Nature

https://shop126307946.taobao.com/?spm= 2013.1.1000126.3.c9895b91Isk2Oc

suggests that this pattern of singing has a function. And even if the patterns predicted by the female-absence hypothesis can be applied to other species, we need to address the following key question. Why do males sing more when females are absent or less when females are present? Diego Gil is in the Department of Evolutionary Ecology, National Museum of Natural Sciences, 28006 Madrid, Spain. e-mail: [email protected] 1. 2.

3.

4. 5. 6. 7.

8. 9.

Schlicht, L., Schlicht, E., Santema, P. & Kempenaers, B. Proc. R. Soc. B 290, 20232266 (2023). Gil, D. & Llusia, D. in Coding Strategies in Vertebrate Acoustic Communication (eds Aubin, T. & Mathevon, N.) 45–90 (Springer, 2020). Staicer, C. E., Spector, D. A. & Horn, A. G. in Ecology and Evolution of Acoustic Communication in Birds (eds Kroodsma, D. E. & Miller, E. H.) 426–453 (Cornell Univ. Press, 1996). Mace, R. Nature 330, 745–746 (1987). Kacelnik, A. & Krebs, J. R. Behaviour 83, 287–308 (1983). McNamara, J. M., Mace, R. H. & Houston, A. I. Behav. Ecol. Sociobiol. 20, 399–405 (1987). Catchpole, C. K. & Slater, P. J. B. Bird Song: Biological Themes and Variations 2nd edn (Cambridge Univ. Press, 2008). Birkhead, T. R. & Møller, A. P. Sperm Competition and Sexual Selection (Academic, 1998). Liu, W.-C. & Kroodsma, D. E. Auk 124, 44–52 (2007).

The author declares no competing interests.

https://doi.org/10.1038/d41586-024-00079-8

News & views Population health science

An early look at birth cohort genetics in China Nicholas John Timpson

COSTFOTO/FUTURE PUBLISHING VIA GETTY

Genetic sequencing data from more than 4,000 Chinese participants in the Born in Guangzhou Cohort Study provide insights into the population, and a snapshot of what is to come in future phases of the project. The charting of genetic variation in populations is one of the standout achievements of human scientific endeavour and an area that has exploded since the early 2000s. Moving on from the first sequencing of the human genome, a combination of technology and ambitious sampling has changed the way people think about genetic variation, the nature of the relationship between genetic variants and measured characteristics (phenotypes) and the culture of trusting researchers to access our genomes. Writing in Nature, Huang et al.1 report on the first phase of the prospective and longitudinal Born in Guangzhou Cohort Study (BIGCS), which has been tracking a huge birth cohort in China since 2012 (ref. 2). The authors describe findings from a largescale genomic study containing whole-genome sequencing data from 4,053 Chinese participants — mainly mother–infant duos (Fig. 1), or mother–father–infant trios. The study provides insight into the population genetics of the sample, the parameters of the study itself and the data available within it, and is a precursor to expansion of the work, which could involve up to 50,000 participants. The BIGCS joins a growing number of largescale genetic data collections. Stemming from early investigations of populations and disease3–5, genome-wide sequencing data sets now encompass population-level analyses, such as those in Estonia (https://genomics. ut.ee/en), Finland (www.finngen.fi/en), Iceland (www.decode.com) and elsewhere, and focused clinical applications, such as Genomics England (www.genomicsengland. co.uk). With these substantial collections, the move towards representative sampling of population-level genetic variation is becoming a reality. The work presented by Huang et al. charts another step of the journey towards

a large and well-characterized cohort that includes genetic sequencing data. The sample itself is fascinating, with the potential to reveal information about the population genetics and characteristics of people from southern China. More broadly, this study presents an opportunity to expand the field of human genetics beyond existing data sets. There is a worrying absence of diversity in genetic data catalogues — particularly with respect to humanity’s shared ancestral genome6. However, even with the modest sample size of the first phase of the BIGCS genetics collection, Huang and colleagues have already started to address this lack of representation. They have identified new genetic variants, generated a panel of

reference genomes specific to individuals of Chinese descent, and examined patterns of genetic variation alongside demographic and linguistic variation. The resulting dissection of genetic ancestry groups in the population and the connection to local dialects is reminiscent of work done in the 1990s, which looked at the interplay between biological, demographical and cultural factors in determining the genetic composition of populations7. The applications of this study of genetic variation are limited by the size of this phase of BIGCS data collection, but the authors do report a series of associations between genetic variants and biomedical traits that seem to be specific to people from East Asia, and might be informative in understanding the causes of disease. Arguably, the work — at least at this stage — is better focused on the task of describing the potential of this data set, which includes detailed phenotypic and physical characteristics, weight changes from early life, and molecular profiles, such as levels of cholesterol, at different ages. Taken together with a growing genetics collection, this record of biomedical traits should become a key resource as sample sizes increase. We should also take a guarded interest in the applied genetic epidemiology used so far. This looks to explore the relationships between measures available in the cohort, such as the phenotypes of mothers (for example, height, blood glucose levels and lipid profile) and their children (for example, birth weight and

Figure 1 | A mother and her newborn infant in a maternity hospital in China.

https://shop126307946.taobao.com/?spm= 2013.1.1000126.3.c9895b91Isk2Oc

Nature | 1

News & views length). Although one would almost expect to see genetic epidemiology8 and its applications9 in this type of work, the findings so far are of interest and will prompt follow-up studies that can confirm or challenge the suggested intergenerational effects in this specific population. Probably the most important aspect of this contribution to the global collection of genetic data is that it shows more evidence that data generation and sharing are broadening, and that there is a shift towards a culture of acceptance and openness as members of the public become more comfortable with sharing their genetic information. The collection and study of such data call for cultural sensitivity and a

combination of ethical and scientific rigour, so it is encouraging to see progress in this area. We should also be excited about having a deeper understanding of populations, sociodemographic histories and fresh biological insights, as well as about the willingness of BIGCS participants to take part. Indeed, although ancestry, genetic association and applied research are illuminating, it is incumbent on the research community to remember the commitment of those who make this type of work possible.

2 | Nature

https://shop126307946.taobao.com/?spm= 2013.1.1000126.3.c9895b91Isk2Oc

Nicholas John Timpson is in the MRC Integrative Epidemiology Unit, University of Bristol, Bristol BS8 2BN, UK, and in Population

Health Sciences, Bristol Medical School, University of Bristol, Bristol, UK. e-mail: [email protected] 1. 2. 3. 4. 5. 6. 7. 8. 9.

Huang, S. et al. Nature https://doi.org/10.1038/s41586023-06988-4 (2024). Qiu, X. et al. Eur. J. Epidemiol. 32, 337–346 (2017). Fuchsberger, C. et al. Nature 536, 41–47 (2016). The Genome of the Netherlands Consortium. Nature Genet. 46, 818–825 (2014). Walter, K. et al. Nature 526, 82–90 (2015). Ramsay, M. Patterns 3, 100412 (2022). Ward, R. H., Redd, A., Valencia, D., Frazier, B. & Pääbo, S. Proc. Natl Acad. Sci. USA 90, 10663–10667 (1993). Davey Smith, G. & Ebrahim, S. Int. J. Epidemiol. 32, 1–22 (2003). Zhang, G. et al. PLoS Med. 12, e1001865 (2015).

The author declares no competing interests.

The international journal of science / 15 February 2024

Calling all engineers: Nature wants to publish your research

work might move the world closer to meeting the goals. In this context, studies that show how discoveries and inventions can be applied in real-world settings, including by testing and evaluating products and processes on large scales, are often highly relevant to the Nature Portfolio journals. Nature’s publishing criteria require that papers report original research that is of outstanding scientific importance. The journal also expects that a study reaches “a conclusion of interest to an interdisciplinary readership”. Our message is loud and clear — that readership includes engineers, as well as scientists from all disciplines.

Papers in engineering are under-represented, even neglected, in the journal. We want to change that.

Back to the future

L

ast month, materials scientist Matic JovičevićKlug and his colleagues reported how ‘red mud’, an iron oxide waste product generated during aluminium manufacturing, can be repurposed for ‘green’ steelmaking. Their findings1 have the potential to reduce carbon dioxide emissions from steelmaking by using a circular-economy approach. Had an article reported the implementation of this same process on a larger, even industrial scale, many readers might have been surprised to see it in Nature. Well, we want to change this perception. We want the world of engineering to know that its research, whether as a proof of concept or at the implementation stage, will be considered by Nature’s editors and reviewers, as it is already by colleagues at other Nature Portfolio journals. The most recent of these, Nature Chemical Engineering and Nature Reviews Electrical Engineering, were launched in January. We are proud to have already included some notable examples in Nature’s pages. On 31 January, for example, Zhixun Wang at Nanyang Technological University in Singapore and his colleagues described a method to produce flexible semiconductor fibres without defects or cracks that could be used in wearable devices2. One advantage of this technology, write Xiaoting Jia and Alex Parrott in an accompanying News and Views article3, is its industrial readiness, because the semiconductor fibres can be woven into fabrics using existing methods. So why emphasize our willingness to consider more such studies now? Last summer, Nature published a series of editorials on the Sustainable Development Goals (SDGs), the world’s plan to end poverty and achieve environmental sustainability. The plan isn’t going well — most of the goals and associated targets will not be met by the United Nations’ self-imposed 2030 deadline. The series brought home the realization that SDG-related research is not yet a priority for many researchers, especially for those in high-income countries, compared with their colleagues in low- and middle-income countries. Partly in response, more than 40 Nature Portfolio journals put out a collective call for papers on topics relevant to the SDGs as part of a drive to get researchers thinking about how their

Engineering is rooted deep in Nature’s history.”

By putting out this call for more engineering research, we are restoring a connection with engineers and the field of engineering that is rooted deep in Nature’s history. In Nature’s first issue, published on 4 November 1869, readers will find a discussion on the likelihood of silting in the Suez Canal4, one of the largest engineering projects of the nineteenth century. The canal was a hot news topic, because it was due to open two weeks later, on 17 November. There was much public debate, and a degree of anxiety about such geoengineering feats. A correspondent to Nature, Thomas Login, had worked on the 437-kilometre Ganges Canal, which had opened 15 years earlier to connect the Ganges and Yamuna rivers in India. The Ganges Canal’s waterways were intended to irrigate massive stretches of farmland, thereby reducing the risk of famine in a region where people had previously experienced hunger when the rains failed. I have no doubt there are many who will say the Suez Canal “is a total failure”, Login wrote. He was confident that the canal would succeed. This is not an isolated or rare example. Subsequent editions of Nature include engineering conversations and critiques. The journal also published regular reports of meetings of professional engineering societies — just as those of other scientific societies were discussed. The late nineteenth century was an age of ambitious, and controversial, imperial-era projects. It was also a time when scientists and engineers wanted to read about each other’s work in the same journal. As editors and publishers, we accept our share of responsibility for how things have turned out. Our responsibility now is to renew this connection.

Creating by collaborating We want to recognize engineering in other ways, too. By highlighting the profession’s approach to collaboration, for example. Last week, the Queen Elizabeth Prize for Engineering, equivalent in recognition to the Nobel prizes, was awarded to two engineering researchers for their contributions to the field of modern wind-turbine technology. Unlike recipients of some of the more wellknown science prizes, Andrew Garrad and Henrik Stiesdal were not rewarded for a single landmark achievement, but for their 40-year partnership in designing, testing and improving wind turbines that are now built on an industrial scale around the world. The prize recognizes decades of painstaking, sometimes incremental, and, yes, collaborative achievements. Their work also brought together researchers from other

Nature | Vol 626 | 15 February 2024 | 455 https://shop126307946.taobao.com/?spm=2013.1.1000126.3.c9895b91Isk2Oc

Editorials

fields, such as mathematics, fluid physics, electronics and materials science. Such an approach to problem-solving needs to become the norm if the world is to succeed in addressing global challenges, Stiesdal, a former chief technology officer at Siemens Wind Power, told Nature. We wholeheartedly agree. Engineering and science are like two ships that have set sail close together, but in many ways have gradually drifted apart. We can’t let that continue. Having engineers back in Nature’s pages is long overdue, not least for the health of our planet and the well-being of all people. 1.

Jovičević-Klug, M., Souza Filho, I. R., Springer, H., Adam, C. & Raabe, D. Nature 625, 703–709 (2024). 2. Wang, Z. et al. Nature 626, 72–78 (2024). 3. Jia, X. & Parrott, A. Nature 626, 38–39 (2024). 4. Login, T. Nature 1, 24 (1869).

EU climate policy is dangerously reliant on untested carboncapture technology Europe’s ambition for emissions reductions is to be welcomed — but look at the detail, and significant hazards emerge.

L

ast week, the European Commission published its long-awaited recommendations for climate targets for 2040. The commission, which is the executive arm of the European Union, is recommending that EU member states cut greenhouse-gas emissions by 90% by 2040, compared with 1990 levels. If countries agree, this would be an interim milestone, ahead of the European Climate Law, which sets out a legally binding target for net-zero emissions by 2050. A target cut of 90% is not as ambitious as some headlines suggest. The EU’s existing policies could reduce emissions by 88% by 2040, according to its own projections. This would be achieved mainly through phasing out coal, converting most fossil-fuel power to renewable sources such as solar, wind and tidal energy, and electrifying transport. There will still be emissions from some vehicles on the road, from shipping and from aviation. Some oil and gas power will also be in use. The commission is, therefore, proposing to accelerate technologies, such as carbon capture and storage (CCS), that can take up some of those remaining emissions and store the gases, possibly underground. The 2040 interim target was proposed by independent climate-science advisers to the EU, and it’s good to see their proposal being implemented. But the advisers also cautioned 456 | Nature | Vol 626 | 15 February 2024

There is not a single fully operational CCS plant in Europe.”

that getting to 90% by including CCS technologies will be challenging. The biggest obstacle is that the technology is not ready — a point reiterated by climate scientists who Nature spoke to in our news reporting of the announcement. At present, there is not a single fully operational CCS plant in Europe, nor a system for governing and regulating the technology. So far, ten CCS projects are planned in the EU, according to the commission’s science advisers. Assuming they all function, their combined carbon-capture capacity is expected to be less than the expected contribution from CCS to achieve the EU’s 2040 climate target. It is worth quoting the science advisers’ views on the risks versus the benefits of emphasizing CCS: “This presents a dilemma for policymakers who need to find ways to incentivise sustainable carbon removal scale-up, while avoiding the risk of disincentivising greenhouse-gas emission reductions in different sectors by more conventional means and a suitable governance system.” Emissions reductions “by more conventional means” includes efforts to cease the burning of fossil fuels; the advisers are keen to ensure that these are not sidelined by policymakers. The interim climate target will now be discussed by member states, and could face some opposition. European countries have historically set more-ambitious climate targets than other high-income countries, but some of the bloc’s largest nations, such as France and Germany, are now facing pressure to weaken climate commitments, or are actually reneging on climate pledges — as the United Kingdom is also doing. This pressure is coming from many sides, including political parties and specific sectors, such as farmers. People have legitimate fears about the loss of jobs in carbon-heavy industries and over who will pay the costs of converting to electric vehicles or decarbonizing home-heating systems. In this respect, the commission’s proposal could have been more persuasive. The document advocates for protecting the most vulnerable in the coming energy transition, as well as ensuring that EU industry stays competitive. But it is light on the specifics of how this should happen. There is a body of research on how to achieve an equitable and just climate transition. There are also lessons from other countries, notably the United States, on how at-risk communities should be supported. The European Commission should ask its science advisers to synthesize this knowledge in the same way they have synthesized research on scenarios for the climate target itself. Much of the current public discussion around climate policies presumes that of the options available, business as usual is a better, or neutral option, against which other choices are necessarily worse. But, as the commission emphasizes, “the costs and human impacts of a changing climate are large and growing”. Delaying action will itself be costly. That must be emphasized with more vigour and urgency over the next few months as the commission seeks to get agreement on its interim targets. It rightly has based its target on the consensus of scientific advice. It should consult its advisers as it begins its period of public engagement. Researchers can help by advising on not just how its targets could be achieved, but also the costs of not doing so.

https://shop126307946.taobao.com/?spm= 2013.1.1000126.3.c9895b91Isk2Oc

Selections from the scientific literature

L TO R: K H FUNG/SPL; PAUL CHINN/THE SAN FRANCISCO CHRONICLE/GETTY; TIM STONESIFER

Research highlights FULMINATING GOLD’S NANOPARTICLES SMOKED OUT AT LAST

UNLIKELY HERO RUSHES IN TO SAVE DROWNING SEAL PUP

Experiments on a highly explosive substance called fulminating gold reveal why its detonation produces unusual purple smoke. More than 400 years ago, alchemists discovered fulminating gold, a chemical compound that readily explodes into a cloud of purple smoke. It has long been thought that the smoke is composed of nanometre-sized gold particles that give it its distinctive colour. This previously unproven theory has been put to the test by Jan Maurycy Uszko and his colleagues at the University of Bristol, UK. The researchers used heat to detonate samples of fulminating gold and captured the emitted smoke on copper meshes. They then analysed this smoke using high-resolution electron microscopy and found that it contained nanometre-sized gold particles. The detected particles had two striking properties: they spanned a broad range of sizes, from 5 nanometres to more than 300 nanometres, and they were spherical. The team says that the explosion of fulminating gold could offer a way to create such particles that is faster than conventional approaches.

The dramatic rescue of an elephant seal pup swept out to sea could be the first recorded example of male altruism in the species. Altruism is found throughout the animal world. But the cost of helping others at the expense of an animal’s own survival or reproduction means that plenty of individuals — and even entire species — don’t dedicate much time to the practice. Seals, for instance, rarely take part in altruistic acts. But rarely doesn’t mean never. Sarah Allen, a retired ecologist in Inverness, California, and her colleagues were watching a colony of northern elephant seals (Mirounga angustirostris, pictured) at Point Reyes National Seashore in California when a pup less than two weeks old was caught in an undertow at sea. The pup was soon deep enough that it was struggling to keep its head above the water. Seemingly in response to the cries of the pup’s mother, an alpha male in the colony left his harem and charged into the sea. Adult males seldom engage with pups. But this bull gently pushed the pup back to shore. Mission complete, he lifted his nose and bellowed.

Nanoscale Adv. https://doi.org/ mf8b (2024)

WHAT THE BRAINS OF PRICKLY PEOPLE HAVE IN COMMON Some people interpret bumping shoulders with a stranger in a crowd as innocuous; others see it as a hostile act by the other party. Scientists have worked out how the brain reflects that tendency to assume malevolence in others’ intentions, called hostile attribution bias. Yizhou Lyu at the University of Chicago in Illinois and her colleagues studied 58 people who were presented with hypothetical scenarios that had negative outcomes; for example, a former employer forgetting to submit a letter of recommendation. In each case, participants ranked how hostile they found such actions. Sensors measured participants’ brain activity during the experiment. The authors then averaged each participant’s scores to get a measure of hostile attribution bias. They found that those who scored similarly showed similar brain activation in an area called the left ventromedial prefrontal cortex, which is crucial for decision-making and social evaluation. Participants who attributed their own and others’ behaviour to complex causes had lower hostile attribution bias, the authors found. These findings might inform training to promote healthier social interactions, they conclude.

Mar. Mamm. Sci. https://doi.org/ gtg8cj (2024)

CROWNING GLORY: RARE FOSSIL REVEALS TREE’S STRUCTURE Fossils of early trees are scarce, and specimens that include the tree’s crown are even scarcer. But Robert Gastaldo at Colby College in Waterville, Maine, and his colleagues analysed just such a fossil, which had been discovered in New Brunswick, Canada, and found that the crown of leaves had a surprising structure. The tree, which lived roughly 350 million to 352 million years ago, looked a bit like a fuzzy umbrella. A slender trunk just 16 centimetres wide and at least 2.85 metres tall was mostly bare. The top 0.75 metres, however, was covered by a dense mop of long, thin leaves, which grew directly from the trunk and extended horizontally outwards for a startling 1.75 metres. The team estimates that a full-grown specimen would have a canopy with a diameter greater than 6 metres. The ancient tree, which the authors named Sanfordiacaulis densifolia (artist’s impression pictured), was probably one of the first subcanopy trees. Its shape possibly maximized photosynthesis in the dappled shade of taller trees — which might have been just as peculiar. Most tall trees from this time are known only as fossilized trunks. Curr. Biol. https://doi.org/mf8d (2024)

J. Neurosci. https://doi.org/mf8c (2024)

Nature | Vol 626 | 15 February 2024 | 459 https://shop126307946.taobao.com/?spm=2013.1.1000126.3.c9895b91Isk2Oc

The world this week

VESUVIUS CHALLENGE

News in focus

Text from the Herculaneum scroll, which has been unseen for 2,000 years.

FIRST PASSAGES OF ROLLED-UP HERCULANEUM SCROLL REVEALED Researchers used artificial intelligence to decipher the text of 2,000-year-old charred papyrus scripts, unveiling musings on music and capers. By Jo Marchant

A

team of student researchers has made a gigantic contribution to solving one of the biggest mysteries in archaeology by revealing the content of Greek writing inside a charred scroll buried 2,000 years ago by the eruption of Mount Vesuvius. The winners of a contest called the Vesuvius Challenge trained their machine-learning algorithms on scans of the rolled-up papyrus, unveiling a previously unknown philosophical work that discusses the senses and pleasure. The feat

paves the way for artificial intelligence (AI) techniques to decipher the rest of the scrolls in their entirety, something that researchers say could have revolutionary implications for the understanding of the ancient world. The achievement has ignited the usually slow-moving world of ancient studies. It’s “what I always thought was a pipe dream coming true”, says Kenneth Lapatin, curator of antiquities at the J. Paul Getty Museum in Los Angeles, California. The revealed text discusses sources of pleasure, including music, the taste of capers and the colour purple. “It’s an historic moment,” says classicist Bob Fowler

at the University of Bristol, UK, one of the prize judges. The three students, from Egypt, Switzerland and the United States, who revealed the text share a US$700,000 grand prize. The scroll is one of hundreds of intact papyri excavated in the eighteenth century from a luxury Roman villa in Herculaneum, Italy. These lumps of carbonized ash — known as the Herculaneum scrolls — constitute the only library that survives from the ancient world, but are too fragile to open. The winning entry, announced on 5 February, reveals hundreds of words across 15 columns of text, corresponding to around 5% of a

Nature | Vol 626 | 15 February 2024 | 461 https://shop126307946.taobao.com/?spm=2013.1.1000126.3.c9895b91Isk2Oc

News in focus

Twenty-year mission In the centuries since the scrolls were discovered, many people have attempted to open them, destroying some and leaving others in pieces. Papyrologists are still working to decipher and stitch together the resulting, horribly fragmented, texts. But the chunks with the worst charring — the most hopeless cases, adding up to perhaps 280 entire scrolls — were left intact. Most are held in the National Library in Naples, Italy, with a few in Paris, London and Oxford, UK. Seales has been trying to read these concealed texts for nearly 20 years. His team developed software to “virtually unwrap” the surfaces of rolled-up papyri using 3D computed tomography (CT) images. In 2019, he took two of the scrolls from the Institut de France in Paris to the Diamond Light Source particle accelerator near Oxford to make high-resolution scans. Mapping the surfaces was time consuming, however, and the carbon-based ink used to write the scrolls has the same density as papyrus, so it was impossible to differentiate between the two in CT scans. Seales and his colleagues wondered whether machine-learning models might be trained to ‘unwrap’ the scrolls and distinguish the ink. But making sense of all the data was a gigantic task for his small team. Seales was approached by Silicon Valley entrepreneur Nat Friedman, who had become intrigued by the Herculaneum scrolls after watching a talk by Seales online. Friedman suggested opening up the challenge to contestants. He donated $125,000 to launch

the effort and raised hundreds of thousands more on Twitter (now X), and Seales released his software, along with the high-resolution scans. The team launched the Vesuvius Challenge in March 2023, setting a grand prize for reading 4 passages, of at least 140 characters each, before the end of the year. Key to the contest’s success was its “blend of competition and cooperation”, says Friedman. Smaller prizes were awarded along the way to incentivize progress, with the winning machine-learning code released at each stage to “level up” the community and allow contestants to build on each other’s advances.

The colour purple A key innovation came in the middle of last year, when US entrepreneur and former physicist Casey Handmer noticed a faint texture in the scans, similar to cracked mud — he called it “crackle” — that seemed to form the shapes of Greek letters. Luke Farritor, an undergraduate studying computer science at the University of Nebraska–Lincoln, used the crackle to train a machine-learning algorithm, revealing the word porphyras, or ‘purple’, which won him the prize for unveiling the first letters in October. Youssef Nader, an Egyptian computer-science PhD student at the Free University of Berlin, followed with even clearer images of the text and came second. Their code was released with less than three months left for contestants to scale up their reads before the 31 December deadline for the final prize. “We were biting our nails,” says Friedman. But in the final week, the competition received 18 submissions. A technical jury checked entrants’ code, then passed 12 submissions to a committee of papyrologists who transcribed the text and assessed each entry for legibility. Only one fully met the prize criteria: a

The Herculaneum scrolls were burnt and buried by an eruption of Mount Vesuvius in AD 79.

462 | Nature | Vol 626 | 15 February 2024

https://shop126307946.taobao.com/?spm= 2013.1.1000126.3.c9895b91Isk2Oc

team formed by Farritor and Nader, along with Julian Schilliger, a robotics student at the Swiss Federal Institute of Technology in Zurich. The results are “incredible”, says judge Federica Nicolardi, a papyrologist at the University of Naples Federico II. “We were all completely amazed by the images they were showing.” She and her colleagues are now racing to analyse the text that has been revealed.

Music, pleasure and capers The content of most of the previously opened Herculaneum scrolls relates to the Epicurean school of philosophy, founded by the Athenian philosopher Epicurus, who lived from 341 to 270 BC. The scrolls seem to have formed the working library of a follower of Epicurus named Philodemus. The new text doesn’t name the author, but from a rough first read, say Fowler and Nicolardi, it is probably also by Philodemus. As well as pleasurable tastes and sights, it refers to a figure called Xenophantus, possibly the flute player of that name mentioned by the ancient authors Seneca and Plutarch, whose evocative playing apparently caused Alexander the Great to reach for his weapons. Lapatin says the topics discussed by Philodemus and Epicurus are still relevant: “The basic questions Epicurus was asking are the ones that face us all as humans. How do we live a good life? How do we avoid pain?” But “the real gains are still ahead of us”, he says. “What’s so exciting to me is less what this scroll says, but that the decipherment of this scroll bodes well for the decipherment of the hundreds of scrolls that we had previously given up on.” There is likely to be more Greek philosophy in the scrolls: “I’d love it if he had some works by Aristotle,” says papyrologist and prize judge Richard Janko at the University of Michigan in Ann Arbor. Meanwhile, some of the opened scrolls, written in Latin, cover a broader subject area, raising the possibility of lost poetry and literature by writers from Homer to Sappho. The scrolls “will yield who knows what kinds of new secrets”, says Fowler. “We’re all very excited.” The achievement is also likely to fuel debate over whether further investigations should be conducted at the Herculaneum villa, entire levels of which have never been excavated. Janko and Fowler are convinced that the villa’s main library was never found, and that thousands more scrolls could still be underground. More broadly, the machine-learning techniques pioneered by Seales and the Vesuvius Challenge contestants could now be used to study other types of hidden text, such as that in cartonnage, recycled papyri often used to wrap Egyptian mummies. The next step is to decipher an entire work. Friedman has announced a new set of Vesuvius Challenge prizes for 2024, with the aim of reading 90% of a scroll by the end of the year. But in the meantime, just getting this far “feels like a miracle”, he says. “I can’t believe it worked.”

THE DIGITAL RESTORATION INITIATIVE/UNIV. KENTUCKY

scroll. “The contest has cleared the air on all the people saying will this even work,” says Brent Seales, a computer scientist at the University of Kentucky, Lexington, and co-founder of the prize. “Nobody doubts that anymore.”

much less about what happened in 2024 than what happened in 2020,” says Joshua Tucker, a computational social scientist at New York University. But, he adds, others are finding ways to work around the limitations, as Idris and Starbird are doing. “Researchers are creative.”

TIMOTHY A. CLARY/AFP VIA GETTY

Social-media studies

The United States is among the countries holding national elections this year.

HOW SOCIAL MEDIA COULD SWAY ELECTIONS IN 2024 Researchers are mapping out fresh approaches to studying social media’s political reach. By Heidi Ledford

T

he hum of the buzzers can be deafening when an election draws near. In Indonesia, which will hold a general election on 14 February, a swarm of buzzers — people paid to post large volumes of material on social media — is in full swing. Their aim: to sway the electorate. Amid the digital noise, Ika Idris, a social scientist at Monash University’s Jakarta campus, and her colleagues are trying to track changes in hate speech, as well as the influence of misinformation. An example of this is an artificial intelligence (AI)-generated ‘deepfake’ video that shows a presidential candidate speaking Chinese, which would suggest a close alliance with China. Previously, Idris had free access to data from X (formerly Twitter), but last year, the social-media platform ended its policy of free data access for academic researchers, and she cannot afford the fees. Now, Idris must ask collaborators in wealthier countries to share their data with her during the run-up to the election, giving her less room to experiment with search parameters. Some 13,000 kilometres away in Seattle,

Washington, computer scientist Kate Starbird and her colleagues at the University of Washington are studying how rumours spread on social media, as the United States moves towards its own presidential election in November. In the last election cycle in 2020, her e-mail inbox teemed with requests for collaborations and advice. This year it is much quieter, she says.

“We have to learn how to get insights from more limited sets of data. That offers the opportunity for creativity.” 2024 is the year of the election: nearly half of the world’s population lives in countries with elections this year. Meanwhile, social media’s reach continues to grow, and generative AI tools capable of creating deepfakes are becoming more accessible and more powerful than before. Yet, researchers say that they are in the worst position they’ve been in for years in monitoring the impact of these tools on elections. “When we close the book on 2024, there is a very good chance that we are going to know

In Europe, where nine countries as well as the European Union are expected to hold parliamentary elections this year, there is more optimism. The EU’s Digital Services Act (DSA) — sweeping legislation that aims, in part, to limit the spread of disinformation — is due to come into effect for social-media platforms on 17 February. Included in that act are provisions for giving vetted researchers access to data from social-media platforms to study systemic risks posed by social media in Europe. “I’m putting a lot of hope in the DSA,” says Philipp Lorenz-Spreen, a computational social scientist at the Max Planck Institute for Human Development in Berlin. For now, researchers do not yet know how these provisions will be implemented, including what kind of data will be provided, what kind of research will be deemed eligible for access and whether the data will be useful for those hoping to monitor the 2024 European elections. In countries outside the EU, researchers are anxious to see whether they will be eligible to use the DSA’s provisions to access data at all. Some platforms, including Facebook and X, have released early versions of interfaces for extracting large amounts of data in compliance with the DSA. When Lorenz-Spreen applied for access, X asked him to explain how his research would affect systemic risks to, among other things, public health, as well as the spread of illegal content and factors endangering fundamental rights in the EU. He is still awaiting a decision on his application. Even so, researchers abroad are hopeful that the DSA will provide them with an option for obtaining data — or, at the very least, that the DSA will inspire other countries to introduce similar legislation. “This is a door that’s opening,” says Maria Elizabeth Grabe, who studies misinformation and disinformation at Boston University in Massachusetts. “There is quite a bit of excitement.” But she can also feel the effects of political pressure in the United States on the field, and she worries that funders are shying away from research that mentions the word ‘disinformation’ to avoid drawing criticism — or even legal action — from technology companies and other groups. This is a worrying possibility, says Daniel Kreiss, who studies communication at the University of North Carolina at Chapel Hill. “We’re a pretty robust crew,” he says. “But what I most worry about is the future of the field and the people coming up without the protections

Nature | Vol 626 | 15 February 2024 | 463 https://shop126307946.taobao.com/?spm=2013.1.1000126.3.c9895b91Isk2Oc

News in focus of tenure.” Despite ongoing challenges, the community of researchers trying to assess the impacts of social media on society has continued to grow, says Rebekah Tromble, a political-communication researcher at George Washington University in Washington DC. And behind the scenes, researchers are exploring different ways of working, says Starbird, such as developing methods to analyse videos shared online and to work around difficulties in accessing data. “We have to learn how to get insights from more limited sets of data,” she says. “And that offers the opportunity for creativity.”

Donated data

464 | Nature | Vol 626 | 15 February 2024

JWST has captured images of spiral galaxies in unprecedented detail.

JWST IS THE MOST IN-DEMAND TELESCOPE EVER Only one in nine research proposals is likely to be approved in latest application cycle. By Rahul Rao

A

stronomers from around the world met in early February to review the latest crop of research proposals for the James Webb Space Telescope ( JWST). They sifted through 1,931 submissions — the most ever received for any telescope in history — and ranked them. By the time the reviewers begin releasing their decisions late this month, only one in every nine proposals will have been allotted time to collect data with JWST. The huge demand is an indicator of the space observatory’s immense success: it has wowed astronomers by spotting some of the earliest galaxies ever seen and has uncovered more black holes in the distant Universe than was predicted. Launched in December 2021, it is the hottest property in astronomy. But oversubscription leaves many sound research projects in limbo. “The overwhelming majority of submitted JWST proposals are very good, totally worth doing, absolutely should be done if time allows,” says Grant Tremblay, an astronomer at the Harvard–Smithsonian Center for Astrophysics in Cambridge, Massachusetts. “But

https://shop126307946.taobao.com/?spm= 2013.1.1000126.3.c9895b91Isk2Oc

most of them will be rejected.” Using JWST can take anywhere from a few minutes for a simple project to hundreds of hours for a major survey. When researchers apply for observing time, they are competing for limited slots — some of which are automatically earmarked for scientists who helped to develop the telescope, including those at the European Space Agency and the Canadian Space Agency. This is JWST’s third proposal submission-and-review cycle. During the first, the Space Telescope Science Institute (STScI) in Baltimore, Maryland, which operates JWST, received 1,084 submissions; reviewers gave the green light to one out of every five. During the second review cycle, submissions rose by about 35%, and the acceptance rate dropped to one in seven. For the first cycle, applications were due before the telescope had even lifted off from Earth. Many astronomers were reluctant to put their energy into writing proposals for an instrument that might not succeed, says Christine Chen, leader of the group at the STScI that issues calls for proposals. “As time has gone on, Webb has just performed so beautifully that people are having

NASA, ESA, CSA, STSCI, JANICE LEE (STSCI), THOMAS WILLIAMS (OXFORD) & THE PHANGS TEAM

Some researchers are using qualitative methods such as conducting targeted interviews to study the effects of social media on political behaviour, says Kreiss. Others are asking social-media users to voluntarily donate their data, sometimes using browser extensions. Tucker has conducted experiments in which he pays volunteers a small fee to agree to stop using a particular social-media platform for a period, then uses surveys to determine how that affected their exposure to misinformation and the ability to tell truth from fiction. Tucker has conducted such experiments in Bosnia, Cyprus and Brazil, and plans to extend them to South Africa, India and Mexico, all of which will hold elections this year. Most research on social media’s political influence has been done in the United States, and research in one country doesn’t necessarily apply to another, says Philip Howard, a social scientist and head of the International Panel on the Information Environment, a non-profit organization based in Zurich, Switzerland, with researchers from 55 countries. “We know much more about the effects of social media on US voters than elsewhere,” he says. That bias can distort the view of what’s happening in different regions, says Ross Tapsell, who studies digital technologies with a focus on Southeast Asia at Australian National University in Canberra. For example, researchers and funders in the West often focus on foreign influence on social media in southeast Asia. But Tapsell says that researchers in southeast Asia are more concerned about local sources of misinformation, such as those that are amplified by buzzers. The buzzers of Indonesia have counterparts in the Philippines, where they are called trolls, and Malaysia, where they are called cybertroopers. In the absence of relevant and comprehensive data about the influence and sources of misinformation during elections, conflicting narratives built on anecdotes can take centre stage, says Paul Resnick, a computational social scientist at the University of Michigan in Ann Arbor. “Anecdotes can be misleading,” he says. “It’s just going to be a fog.”

an easier and easier time envisioning how it’s going to advance their science,” she says. “It’s natural that the community is excited.” Still, demand for JWST is unprecedented. It has surpassed that for the 33-year-old Hubble Space Telescope, its predecessor flagship observatory. For most of Hubble’s lifetime, reviewers have approved between one in four and one in six of the proposals submitted. One reason for JWST’s popularity is that it has capabilities that other telescopes don’t. It is the most powerful infrared space telescope ever built, so it can observe objects in the very distant Universe and can scan the atmospheres of exoplanets for molecules that other instruments can’t see. In fact, a proposal’s specificity to JWST is one of the reviewers’ criteria. If an experiment can be done with another telescope, it will almost certainly not receive JWST time, Chen says. “We want to execute projects that you can do no other way.”

WAI KEEN VONG

Pain points A large portion of the JWST proposals that get rejected are resubmitted during the next review cycle. Reviewers encourage researchers to fine-tune their submissions — usually to clarify their scientific justification for a project — and try again. Tremblay, for example, had one proposal rejected during JWST’s first cycle but accepted, with some edits, in the second. “High oversubscription is horrible, but it does drive rigour in the preparation [of proposals] and ensure the science is strong,” says Thomas Haworth, an astrophysicist at Queen Mary University of London. JWST cost a lot — more than US$10 billion to develop — so “we want to make sure it does the best science it can”, he adds. Would-be users are not the only ones feeling the pain of JWST’s oversubscription rate. Tremblay says that the ballooning number of proposals is placing an increasing burden on those volunteering their time to be on review panels. “It’s a lot of work. I don’t think the process as it exists now can scale up much further,” he adds. This is not a JWST-specific problem. The holder of the previous record for most proposals — the Atacama Large Millimeter/submillimeter Array (ALMA) in northern Chile — received 1,838 submissions during a review cycle that began in 2018. By 2021, ALMA, an internationally funded radio observatory studying how stars and planets form, among other things, had mostly switched to a distributed peer-review system. In this approach, a researcher who submits a proposal is required to review a certain number of their peers’ proposals in the same cycle. If they do not, their own proposals might face disqualification. Whether or not JWST retains its current review system, astronomers’ desire to use it is likely to remain high for years to come — at least until another instrument of the same calibre opens its aperture.

AI MODEL LEARNT LANGUAGE BY SEEING THE WORLD LIKE A BABY A neural network taught itself to recognize objects using the filmed experiences of a single infant. By Elizabeth Gibney

A

n artificial intelligence (AI) model has learnt to recognize words such as ‘crib’ and ‘ball’, by studying headcam recordings of a tiny fraction of a single baby’s life. The results indicate that AI can help us to understand how humans learn, says Wai Keen Vong, a researcher in AI at New York University. This has previously been unclear, because other language-learning models such as ChatGPT learn from billions of data points, which is not comparable to the realworld experiences of an infant, says Vong. “We don’t get given the Internet when we’re born.” The authors hope that the research, reported in Science on 1 February, will feed into long-standing debates about how children learn language (W. K. Vong et al. Science 383, 504–511; 2024). The AI learnt only by building associations between the images and words it saw together; it was not programmed with any other prior knowledge about language. That challenges some cognitive-science theories that, to attach meaning to words, babies need some innate knowledge about how language works, says Vong.

The study is “a fascinating approach” to understanding early language acquisition in children, says Heather Bortfeld, a cognitive scientist at the University of California, Merced.

Baby’s-eye view Vong and his colleagues used 61 hours of recordings from a camera worn by a baby boy named Sam, to gather experiences from the infant’s perspective. Sam, who lives near Adelaide in Australia, wore the camera for around one hour twice a week, from the age of six months to around two years. The researchers trained their neural network — an AI inspired by the structure of the brain — on frames from the video and words spoken to Sam, transcribed from the recording. The model was exposed to 250,000 words and corresponding images, captured during activities such as playing, reading and eating. The model used a technique called contrastive learning to learn which images and text tend to go together and which do not, to build up information that can be used to predict which images certain words, such as ‘ball’ and ‘bowl’, refer to. To test the AI, the researchers asked the model to match a word with one of four candidate images, a test that is also used to evaluate

Sam — here aged 18 months — wore a camera whose recordings trained an AI model.

Nature | Vol 626 | 15 February 2024 | 465 https://shop126307946.taobao.com/?spm=2013.1.1000126.3.c9895b91Isk2Oc

News in focus

Lessons about learning The study’s reliance on data from a single child might raise questions about the generalizability of its findings, because childrens’ experiences and environments vary greatly, says Bortfeld. But the exercise revealed that a

lot can be learnt in the infant’s earliest days by forming associations only between different sensory sources, she adds. The findings also challenge scientists — such as US linguist Noam Chomsky — who say that language is too complex, and the input of information too sparse, for language acquisition to happen through general learning processes. “These are among the strongest data I’ve seen showing that such ‘special’ mechanisms are not necessary,” says Bortfeld. Real-world language learning is much richer and more varied than the AI experienced. The researchers say that, because the AI is limited to training on still images and written text, it could not experience interactions that are inherent to a real baby’s life. The AI struggled to learn the word ‘hand’, for example, which is usually learnt early in an infant’s life, says Vong. “Babies have their own hands, they have a lot of experience with them. That’s definitely a missing component of our model.”

WHY AUTOIMMUNE DISEASE IS MORE COMMON IN WOMEN Rogue antibodies are drawn to the protein–RNA coating on one of the X chromosomes in an XX cell. By Elie Dolgin

W

hy are women so much more susceptible to autoimmune diseases than men? A new explanation for the discrepancy has emerged: a molecular coating typically found on half of a woman’s X chromosomes — but not usually in men’s cells — might be provoking unwanted immune responses (D. R. Dou et al. Cell 187, 733–749; 2024). The coating, a mix of RNA and proteins, is central to a developmental process called X-chromosome inactivation. Researchers had previously suspected flawed gene regulation on the X chromosome of being a driver of the autoimmune disparity. But the discovery that proteins central to X-chromosome inactivation can themselves set off immunological alarm bells adds yet another layer of complexity — and could point to new therapies. “This really adds a new mechanistic twist,” says Laura Carrel, a geneticist at the Pennsylvania State College of Medicine in Hershey.

includes conditions such as lupus and rheumatoid arthritis. What causes this sex bias has long been a mystery. A prime suspect is the X chromosome: in most mammals, including humans, a male’s cells typically include one copy, and a female’s cells typically carry two. (This article uses ‘women’ and ‘female’ to

describe people with two X chromosomes and no Y chromosome, reflecting the language of the study, while acknowledging that gender identity and chromosomal make-up do not always align.) X-chromosome inactivation muffles the activity of one X chromosome in most XX cells, making their ‘dose’ of X-linked genes equal to that of the XY cells typical in males. The process is highly physical: long strands of RNA known as XIST (pronounced ‘exist’) coil around the chromosome, attracting dozens of proteins to form complexes that effectively muzzle the genes inside. Not all genes stay quiet, however, and those that escape X inactivation are thought to underpin some autoimmune conditions. But that is not the whole story.

XISTential questions Almost a decade ago, Howard Chang, a dermatologist and molecular geneticist at Stanford University School of Medicine in California and a co-author of the current study, noticed that many of the proteins that interact with XIST are targets of misguided immune molecules called autoantibodies. These rogue actors can attack tissues and organs, leading to damage characteristic of autoimmune diseases. Because XIST is normally expressed only in XX cells, it seemed logical that the autoantibodies that attack XIST-associated proteins might be a bigger problem for women than for men. Chang and his colleagues tested their idea using male mice, which don’t usually express XIST. The team bioengineered the mice to produce a form of XIST that did not silence gene expression but did form the characteristic RNA–protein complexes. The researchers induced a lupus-like disease in the mice and found that animals that expressed XIST had higher autoantibody levels than those that didn’t. Their immune cells were also on higher alert, a sign of predisposition to autoimmune attacks, and the animals showed more extensive tissue damage.

Immune-system overdrive

Medical mystery Women account for around 80% of all cases of autoimmune disease, a category that

The X chromosome (artificially coloured).

466 | Nature | Vol 626 | 15 February 2024

https://shop126307946.taobao.com/?spm= 2013.1.1000126.3.c9895b91Isk2Oc

Notably, the same autoantibodies were also identified in blood samples from people with lupus or the autoimmune disorders scleroderma and dermatomyositis. Montserrat Anguera, a geneticist at the University of Pennsylvania in Philadelphia, points to the human data as validation that the XIST-related mechanisms observed in mice have direct relevance to human autoimmune conditions, with implications for disease management. For example, diagnostics targeting these autoantibodies could assist clinicians in detecting various autoimmune disorders. “This is a cool start,” she says. “If we could use this information to expedite the diagnosis, it would be amazing.”

LENNART NILSSON, TT/SPL

children’s language abilities. It successfully classified the object 62% of the time — much better than the 25% expected by chance, and comparable to a similar AI model that was trained on 400 million image–text pairs from outside this data set. For some words, such as ‘apple’ and ‘dog’, the model was able to correctly identify previously unseen examples — something humans generally find relatively easy. On average, it did so successfully 35% of the time. The AI was best at identifying objects that vary little in their appearance, says Vong. Words that can refer to a variety of items — such as ‘toy’ — were harder to learn.

JENS SCHLUETER/GETTY

Coal-fired power: the European Commission wants to phase out the energy source by 2040.

EU UNVEILS CLIMATE TARGET: WHAT SCIENTISTS THINK The goal leans heavily on the largely unproven approach of carbon removal, concerning researchers. By Katharine Sanderson & Carissa Wong

T

he European Commission has unveiled an ambitious climate target for 2040 — aiming to cut net greenhouse-gas emissions by 90% compared with 1990 levels. Researchers say that the goal, although admirable, risks relying too much on technologies such as carbon removal — which is largely unproven — rather than prioritizing the cessation of fossil-fuel use. Political shifts to the right, with many European Union member states electing governments that are unlikely to prioritize climate policy, might also make the goal difficult to achieve. “It’s going to be very difficult to reach a 90% or 95% emissions reduction without cutting very strongly on fossil fuels,” says Richard Klein, a climate researcher at the Stockholm Environment Institute. “Carbon capture and storage is great if it works,” says Klein. “But it simply hasn’t been shown to work at the scale that would be needed — it remains a pipe dream.” The target was revealed in a report on 6 February. It is not yet legally binding, it will form the basis of legislation designed to take the EU beyond its existing targets for 2030, and onto

its goal for 2050. The commission’s current targets, which were set in 2021 and are legally binding, are to reduce net greenhouse-gas emissions by at least 55% from 1990 levels by 2030. The other goal commits the bloc to achieving climate neutrality by 2050. That means ensuring that greenhouse-gas emissions are equal to or less than the emissions absorbed from the atmosphere by natural processes. By 2022, the EU had decreased emissions by 32.5% from 1990 levels.

“It’d be very dangerous to rely strongly on carbon capture and storage.” The 2040 target focuses on a ‘net cut’, meaning that the goal can be met by reductions to emissions, alongside technologies such as carbon capture and storage (CCS) that lock emissions underground. The commission also wants to phase out coal-fired power by 2040, as well as fossil-fuel subsidies, which it says “do not address energy poverty or just transition”. The latest target incorporates scientists’ recommendations, says Joeri Rogelj, a climate scientist at Imperial College London who is a

member of the European Scientific Advisory Board on Climate Change, which includes climate scientists from across the EU and which advises the commission. Rogelj says that the board recommended a target of slashing greenhouse-gas emissions by 90% to 95% by 2040. “With this communication, of aiming for a reduction of 90% — it definitely falls in that range,” Rogelj says. “It’s positive to see that the advice was taken up.” But the finer details of the strategy — in particular the inclusion of targets for carbon removal — have attracted criticism. CCS has not yet been proved to be effective on a large scale. “It’d be very dangerous to rely strongly on carbon capture and storage, because it would give the signal that you can basically continue to invest in fossil fuels, and that will go very much against the idea of what was agreed in Dubai at COP28,” says Klein, referring to the last year’s United Nations climate summit in the United Arab Emirates. The focus on removal echoes a bold target proposed last year by the administration of US President Joe Biden, which also focused on CCS technologies.

Falling behind Rogelj is pleased that the communication explicitly separates out emissions reductions from carbon removal — so member states can’t just rely on removing carbon, they must also reduce emissions in parallel. “Carbon dioxide removal definitely comes with this kind of risk of obfuscating what actually needs to happen, by expecting that indeed carbon dioxide will be removed,” he says. Although the commission’s 2040 target provides a more detailed plan towards achieving net zero — when greenhouse-gas emissions are zero or completely balanced by removal mechanisms — it will be important to ensure that it doesn’t detract from efforts to meet the 2030 goals, says Klein. Several countries already aren’t on track to meet the 2030 targets, and a political shift to the right in many EU nations makes it less likely that the bloc will meet the existing goal, says Klein. “We’ve got several countries with governments either just installed or in the making, like in the Netherlands, where the governments are likely to be led by parties who either don’t believe in climate change or don’t consider climate policy to be particularly the priority,” he says. Researchers also say that, although reducing carbon emissions is crucial, there needs to be more focus on adaptation — lessening the current or future effects of climate change, such as by building flood barriers. “We can’t effectively mitigate climate change without more-ambitious finance for adapting to the impacts it’s already having,” says Mikael Allan Mikaelsson, a climate-policy researcher at the Stockholm Environment Institute.

Nature | Vol 626 | 15 February 2024 | 467 https://shop126307946.taobao.com/?spm=2013.1.1000126.3.c9895b91Isk2Oc

News in focus for Particle Physics. Fabiola Gianotti, CERN’s director-general, told journalists that the strategy found the FCC to be “the most compelling scientific instrument” of those it considered. Building the FCC is far from a done deal. A large part of the price tag will be covered by the existing CERN budget, added Gianotti. But the project will still require financial contributions from the countries that are full members of CERN, and from others such as the United States and Japan. The briefing did not provide any information on what these costs could be. “They seemed to be dodging giving specific numbers, like cost, and what might be shared by non-member states,” says Michael Riordan, a physics historian based in Eastsound, Washington.

The new collider would smash together electrons and positrons (artist’s impression).

CERN’S $17-BILLION SUPERCOLLIDER WOULD DWARF LHC A feasibility study on the Future Circular Collider identifies where and how the machine could be built. By Elizabeth Gibney & Davide Castelvecchi

E

urope is pushing forward with plans to build a 91-kilometre-long, 15-billionSwiss-franc (US$17-billion) supercollider under the French and Swiss countryside. The machine would allow researchers to study the Higgs boson in detail. But scientists are under pressure to convince funders that such an enormous investment is worth it, given the lack of new physics revealed by the Large Hadron Collider (LHC). Details of CERN’s plan emerged from a midterm report studying the feasibility of the Future Circular Collider (FCC), which would dwarf its predecessor, the 27-kilometre LHC at CERN, Europe’s particle-physics laboratory near Geneva, Switzerland. The first phase of the study — which focused on identifying where and how such a machine could be built in the CERN region — revealed “no technical or scientific showstoppers” that would prevent its construction, said Eliezer Rabinovici, president of the CERN Council, the organization’s governing body, at a press briefing on 5 February. Construction of the machine, which will require boring a circular tunnel around 468 | Nature | Vol 626 | 15 February 2024

200 metres underground, could begin as early as 2033. The 91-kilometre tunnel, which the design suggests should be interrupted by four experimental halls, would encircle an area bigger than Chicago in Illinois. The council reviewed the report on 2 February, but the document itself has not been made public. The full study will be published next year, with a go or no-go decision on the project expected before 2028. Former CERN director-general Chris Llewellyn Smith says he was puzzled by the decision not to publish the mid-term report, but he thinks that CERN is on the right track.

Smashing particles The planned machine would collide electrons with their antimatter partners, positrons, from around 2045, with the aim of generating and studying in precise detail around one million Higgs bosons. Many physicists think that studying the particle, which was discovered in 2012 and interacts like no other, represents physics’ strongest chance of finding cracks in the standard model, a wildly successful but incomplete model of particles and forces. Physicists called for a study into the feasibility of the FCC in 2020 as part of a prioritization exercise known as the European Strategy https://shop126307946.taobao.com/?spm= 2013.1.1000126.3.c9895b91Isk2Oc

Meanwhile, other ‘Higgs factory’ designs are in the works around the world. The Japanese government has shown interest in hosting the long-planned International Linear Collider, while China is designing a ring-shaped machine called the Circular Electron Positron Collider. Gianotti said that the European Strategy for Particle Physics had found that the FCC had greater physics potential than a linear collider, because it could produce Higgs bosons at a higher rate and because the same tunnels could later be used for a much higher-energy machine that collides protons. Not everyone in the particle-physics community is in favour of CERN’s proposed machine. Donatella Lucchesi, a particle physicist at the University of Padua in Italy, disagrees with the organization’s focus on the FCC. “I don’t believe this is good for our community, for scientific and other reasons.” Lucchesi is part of a team studying an alternative technology for future colliders based on colliding beams of muons. Gianotti said that building the FCC would not prevent CERN from contributing to a muon collider, a facility an influential panel of US scientists said in December should be explored. Muons are much more massive than electrons, allowing for higher-energy collisions. But no one knows yet whether building a muon collider is even possible. “Of course we will now work with our US colleagues if they plan to build a new collider in the United States, but it’s on a timescale which is totally different from the timescale of the FCC,” she said. Some scientists argue that the cost of building such mega-colliders outweighs their benefits, especially when theory gives no clear steer on what could be discovered. “It’s true that at the moment, we do not have a clear theoretical guidance on what we should look for,” said Gianotti, but she said this was an argument in favour of building a new machine. “The instruments will allow us to make a big step forward towards addressing the question, also telling us what are the right questions,” she said.

PIXELRISE

More mega-colliders

THE TRY-IT-ALL APPROACH TO PRECISION MEDICINE

Researchers are blasting patients’ cancer cells with dozens of drugs in the hope of finding the right treatment. By Elie Dolgin 470 | Nature | Vol 626 | 15 February 2024

https://shop126307946.taobao.com/?spm= 2013.1.1000126.3.c9895b91Isk2Oc

ILLUSTRATION: SHIRA INBAR

Feature

T

he blood cancer had returned, and Kevin Sander was running out of treatment options. A stem-cell transplant would offer the best chance for long-term survival, but to qualify for the procedure he would first need to reduce the extent of his tumour — a seemingly insurmountable goal, because successive treatments had all failed to keep the disease in check. As a last throw of the dice, he joined a landmark clinical trial. Led by haematologist Philipp Staber at the Medical University of Vienna, the study is exploring an innovative treatment strategy in which drugs are tested on the patient’s own cancer cells, cultured outside the body. In February 2022, researchers tried 130 compounds on cells grown from Sander’s cancer — essentially trying everything at their disposal to see what might work. One option looked promising. It was a type of kinase inhibitor that is approved to treat thyroid cancer, but it is seldom, if ever, used for the rare subtype of lymphoma that Sander had. Physicians prescribed him a treatment regimen that included the drug, and it worked. The cancer receded, enabling him to undergo the stem-cell transplant. He has been in remission ever since. “I’m a bit more free now,” says Sander, a 38-year-old procurement manager living in Podersdorf am See, Austria. ”I do not fear death any more,” he adds. “I try to enjoy my life.” His story is a testament to this kind of intensive and highly personalized drug-screening method, referred to as functional precision medicine. Like all precision medicine, it aims to match treatments to patients, but it differs from the genomics-guided paradigm that has come to dominate the field. Instead of relying on genetic data and the best available understanding of tumour biology to select a treatment, clinicians throw everything they’ve got at cancer cells in the laboratory and see what sticks. But what it sometimes lacks in elegance, it could make up for in results: in pilot studies, Staber and his colleagues found that more than half of people with blood cancer whose treatment was guided by functional drug testing enjoyed longer periods of remission compared with their experiences of standard treatments 1,2. Large-scale testing of genome-directed approaches suggests that the techniques are very effective against some cancers, yet they benefit, at most, only around 10% of patients overall3. Staber and his group’s latest trial is the first to compare functionaland genome-guided approaches head-to-head alongside treatments directed by standard pathology and physician intuition. “That’ll be a very powerful study, and it will probably vindicate the utility of these functional assays,” says Anthony Letai, a

haematologist at the Dana-Farber Cancer Institute in Boston, Massachusetts, and president of the Society for Functional Precision Medicine, a professional organization founded in 2017 to advance the field. And, if anecdotal reports serve as any indication, the try-everything tactic seems to bring about meaningful improvements, even when the genetic sequence of a tumour provides no actionable information, as was the case for Sander. Companies around the world are already offering these kinds of personalized drug testing service. But proponents of the strategy still have much to prove. Although the concept

THIS IS A REVOLUTION. PATIENTS ARE DEMANDING THIS APPROACH.” of screening a bunch of drugs seems simple, the methods used to culture cancer cells outside the body can be technically demanding, time-consuming and costly. The challenges are particularly acute for solid tumours, which live in complex environments inside the body; replicating those conditions is no easy feat. Researchers are trying wildly differing methods that range from growing tumour samples in mice and chicken embryos to cultivating carefully engineered organoids, and even the delivering infinitesimal amounts of various medicines to a tumour while it’s still in a patient. Figuring out what works and what is practical, with regard to cost and scale, won’t be easy. But momentum is growing, says Christopher Kemp, a cancer biologist at the Fred Hutchinson Cancer Center in Seattle, Washington. “This is a revolution. Patients are demanding this approach.”

Behind the screen Down a long corridor, beyond a set of tangerine-coloured doors, lies the Vivi-Bank at the Medical University of Vienna. Short for ‘Viable Biobank’, the room is brimming with liquid-nitrogen dewars, each containing frozen lymphoma samples. When surgeons extract biopsies from cancerous lymph nodes, they usually immerse the tissue in formaldehyde to prepare for standard pathology analyses. That kills the cells, however, rendering them useless for functional testing. So, to enable drug screens, Staber and haematopathologist Ingrid Simonitsch-Klupp, who jointly oversee the Vivi-Bank, had to

convince their surgical colleagues to change their ways, keeping the tissue alive and sending it quickly for processing and storage. “Fresh tissue is the most important thing,” Simonitsch-Klupp says. Some of that tissue arrives in Staber’s lab, where researchers break up the cells using a knife, forceps and a nylon strainer, creating a slurry to distribute across a 386-well plate. In each well, they test a different drug compound — chemotherapy agents, enzyme-targeted drugs, immune-modulating therapies and more. After a night of incubation, lab testing reveals which drugs are active against the cancer and which ones are not. A team of clinicians, known as a molecular tumour board, then uses this information to determine the most appropriate course of treatment for each patient. Several groups have reported success with this general approach. In a trial from the University of Helsinki, for example, researchers found that individualized drug screening of leukaemia cells provided informative results substantially faster than did genomic profiling, yielding impressive clinical responses as well4. Of 29 people with treatment-resistant acute myeloid leukaemia (AML), 17 responded to drug-screening-informed therapies and entered remission. Likewise, Candace Howard, a radiologist at the University of Mississippi Medical Center in Jackson, and her colleagues published a study last year showing that people with aggressive brain tumours live longer when their chemotherapy regimens are guided by lab testing than when their treatment is directed by a physician’s intuition alone5 — with lower annual health-care costs to boot6. “It’s cheaper and it’s more effective,” says Jagan Valluri, a cell biologist at Marshall University in Huntington, West Virginia, who co-founded a company called Cordgenics, also based in Huntington, to commercialize the assay used in Howard’s trial. Functional drug testing is not a new idea. It was embraced by cancer researchers in the late twentieth century, but soon fell out of favour — largely owing to the limitations of assays at the time and a restricted repertoire of anti-cancer drugs. Technological improvements and an expanded pharmacopoeia have changed the picture. Yet, as with most lab-based testing systems, the necessary equipment can be expensive and requires trained personnel to operate it. That’s a big limitation according to Joan Montero, a biochemist at the University of Barcelona in Spain, because it hinders the broad implementation of functional precision drug testing, especially in low-resource settings. To address these challenges, Montero and his colleagues have been developing inexpensive and portable microfluidic devices for rapid, on-site testing of cancer cells7.

Nature | Vol 626 | 15 February 2024 | 471 https://shop126307946.taobao.com/?spm=2013.1.1000126.3.c9895b91Isk2Oc

Feature her quality of life,” Gray says, “and that would not have happened without the knowledge provided by this test.”

Multi-well plates can be used to test the effectiveness of many cancer drugs at once.

Their microfluidic platform remains years away from practical use, however. And it might guide treatment only for certain types of cancer. That’s because protocols developed for tailoring therapies against blood cancers do not always work in solid tumours of the breast, lung, liver and other organ systems. Biopsies from solid tumours often yield lower cell counts, requiring extra steps to culture the cells before drug screening. Moreover, solid tumours have complex interactions with healthy cells in their surroundings, meaning that models should be more sophisticated.

Growing pains The first challenge remains growing enough tumour tissue to test. David Ziegler, a paediatric neuro-oncologist at Sydney Children’s Hospital in Australia, had set out to perform individualized drug screens for around 1,000 children with high-risk cancers as part of the country’s Zero Childhood Cancer Program. But in pilot testing, he and his team discovered that, after several days under lab conditions, up to one-fifth of the patient samples either contained no cancer cells at all, or the cancer cells were being outcompeted by normal, healthy cells8. The researchers quickly learnt to check cultures for tumour cells — through imaging, cellular analysis or genetic profiling — before testing them against drugs. Cell cultures from solid tumours can, in principle, be subjected to the same kind of testing used for blood cancers. But an increasing number of research teams are crafting elaborate structures, known as organoids, to test. These patient-derived 3D tissue models — made by growing tumour samples in specialized 472 | Nature | Vol 626 | 15 February 2024

scaffolds over the course of several weeks — are designed to replicate the intricate tissue architecture of a tumour, thereby offering a more accurate representation of the cancer that physicians are looking to treat. “We want to put the tumour cells in an environment that’s as close [as possible] to how they were growing in the body,” says Alice Soragni, a cancer biologist at the University of California, Los Angeles. The process can add weeks to the timeline for obtaining drug sensitivity data. But the extra effort and time investment is worth it, says Carla Grandori, co-founder and chief executive of SEngine Precision Medicine in Bothell, Washington. In clinical validation studies, Grandori and her SEngine colleagues found that the drug-screening results using organoids aligned with patient outcomes with around 80% accuracy. Those findings are not yet published, but the company — which counts Kemp among its founders — has put out case reports over the past year describing people with difficult-to-treat cancers who, after seemingly running out of treatment options, found unexpectedly effective remedies through organoid drug testing9,10. Heidi Gray, a gynaecological oncologist at the University of Washington Medical Center in Seattle, treated one of these patients, a woman with ovarian cancer. “Her response was definitely one of the best I’ve seen,” she says. The drug they tried is generally used to treat leukaemia, but it helped to beat back the woman’s ovarian tumour for more than a year, allowing her to travel and enjoy precious time with loved ones before ultimately succumbing to the disease. “We profoundly improved https://shop126307946.taobao.com/?spm= 2013.1.1000126.3.c9895b91Isk2Oc

In the hope of testing drugs against even more realistic cancer systems, some researchers have opted to study mice implanted with fresh tumour specimens, a model system known as a patient-derived xenograft. These personalized ‘avatars’ were once heralded as the next big thing in cancer care. But it soon became evident that many tumours do not grow in mice, that drug screening in xenografts takes too long to provide timely recommendations and that the cost of the approach — often exceeding US$50,000 — is more than most patients and health-care systems can bear. “It was too slow, too expensive and not robust enough,” says David Sidransky, an oncologist at Johns Hopkins University School of Medicine in Baltimore, Maryland, and a co-founder of Champions Oncology, a leading developer of xenograft models, based in Hackensack, New Jersey. Although some drug companies continue to use xenografts for research, and some oncologists think that there are certain situations in which they can inform patient care, for the most part, researchers have moved away from mice for functional testing in the clinic. Some have moved on to other living systems. One such alternative comes from cancer biologist Hon Leong and his colleagues at Sunnybrook Hospital in Toronto, Canada, who devised a system for screening drugs on tumour biopsy samples cultivated on developing chicken embryos. The approach is both rapid and inexpensive, says Leong, allowing researchers to assess different drug options in a matter of weeks rather than the months required for mice. In ongoing trials focused on advanced breast and kidney cancers that have spread to other parts of the body, Leong and his team have successfully used the chicken-embryo system to identify individuals who would benefit from immune therapies. These are among the most effective cancer treatments today, and a drug class that few other avatar systems can accurately assess, says Leong. Another approach comes from Ross Cagan, a developmental biologist at the University of Glasgow, UK, who uses genomic sequencing and genetic engineering to recreate the unique characteristics of a patient’s tumour in a custom-made fruit fly. This involves introducing mutated forms of cancer-promoting genes or incorporating sequences that restrict cancer-suppressing genes — generally between 5 and 16 alterations in total. Feeding the flies with food containing various medications can then reveal therapeutic regimens that suppress cancer growth, either by acting directly on tumour cells or by influencing the

FIMM, UNIVERSITY OF HELSINKI

Model of efficiency

animal’s biology in ways that indirectly impede cancer progression. This is how Cagan and his colleagues identified a new three-drug cocktail — consisting of a lymphoma treatment, a blood-pressure medicine and an arthritis therapy — that, when administered to a man with a rare tumour of the salivary glands, helped to stabilize the cancer for a year11. In another case, involving a man with an aggressive form of colon cancer, the use of fly avatars guided the team to administer a melanoma drug alongside a bone-strengthening agent, resulting in notable tumour shrinkage and a clinical response that lasted for nearly a year12. A biotech start-up in London called Vivan Therapeutics now offers this bespoke fly-making and drug-screening service for $15,000 per patient. Any model invariably has biological limitations, however, and so some researchers have elected to do away with animal stand-ins or cellular replicas entirely. Instead, they have developed implantable devices that allow clinicians to test drugs directly on patient tumours — and to do so while the cancer is still inside the body. Last year, bioengineer Oliver Jonas at Brigham and Women’s Hospital in Boston, and his colleagues demonstrated the feasibility of this strategy in people with lung13 and brain14 cancers. In small trials, surgeons inserted tiny drug-releasing devices, each loaded with nanodoses of up to 12 drugs, into tumours as people underwent cancer-removal surgery. Over the course of the operation, drugs flowed into the surrounding tissue from separate reservoirs in a device the size of a grain of rice. Those tissues, along with the device itself, were then removed at the end of the procedure, and subsequently inspected for molecular indicators of drug action. So far, the data

collected haven’t been used to guide treatments, but retrospective analyses hinted at potential benefits if they had. Two companies — Boston-based Kibur Medical, co-founded by Jonas, and Presage Biosciences, headquartered in Seattle — are now developing these kinds of in situ drug-testing platform.

A choice opportunity An assay’s treatment predictions are only as good as a patient’s ability to access the recommended drugs — and, when those are expensive cancer agents that have not been approved for the desired use, costs and insurance reimbursement can be impediments.

I THINK YOU’RE GOING TO SEE A SORT OF NON-LINEAR ADOPTION OF THESE STRATEGIES.” Pamela Becker, a haematologist at City of Hope cancer centre in Duarte, California, has encountered some of these problems when trying to prescribe drugs that were identified during assay-guided treatment trials for people with multiple myeloma and other blood cancers. “I couldn’t get my top choice,” she says. Becker had to go down the list of recommendations, eventually finding drugs that would be covered by insurance.

Another financial obstacle remains reimbursement for the functional tests themselves. In the United States, an official policy enacted in 1996 classifies drug-sensitivity assays as ‘experimental’, making them ineligible for coverage under Medicare, the federal government’s giant health insurance programme for older people. Changing reimbursement rules will thus require reversing that decades-old decision, says Bruce Yeager, an independent consultant in functional precision diagnostics based in Johns Creek, Georgia — an extra hurdle that means “we’re not starting from a point of neutrality”, he says. “We’re starting from negativity.” Combating such policies and entrenched practices hinges on the availability of compelling clinical data. But accumulating such data can be challenging when the medical establishment is not geared towards enabling functional drug testing. It’s something of a catch-22, says Letai. “But that cycle is going to break in the next couple of years,” he says, “and then I think you’re going to see a sort of non-linear adoption of these strategies, because the power and the need for them is so great.” Functional testing strategies might even work for conditions outside the cancer arena. In cystic fibrosis, for example, organoid models made from rectal or intestinal tissue are beginning to help clinicians to find effective drug regimens for people with rare disease-causing mutations who are not eligible to receive any approved treatments. “It just makes a lot of sense,” says Jeffrey Beekman, a cystic-fibrosis researcher at the University Medical Center Utrecht in the Netherlands, who has pioneered the approach. Many cancer researchers feel the same way, and now they just need to prove it to the wider medical community. All eyes are therefore on Staber and his randomized trial, which researchers anticipate will go a long way towards convincing clinicians that genomics is not the be-all and end-all of personalized care. “Paradigm shifts can be very threatening to people,” says Howard, the University of Mississippi radiologist, “but it shouldn’t be threatening. It’s just another tool in our arsenal against disease.”

UNIV. BARCELONA

Elie Dolgin is a science journalist in Somerville, Massachusetts.

Joan Montero (standing) and his colleagues are developing a low-cost microfluidic device.

1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14.

Kornauth, C. et al. Cancer Discov. 12, 372–387 (2022). Snijder, B. et al. Lancet Haematol. 4, e595–e606 (2017). O’Dwyer, P. J. et al. Nature Med. 29, 1349–1357 (2023). Malani, D. et al. Cancer Discov. 12, 388–401 (2022). Ranjan, T. et al. Cell Rep. Med. 4, 101025 (2023). Ranjan, T. et al. Neurooncol Adv. 5, vdad055 (2023). Manzano-Muñoz, A. et al. npj Precis. Oncol. 6, 90 (2022). Mayoh, C. et al. Cancer Res. 83, 2716–2732 (2023). Al-Aloosi, M. et al. Front. Oncol. 13, 1267650 (2024). Gray, H. J. et al. npj Precis. Oncol. 7, 45 (2023). Bangi, E. et al. iScience 24, 102212 (2021). Bangi, E. et al. Sci. Adv. 5, eaav6528 (2019). Tsai, L. L. et al. Ann. Surg. 277, e1143–e1149 (2023). Peruzzi, P. et al. Sci. Transl. Med. 15, eadi0069 (2023).

Nature | Vol 626 | 15 February 2024 | 473 https://shop126307946.taobao.com/?spm=2013.1.1000126.3.c9895b91Isk2Oc

Feature

Students protest against the actions of Florida governor Ron DeSantis that affect New College of Florida, in Sarasota.

HOW CULTURE WARS ARE AFFECTING US UNIVERSITIES

I

t’s been a tumultuous time for higher education in the United States. Since early December, the presidents of two high-profile universities have resigned, both following comments they made during a congressional hearing about the Israel–Hamas war. The resignations are part of a growing politicization of higher education in the country — one that is having an impact on science and could lead to upheavals in the US research community. In the past few years, conservatives at think tanks and in government, especially in

right-leaning states, have pushed through laws and political appointments that they say are intended to reform universities. Ilya Shapiro, a senior fellow at the conservative Manhattan Institute in New York City and a trustee of Florida Polytechnic University in Lakeland, told Nature: “For higher ed to survive, for science to thrive, we must restore academic freedom and colour-blind meritocracy in place of identitarian social-justice activism.” But the interventions have left some scientists looking to move to less conservative states, while others worry that their research

474 | Nature | Vol 626 | 15 February 2024

https://shop126307946.taobao.com/?spm= 2013.1.1000126.3.c9895b91Isk2Oc

and funding could get caught in the crossfire. Claudine Gay’s resignation as president of Harvard University in Cambridge, Massachusetts, in January and Elizabeth Magill’s resignation from the University of Pennsylvania in Philadelphia in December came after they both appeared during a congressional hearing on student protests relating to the Israel–Hamas war. Student protesters chanted pro-Palestinian slogans that are regarded by some as antisemitic. Some politicians, principally right-wing, sharply criticized the university leaders for not

OCTAVIO JONES/BLOOMBERG/GETTY

Scientists and other academics worry that political pressure on universities is growing and could limit research and teaching. By Emma Marris

Divided over diversity Universities have invested heavily in DEI offices and programmes, especially since the summer of 2020, when protests against the killing of George Floyd spread across the United States. The expansion of DEI has prompted some backlash from the left as well as the right. Leftist critiques tend to focus on whether DEI efforts are effective in achieving their stated goals or whether such programmes have become co-opted by those in power and used as box-checking exercises that deflect calls for more meaningful change. However, by and large, DEI has been broadly embraced by the scientific community. Many universities across the globe and companies have come out in support of DEI efforts and

KEN CEDENO/REUTERS

unequivocally denouncing such chants, which spurred campaigns for the presidents to step down. Gay also faces charges of plagiarism. Elise Stefanik, a Republican member of Congress who called for Gay and Magill’s resignations after the hearing, had criticized Harvard in the past and decried “the Ivory Tower’s march toward a monoculture of likeminded, intolerant liberal views”. The concerns over antisemitism on campuses join a series of other issues that have drawn scrutiny — including diversity, equity and inclusion (DEI) initiatives, transgender rights and an academic framework for studying patterns of bias in society known as critical race theory. These issues have rallied conservatives and lent momentum to the movement to wrestle higher education away from what they see as liberal control. Conservative critics argue that campus antisemitism has grown out of an environment at US universities that focuses on DEI and where social issues are seen through the lens of identity and diversity, oppressor and oppressed. For conservatives, shutting down DEI efforts in higher education is viewed as a way to protect academic freedom at universities where liberal thinking has become compulsory. Many in academia, however, see measures to restrict DEI efforts as political interference that is itself a threat to academic freedom. “What we’re seeing is an attempt by the right to convince the public that higher education is broken,” says Irene Mulvey, president of the American Association of University Professors (AAUP), headquartered in Washington DC. “And they need to fix it by squashing academic freedom.” DEI staff are not activists, Mulvey insists, and DEI is not a leftist ideology that is being forced on faculty members and students. “DEI is there to help and support students from under-represented groups, students of colour, first-generation students, veteran students with disabilities, all sorts of students,” she says. Responding to critics of DEI, Mulvey says, “I don’t see any evidence of indoctrination in the classroom.”

Claudine Gay is a former president of Harvard University in Cambridge, Massachusetts.

antiracism initiatives, including Nature (see go.nature.com/4bxjb5m). Research leaders have argued for DEI to be used as a tool to counteract pre-existing structural biases that have limited the diversity of science, and thus limited the questions that science asks and the hypotheses that science generates. Inclusion, in this view, is pragmatically good for science as well as a moral imperative. Florida has gone further than any other state in intervening in public higher education. Early last year, Florida governor Ron DeSantis introduced legislation, which came into effect last July, aimed at stopping “the tactics of liberal elites who suppress free thought in the name of identity politics and indoctrination”, according to a statement by his office. Florida banned public-university spending on DEI and directed the state board of governors to report on “any curriculum … that is based on theories that systemic racism, sexism, oppression, and privilege are inherent in the institutions of the United States and were created to maintain social, political, and economic inequities”. That directive could affect science courses that touch on topics such as racial disparities in public health or the history of science. The legislation prohibits public universities from investing in programmes or campus activities that “promote or engage in political or social activism”. Activism is left undefined in the text, but a draft regulation defines activism as “any activity organized with a purpose of effecting or preventing change to a government policy, action, or function, or any activity intended to achieve a desired result related to social issues”. Interpreted broadly, the law could rule out any activities or even research efforts that seek to mitigate climate change, make birth control more accessible or increase vaccination rates. “The language is vague,” says

Mulvey. “It’s deliberately vague, so that people will overcompensate and self-censor, so they won’t get into trouble.” DeSantis’s office did not respond to a request for comment. In January, Florida removed the course ‘principles of sociology’ from the list of options that students can take to fulfil general graduation requirements. At the board of governors meeting where the vote to remove the course was held, Florida’s education commissioner Manny Diaz said, “While that field was very scientific, at one point, it has moved away from that.” In December, on the social-media platform X (formerly Twitter), Diaz wrote: “Sociology has been hijacked by left-wing activists and no longer serves its intended purpose as a general knowledge course for students.” In December, the AAUP issued a report chronicling political interference in Florida’s public university system, including anti-DEI legislation, the appointment of political allies of DeSantis to university leadership positions, and the installation of a post-tenure review system that makes it relatively easy for universities to get rid of faculty members. Some faculty members have left Florida in response to the changes. There are many individual anecdotes, but as yet there are no clear data that show a major exodus — which could take some time to emerge because of the time it takes to fill academic appointments and the difficulty of finding available university positions. An informal survey conducted by organizations that represent faculty members in southern states found that many people are interested in moving. Neuroscientist Elizabeth Leininger has already left. She once taught at New College of Florida in Sarasota, a small public institution with a left-wing reputation, where more than 10% of its bachelor-of-science graduates

Nature | Vol 626 | 15 February 2024 | 475 https://shop126307946.taobao.com/?spm=2013.1.1000126.3.c9895b91Isk2Oc

Feature went on to earn doctorates, the 13th-highest rate in the country. Leininger attributes the high rate in part to a curriculum that focuses on undergraduate research and independent study. “There’s a lot about the structure of New College that is a little bit hippie,” Leininger says. “But it turns out that structure was really excellent for training scientists.” New College of Florida was thrown into upheaval in January last year when DeSantis appointed several members to the board of trustees, who proceeded to give the university an ideological overhaul in what one trustee described as “the opening move in a conservative counter-revolution”. Immediately, all DEI initiatives at New College ceased. Soon after, the college’s president was fired and five faculty members were denied tenure owing in part to “a renewed focus on ensuring the college is moving towards a more traditional liberal arts institution”, according to a memo from Richard Corcoran, who was appointed as New College’s president. The faculty members who were denied tenure included two chemists and an oceanographer. They were all applying one year early, so they could reapply next year, if they choose. “Science thrives if we make sure that everybody has a place in it, and that everyone feels like they can be a scientist,” Leininger says. “I didn’t want to work at a place that wouldn’t allow me to reach all of my students and teach inclusively.” Leininger had begun looking for a new position as soon as the new trustees were appointed in January. By July, she was gone.

Troubles in Texas Although other states have not adopted as many changes as Florida, similar stories are playing out in other conservative-leaning states. On 1 January, a law came into effect in Texas prohibiting public universities from maintaining DEI offices or using DEI statements in hiring processes. In a statement, the bill’s sponsor, state senator  Brandon Creighton, said: “The days of political oaths, compelled speech, and racial profiling in university hiring are behind us.” Anti-DEI laws have also been signed in North Dakota, North Carolina, South Dakota and Tennessee. The Texas bill, SB 17, does specifically state that the ban is not intended to apply to “academic course instruction” or “scholarly research”, but uncertainty about the laws is leading to self-censorship, as some had feared. When the law came into effect, psychologist Idia Binitie Thurston, who was then at Texas A&M University in College Station, was working on an internal grant application with colleagues involved in diversity studies. Her proposed research project would have followed families with adolescents in Texas and looked at how a large number of factors — including the experience of racial discrimination — affect adolescent health. 476 | Nature | Vol 626 | 15 February 2024

She says her team asked Gerianne Alexander, the university’s associate vice-president for research, if their research focus would be a problem, given the new policy. When they received what she describes as a “non-specific, non-reassuring” response, the researchers decided to scrap the proposal. “Our concern was: can we mention inequities?” Thurston says. “Can we talk about these kinds of issues?” Alexander said she did not recall her communication with the team, adding that, “the university administration has communicated to faculty that SB 17 does not pose restrictions on research. There would be no reason to not seek internal or external support for research on any topic”.

SCIENCE THRIVES IF WE MAKE SURE THAT EVERYBODY HAS A PLACE IN IT.” Not long after that interaction, Thurston left Texas to take a position in Boston, Massachusetts. She says she is committed to continuing her work, in part so her data can inform debates on whether specific interventions taken to reduce social and racial inequities are effective. “We have to find places where we can do it, and do it,” she says. Another target of right-wing activists has been diversity statements, in which job candidates explain their approach to integrating diversity, equity and inclusion in their classrooms and laboratories. The use of diversity statements in hiring is seen by many conservatives as an ideological litmus test — a kind of leftist loyalty oath. Heather Mac Donald, a fellow at the Manhattan Institute who opposes DEI policies, told Nature that “many schools screen STEM [science, technology, engineering and mathematics] faculty applicants based on the enthusiasm evinced for diversity, equity and inclusion in their mandatory DEI statements. Such enthusiasm has no relation whatsoever to scientific breakthroughs and is a form of thought control.” That’s not a perception shared by Leininger. “Our job as professors and scientists at public colleges is to serve the public,” she says, and that means helping students who meet the university’s admittance criteria to “realize their academic potential”. Diversity statements help to identify teachers who can do that, she says, by assisting hiring committees to select candidates who are “aware that not all students have the same academic opportunity” and have

https://shop126307946.taobao.com/?spm= 2013.1.1000126.3.c9895b91Isk2Oc

some ideas about how their teaching could connect with students from various backgrounds. “That’s not indoctrination,” Leininger says. “That’s just being a good teacher.” More recent moves, stemming from the controversy over antisemitism, go beyond dismantling DEI programmes. In December, New York representative Michael Lawler had added an addendum to a budget bill, which must be passed to fund the government’s operations. The Lawler amendment would remove federal funding from public institutions of higher education “that authorize, facilitate, provide funding for, or otherwise support any event promoting antisemitism”. Lawler says his bill is not political interference. A spokesperson for Lawler said, “This legislation is not about political oversight of campus activities. It is about ensuring the safety of students on campus.” Lawler’s office told Nature that this bill would apply only to funds from the US education department and not from agencies such as the National Institutes of Health (NIH). But Tobin Smith, a policy specialist at the Association of American Universities (AAU), says the bill’s language could also be read as applying to grant funding from other federal agencies, such as the NIH, a crucial funder of university research grants. Barbara Snyder, president of the AAU and a former president of Case Western Reserve University in Cleveland, Ohio, says Republicans are putting the reputation of the US research enterprise at risk. “It would be incredibly short-sighted — with long-term negative consequences for all Americans — if policymakers were to put these cutting-edge, life-saving research efforts in jeopardy simply to make a political point,” Snyder says. Although antisemitism is currently the issue around which right-wing activists are organizing their efforts, the next focal issue could be scientific, according to Isaac Kamola, a political scientist at Trinity College in Hartford, Connecticut, who studies conservative campaigns to reshape higher education. “Next year, it could be an issue of climate change, the science around electric vehicles, medicines, COVID,” Kamola says. In fact, Ohio legislators have proposed a bill that could limit the teaching of “controversial beliefs or policies”, which includes climate policies. Mulvey says that the overall campaign to shape what is studied and taught is likely to affect some scientists directly. As a researcher herself, who studies “completely abstract mathematics”, she says all scientists should be concerned. “Political interference in higher education is simply disastrous to the academic mission of the university, and the mission of higher ed as a public good in a democracy.” Emma Marris is a freelance journalist in Portland, Oregon.

Science in culture

Books & arts Heading in here running to two lines

HALIL SAGIRKAYA/ANADOLU/GETTY

Intro running on two lines normally, please don’t break it, with a byline. By Bernard Wood

Bill Gates and other wealthy individuals who spend vast sums on research often back some types of solution over others.

Do billionaire philanthropists skew global health research? Personal priorities too often shape where charitable funding goes. By Andy Stirling

G

lobal wealth, power and privilege are increasingly concentrated in the hands of a few hyper-billionaires. Some, including Microsoft founder Bill Gates, come across as generous philanthropists. But, as investigative journalist Tim Schwab shows in his latest book, charitable foundations led by billionaires that direct vast amounts of money towards a narrow range of selective ‘solutions’ might aggravate global health and other societal issues as much as they might alleviate them. In The Bill Gates Problem, Schwab explores this concern compellingly with a focus on Gates, who co-founded the technology giant Microsoft in 1975 and set up the William H. Gates Foundation (now the Bill & Melinda Gates Foundation) in 1994. The foundation spends

billions of dollars each year (US$7 billion in 2022) on global projects aimed at a range of challenges, from improving health outcomes to reducing poverty — with pledges totalling almost $80 billion since its inception. Schwab offers a counterpoint to the prevailing popular narrative, pointing out how much of the ostensible generosity of philanthropists The Bill Gates Problem: Reckoning with the Myth of the Good Billionaire Tim Schwab Metropolitan Books (2023)

is effectively underwritten by taxpayers. In the United States, for example, 100,000 private foundations together control close to $1 trillion in assets. Yet up to three-quarters of these funds are offset against tax. US laws also require only sparse scrutiny of how charities spend this money. Had that tax been retained, Schwab reasons, the government might have invested it in more diverse and accountable ways. Instead, the dispersal of these funds is being driven mainly by the personal interests of a handful of superrich individuals. By entrenching particular pathways and sidelining others, philanthropy is restricting progress towards the global Sustainable Development Goals by limiting options (see also strings.org.uk). Many Gates foundation programmes are

Nature | Vol 626 | 15 February 2024 | 477 https://shop126307946.taobao.com/?spm=2013.1.1000126.3.c9895b91Isk2Oc

SIMON MAINA/AFP/GETTY

Books & arts

Transparency is scarce on whether charitable investments in vaccine companies might benefit philanthropists or their contacts.

shaped and evaluated using data from the US Institute for Health Metrics and Evaluation (IHME), which was founded — and is lavishly funded — by the foundation. Schwab suggests that such arrangements could be considered conflicts of interest, because in-house ‘evaluations’ often tend to justify current projects. In the case of malaria, for instance, the numbers of bed nets distributed in tropical countries — a metric tracked by the IHME — can become a proxy for lives saved. Such circularity risks exaggerating the efficiency of programmes that aim to tackle high-profile diseases, including HIV/AIDS, potentially at the expense of other treatable conditions for which solutions might remain unexplored (see also Philip Stevens’s 2008 book Fighting the Diseases of Poverty).

Limited scope Similarly restricted views exist in other areas, too. In the energy sector, for instance, Gates flouts comparative performance trends to back exorbitantly expensive nuclear power instead of much more affordable, reliable and rapidly improving renewable sources and energy storage. In agriculture, grants tend to support corporate-controlled gene-modification programmes instead of promoting farmer-driven ecological farming, the use of open-source seeds or land reform. African expertise in many locally adapted staples is sidelined in favour of a few supposedly optimized transnational commodity crops. Furthermore, the Gates foundation’s support for treatments that offer the best chances of accumulating returns on intellectual 478 | Nature | Vol 626 | 15 February 2024

property risks eclipsing the development of preventive public-health solutions, Schwab notes. For example, the foundation promotes contraceptive implants that control women’s fertility, instead of methods that empower women to take control over their own bodies. Similarly, the foundation often backs forprofit, Internet-based education strategies rather than teacher-led initiatives that are guided by local communities. Throughout its history, the Gates foundation’s emphasis on ‘accelerating’ innovations and ‘scaling up’ technologies, as noted on its website (gatesfoundation.org), obscures real-world uncertainties and complexities,

“Resolution of the Bill Gates problem might need a cultural transformation.” and ignores the costs of lost opportunities. For example, Gates’s aim to eradicate polio is laudable. But pharma-based actions are slow — and can come at the expense of practical solutions for less ‘glamorous’ yet serious scourges, such as dirty water, air pollution or poor housing conditions. Thus, by promoting interventions associated with the technological processes of extraction, concentration and accumulation that underpinned his own corporate success, Gates helps to tilt the playing field. His foundation tends to neglect strategies built on economic redistribution, institutional reform, https://shop126307946.taobao.com/?spm= 2013.1.1000126.3.c9895b91Isk2Oc

cultural change or democratic renewal. Yet in areas such as public health, disaster resilience and education, respect for diverse strategies, multifaceted views, collective action and open accountability could be more effective than the type of technology-intensive, profit-oriented, competitive individualism that Gates favours. Schwab traces the origins of this ‘Gates problem’ to the 1990s. At that time, he writes, Gates faced hearings in the US Congress that challenged anti-competitive practices at Microsoft and was lampooned as a “monopoly nerd” in the animated sitcom The Simpsons for his proclivity to buy out competitors. By setting up the Gates foundation, he pulled off a huge communications coup — rebranding himself from an archetypal acquisitive capitalist to an iconic planetary saviour by promoting stories of the foundation’s positive impact in the media. Yet since then, Schwab shows, Gates has pursued a charitable monopoly similar to the one he built in the corporate world. He has shown that in philanthropy — just as in business — concentrated power can manufacture ‘success’ by skewing news coverage, absorbing peers and neutralizing oversight. For instance, Schwab documents how the voices of some non-governmental organizations, academia and news media have been muted because they depend on Gates’s money. While dismissing “unhinged conspiracy theories” about Gates, he describes a phenomenon that concerned activists and researchers call the “Bill chill”. By micromanaging research and dictating methods of analysis, the foundation effectively forces scientists to go down one path — towards the results and conclusions

that the charity might prefer. These issues are exacerbated by Gates applying the same energy that he used in business to coax huge sums from other celebrity donors, which further concentrates the kinds of innovation that benefit from such funding. But Schwab has found that transparency is scarce on whether or how Gates’s private investments or those of his contacts might benefit from his philanthropy. Questions arise over the presence of people with personal ties to Gates or the foundation on the board of start-up companies funded by the charity, for example.

Bigger picture One minor gripe with the book is that although Schwab excels in forensically recounting the specific circumstances of Gates’s charitable empire, he is less clear on the wider political forces at work or the alternative directions for transformation that have been potentially overlooked. Schwab often implies that Gates’s altruism is insincere and rightly critiques the entrepreneur’s self-serving “colonial mindset” (see, for example, S. Arora and A. Stirling Environ. Innov. Soc. Transit. 48, 100733; 2023). But in this, Gates is a product of his circumstances. As Schwab writes, “the world needs Bill Gates’s money. But it doesn’t need Bill Gates”. Yet maybe the real problem lies less in the man than in the conditions that produced him. A similar ‘tech bro’ could easily replace Gates. Perhaps what is most at issue here is not the romanticized intentions of a particular individual, but the general lack of recognition for more distributed and collective political agency. And more than any single person’s overblown ego, perhaps it is the global forces of appropriation, extraction and accumulation that drive the current hyper-billionaire surge that must be curbed (see also A. Stirling Energy Res. Soc. Sci. 58, 101239; 2019). Resolution of the Bill Gates problem might need a cultural transformation. Emphasis on equality, for instance, could be more enabling than billionaire-inspired idealizations of superiority. Respect for diversity might be preferable to philanthropic monopolies that dictate which options and values count. Precautionary humility can be more valuable than science-based technocratic hubris about ‘what works’. Flourishing could serve as a better guiding aim than corporate-shaped obsessions with growth. Caring actions towards fellow beings and Earth can be more progressive than urges to control. If so, Schwab’s excellent exposé of hyper-billionaire ‘myths’ could yet help to catalyse political murmurations towards these more collective ends. Andy Stirling is a professor of science and technology policy at the Science Policy Research Unit at the University of Sussex, UK. e-mail: [email protected]

Books in brief Handwritten Lesley Smith Bodleian Library Publishing (2023) In 1833, a seasick Charles Darwin wrote from Peru that he looked forward to the end of his 1831–36 global circumnavigation “with more interest, than the whole of the voyage”. This astonishing note appears in historian Lesley Smith’s compelling collection of handwritten documents held at the Bodleian Library in Oxford, UK. Others include Albert Einstein’s comic poem about Oxford and Dorothy Hodgkin’s sketch of penicillin’s molecular structure. “The handwritten text is the closest we can get to meeting the author,” comments Smith.

Sea Mammals Annalisa Berta Princeton Univ. Press (2023) There are 137 living species of sea mammal worldwide. The majority are cetaceans: whales, dolphins and porpoises. The blue whale is perhaps the largest animal that has ever lived, “rivaled only by a few dinosaurs”, notes palaeontologist Annalisa Berta in her illustrated survey of sea mammals based on a lifetime’s study. The reason is its diet: shrimp-like crustaceans known as krill, tiny but abundant in some oceans. To survive, it needs an enormous mouth, which can swallow a gulp of water equivalent to its body mass.

Over the Seawall Stephen Robert Miller Island (2023) Forest fires in the American West are today exacerbated by the US Forest Service’s attempts to stamp out fires in 1930s — creating tracts of unbroken forest unnaturally abundant in fuel. Academics call such fixes “maladaptation”, writes science journalist Stephen Robert Miller. He prefers “solutions that backfire”. His book examines three examples: a sea wall in Japan, location of the 2011 tsunami disaster; tidal management in the Ganges River Delta in often-flooded Bangladesh; and artificial watercourses in parched Arizona.

Breaking Through Katalin Karikó Crown (2023) Biochemist Katalin Karikó, daughter of a butcher, was born in 1950s Hungary in a cramped earth-brick house without running water. In 2023, she shared the Nobel Prize in Physiology or Medicine for discoveries that enabled the development of vaccines against COVID-19. Her autobiography describes her vital, sometimes moving, personal and scientific struggle for success. As she writes of her father’s surprising mathematical gifts, “A person may lack prestige or a diploma but nevertheless have a swift mind.”

Lost Cities of the Ancient World Philip Matyszak Thames & Hudson (2023) The earliest cities — dating from the eighth millennium BC — were once regarded as defensive strongholds. But their accessible locations beside major rivers suggest they were created “for the purposes of government, religion, education and trade”, writes historian Philip Matyszak. His readable, well-illustrated book covers 37 “lost” cities in Europe, the Middle East and Asia, including Troy, Thebes and Persepolis. Inexplicably, it omits Indus Civilization cities such as Harappa, lost until the 1920s. Andrew Robinson

Nature | Vol 626 | 15 February 2024 | 479 https://shop126307946.taobao.com/?spm=2013.1.1000126.3.c9895b91Isk2Oc

Readers respond

Correspondence Replace Norway as Norway is only ocean-panel co-chair being real on deep-sea mining

Open-access citation advantage still unproven

Build collaborations to protect marine ecological corridors

As scientists who advised the High Level Panel for a Sustainable Ocean Economy, we think the panel is worth saving (see Nature 625, 424; 2024). Within five years, it forged a coalition of states committed to ocean leadership, articulated a vision of ocean stewardship and set out a plan that was based on the advice of more than 250 scientists. We are proud of the panel’s achievements, but are deeply concerned by developments in Norway, the panel’s co-chair. Norway is moving to expand its offshore oil and gas industry and to permit deep-sea mining. These steps are undermining the panel, eroding trust in the scientific community and creating an uncertain environment for private-sector investment and innovation. We respectfully call on the panel members to urgently establish monitoring, compliance and governance mechanisms, including clear structures for transitioning of the leadership. These actions would provide a basis for celebrating progress and leadership, and a means of distancing the panel from Norway’s efforts to science-wash extractivist agendas that are mired in the past. While these mechanisms are established, we call for replacement of Norway as co-chair with a member that holds a firmer commitment to ocean leadership.

Your feature on sharing laboratory materials (Nature 625, 841–843; 2024) cites evidence for an open-access citation advantage dating back to 2006 (G. Eysenbach PLoS Biol. 4, e157; 2006). There might be many compelling arguments for open-access publishing for the benefit of science and scholarship, but there is currently little firm evidence that a citation advantage is one of them. The hypothesis that articles with an open-access status are more highly cited than those without has been a matter of active research for decades. The 2006 publication quoted in the article was one of 134 included in a 2021 systematic review of studies of citation rates of openaccess and non-open-access articles (A. Langham-Putrow et al. PLoS ONE 16, e0253129; 2021). Although almost half of these confirmed the existence of an advantage, around one-quarter did not and the remaining one-quarter found one only in subsets of their study sample. However, only 3 of the 134 studies were found to have a low risk of bias: one supported the existence of the open-access advantage, one did not and the other found one only in subsets. The study quoted was not among these three studies, calling into question the veracity of the trend you highlighted.

Migrations of marine species such as whales, eels and sea turtles are some of the largest in the world. Identifying, monitoring and maintaining ecological corridors is one focus of the Kunming–Montreal Global Biodiversity Framework, which was adopted in 2022 at the United Nations COP15 biodiversity summit, chaired by China. China sees protection of these habitats as a priority. In 2020, maintenance of coastal corridors was integrated into the national master plan for ecological protection and restoration. In January, the revised Marine Environmental Protection Law was implemented, legislating for the conservation and restoration of crucial marine ecological corridors. The Ministry of Natural Resources is developing technical guidelines to identify ecological corridors for fishes, mammals, reptiles, water birds and corals. Global cooperation beyond the efforts of individual countries is also crucial. The COP14 Convention on the Conservation of Migratory Species of Wild Animals, which meets in Samarkand, Uzbekistan, this week, has adopted “Nature knows no borders” as its slogan. We appeal to nations to build these collaborative frameworks and global technical guidelines to preserve marine ecological corridors.

Diva J. Amon, Douglas J. McCauley University of California, Santa Barbara, Santa Barbara, California, USA. [email protected]

As a member of the United Nations International Resource Panel, I have been reviewing a range of literature comparing the impacts of terrestrial and oceanic mining. The sharp tone of your Editorial (Nature 625, 424; 2024) rebuking Norway for allowing minerals exploration in its territorial waters was surprising. Deep-sea mining clearly has risks, but a wider view needs to be taken on the inherent trade-offs between various sources of minerals. There is no such thing as a free lunch when it comes to clean energy — a point made by US biologist and environmentalist Barry Commoner that I have previously highlighted in relation to US policy on mining critical minerals for green energy sources (Nature 615, 563; 2023). Marine ecologists need to team up with terrestrial ecologists and use tools such as life-cycle analysis to assess comparative impacts, as my colleagues and I have tried to do in earlier publications (D. Paulikas et al. J. Ind. Ecol. 26, 2154–2177; 2022). Nature Editorials should advocate such a systems approach, which includes a circular economy for metals, rather than a parochial advocacy of the views of one particular ecosystem panel. Saleem H. Ali University of Delaware, Newark, Delaware. [email protected]

Andrew Plume Elsevier, Oxford, UK. [email protected] The author declares competing interests; see go.nature.com/3udfan4 for details.

Robert Blasiak, Henrik Österblom Stockholm University, Stockholm, Sweden.

480 | Nature | Vol 626 | 15 February 2024

https://shop126307946.taobao.com/?spm= 2013.1.1000126.3.c9895b91Isk2Oc

Jianguo Du, Bin Chen, Feng Cai, Wenjia Hu Third Institute of Oceanography, Ministry of Natural Resources, Xiamen, China. [email protected]

News & views (Cyanistes caeruleus; Fig. 1) in southern Germany, and related these to the presence or absence of their female partners. Schlicht and colleagues first tagged the birds with radio-frequency-identification (RFID) tags to enable the location of the animals to be determined. They then placed an automatic RFID recorder near the nest box, and used this system to determine whether the female was inside or outside the nest box. After filtering the recordings to make sure that the bird of interest was the one singing in the recordings, Schlicht et al. found that the males sang at high rates while their female partners were still roosting in the nest box at dawn, and stopped singing as soon as the females left the nest box to join them. Similarly, males were more likely to sing when the females went to roost in the evening, and song rates also increased whenever the females entered the nest box during the day. Interestingly, male song rates increased in a linear manner as separation time from the female increased during the day. However, this linear relationship predicts an impossibly high dawn song rate after overnight separation, which was not observed. To check for any extra support for the generality of their hypothesis, the authors searched published work and found some circumstantial evidence that males and females of most species show a mismatch in their daily activity, leading to separation periods at dawn and dusk. Furthermore, several studies (but not all) indicate that male singing activity increases when a female partner is separated from the male in the context of experimental manipulations. The patterns found for blue tits are exciting for this field, and do convincingly fit this female-absence hypothesis. They also provide support to observations that show similar patterns in other species. However, further work will be necessary to determine whether Schlicht and colleagues’ hypothesis describes a universally applicable mechanism for the dawn chorus. Birdsong in males has a dual function. It is used to defend territories against other males and to attract females, and which of these two roles is dominant varies between species and can depend on the breeding stage for a given species7. Schlicht et al. conducted their study during the ten days around egg laying. This is the fertile period of the blue tit, one of the species in which male dawn singing reaches a maximum at this precise stage of the breeding season when the female is inside the nest. Thus, the change in male singing activity in relation to the presence of a fertile female about to mate makes sense from a mate-guarding perspective. The male cannot guard the female from copulation attempts by other males when she leaves the nest if he sings when she is outside the nest2,8. 482 | Nature | Vol 626 | 15 February 2024

However, not all bird species sing during the female’s fertile period: some even become completely silent at this time9. In addition, dawn choruses can happen throughout the breeding season, by males engaging in feeding chicks and even by unpaired males. It remains to be determined whether, in situations such as those and when females are no longer fertile, the singing patterns found in this study also occur. The female-absence hypothesis offers a mechanistic explanation for the dawn chorus, but the authors do not propose an evolutionary function. Rather, Schlicht and colleagues speculate that, possibly after an origin not related to evolutionary processes of natural selection or mate choice by sexual selection, a full range of functions mediated by this singing might have been acquired over time, such as strengthening the pair bond, manipulating female behaviour or displaying male quality. This phenomenon nevertheless begs for a functional explanation. Although the first of the two conditions of the hypothesis (separation between males and females at dawn or dusk) could be explained by an origin due to inherent sex-specific differences under natural selection, the second condition (less singing when the female is present) strongly

suggests that this pattern of singing has a function. And even if the patterns predicted by the female-absence hypothesis can be applied to other species, we need to address the following key question. Why do males sing more when females are absent or less when females are present? Diego Gil is in the Department of Evolutionary Ecology, National Museum of Natural Sciences, 28006 Madrid, Spain. e-mail: [email protected] 1. 2.

3.

4. 5. 6. 7.

8. 9.

Schlicht, L., Schlicht, E., Santema, P. & Kempenaers, B. Proc. R. Soc. B 290, 20232266 (2023). Gil, D. & Llusia, D. in Coding Strategies in Vertebrate Acoustic Communication (eds Aubin, T. & Mathevon, N.) 45–90 (Springer, 2020). Staicer, C. E., Spector, D. A. & Horn, A. G. in Ecology and Evolution of Acoustic Communication in Birds (eds Kroodsma, D. E. & Miller, E. H.) 426–453 (Cornell Univ. Press, 1996). Mace, R. Nature 330, 745–746 (1987). Kacelnik, A. & Krebs, J. R. Behaviour 83, 287–308 (1983). McNamara, J. M., Mace, R. H. & Houston, A. I. Behav. Ecol. Sociobiol. 20, 399–405 (1987). Catchpole, C. K. & Slater, P. J. B. Bird Song: Biological Themes and Variations 2nd edn (Cambridge Univ. Press, 2008). Birkhead, T. R. & Møller, A. P. Sperm Competition and Sexual Selection (Academic, 1998). Liu, W.-C. & Kroodsma, D. E. Auk 124, 44–52 (2007).

The author declares no competing interests. This article was published online on 6 February 2024.

Condensed-matter physics

New type of magnetism splits from convention Carmine Autieri

Magnetic materials with zero net magnetization fall into two classes: conventional antiferromagnets and altermagnets. Physicists have identified a property in altermagnets that widens the divide between the two groups. See p.517 & p.523 The energies of electrons in a material are confined to specific levels. These levels can split into bands that correspond to possible configurations of the electrons’ intrinsic angular momentum — their ‘spin’. Such spin splitting underlies the existence of ferromagnetism: the type of magnetism found in iron. But it has also been predicted to arise in materials showing a newly discovered type of magnetism, known as altermagnetism, and such systems could prove more useful than ferromagnets for some technological applications. In two papers in Nature, Krempaský et al.1 (page 517) and Zhu et al.2 (page 523) report experimental evidence of spin splitting in materials classed as altermagnets. Energy levels are said to be degenerate if they correspond to two or more quantum https://shop126307946.taobao.com/?spm= 2013.1.1000126.3.c9895b91Isk2Oc

states. In 1930, the Dutch physicist Hans Kramers found that a particular type of degeneracy — now known as Kramers degeneracy — exists in non-magnetic systems that show time-reversal symmetry (those that adhere to the same laws of physics whether time runs forwards or backwards)3. Kramers degeneracy was subsequently found to exist in all non-magnetic systems. But what about magnetic systems? Kramers degeneracy is thought to extend to antiferromagnetic systems, the name of which refers to their relationship with ferromagnetism. The spins in ferromagnetic materials all point the same way, but in antiferromagnets, adjacent spins are oriented in opposite directions, so the net magnetization of the material is zero. However, Kramers degeneracy has never been

a MnTe (collinear)

b MnTe2 (non-collinear)

Mn

z x Spin

x component

y component

y

z component

Figure 1 | Configurations of altermagnets. Antiferromagnets are materials in which the intrinsic angular momenta (spins) of electrons are oriented in opposite directions, so the net magnetization of the material is zero. Altermagnets form a class of antiferromagnet, and have been predicted to show a phenomenon called spin splitting. a, Krempaský et al.1 found evidence of spin splitting in the altermagnetic compound manganese telluride (MnTe; Te atoms not shown because they are non-magnetic), which is collinear, meaning its manganese spins are aligned along a single axis. b, Zhu et al.2 showed that manganese ditelluride (MnTe2) also displays spin splitting. MnTe2 is non-collinear, but can be decomposed into three collinear altermagnets, each representing a spatial component of the manganese spins, leading to complex spin-splitting properties.

verified rigorously for antiferromagnets. In the past few years, physicists have begun forming theories about the possibility that there are specific limits to the extent to which Kramers degeneracy applies to antiferromagnetic systems4,5. This led to a categorization of antiferromagnets — those that show Kramers degeneracy are now called conventional antiferromagnets, whereas those that lack it are termed altermagnets. The definition of an altermagnetic system is tied to specific crystallographic symmetries, and these symmetries are surprisingly widespread, so a considerable portion of antiferromagnetic systems are classified as altermagnets under this revised definition. The key to obtaining crystallographic symmetries that host altermagnetism lies with the non-magnetic atoms in a material, and the way that they affect its magnetic symmetries. The role of these atoms has long been disregarded in solid-state physics, and is only now being given full consideration. The absence of Kramers degeneracy is the reason that the system’s energy levels split. This spin splitting occurs in ferromagnets and altermagnets of a comparable size. It imbues altermagnets with some properties that are similar to those of ferromagnets, while maintaining zero net magnetization. These similarities are technologically promising, because antiferromagnetism occurs in a broader range of materials than does ferromagnetism. It also tends to appear at higher temperatures than ferromagnetism, which often requires cryogenic cooling. And there

are other benefits: the magnetization of ferromagnets can induce stray magnetic fields that interfere with the material’s performance, but these fields do not arise in antiferromagnets. Finally, oscillations in magnetization can be used to produce high-frequency signals, and antiferromagnets can achieve higher-frequency oscillations than can ferromagnets6. These attributes underline the potential importance of antiferromagnetic systems in various applications7. Krempanský et al. used a technique called X-ray angle-resolved photoemission spectroscopy (ARPES) to investigate the electronic and magnetic characteristics in manganese telluride (MnTe). The surfaces of materials often host altermagnetic surface states that can differ from their bulk counterparts. By using X-ray ARPES, the authors were able to bypass these surface effects and characterize the bulk of the compound. Their study provides robust evidence for spin splitting in MnTe, suggesting that it is present in a large share of antiferromagnetic compounds. MnTe is a ‘collinear’ altermagnet, which means that the spins of manganese atoms point in opposite directions but are oriented along the same set of axes (Fig. 1a). However, altermagnets can also be non-collinear. In this case, their spins are not parallel, because of factors such as interactions or the geometry of their crystal structure, but they still show the zero net magnetization that is indicative of antiferromagnetism. Zhu et al. studied manganese ditelluride

(MnTe2), which is non-collinear, using a method known as spin-resolved ARPES. The authors showed that spin splitting in this system results in manganese spins assuming a plaid-like pattern. This non-collinear altermagnet can be decomposed into three collinear altermagnets, each representing a spatial component of the manganese spins (Fig. 1b) and each giving the material a different spin-splitting property. Although the two groups used different experimental approaches and methods of analysis, both studies contribute key advances to the understanding of spin splitting in altermagnetic compounds — they shed light on the complexities inherent in the magnetic structures of these materials. The authors’ work will no doubt serve as a catalyst for accelerating research on this topic. One direction for future investigation is the effect that electric fields have on altermagnetism. Another is the surfaces and interfaces at which altermagnetic properties can be tuned. A third possibility involves the appearance of ferromagnetism in altermagnetic compounds. Although altermagnets have zero net magnetization, ferromagnetism can be induced in these materials through a phenomenon called spin–orbit coupling, which involves an electron’s spin interacting with its orbital motion. This effect is often weaker than conventional ferromagnetism, and for this reason it is termed weak ferromagnetism8. Improved understanding of altermagnets could lead to technologies for engineering and manipulating weak ferromagnetism9,10. These emerging concepts are poised to become integral components of future physics textbooks. Although current research on altermagnets is mainly fundamental in nature, it is likely that the insights garnered by these two papers will pave the way for technological applications in the coming decades. The enticing results will undoubtedly form a cornerstone of exciting developments. Carmine Autieri is in the International Centre for Interfacing Magnetism and Superconductivity with Topological Matter (MagTop), Institute of Physics, Polish Academy of Sciences, PL-02668 Warsaw, Poland. e-mail: [email protected] 1. Krempaský, J. et al. Nature 626, 517–522 (2024). 2. Zhu, Y.-P. et al. Nature 626, 523–528 (2024). 3. Kramers, H. A. Proc. Koninkl. Akad. Wetenschap. 33, 959–972 (1930). 4. Šmejkal, L., Sinova, J. & Jungwirth, T. Phys. Rev. X 12, 031042 (2022). 5. Šmejkal, L., Sinova, J. & Jungwirth, T. Phys. Rev. X 12, 040501 (2022). 6. Hortensius, J. R. et al. Nature Phys. 17, 1001–1006 (2021). 7. Jungwirth, T., Marti, X., Wadley, P. & Wunderlich, J. Nature Nanotechnol. 11, 231–241 (2016). 8. Dzyaloshinsky, I. J. Phys. Chem. Solids 4, 241–255 (1958). 9. Feng, Z. et al. Nature Electron 5, 735–743 (2022). 10. Kluczyk, K. P. et al. Preprint at https://arxiv.org/ abs/2310.09134 (2023). The author declares no competing interests.

Nature | Vol 626 | 15 February 2024 | 483 https://shop126307946.taobao.com/?spm=2013.1.1000126.3.c9895b91Isk2Oc

News & views Developmental biology

Unravelling how plant cells divide and differ Ikram Blilou

In a multicellular organism, normal growth requires control of cell division to generate cells that are similar to or different from their parents. Analysis of this process in plant roots reveals how this mechanism is regulated. See p.611 In a developing tissue, asymmetric (also known as formative) cell division is essential to generate specialized cells with distinct functions — something that is crucial for forming diverse tissues and organs. Symmetric (also called proliferative) cell division, producing identical cell types, is required for cells to proliferate and contribute to growth. On page 611, Winter et al.1 provide insights into how these two types of cell division that drive patterning and growth are coordinated. The authors studied this process in the roots of the plant Arabidopsis thaliana, focusing on two proteins, SHR and SCR, which are core plant-specific developmental regulators. These two transcription factors are key determinants for asymmetric cell division. Stem cells divide and can give rise to different types of cell, and SHR and SCR associate physically to form a protein complex that promotes asymmetric cell division of stem cells by controlling the activity of the cell-cycle regulator protein encoded by the gene CYCD6 (refs 2–5). The action of the SHR–SCR complex is regulated by the protein RBR (ref. 3). In differentiated cells, RBR binds to SHR–SCR and disrupts its function3. In stem cells, RBR is phosphorylated (has phosphate groups attached to it) by the cell-cycle regulator proteins CYCD6 and CDKB1, which prevents the protein from binding to SHR–SCR. This means that the asymmetric cell division controlled by  SHR– SCR occurs only in stem cells3. This regulatory network acts as a ‘bistable’ switch in which distinct states of SHR–SCR activity control whether asymmetric cell division occurs3. These states are regulated by the gradients in concentration of the molecule auxin and SHR. There is a ‘longitudinal’ gradient along the direction of the root established by the distribution of auxin (Fig. 1a)6. There is a ‘radial’ gradient running from the centre of the roots to the outer layers; it is put in place by the movement of SHR from the inner vasculature tissue to the outer layer of the root5. The two gradients converge to drive the action of SHR–SCR on CYCD6. SHR–SCR activates 484 | Nature | Vol 626 | 15 February 2024

CYCD6, and auxin increases its expression in stem cells, which leads to RBR phosphorylation and asymmetric cell division. Although the convergence of the two gradients in a specific cell triggers cell division, protein degradation immediately after the division turns the switch off (generating a low state of SHR–SCR) to prevent further divisions. The bistable switch explains why asymmetric cell

division occurs at a specific time and in a specific place to generate distinct cell types and hence specific tissue lineages, a concept that is well established in studies of animal cells7. Winter and colleagues’ work provides fresh insights into the importance of SHR–SCR in cell-cycle control and highlights its contribution to determining how cells orient the way in which they divide (the location of their division plane in the cell) to produce cells with fates that either differ from (Fig. 1b), or are the same as, that of the original dividing cell (Fig. 1c). Using a custom-made device — a light-sheet confocal microscope — the authors obtained images providing detailed spatial and temporal information regarding SHR–SCR expression during root growth. These high-speed 3D images were acquired with minimal loss of fluorescent signals (a problem known as photobleaching) and enabled the researchers to view protein dynamics in space and time. This would have been tedious to achieve using conventional microscopy methods of confocal imaging. The authors imaged three proteins, SHR, SCR and the nuclear protein H2B, each tagged with a different fluorescent molecule, and evaluated the dynamics of their expression. Winter

b Asymmetric

SHR–SCR activity

(formative) division

a

Different cells in different layers

Radial SHR gradient

G1 S G2 M Cell-cycle stage P

RBR

CYCD6 CDKB1 Auxinbinding Auxin protein SHR CYCD6 SCR

Plant root

Patterning Root cell

c Symmetric Growth

Endodermis

Longitudinal auxin gradient

(proliferative) division

Identical cells in the same layer

SHR–SCR activity G1 S G2 M Cell-cycle stage

CYCD6

Figure 1 | Control of asymmetric cell division in the root. a, In the plant Arabidopsis thaliana, gradients of the molecules auxin and SHR aid processes that govern cell division. Auxin is highest at the root tip of its longitudinal gradient6, and SHR is highest at the root’s centre and runs outwards in a radial gradient5. For a given cell, such as the one shown in the layer called the endodermis, the orientation of cell division determines whether the cell divides asymmetrically to give rise to two different cell types and two distinct tissue types, or symmetrically to form identical cells in the same layer. b, Winter et al.1 present microscopy data that shed light on cell division. In asymmetric (also called formative) division, a complex of the transcription factors SHR and SCR is active during cell-cycle stages called G1 and S. The proteins CDKB1 and CYCD6 phosphorylate (add a phosphate (P) group) to the protein RBR. The gene CYCD6 is expressed through the action of the SHR–SCR complex and by auxin bound to an auxin-binding protein3. c, During symmetric (also termed proliferative) division, the SHR–SCR complex is active at the cell-cycle stages G2 and M. RBR does not contain a phosphate group, and it binds to the SHR–SCR complex, thereby preventing CYCD6 expression.

https://shop126307946.taobao.com/?spm= 2013.1.1000126.3.c9895b91Isk2Oc

and colleagues then used this information to determine how often cells divide asymmetrically compared with symmetrically. The authors found that SHR-mediated asymmetric division occurs only during a limited window of the cell cycle. The authors used mathematical models that revealed that bistability is not a prerequisite for SHR–SCR action. This outcome might seem inconsistent with the findings described previously3. However, it can also be considered as an alternative model for bistability — especially given that the authors also observed an increase in the level of SHR, and this level of SHR remained constant until division took place, then the level decreased, which is consistent with previous findings. The authors found that the absence of SHR from a cell during a specific stage of the cell cycle affects its commitment to divide asymmetrically or symmetrically. They demonstrated this through a mathematical approach and confirmed it experimentally by synchronizing cells at particular stages of the cell cycle, using cell-cycle inhibitors. The induction of SHR expression after the cells were released from inhibition of the transition between the G1 and S stages of the cell cycle triggered a higher frequency of asymmetric cell divisions than was observed after the release from transition between the G2 and M stages of the cell cycle. In the region of the root called the meristem, cells have the potential to undergo both types of division. Another interesting observation made by the authors was the inability of SHR to initiate asymmetric cell divisions outside the meristem, indicating that other factors, including the auxin gradient necessary for SHR–SCR action, as well as all the components of the signalling network needed for asymmetric division, are probably expressed exclusively in the meristem. Examining these components experimentally will provide more insights into the requirement for SHR in triggering divisions in a differentiated cell. The authors worked in the laboratory of Philip Benfey, who died in 2023. When those of us who knew him think about Benfey, some of the attributes that come into our mind include vision, leadership, intelligence, generosity, kindness, optimism and courage. The plant developmental biology community has lost an outstanding scientist, a fantastic person, a great mentor and leader. His passion, dedication, innovation in research and his support for the young generation, especially female researchers, have inspired us all. His optimism and courage were contagious and gave us all hope for the future. He will always be in our hearts, and his legacy will live on. Ikram Blilou is in the Division of Biological and Environmental Sciences and Engineering, King Abdullah University of Science and Technology, Thuwal 23955-6900, Saudi Arabia.

e-mail: [email protected] 1. 2. 3. 4. 5.

Winter, C. M. et al. Nature 626, 611–616 (2024). Sozzani, R. et al. Nature 466, 128–132 (2010). Cruz-Ramírez, A. et al. Cell 150, 1002–1015 (2012). Di Laurenzio, L. et al. Cell 86, 423–433 (1996). Helariutta, Y. et al. Cell 101, 555–567 (2000).

6. Grieneisen, V. A., Xu, J., Marée, A. F. M., Hogeweg, P. & Scheres, B. Nature 449, 1008–1013 (2007). 7. Yao, G., Lee, T. J., Mori, S., Nevins, J. R. & You, L. Nature Cell Biol. 10, 476–482 (2008).

The author declares no competing interests. This article was published online on 31 January 2024.

Neuroscience

How the brain produces and perceives speech Yves Boubenec

A neural probe has been used to capture the activity of large populations of single neurons as people are speaking or listening, providing detailed insights into how the brain encodes specific features of speech. See p.593 & p.603 In the human brain, the perception and production of speech requires the tightly coordinated activity of neurons across diverse regions of the cerebral cortex. On pages 593 and 603, respectively, Leonard et al.1 and Khanna et al.2 report their use of a neural probe consisting of an array of microelectrodes, called Neuropixels, to measure the electrical activity of individual neurons in regions of the human cortex involved in speech processing. Speech has a sophisticated structure that is characterized by the hierarchical organization of sounds across various timescales. Phonemes, the smallest units of speech, underpin spoken language and contribute to the differentiation of words and syllables. For instance, the three-phoneme words ‘dig’, ‘dug’, ‘dog’ and ‘god’ differ only by the alteration of a single phoneme (/d g/ versus /d g/ versus /d g/) or the rearrangement of phonemes (/d g/ versus /g d/). Despite advances in scientists’ understanding of the intricate neural computations involved in parsing and recognizing phonemes, it is still not clear how the brain represents the identity and sequence of phonemes at the level of single neurons. Are single neurons tuned to single phonemes (/ / versus / / versus / /) by showing distinct responses to each? Or, instead, are neurons selective for groups of phonemes, much as neurons in the visual cortex are tuned to classes of object, such as faces3? And do neurons encode sequences of phonemes (such as /d g/ and /g d/)? To address these questions, intracranial neural recordings can be made in people who are performing speech tasks4,5. Researchers in the same groups as Leonard et al. and Khanna et al. demonstrated in 2022 that it is possible to perform single-neuron recordings in people

undergoing brain surgery while awake using Neuropixels electrodes6,7 — a method that had previously been used only in non-human animals8. In their latest studies, the authors have captured the stable, simultaneous activity of tens of single cortical neurons while participants were either listening to speech1,2 or speaking2 (Fig. 1). Their groundbreaking work represents the first applications of Neuropixels to address meaningful research questions that can be answered only in humans. The authors’ detailed insight into the single-neuron encoding of speech perception and production yields two key findings. First, they show that single neurons are selectively tuned to groups of phonemes that are articulated in a similar way. This mirrors findings obtained with a more conventional intracranial electrophysiology method, called electrocorticography, in which electrical activity is averaged from hundreds of cells5. Second, these studies show how the coordinated activity of neuronal populations encodes emergent properties of speech perception and production. Leonard and colleagues recorded neural activity from a region of the brain’s auditory cortex called the superior temporal gyrus. This cortical region is specialized for high-level processing of speech sounds before the meanings of words are processed in other brain regions. Khanna and colleagues focused on a part of the brain’s prefrontal cortex that is involved in word planning and sentence construction. When participants were listening to speech, single neurons in both the auditory cortex1 and the prefrontal cortex2 were tuned to classes of phoneme (defined by their similar articulation) rather than specifically to single phonemes. Neurons that were spatially close to each other tended to show correlated functional

Nature | Vol 626 | 15 February 2024 | 485 https://shop126307946.taobao.com/?spm=2013.1.1000126.3.c9895b91Isk2Oc

News & views a Leonard et al.

b Khanna et al. Probe Cortex layers Neuropixels probe

Prefrontal cortex

Superior temporal gyrus

Listening to speech

Different features of words that are heard activate different populations of neurons

Figure 1 | Recording the activity of neurons involved in speech processing. a, Leonard et al.1 used an intracranial probe called Neuropixels to measure the activity of single neurons in the superior temporal gyrus, a region of the brain’s auditory cortex that is involved in processing speech sounds, while participants listened to speech. b, Khanna et al.2 used the same approach to measure neuronal activity in the prefrontal cortex, a brain

Different features of words that are about to be spoken activate different populations of neurons

Speaking

region that is involved in word planning, while participants were speaking or listening to speech. Both teams found that single neurons are tuned to particular features of speech, including the sounds or the positions of phonemes (the smallest units of speech) in a word. For example, the different phonemes in the word ‘dog’ — either heard or said — activate different populations of neurons.

properties in the auditory cortex1. Consequently, single neurons in the same vicinity were not good at discriminating between words composed of different phonemes but of the same phonetic group (for example, ‘dog’ and ‘dug’ with the vowels / / and / /). By contrast, words formed by phonemes from different phonetic groups (for example, / / and / / in ‘dog’ and ‘dig’) activated distinct populations of neurons. Furthermore, neurons in the auditory cortex displayed diverse responses, even to non-linguistic cues, such as the beginnings of sentences. The spatial clustering of neurons that mediate responses to similar cues is suggestive of organization into ‘columns’ that span several layers of the cortex. Together, these observations indicate that local populations of neurons are essential units of computation for speech processing9 that integrate information about speech features, to which they are preferentially tuned with other sound cues. Such integration could facilitate highlevel functional properties, such as the ability to recognize the same phonemes spoken by different speakers or tracking changes in the speaker’s pitch. Khanna et al. observed that, when participants performed a speech-production task, neurons in the prefrontal cortex were tuned to the classes of phoneme that were about to be spoken, but neurons were also sensitive to the position of phonemes in upcoming words. An analysis of changes in the coordinated patterns of activity of neuronal populations over time revealed that features of a word are coded sequentially during word planning — for example, the neuronal activity that relates to phonemes peaks before that relating to syllables. Distinct patterns of activity during listening and speech production paralleled findings

from studies of the motor cortex of macaque monkeys (Macaca mulatta) during movement preparation and execution10, suggesting that such patterns are a general principle of motor production. In some cases, the two teams used different approaches to analysis. Leonard and colleagues used Neuropixels to focus on specific layers in the auditory cortex, whereas Khanna and colleagues used decoding techniques to quantify information at the level of the neuronal population in the prefrontal cortex. Integrating these complementary analyses in future studies could enrich scientists’ overall understanding of the similarities and differences between the functional properties of the two cortical regions. Both studies lay the groundwork for forthcoming investigations to determine how the internal loop between the auditory and motor centres is closed. Although the two teams focused on mapping cortical activity to auditory inputs or motor outputs, researchers still lack an understanding of the link between these processing stages. A key question arises: how does the brain’s representation of the way a word sounds (an internal auditory target) translate into a sequence of coordinated neuronal activity that results in the movement of muscles required to say that word correctly? In other words, how does the auditory cortex convey auditory information to motor centres to enable accurate speech production? Simultaneous recordings from the auditory and prefrontal cortices will help neuroscientists to understand how the production and perception of speech converge, and how information flows from the auditory cortex to the prefrontal cortex (the ascending pathway) and vice versa (the descending pathway)11. The ascending pathway transforms an internal

486 | Nature | Vol 626 | 15 February 2024

https://shop126307946.taobao.com/?spm= 2013.1.1000126.3.c9895b91Isk2Oc

auditory target into preparation for a movement in the prefrontal and motor areas. Conversely, the descending pathway informs the auditory cortex of anticipated sound inputs, such as spoken words. Notably, the neuronal projections belonging to the descending pathway — the circuit between the motor and auditory cortices — have been identified in the mouse brain12. Ultimately, a comprehensive understanding of how this bidirectional flow of information is coordinated during infant development will shed light on how internal representations of speech are constructed. Yves Boubenec is in the Perceptual Systems Laboratory, Department of Cognitive Studies, École Normale Supérieure, PSL Research University, CNRS, Paris 75005, France. e-mail: [email protected]

1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12.

Leonard, M. K. et al. Nature 626, 593–602 (2024). Khanna, A. R. et al. Nature 626, 603–610 (2024). Quian Quiroga, R. et al. Nature Commun. 14, 5661 (2023). Bouchard, K. E., Mesgarani, N., Johnson, K. & Chang, E. F. Nature 495, 327–332 (2013). Mesgarani, N., Cheung, C., Johnson, K. & Chang, E. F. Science 343, 1006–1010 (2014). Paulk, A. C. et al. Nature Neurosci. 25, 252–263 (2022). Chung, J. E. et al. Neuron 110, 2409–2421 (2022). Jun, J. J. et al. Nature 551, 232–236 (2017). Saxena, S. & Cunningham, J. P. Curr. Opin. Neurobiol. 55, 103–111 (2019). Kaufman, M. T., Churchland, M. M., Ryu, S. I. & Shenoy, K. V. Nature Neurosci. 17, 440–448 (2014). Keller, G. B. & Mrsic-Flogel, T. D. Neuron 100, 424–435 (2018). Schneider, D. M., Sundararajan, J. & Mooney, R. Nature 561, 391–395 (2018).

The author declares no competing interests. This article was published online on 31 January 2024.

News & views

From the archive River monitoring needed to address pollution, and the case for having a minister for science in government.

100 years ago The difficulty in deciding a status of pollution ... is well illustrated by ... the lack of critical and co-ordinated information regarding ... conditions in fresh-water streams, rivers, and in estuaries. The absence of this kind of information must render much work on the conditions in polluted waters inconclusive or even futile. The present letter is therefore written to demonstrate ... the necessity for organised continuous work on the biological, physical, and chemical conditions in streams, rivers, and estuaries, whether polluted or not. From Nature 16 February 1924

150 years ago We are glad to see that the Times has at last opened its pages to the question of the propriety of appointing a responsible Minister, whose duty it shall be to look after the interests of Science and of scientific research and education, and take charge of the scientific institutions of the country ... The whole question could not be better stated than in Colonel Strange’s letter which ... runs as follows: ... “[S]cientific research must be made a national business ... [T]he point at which Science, in most of its leading branches, has now arrived and the problems presented for solution are such as to need for their adequate treatment, permanent wellequipped establishments with competent staffs ... [W]e are being rapidly outstripped by nations who, though they encourage private exertion, are wise enough not to rely on it, but to establish a system free from the caprice, the incompleteness, the liability to interruption and cessation incident to all individual labour in whatever field ... [T]here must be a Minister for Science ... Let this be done, and we should cease to witness the farce of consulting the Chancellor of the Exchequer about observing eclipses of the sun, the Prime Minister about scientific Arctic expeditions, and the Treasury about tidal reductions.” From Nature 12 February 1874

488 | Nature | Vol 626 | 15 February 2024

data is that it shows more evidence that data generation and sharing are broadening, and that there is a shift towards a culture of acceptance and openness as members of the public become more comfortable with sharing their genetic information. The collection and study of such data call for cultural sensitivity and a combination of ethical and scientific rigour, so it is encouraging to see progress in this area. We should also be excited about having a deeper understanding of populations, sociodemographic histories and fresh biological insights, as well as about the willingness of BIGCS participants to take part. Indeed, although ancestry, genetic association and applied research are illuminating, it is incumbent on the research community to remember the commitment of those who make this type of work possible.

Nicholas John Timpson is in the MRC Integrative Epidemiology Unit, University of Bristol, Bristol BS8 2BN, UK, and in Population Health Sciences, Bristol Medical School, University of Bristol, Bristol, UK. e-mail: [email protected] 1. 2. 3. 4. 5. 6. 7. 8. 9.

Huang, S. et al. Nature 626, 565–573 (2024). Qiu, X. et al. Eur. J. Epidemiol. 32, 337–346 (2017). Fuchsberger, C. et al. Nature 536, 41–47 (2016). The Genome of the Netherlands Consortium. Nature Genet. 46, 818–825 (2014). Walter, K. et al. Nature 526, 82–90 (2015). Ramsay, M. Patterns 3, 100412 (2022). Ward, R. H., Redd, A., Valencia, D., Frazier, B. & Pääbo, S. Proc. Natl Acad. Sci. USA 90, 10663–10667 (1993). Davey Smith, G. & Ebrahim, S. Int. J. Epidemiol. 32, 1–22 (2003). Zhang, G. et al. PLoS Med. 12, e1001865 (2015).

The author declares no competing interests. This article was published online on 31 January 2024.

Materials science

Layered ferroelectric materials make waves Berit H. Goodge

By combining materials-synthesis techniques, researchers have come up with a way of building layered structures that display intriguing wave-like patterns of electric polarization, and could be useful for next-generation electronics. See p.529 A household refrigerator magnet has an inherent polarity, a characteristic it shares with other magnets of its kind, known as ferromagnets. Ferroelectric materials are also polarized — electrically rather than magnetically — with positive and negative charges instead of north and south poles. This polarity can be flipped by strong electric fields, a property that makes ferroelectrics attractive materials for computing, memory and sensing devices, especially if they can be manipulated at the nanometre scale. It also allows complex patterns of ferroelectric polarization to be generated through careful materials design and synthesis1. On page 529, Sánchez-Santolino et al.2 describe a technique that combines strategies from several areas of materials research to generate and stabilize ferroelectric patterns. A simple model of electric polarization holds that a charged ion that is offset from the charge centre of its neighbours will shift the distribution of electric charge to create a local polarization (Fig. 1a). Ferroelectricity is therefore closely tied to the arrangement of atoms in the bulk of a material, to the surfaces and interfaces that the atoms form, and to how these geometrical features impart mechanical

https://shop126307946.taobao.com/?spm= 2013.1.1000126.3.c9895b91Isk2Oc

stress, all of which affect the interactions between atoms. Careful engineering of these ‘boundary conditions’ — by growing structures with several layers, for example — has proved to be a powerful way of generating ferroelectric textures such as waves, vortices and other twisting configurations3–5. Now, Sánchez-Santolino and colleagues have added a ‘twist’ to this approach by stacking two existing layers so that they are rotated relative to each other. When a layer that is a single atom or a few atoms thick is placed on top of another such layer and rotated by a small angle, the overlap between the two crystal lattices creates a third distinct pattern. This pattern is termed a moiré lattice — named after a French method of fabric pressing — and the periodicity of this lattice is considerably larger than those of the original layers (Fig. 1b). The interactions between the two layers can impart entirely new properties to the overall twisted structure, and this ‘twist and stack’ approach has therefore garnered immense interest in physics and materials research6. Moiré structures are typically fabricated by peeling apart larger crystals of each material to isolate a thin layer, which can then be stamped

a

b

c

Displaced atom O Ba

Polarization Ti

BaTiO3

Figure 1 | Engineering electric-polarization textures in ferroelectric materials. a, The displacement of an atom in a ferroelectric material (such as a titanium atom in barium titanate, BaTiO3) can give rise to a local electric polarization. b, Sánchez-Santolino et al.2 induced this effect by stacking thin layers of barium titanate together and

on top of another layer at a precisely specified twist angle. So far, this approach has been limited mostly to naturally layered materials — stacks of sheets of single-atom thickness that are only weakly bonded to each other, and can therefore be easily separated. Yet some researchers have focused on extending this approach to more strongly bound compounds7,8. Sánchez-Santolino et al. have now generalized twist engineering to materials that are not easily peeled apart. To do so, the authors used a method of growing thin films of a target compound on top of a sacrificial layer, which is later dissolved to release a free-standing membrane9. Different membrane layers can then be carefully manipulated — twisted and stacked — in analogy to conventional moiré systems comprising layers that have been peeled off larger structures. Combining these strategies, the authors created thin membrane layers of the common ferroelectric compound barium titanate (BaTiO3), in which titanium ions that are offset from the centre of their atomic neighbours give rise to local ferroelectric polarization. These atomic displacements can be directly measured with advanced electron-microscopy imaging to reveal ferroelectric patterns3,10. When they stacked two layers of barium titanate with relative twist angles of several degrees, Sánchez-Santolino and colleagues observed ferroelectric wave and vortex patterns (Fig. 1c) that they ascribe to the specific boundary conditions formed by the moiré pattern of the interface between membrane layers. Future investigations of the switching behaviour of these intriguing mesoscopic ferroelectric textures might reveal promising functional properties that could be used in high-density data storage or other technological applications. For example, in another system showing polarization waves, applying

twisting them relative to each other so that the two crystal lattices overlapped. c, The authors showed that the structural and electronic interactions (not shown) between these two layers gave rise to complex polarization textures, including vortices and waves. (Adapted from Extended Data Fig. 4a of ref. 2.)

an electric field does more than change the polarization — it also changes both the electrical and the optical conductivity considerably. This ability could be harnessed to build devices with several functions11. The free-standing-membrane platform built by Sánchez-Santolino and colleagues might also be suitable for integrating into flexible electronics, replacing the stiffer materials typically used in electronic devices. Further details of the many multiscale interactions that give rise to these textures could be investigated using a newly devised method of imaging atomic structure with 3D resolution12. Using this technique, researchers could compare the arrangement of

interface in samples with different membrane thicknesses could shed light on how these competing effects are balanced in the composite structure. These experiments could also be carried out while the structure is subjected to an external electric field, so that the switching characteristics of these ferroelectric waves and vortices can be directly observed in situ. More broadly, Sánchez-Santolino and colleagues’ work shows how combining strategies developed in different fields — atomic-resolution imaging and analysis, twist-and-stack engineering and the use of free-standing membranes — can be leveraged to develop platforms for experimental research and, hopefully, technological advances.

“The free-standingmembrane platform built by the authors might also be suitable for integrating into flexible electronics.”

Berit H. Goodge is at the Max Planck Institute for Chemical Physics of Solids, 01187 Dresden, Germany. e-mail: [email protected]

atoms near the twisted interface with those at the exposed surfaces. Sánchez-Santolino and colleagues focused on characterizing the ferroelectric patterns in the plane of the twisted membranes, but further analysis of a cross-section of the interface could provide useful insights about the structural and electronic interactions between the two layers. The authors observed that the details of the ferroelectric pattern depend not only on the twist angle, which sets the length scale of the repeated moiré pattern, but also on the thicknesses of the membranes. Intuitively, the effect of the interface diminishes as membrane thickness increases, because the atoms at the interface comprise a smaller fraction of the total system. Systematic examination of the atomic structure at, near to, and far from the

1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12.

Schlom, D. G. et al. Annu. Rev. Mater. Res. 37, 589–626 (2007). Sánchez-Santolino, G. et al. Nature 626, 529–534 (2024). Jia, C.-L., Urban, K. W., Alexe, M., Hesse, D. & Vrejoiu, I. Science 331, 1420–1423 (2011). Yadav, A. K. et al. Nature 530, 198–201 (2016). Rusu, D. et al. Nature 602, 240–244 (2022). Andrei, E. Y. et al. Nature Rev. Mater. 6, 201–206 (2021). Zhu, Y. et al. Phys. Rev. X 11, 031011 (2021). Zhao, S. Y. F. et al. Science 382, 1422–1427 (2023). Lu, D. et al. Nature Mater. 15, 1255–1260 (2016). Nelson, C. T. et al. Nano Lett. 11, 828–834 (2011). Caretta, L. et al. Nature Mater. 22, 207–215 (2023). Chen, Z. et al. Science 372, 826–831 (2021).

The author declares no competing interests.

Nature | Vol 626 | 15 February 2024 | 489 https://shop126307946.taobao.com/?spm=2013.1.1000126.3.c9895b91Isk2Oc

Perspective

Optimally generate policy-based evidence before scaling https://doi.org/10.1038/s41586-023-06972-y

John A. List1,2,3 ✉

Received: 27 February 2023 Accepted: 11 December 2023 Published online: 14 February 2024 Check for updates

Social scientists have increasingly turned to the experimental method to understand human behaviour. One critical issue that makes solving social problems difficult is scaling up the idea from a small group to a larger group in more diverse situations. The urgency of scaling policies impacts us every day, whether it is protecting the health and safety of a community or enhancing the opportunities of future generations. Yet, a common result is that, when we scale up ideas, most experience a ‘voltage drop’— that is, on scaling, the cost–benefit profile depreciates considerably. Here I argue that, to reduce voltage drops, we must optimally generate policy-based evidence. Optimality requires answering two crucial questions: what information should be generated and in what sequence. The economics underlying the science of scaling provides insights into these questions, which are in some cases at odds with conventional approaches. For example, there are important situations in which I advocate flipping the traditional social science research model to an approach that, from the beginning, produces the type of policy-based evidence that the science of scaling demands. To do so, I propose augmenting efficacy trials by including relevant tests of scale in the original discovery process, which forces the scientist to naturally start with a recognition of the big picture: what information do I need to have scaling confidence?

Over the past four centuries, the experimental method has produced a steady stream of deep insights, helping us to understand the world around us. From the work of Galileo, who tested his theories of falling bodies using quantitative experiments in the early seventeenth century, to Rosalind Franklin’s X-ray diffraction experiment, which was central in the construction of a theory of the chemical structure of DNA, the scientific approach has been a key engine to knowledge creation, economic growth and enhanced societal well-being. Social scientists have increasingly turned to the experimental method to understand human behaviour and how the world around us affects that behaviour. Field experimentation in the social sciences has uncovered insights from the theoretical to the practical. For example, at odds with his more philosophical contemporaries, William McCall insisted on quantitative measures to test the validity of education programmes1. For his efforts, McCall is credited as an early proponent of using randomization rather than matching to exclude rival hypotheses, and his work continues to influence field experiments conducted in education today. In political science, Harold Gosnell and Charles Merriam are often credited with conducting the first social ‘megaproject’ when they explored techniques to enhance voter turnout. One example is Gosnell2, who found that the use of cartoons and informational reminders increased both voter turnout and votes cast by around 10%. In a similar spirit, Ronald Fisher used the concept of randomization in his field experiments to understand the agricultural production function3. Likewise, Kurt Lewin4 directly studied questions of where, how, why and under what conditions an effect does or does not appear, anticipating contemporary discussions of generalizability and scaling.

Within economics, early experimental work took the form of laboratory experiments5. However, over the past few decades, economists have begun to depart systematically from these roots; they are recruiting participants in the field rather than in the classroom, using field context rather than abstract terminology in the instructions and striving to carry out randomization in naturally occurring settings, often without the research participants being aware that they are part of an experiment, which has been denoted a natural field experiment6,7. Such studies have spanned broad areas of our economy, including lending insights into the underpinnings of markets and how we can achieve more efficient and equitable exchange8; examining crucial factors driving unemployment9; exploring how simple social science principles can improve public health interventions10; identifying the roots of generosity, reciprocity and altruism; and investigating how we can use the private provision of public goods to overcome market failures and collective action to solve the free-rider problem11–15. Each of these areas, and several others, have been advanced by modern field experimentation in economics16. One crucial question for the experimental research agenda in the social sciences relates to the scale-up problem: can this idea work at scale? In the physical sciences, in which it is assumed that the same physical laws prevail everywhere, Galileo could fluidly extrapolate his experimental results to the heavenly bodies. Yet, when experimenting with humans, the scale-up problem takes centre stage. In its simplest form, this question relates to the proliferation of an idea or policy from a small group—for example, students at a certain school—to a larger group in more diverse situations17,18. The urgency of scaling important

The University of Chicago, Chicago, IL, USA. 2ANU, Canberra, New South Wales, Australia. 3NBER, Cambridge, MA, USA. ✉e-mail: [email protected]

1

Nature | Vol 626 | 15 February 2024 | 491 https://shop126307946.taobao.com/?spm=2013.1.1000126.3.c9895b91Isk2Oc

Perspective policies impacts us every day, whether it is protecting the health and safety of a community, improving the sustainability of development or enhancing the educational opportunities of future generations. Scaling was once viewed as important only in the domain of business startups, yet it underlies all social and technological progress, as the innovations that have the best chance to change the world are those that reach the largest number of people. Innovation is crucial, but diffusion is its perfect complement. Although its importance is undeniable, the scaling process is not simple, with pitfalls present every step of the way, running from the seed of an idea to well after policy launch. One consequence of having multiple fault lines is that ideas tend to experience a voltage drop: the cost–benefit profile depreciates considerably when moving from the small to the large19–22. Indeed, across disciplines ranging from software development to medicine to education and beyond, between 50% and 90% of programmes lose voltage at scale, with many of the scaled programmes having a weak effect or no effect at scale even though they showed great promise in early trials18,23. Elegant ideas to fight climate change work in the Petri dish but fall apart at scale24. An educational reform works in pilot studies but, when scaled to the whole country, effect sizes considerably diminish25. In my own research, I have found that certain ideas are predictably scalable, while others are predictably unscalable, and the reasons for each are visible through a scientific lens18. Today, policymakers focus on developing evidence-based policies, which, in most circles, means advancing public policies, programmes and practices that are grounded in some type of empirical evidence, with little attention paid to how far removed that evidence is from what, for whom, where and how they want to scale. To reduce voltage drops, I argue in this Perspective that we must optimally generate policy-based evidence before the decision-maker attempts to scale up the project (and even throughout the life cycle of the policy). Optimality requires a recognition of two key aspects of the problem: what information to generate and in what sequence. The economics of data generation and the behavioural economics of the decision-maker and scientist provide insights into these two crucial areas. To sharpen our understanding of these two key aspects, I argue that the nature in which policy-based evidence is generated demands an appreciation of a scaling continuum: there are some instances in which the researcher should adopt a learning-by-doing approach, whereby first tests are purely for efficacy and future explorations tinker with populations of people, treatment types and testing boundary conditions incrementally to produce a scalable policy. Although this approach is the current status quo, there are other important cases in which the researcher should start by imagining what a successful, fully implemented intervention looks like, applied to the entire population with their varying situations, sustained over a long period of time. I denote this class of ideas high fixed cost with impatient decision-makers (HFIDs). In the case of HFIDs, I advocate flipping the traditional research and policy-development models. To accomplish this goal, original experimental designs and prototyping must address potential stumbling blocks from the beginning: we must engage in backward induction and understand the big picture rather than focus on the immediate incentives that we face as scholars. In the traditional model (learning by doing), scholars consider how to scale up ideas and interventions only (if at all) after they are shown to work in efficacy tests. In the experimental community, such efficacy tests are commonly called A/B tests. This leads to a natural question, if classic A/B tests are insufficient in such cases, what data should be generated? I argue that we must begin with an understanding of the factors that cause voltage drops and a discernment of the mechanisms underlying why the policy works. This includes identifying important mediators, heterogeneity and causal moderators as well as whether the idea remains promising in the face of crucial real-world constraints it will face at scale. Accordingly, 492 | Nature | Vol 626 | 15 February 2024

it is often economically and scientifically efficient to do not only the efficacy test, but also relevant tests of scale and mechanisms within the original discovery process. As a mnemonic, I denote this approach as option C thinking, which effectively asks: if I want to scale up this idea, what extra information do I need beyond an A/B test? To answer this question, I argue that we must include a scalable version of the studied programme alongside the best-case A/B test. Leveraging option C thinking in our initial designs is not simply adding a new treatment arm; rather, it augments the traditional approach with an experimental enhancement that produces the type of policy-based evidence that the science of scaling demands from the beginning.

What data to generate? Policy-based evidence A hallmark of public policy decision-making is a comparison of the benefits and costs associated with proposed programmes or regulations. Likewise, firms, both profit and non-profit, along with many individuals follow a basic rule of comparing benefits and costs of a proposed action. In this spirit, as knowledge creators interested in scaling, we must answer the following question: after a programme has been claimed to pass a cost–benefit test in an initial study, what is the probability that it will pass a cost–benefit test in the target setting of interest? A crucial piece of information that is necessary to answer this question relates to the transportable of information from the experimental setting to the scaled setting. The difficulty arises because the object of interest in social science experiments is humans; thus, in most (or potentially all) cases, attempting to develop behavioural laws that parallel those from the natural sciences is a fool’s errand. Humans are creatures of habit, until we are not. We are animals that respond to stimuli in predictable ways, until something imperceptible to the scientist in the background changes and causes a seemingly irrational reaction. In brief, behavioural principles observed in one environment are not always shared broadly. Humans, who make difficult experimental participants, make the social sciences the ‘harder’ sciences. By contrast, for physical laws and processes, evidence to date supports the idea that what happens in the laboratory is equally valid in the broader world. Shapley (page 43 of ref. 26), for example, noted that “as far as we can tell, the same physical laws prevail everywhere”. Likewise, Newton (page 398 of ref. 27) scribed that “the qualities … which are found to belong to all bodies within the reach of our experiments, are to be esteemed the universal qualities of all bodies whatsoever”. With humans, the heterogeneity in populations and situations yields different experimental results quantitatively, and even sometimes qualitatively across scaling domains. Within economics and the broader social sciences, generalizability models naturally arise from Mill’s assumption of the lawfulness of nature in that they are based on the ‘distance’ between time, space, population and the decision environment between the study setting and the setting of ultimate interest28–31. Generalization revolves around the degree of stickiness of nature: the models assume that, as two decisions become closer in distance, the congruency in response effects will heighten because the two environments tend to follow similar laws. While theoretically this might be pleasing, it leaves open the crucial question: what are the key factors that cause behavioural differences across settings or, likewise, what features underlie the voltage effect? Leveraging a set of economic models and the voluminous empirical literature, in previous work18, I outlined five threats that can cause voltage drops and prevent an idea from having its promised impact when scaled. I denote these as the five vital signs, and I summarize them in Table 1. In effect, these five threats inform us of what information is necessary to generate policy-based evidence.

https://shop126307946.taobao.com/?spm= 2013.1.1000126.3.c9895b91Isk2Oc

Table 1 | Causes of voltage drops: the five vital signs Cause of voltage drop

Definition

Example

False positives

It did not actually work in the first place.

Initial tests of D.A.R.E. in 1985 showed great efficacy, leading to it being scaled broadly; later analyses showed that D.A.R.E. was a false positive18.

Representativeness of the sampled population

It worked for the sampled population, but that population is too different from the population at scale.

First tests of energy conservation programmes work well but, as a programme expands, the initial findings are overly optimistic because the early experimental participants are more responsive to treatment than later participants84.

Spillovers

If an intervention affects groups other than those sampled, then the impact at scale will not be like the impact in the initial test.

Uber attempts to raise driver pay with base fare increases or by adding tipping but spillovers cause each effort to be thwarted102,103.

Supply side

Even if the benefits persist at scale, if expansion of the programme causes costs to rise disproportionately, then ‘diseconomies of scale’ cause a voltage drop.

As expansion occurs at a cheese packing plant, management teams get spread too thin and duplication of tasks occurs, leading to average total cost increases104.

Representativeness of the sampled situation

The intervention worked in a particular situation that was too different from the world at scale.

A ‘special moment in time’: nudges for vaccine uptake work well early in campaigns, but the efficacy decays66.

To create policy-based evidence, we must understand the various threats to scaling. These are factors that can cause voltage drops—when the idea is scaled up, the cost–benefit promise is not realized. There are five key threats that can cause voltage drops. I denote these as the five vital signs, and they include false positives, representativeness of the population, spillovers, the supply side and representativeness of the situation. Any one of the five threats, or a combination of them, can frustrate scaling, causing our ideas to not have the impact that was promised from the original evidence. D.A.R.E., Drug Abuse Resistance Education.

Vital sign 1: false positives An active debate has emerged in the social sciences that claims there is a credibility crisis, whereby the foundation of the experimental approach and the credibility of the received results are called into question32–35. The debate has evolved into several streams of inquiry, but the channel connecting them is false positives, with a lack of replication often carrying the water. False positives fall into one of three buckets: statistical error (alpha), human error (how we generate, evaluate and interpret data) and fraud (less rare than we hope36). The literature has addressed these concerns by changing both the incentives around creating replications and examining data to control the false-positive rate37–40. To put this inferential issue into perspective, an often-misunderstood consideration is that there is a crucial distinction between the probability that a reported significant finding in the literature represents a real relationship and the probability that an individual experiment has uncovered a real relationship41,42. For example, even if a fully powered experiment reveals that a programme works at conventional significant levels in an experiment, if the result was a surprise, the likelihood of the programme working again, especially at scale, remains a long shot43–45. The power of replication to overcome false positives becomes abundantly clear to build scientific knowledge around scaling18,42,46.

Vital sign 2: representativeness of the population The second vital sign is representativeness of the experimental sample. In this case, the voltage effect arises when the decision-maker assumes that the small subset of people who are affected by the policy in the Petri dish are more representative of the general population than they are at scale. There are two key aspects of the sample and the population that must be considered before scaling. The first concerns selection into the experiment: after choosing the population, is the experimental sample truly representative of that population? The second question relates to the population itself: is the experimental population representative of the target population to which the programme will be scaled? The first question concerns whether inference drawn from an experiment can be extended to non-participants from the same population. Before taking part in the experiment, individuals must decide whether to participate in the experiment. One example investigating this question examined participation in a laboratory experiment47. Their results show that some of the characteristics of participants versus non-participants are significantly different. In particular, the experimental participants have significantly less income, more leisure time, are more likely to major in economics, and are more pro-social. They even find some support for key outcome differences. More empirical evidence is necessary before broad conclusions can be drawn, however. This insight is not germane to only economic experiments as, decades ago, it was noted that volunteers in human research “typically have more education, higher occupational status, earlier birth position, lower chronological age, higher need for approval and lower authoritarianism than non-volunteers”48. Indeed, a previous study49 concluded that social experimentation is largely the science of “punctual college sophomore” volunteers and further argued that participants are more likely to be “scientific do-gooders”, interested in the research, or students who readily cooperate with the experimenter and seek social approval (see also ref. 50). One method to create a representative sample from the experimental subsample is to use propensity score methods to adjust the sample for the population features. This approach is effective if selection into the experiment occurs only on observables that the analyst can use in the statistical analysis, or if the assumption of conditional independence is met. Another approach to achieving representativeness is to conduct a natural field experiment6,7,16, which handles the selection decision by covertness: the participants do not know that they are being randomized into treatment–control or that their behaviour is being scrutinized. Of course, this might not be applicable in all situations. The second question concerns whether the experimental population is representative of the target population. This area has been well explored in the extant social science literature51–55. One line of thought is that, with multiple potential locations, if the researcher chooses locations at random in an initial stage of the experimental design, then this will lead to generalizable results across all potential locations. This approach helps to tackle our inferential problem only if the participants themselves are randomly assigned across locations, which of course is dubious56. A more credible identification approach is to gather information on covariates in the experimental sample and estimate conditional average treatment effects (CATEs). Then, with a good assortment of individuals in the experimental sample, the researcher can average these CATEs over the distribution of covariates in the target population, ‘adjusting’ treatment effects accordingly. This approach is often used when the researcher constructs a random sample that is useful for examining heterogeneous treatment effects using multisite trials and then relies on probability reweighting and effect estimation using functional form assumptions to transport insights56–62. Once again, this approach identifies the relevant treatment effect if selection into the experiment occurs only on observable characteristics. To avoid

Nature | Vol 626 | 15 February 2024 | 493 https://shop126307946.taobao.com/?spm=2013.1.1000126.3.c9895b91Isk2Oc

Perspective such selection issues, conducting a natural field experiment might be necessary6,7,42.

Vital sign 3: spillovers As the above discussion intimates, there are factors beyond the sampled individuals that can affect scaling. The third vital sign is an understanding that the implementation of the policy can have unintended consequences, or spillovers, that work against (or in favour of) the desired outcome. Such ‘general equilibrium’ effects, or changes to the overall market or system outside a programme or policy evaluation, lurk everywhere, whether from labour market policies to hot-spot policing in criminology to effects of housing and education voucher programmes. This is because, if a policy alters the value of a certain programme outcome, the benefit that an individual gains from that programme may be relatively small if many more people are also benefiting from the programme. For example, more people benefiting from certain educational credentials may decrease the individual benefit of that credential if a programme leads to many more people earning that credential. Much research has been completed that measures such spillovers63,64. Vital sign 4: supply side The fourth vital sign is the supply side of scaling—if a policy has diseconomies of scale as it expands it becomes costlier to sustain. The supply side of scaling a policy is well studied theoretically and empirically. Economies and diseconomies of scale have been important to economists since the beginning, and have a central role in Adam Smith’s Wealth of Nations65. Below, we learn that the supply side can also have a crucial role in determining the optimally sequencing of our information acquisition. Vital sign 5: representativeness of the situation Our last key component of scaling is representativeness of the situation. This is a broad and rich category that includes any situational feature that impacts the science around scaling. For example, interventions that worked well to accelerate COVID-19 vaccination when vaccines were initially rolled out were less effective later when that initial high demand was largely quenched; it seems to be situational features rather than representativeness of the population that caused such voltage drops66. Likewise, the Becoming a Man Program67 worked well initially but, when the programme was expanded, it lost voltage, and situational features probably had a role, as discussed previously68. Finally, an early study showed that FAFSA (‘Free Application for Federal Student Aid’) simplification worked quite well69; however, when the programme was scaled up using a variation on the original idea, it failed70. As these examples make clear, this final vital sign demands that information across a rich assortment of situational features must be generated before having scaling confidence. In its most basic sense, such information can then be used by the researcher to average these types of CATEs over the distribution of situational covariates in the scaled setting, ‘adjusting’ treatment effects accordingly. This approach can be performed using multisite trials and relying on spatial variation to re-weight in a similar spirit to that done with individual covariates56–62. For example, an intervention designed for schools should sample schools that vary in location/income, teacher quality, scholastic outcomes, size and other variables that are known to be related to key educational outcomes the scholar is attempting to affect. This is because sometimes programmes do not scale up well due to changes in such details or even in who implements the programme18. Work across the social sciences has grappled with providing meaningfully representative samples and situations for decades18,28–30,71–75. A general lesson from the literature in this area is that understanding whether a programme works with the constraints that it will face at scale is invaluable before scaling. The examples show that most discussions 494 | Nature | Vol 626 | 15 February 2024

focus on ensuring that the effects of causes measured in the initial study will remain at scale. A complementary exercise that importantly informs the scaling exercise is to understand the causes of effects76,77. In this spirit, understanding the mediation path(s) and creating a path that will exist in environments in which scaling will occur is vital. Consider recent work that explores the effect that changing parental beliefs has on early childhood investment and ultimately on childhood outcomes78. For this idea to be scalable, it is important to recognize that their approach relies on (1) the treatment changing parental beliefs; (2) parents, whose beliefs change, having adequate resources to invest; (3) those investing parents understanding how to invest; (4) the parents investing accordingly; and (5) those investments moving the child outcome of interest within the experimental time-frame. If this entire chain holds in the target setting of interest with the fluidity that it held in the experimental setting, then the programme passes the mediation path test. A key consideration for scaling that mediation and any relevant moderators in an experimental situation teaches us is that great caution should be taken when drawing conclusions from a localized experiment about a policy implemented at scale. In many quarters, splitting a complex problem into several distinct smaller parts has been celebrated. Although this approach certainly has merits—I have done the same in some of my field experimental work79—care must be taken to understand the complete set of mechanisms before generalizing the insights from a part of the puzzle to the whole. Indeed, without a keen sense of how the whole puzzle fits together (understanding both the causes of effects and effects of causes), one should take great care in generalizing and scaling ideas. The overarching message is that creating ideas that have simple mediation paths is crucial. Greater trust should be given to a policy that is well understood theoretically and one that has a simple mediation path.

What data generation sequence is optimal? Once the five vital signs to create policy-based evidence are understood, the next piece of the puzzle is how to generate such evidence in an optimal manner, where optimality in this case is defined as arriving at the correct conclusion using the fewest resources. The traditional research approach to scaling is one of learning by doing: first run a pilot under idealized (not scalable) conditions, and then, once we find promise under those conditions, run scaling trials under more realistic conditions with real-world scaling complexities increasingly introduced. Beyond medical trials, one area in which this approach has been carried out is the exploration by social scientists of the effects of growth mindset and social belonging using multisite trials58,60,74,80. For example, the growth mindset work advanced from early studies that showed effects of growth mindset interventions in certain settings81 and evolved to develop more scalable insights by systematically testing interventions and then disseminating treatments widely only after those conditions were rigorously tested58,60. Within economics, this approach can be found in the learning-by-doing experiments conducted by Pratham, an Indian nongovernmental organization. To augment human capital accumulation among its students, Pratham designed a programme called ‘teaching at the right level’ (LEVEL). The basic idea of LEVEL is to group children according to their knowledge rather than their age. The partnership between researchers and Pratham started with a trial field experiment in the cities of Vadodara and Mumbai82,83. Over time, Pratham tinkered with aspects of the programme and eventually scaled it to over 33 million children. Although the learning-by-doing approach has merits, there is an important class of problems for which we should flip the current approach. Under this new model, scholars and policymakers should start by imagining what a successful, fully implemented intervention looks like, applied to the entire participant population with their

https://shop126307946.taobao.com/?spm= 2013.1.1000126.3.c9895b91Isk2Oc

varying situations, sustained over a long period. In such cases, we must engage in backward induction rather than focus solely on efficacy tests in the beginning. My reasoning for flipping the research agenda follows from basic economics, which point to three interrelated drivers. First, short-term incentives encourage researchers to create a Petri dish that provides results that are overly optimistic: consumers of scientific journals demand studies that report important insights21. They reward journals through the purchase of subscriptions and by citing a journal’s papers. In turn, a well-documented bias emerges: professional academics, potential funders and laypeople all regard reporting of large treatment effects as more noteworthy than smaller effects84. The result is that we are essentially performing efficacy tests on steroids without telling outsiders. Thus, the first insights given to decision-makers about an idea’s prospects will be of an overly optimistic nature. The second reason relates to the behavioural economics of the problem: impatient decision-makers. In the business world, this time-horizon mismatch problem has been denoted as short-termism, and various estimates point to the issue being widespread and important. For example, 78% of executives surveyed self-report sacrificing long-term value to smooth quarterly earnings85, and the qualitative evidence suggests that the social cost of short-termism is up to 20% of potential output86. Various unresolved issues and proposals to combat short-termism have been advanced87,88. A corollary of short-termism that has been discussed related to policymakers revolves around choices made that involve discounting. The tension between social and private discount rates, particularly for long-term problems such as climate change, and especially how that interacts with uncertainty in eventual damages, reveals the importance of time discounting89–92. Likewise, there are recent policy choices in which impatience has had a key role. Consider the case of anaemia, which affects 1.6 billion people worldwide. To provide solutions to the problem, researchers in India ran pilot studies measuring the benefits of consuming iron-fortified salt on anaemia (double-fortified salt; salt fortified with iron and iodine). Although the early pilots showed some efficacy, surprisingly, in 2012, Indian officials did a nationwide scale-up of double-fortified salt despite the lack of large-scale trials. It turned out that the fortified salt had no effect on the policy goal of reducing general anaemia with the broader population93. Why did this happen? The original studies had specifically sought out adolescent women. While their unique physiology benefited, these health gains did not manifest at scale. Similar failures at scale have occurred elsewhere with initiatives to decrease the rates of transmission of sexually transmitted diseases and promote safe sex, such as providing condoms to a community. Such practices have produced weak results after expansion due to variations in community mores related to sex18. In summary, the first two sequencing considerations suggest that, to achieve optimality in some cases, we must ‘tie’ the researchers’ and decision-makers’ hands. In effect, by committing the researcher to provide relevant information on what the programme will involve at scale, the decision-maker will find it more difficult to ignore that such problems might exist before scaling. This approach has some antecedents in medical trials: a new drug cannot be taken to market until the relevant signposts have been passed. The third reason for reconsidering sequencing of data generation stems from the economics of the idea testing itself: for interventions with a high start-up cost, backward inducting from the beginning makes economic sense. Consider a parochial example that takes a minor step in this direction. I led the creation and launch of the Chicago Heights Early Childhood Center (CHECC) in 2008. From 2007 to 2010, I directed the building of two separate pre-schools (from scratch) in Chicago Heights, Illinois, a suburb south of Chicago. Building the programmes/schools represented large start-up costs, as securing relevant licences, contracts, curricula, buildings, buses, lunches, teachers, administrators,

community and schoolboard buy-in represented a considerable resource cost. We opened the schools in 2010 and conducted the field experiment from 2010 to 2014, including nearly 1,500 students annually. One key feature of our field experiment was a full-day pre-school programme that emphasized non-cognitive skills, such as socialization, active listening and delaying gratification compared with a programme that focused on cognitive skill development. Moreover, the experiment included a set of treatments that included parent academies, whereby the parents were taught rather than their children64,94–99. At this point, one might ask: if I want to create a programme that scales, how should I design the CHECC experiment at the outset? After some introspection, using the traditional learning-by-doing model for CHECC does not exactly fit. This is because, if the researcher had to continuously tinker over time with new programmes, ideas, teachers, administrators, populations of people, treatment types and testing boundary conditions incrementally, the cost to build each of the new pre-schools to generate the necessary data would be exorbitant. This is because the fixed cost to begin data collection differentiates this idea from a drug trial, the ‘wise interventions’ and Pratham’s tutoring programme. In the CHECC case, even continuing data collection at the same site might feature a high fixed cost if considerable programmatic changes are made. When time costs are added, the tinkering model becomes untenable. We decided to take a different route: we introduced ‘option C thinking’ to the standard A/B testing approach.

Introducing option C thinking to A/B testing The cornerstone of the experimental approach in the social sciences is A/B testing. For example, if an early education scholar is interested in testing whether an early education programme works, they will gather a group of children and split them into two groups: a control ‘A’ group that does not receive the intervention and the ‘B’ group that receives the early education programme. If the experiment satisfies the classical identification assumptions42, then a causal effect can be recovered. The causal effect is traditionally measured by comparing sample means of particular outcome variables across the two groups, as displayed in Fig. 1. In the A/B experimental test summarized in the top panel of Fig. 1, the programme is found to triple pre-school readiness from 17% to 51%. One might view this result as extraordinary, and immediately want to scale up the programme. To understand why that choice is not prudent, consider exactly what we have learned from this research. If it is a typical social science experiment, then it has probably been conducted as an efficacy test: the best-case test of the programme is arm B versus the control, arm A. To understand why more information is necessary, we must consider the incentives that the researchers faced. Those incentives are set up to create a Petri dish that provides results that gives the intervention its best shot or, likewise, the greatest treatment effects. In this manner, we are answering the wrong question if we are attempting to provide policy advice. We are asking whether this idea can work in the Petri dish under the best-case situation, rather than whether this idea will work at scale. At this point, the astute critic might respond: of course, this is just a ‘phase 1’ test rather than a ‘phase 3’ test, it is meant for efficacy not generalizability. While that reasoning makes sense for medical trials, in many programme explorations in the social sciences, there is considerable fixed cost to start-up. Accordingly, it is economically and scientifically efficient to not only do the efficacy test, but also relevant tests of scale within the original discovery process. The economics of many situations demand such an approach. I refer to this approach as option C thinking. This is not because it represents a simple treatment arm to augment standard A/B testing. Rather, leveraging option C thinking in our initial designs flips the traditional research model from efficacy trials to an approach that produces the type of policy-based evidence that the science of scaling demands.

Nature | Vol 626 | 15 February 2024 | 495 https://shop126307946.taobao.com/?spm=2013.1.1000126.3.c9895b91Isk2Oc

Perspective Problem How to prepare children for pre-school

Solution Use scientific literature to develop an A/B test that tests for group A against group B

Children in group A (no intervention) Result: 17% pre-school ready Children in group B (early education programme) Result: 51% pre-school ready

Outcome Group B wins

Assess ideal circumstances

Consequence Group B is scaled up but does not work at scale

Test at scale Flip

Assess constraints at scale

Think at scale What resources are there? Who are the people we need to help?

Test at scale

A/B Design an intervention with these parameters in mind

Children in group C (early education programme with scaling constraints) Result: 18% pre-school ready

Outcome Group C has a tiny effect above control

Consequence The programme will not scale well without key changes

Fig. 1 | Adding option C thinking to A/B testing. The cornerstone of the experimental approach in the social sciences is A/B testing. For example, to test whether an education programme works, the researcher takes a group of children and splits them into two groups: a control A group that does not receive the intervention and a B group that receives the early education programme. This represents the two arms in the top panel of the diagram. If the experiment satisfies the classical identification assumptions42, then a causal effect can be recovered. In this example, the programme is found to move pre-school readiness from 17% (option A: control) to 51% (option B: treatment). However, if this is a typical social science experiment, then the A/B test has probably been conducted as an efficacy test. In this manner, we are answering the wrong question if we are attempting to give policy advice with the A/B test.

In such a case, we are asking whether this idea can work in the best-case situation, rather than whether this idea will work at scale. There are many situations in which the economics and science demand a new approach. I refer to this approach as option C thinking: create the evidence that provides greater scaling confidence in the original design alongside the efficacy test. To complete our thought experiment, in this case, after doing so, we find in the lower panel, that the scaled programme accounting for necessary constraints raises pre-school readiness to 18%—only a 1 percentage point increase relative to the control. Accordingly, when the programme is conducted with the constraints that it will face at scale, the programme probably does not pass a cost–benefit test, even though it looked incredible in the efficacy test.

The option C analogy is meant to take the initial discovery process from one of focusing purely on the details of an efficacy test to engaging in a bigger picture view, including examining what constraints the idea will face at scale, what key factors can impact scaling, and whether the mediation paths and moderators are in place at scale. Such evidence should be generated in the original design alongside the efficacy test. To complete our thought experiment, in the lower panel of Fig. 1, when following this approach, we add group C, which tests a programme that includes the constraints or issues that the idea will face at scale. After doing so, we find that the programme that will be scaled minimally increases the pre-school readiness compared to the control group—from 17% to 18%—a result that is not statistically significant relative to the control and certainly will not pass a cost–benefit test. Returning to our CHECC example, in this case, we augmented traditional A/B testing by using option C thinking as follows. If we backward-induct from the reality that, at scale in thousands of schools, our programme would not have its dream budget or a dream applicant pool of teachers to choose from, then several potential issues emerge. We therefore designed our experiment to examine whether our curriculum could work with teachers who have varying abilities. That is, we employed teachers who would typically come and work in a school district like Chicago Heights. As we prepared to open CHECC, we hired our 30 teachers and administrators the same way the Chicago Heights public schools would, from the same candidate pool and with the same salary caps. This choice provided the A/B efficacy test because we had several stellar teachers, but we populated option C too, which ensured that the situation was representative, at least on the dimension of teacher quality. While it may sound counterintuitive to not search out and hire only the best talent in the early stages of an endeavour, it does provide insights necessary to scale from the outset. Our approach provided a teacher pool that would allow us to perform heterogeneity testing on teacher value-added metrics; or as discussed

above, permit us to measure relevant CATEs. Of course, this example focuses on the teacher input for a pre-school readiness programme but, beyond teachers, a wealth of other considerations could be examined, as discussed above with the five vital signs. For example, at scale, we also must convince many principals and superintendents to implement our experiment with our new way of doing things. Fidelity concerns might yield voltage effects at scale18. The larger point here is that there are key cases in which the researcher should backward induct and explore crucial elements and constraints that the idea will face at scale. But how can we identify those key cases in which we should perform such tests?

496 | Nature | Vol 626 | 15 February 2024

Introducing HFIDs Combining the three-interrelated drivers discussed above—researcher incentives, impatient decision-makers and programme fixed costs— provides a strong impetus for flipping the scaling research approach for certain idea types. Specifically, uniting the notions of high-fixed-cost programmes and impatient decision-makers creates the class of potentially scalable ideas that I denote as HFIDs: high fixed cost with impatient decisionmakers. I provide Fig. 2 to categorize such ideas in a digestible format. For example, region I of Fig. 2 includes true HFIDs. These are programmes that are characterized as high-fixed-cost programmes to generate information and are in topic areas in which decision-makers are impatient. Region II of Fig. 2 relaxes the impatient requirement but continues with high-fixed-cost programmes. Regions III and IV include low-fixed-cost programmes, such as many of those discussed above that use the learning-by-doing approach. CHECC falls into region I or II and it is possible to fall into both as policymakers around the globe hold their own views. Making my point anecdotally, we can examine how practitioners have used CHECC results. In one example, policymakers scaled the parental component of CHECC in

https://shop126307946.taobao.com/?spm= 2013.1.1000126.3.c9895b91Isk2Oc

Decision-maker impatience

III

I

Low fixed cost to generate data and impatient decision-makers.

High fixed cost to generate data and impatient decision-makers.

For example, the best text messages to send to encourage vaccination against a deadly new virus.

For example, the best combination of incentives and face-to-face community health outreach to use to encourage vaccination against a deadly new virus.

IV

II

Low fixed cost to generate data and patient decision-makers.

High fixed cost to generate data and patient decision-makers.

For example, the best e-mail that a fundraiser can send to maximize charitable contributions.

For example, the best all-day pre-school programme to augment skill formation at a young age (CHECC). Fixed cost

Fig. 2 | Two dimensions that affect optimal data generation sequencing. What types of ideas are most important for the researcher to backward induct? All other things being equal, the strongest case can be made for ideas in region I. These are programmes that are characterized as high-fixed-cost programmes to generate information and are in topic areas in which decision-makers are impatient. Regions II, III and IV relax one or both elements. If the idea falls into one of these regions, then I view the prudent action as considering the benefits and costs of flipping the traditional research model. In many such instances, introducing option C thinking in the beginning will be optimal. One example is quadrant IV: with adequate sample sizes, it would be quite simple and costeffective to explore mechanisms and relevant tests of scale within the original charitable-giving discovery process. A useful analogy naturally arises. In engineering, a common approach is that you reverse engineer the testing from the desired use cases, which usually involve the product working reliably under many different conditions. For example, when engineering a plane, it would not make sense to follow the typical A/B testing method of starting with a test in a wind tunnel (paired with analogous computer simulations), then attempting a short journey—say from Chicago to Indianapolis—and then, if that works, you assume that the plane can fly around the world. Instead, you would engineer the plane for all conditions and backwards map the wind tunnel experiments onto how you would want the plane to perform under all possible conditions. You would then go out and test fly the plane, eventually pushing the engineered envelope of performance (which, by design, has been set beyond the parameters of expected use). After necessary tweaking, you release to the real world. While this is not the current state of art in the social sciences, I argue that it should feature prominently in our toolkit.

London without incentivizing parents, ignoring that parent incentives were used in CHECC18. Although we did not have evidence that showed conclusively that parental incentives were necessary, we advised local officials that they were probably quite important. Their programme did not scale well. Had we shown experimentally that a non-negotiable for the programme to work was parental incentives, perhaps the officials would have been less likely to scale up the programme without them. Alternatively, in another distinct case, scaling the full-day pre-school CHECC programme has worked well in Bangladesh100. This is congruent with the fact that CHECC worked well with a variety of teachers in the initial experiment; we therefore had scaling confidence in that input variety. My perspective is that, for those cases that fall into region I of Fig. 2, an approach of flipping the traditional research and policy-development models should be used. That is, in the original experimental designs and prototyping, option C thinking should be included to understand potential stumbling blocks from the beginning. In this sense, using backward induction to create policy-based evidence begins by identifying features that can cause voltage drops and then testing those

factors. If the idea falls into regions II or III, I view the prudent action as considering the benefits and costs of flipping the traditional research and policy-development models. In many such instances, introducing option C thinking in the beginning will make sense. Although, in theory, region IV might yield the least benefits of introducing option C thinking, I argue that the researcher should still consider doing so if there are economies of scale in data collection. For example, one question that falls into quadrant IV relates to non-profits: what is the best email that a fundraiser can send to maximize charitable contributions? With adequate sample sizes, it is quite simple and cost-effective to explore mechanisms and relevant tests of scale within the original discovery process when answering this question.

A path forward and its potholes I suspect that most academics and practitioners agree with my policy-based evidence argument. The key is determining whether, and to what extent, the five vital signs curb our scaling enthusiasm. Yet, there is probably more consternation around my sequencing proposal, which remains largely a theoretical construct. This is because it is difficult to find social science experiments that imagine what a successful, fully implemented intervention looks like, applied to the entire participant population with their varying situations, sustained over a long period of time and test accordingly at the outset. In this manner, the CHECC example takes a small step using backward induction from the beginning, but many key pieces were not tested because of resource constraints, such as whether incentives are necessary in the parent academy or the effect of administrators. A useful path forward is to develop an understanding of where and when high fixed start-up cost programmes that are delivering insights to impatient decision-makers arise. Those are ripe cases to not only do the efficacy test, but also relevant tests of scale within the original discovery process by adding option C thinking. Likewise, even when an idea falls into regions II, III or IV of Fig. 2, it is important to explore the economics of the research process to determine whether option C thinking should be tested alongside the best-case programme. Policymakers deserve such an approach, and the economics of many programme and idea evaluations that have a chance to scale demand such an approach. However, my perspective on timing is not without issues. First, there are potentially wasted resources—exploring whether an idea works in a compromised situation might not be worthwhile if it cannot even succeed in its best light. Alternatively, we may also want to avoid the risk of dropping valuable ideas too quickly that could work in some settings, only because our initial effort at implementing them in certain settings was sub-par. In other words, there are good reasons why we might take the easy piloting steps first, and slowly grow to tackle the more challenging situations. Thus, a useful trade-off emerges: if an idea is in region I of Fig. 2, do we gamble by putting efficacy estimates in impatient hands when we know that it is questionable whether it will work in realistic states of the world? This consideration leads to a related second issue. Some academics might view my scaling concerns as missing the real point, which, in their opinion, is that academics’ research is not used for policy purposes enough. My view is if that statement is true, then it might have arisen because our ideas have experienced such substantial voltage effects in the past, leading policymakers to lose confidence in our results. To regain trust, we must be sure to advance scalable policies that satisfy the five vital signs. Under my approach, we are not slowing down progress and putting up new barriers for scientists, we are creating a systematic research agenda that, in the long run, will lead to more science being used by policymakers because it is more likely to work at scale. A third area of concern is potential inequities. Demanding greater information from researchers in early stages might lead to an

Nature | Vol 626 | 15 February 2024 | 497 https://shop126307946.taobao.com/?spm=2013.1.1000126.3.c9895b91Isk2Oc

Perspective inequitable distribution of scientific discoveries, journal pages and subsequent grant funding should this approach be possible by only well-funded researchers. This concern represents a key resource call to funders. Such a call could be answered in many forms. For example, it might yield critical investments in shared research and development infrastructure to move ideas from regions I and II in Fig. 2 to regions III and IV. Eliminating the need for researchers to produce their own CHECCs by allowing them to share infrastructure can considerably change the economics of knowledge creation. Alternatively, funders, such as the National Science Foundation, National Institutes of Health and private foundations, could recognize that current data generation is not optimal to produce scaling insights. They could create funds that recognize HFIDs, and the need for them to generate data through scaling treatment arms from the beginning. Likewise, research organizations such as the Abdul Latif Jameel Poverty Action Lab, Resources for the Future and other well-funded organizations might have the ability to do this within their strategic agenda. At this point, a useful consideration is where medical trials fit in Fig. 2. I trust that impatience served as a key impetus for why regulators created safeguards to avoid decision-makers releasing new drugs too early. Thus, by design, regulators have restricted decision-makers in a manner that has been absent for policymakers enacting social rulemakings (although cost–benefit analysis is completed on every major economic rulemaking in the USA101). Furthermore, for drug testing, once the high fixed costs have been absorbed in the innovation stage to create novel compound recognition, generating new data to explore dosage levels, frequencies or treatment heterogeneities in the diffusion stage does not have a high fixed cost. Thus, by regulatory fiat and the economics of the problem, one can argue that the current landscape dictates that the modal medical trial is in region I, where my approach has the least bite. What my Perspective makes clear is that the promise of the scientific method within the social sciences will not be realized until we understand that grasping the interrelationships of factors in field settings is only the beginning. We must then seek to understand the mechanisms underlying those relationships, and whether more distant phenomena have the same underlying structure. Together, such information provides insights into whether the causal impacts of treatments implemented in one environment transfer to other environments, be them spatially, temporally or scale differentiated. Until policy-based evidence is gathered in an optimal manner, we will not reap the true rewards of experimentation. 1. 2. 3.

4. 5.

6.

7. 8. 9. 10.

11. 12. 13. 14.

McCall, W. A. How to Measure in Education (Macmillan, 1922). Gosnell, H. F. Getting Out the Vote (Univ. Chicago Press, 1927). Fisher, R. A. The Design of Experiments (Oliver and Boyd, 1935). Represented an early formal treatment of the experimental method and created a methodological tripod that remains in use today. Lewin, K. Field theory and experiment in social psychology: concepts and methods. Am. J. Sociol. 44, 868–896 (1939). Smith, V. L. An experimental study of competitive market behavior. J. Polit. Econ. 70, 111–137 (1962). Helped to establish laboratory experiments as a tool for modern empirical economics and showcased its power using market experiments. Harrison, G. W. & List, J. A. Field experiments. J. Econ. Lit. 42, 1009–1055 (2004). Helped to establish field experiments as a useful tool for social scientists and created a typology for field experimental approaches. List, J. A. Homo experimentalis evolves. Science 321, 207–209 (2008). List, J. A. The nature and extent of discrimination in the marketplace: evidence from the field. Q. J. Econ. 119, 49–89 (2004). Al-Ubaydli, O. & List, J. A. How natural field experiments have enhanced our understanding of unemployment. Nat. Hum. Behav. 3, 33–39 (2019). Banerjee, A. V., Duflo, E., Glennerster, R. & Kothari, D. Improving immunisation coverage in rural India: clustered randomised controlled evaluation of immunisation campaigns with and without incentives. Brit. Med. J. 340, c2220 (2010). Ostrom, E. Governing the Commons: The Evolution of Institutions for Collective Action (Cambridge Univ. Press, 1990). List, J. A. The market for charitable giving. J. Econ. Perspect. 25, 157–180 (2011). DellaVigna, S., List, J. A. & Malmendier, U. Testing for altruism and social pressure in charitable giving. Q. J. Econ. 127, 1–56 (2012). DellaVigna, S., List, J. A., Malmendier, U. & Rao, G. Estimating social preferences and gift exchange at work. Am. Econ. Rev. 112, 1038–1074 (2022).

498 | Nature | Vol 626 | 15 February 2024

15.

16. 17.

18. 19.

20. 21.

22.

23.

24.

25.

26. 27. 28. 29. 30. 31. 32. 33. 34. 35. 36. 37. 38. 39.

40. 41. 42. 43. 44. 45.

46. 47. 48. 49. 50.

51.

Halperin, B., Ho, B., List, J. A. & Muir, I. Toward an understanding of the economics of apologies: evidence from a large-scale natural field experiment. Econ. J. 132, 273–298 (2022). Levitt, S. D. & List, J. A. Field experiments in economics: the past, the present, and the future. Eur. Econ. Rev. 53, 1–18 (2009). Mobarak, A. M. Assessing social aid: the scale-up process needs evidence, too. Nature 609, 892–894 (2022). Provided a useful and meaningful scientific discussion of the science of scaling in development economics. List, J. A. The Voltage Effect: How to Make Good Ideas Great and Great Ideas Scale (Currency, 2022). Al-Ubaydli, O., List, J. A. & Suskind, D. L. What can we learn from experiments? Understanding the threats to the scalability of experimental results. Am. Econ. Rev. 107, 282–286 (2017). Al-Ubaydli, O., List, J. A., LoRe, D. & Suskind, D. Scaling for economists: lessons from the non-adherence problem in the medical literature. J. Econ. Perspect. 31, 125–144 (2017). Al-Ubaydli, O., List, J. A. & Suskind, D. 2017 Klein Lecture: the science of using science: toward an understanding of the threats to scalabality. Int. Econ. Rev. 61, 1387–1409 (2020). Provided a theoretical structure to understand the science of using science and generated insights that led to the five vital signs discussed here. Al-Ubaydli, O., Lee, M. S., List, J. A., Mackevicius, C. L. & Suskind, D. How can experiments play a greater role in public policy? Twelve proposals from an economic model of scaling. Behav. Publ. Pol. 5, 2–49 (2021). How to Solve U.S. Social Problems When Most Rigorous Program Evaluations Find Disappointing Effects (Part One in a Series) (Straight Talk on Evidence, 2018); www. straighttalkonevidence.org/2018/03/21/how-to-solve-u-s-social-problems-when-mostrigorous-program-evaluations-find-disappointing-effects-part-one-in-a-series/. Brandon, A., Clapp, C. M., List, J. A., Metcalfe, R. D. & Price, M. The Human Perils of Scaling Smart Technologies: Evidence from Field Experiments Working Paper Series No. 30482 (National Bureau of Economic Research, 2022). Raikes, H. et al. Involvement in early head start home visiting services: demographic predictors and relations to child and parent outcomes. Early Child. Res. Q. 21, 2–24 (2006). Shapley, H. Of Stars and Men: the Human Response to an Expanding Universe (Washington Square Press, 1964). Newton, I. Philosophiæ Naturalis Principia Mathematica (London, 1687) (Harvard Univ. Press, 1966). Brunswik, E. Perception and the Representative Design of Psychological Experiments 2nd edn (Univ. California Press, 1956) Campbell, D. T., & Stanley, J. C. Experimental and Quasi-Experimental Designs for Research (Rand McNally & Company, 1963). Al-Ubaydli, O., & List, J. A. in Methods of Modern Experimental Economics (eds Frechette, G. & Schotter, A.) chapter 20, 420–462 (Oxford Univ. Press, 2013). List, J. A. Non Est Disputandum de Generalizability? A Glimpse into the External Validity Trial Working Paper 27535 (National Bureau of Economic Research, 2020). Nosek, B. A., Spies, J. R. & Motyl, M. Scientific utopia: II. Restructuring incentives and practices to promote truth over publishability. Perspect. Psychol. Sci. 7, 615–631 (2012). Jennions, M. D. & Møller, A. P. A survey of the statistical power of research in behavioral ecology and animal behavior. Behav. Ecol. 14, 438–445 (2003). Camerer, C. F. et al. Evaluating replicability of laboratory experiments in economics. Science 351, 1433–1436 (2016). Camerer, C. F. et al. Evaluating the replicability of social science experiments in nature and science between 2010 and 2015. Nat. Hum. Behav. 2, 637–644 (2018). List, J. A., Bailey, C. D., Euzent, P. J. & Martin, T. L. Academic economists behaving badly? A survey on three areas of unethical behavior. Econ. Inq. 39, 162–170 (2001). Dreber, A. et al. Using prediction markets to estimate the reproducibility of scientific research. Proc. Natl Acad. Sci. USA 112, 15343 (2015). Benjamin, D. J. et al. Redefine statistical significance. Nat. Hum. Behav. 2, 6–10 (2018). Butera, L. & List, J. A. An Economic Approach to Alleviate the Crises of Confidence in Science: With an Application to the Public Goods Game (National Bureau of Economic Research, 2017). Buck, S. Policy-Based Evidence Doesn't Always Get it Backward, www.arnoldventures.org/ stories/when-policy-based-evidence-is-exactly-what-is-needed (Arnold Ventures, 2019). Ioannidis, J. P. A. Why most published research findings are false. PLoS Med. 2, e124 (2005). List, J. A. Experimental Economics: Theory and Practice (Univ. Chicago Press, 2024). Maniadis, Z., Tufano, F. & List, J. A. One swallow doesn’t make a summer: new evidence on anchoring effects. Am. Econ. Rev. 104, 277–290 (2014). Reed, W. R. A primer on the ‘reproducibility crisis’ and ways to fix it. Aust. Econ. Rev. 51, 286–300 (2018). Butera, L., Grossman, P. J., Houser, D., List, J. A. & Villeval, M.-C. A New Mechanism to Alleviate the Crises of Confidence in Science—With An Application to the Public Goods Game (National Bureau of Economic Research, 2020). Maniadis, Z., Tufano, F. & List, J. A. To replicate or not to replicate? Exploring reproducibility in economics through the lens of a model and a pilot study. Econ. J. 127, F209–F235 (2017). Cleave, B. L., Nikiforakis, N. & Slonim, R. Is there selection bias in laboratory experiments? The case of social and risk preferences. Exp. Econ. 16, 372–382 (2013). Doty, R. L. & Silverthorne, C. Influence of menstrual cycle on volunteering behaviour. Nature 254, 139–140 (1975). Rosenthal, R. & Rosnow, R. L. Artifacts in Behavioral Research: Robert Rosenthal and Ralph L. Rosnow’s Classic Books (Oxford Univ. Press, 2009). Orne, M. T. On the social psychology of the psychological experiment: with particular reference to demand characteristics and their implications. Am. Psychol. 17, 776–783 (1962). Henrich, J. et al. In search of homo economicus: behavioral experiments in 15 small-scale societies. Am. Econ. Rev. 91, 73–78 (2001).

https://shop126307946.taobao.com/?spm= 2013.1.1000126.3.c9895b91Isk2Oc

52. Henrich, J., Heine, S. J. & Norenzayan, A. The weirdest people in the world? Behav. Brain Sci. 33, 61–83 (2010). Called attention to, and created a useful discussion of, the importace of participant pools in social science experiments. 53. Henrich, J., Heine, S. J. & Norenzayan, A. Most people are not WEIRD. Nature 466, 29 (2010). 54. Fehr, E. & List, J. A. The hidden costs and returns of incentives—trust and trustworthiness among CEOs. J. Eur. Econ. Assoc. 2, 743–771 (2004). 55. Levitt, S. D. & List, J. A. What do laboratory experiments measuring social preferences reveal about the real world? J. Econ. Perspect. 21, 153–174 (2007). Called attention to, and created a useful discussion of, the importance of both the population of experimental participants and the population of situations in economic experiments. 56. Hotz, J. V., Imbens, G. W. & Mortimer, J. H. Predicting the efficacy of future training programs using past experiences at other locations. J. Econom. 125, 241–270 (2005). 57. Kern, H. L., Stuart, E. A., Hill, J. & Green, D. P. Assessing methods for generalizing experimental impact estimates to target populations. J. R. Educ. Effect. 9, 103–127 (2016). 58. Yeager, D. S. et al. A national experiment reveals where a growth mindset improves achievement. Nature 573, 364–369 (2019). 59. Yeager, D. S., Krosnick, J. A., Visser, P. S., Holbrook, A. L. & Tahk, A. M. Moderation of classic social psychological effects by demographics in the U.S. adult population: new opportunities for theoretical advancement. J. Person. Soc. Psychol. 117, e84–e99 (2019). 60. Yeager, D. S. et al. Teacher mindsets help explain where a growth-mindset intervention does and doesn’t work. Psychol. Sci. 33, 18–32 (2022). 61. Tipton, E. Y. et al. Sample selection in randomized experiments: a new method using propensity score stratified sampling. J. Res. Educ. Effect. 7, 114–135 (2014). 62. Rudolph, K. E. et al. Composition or context: using transportability to understand drivers of site differences in a large-scale housing experiment. Epidemiology. 29, 199–206 (2018). 63. Miguel, E. & Kremer, M. Worms: identifying impacts on education and health in the presence of treatment externalities. Econometrica 72, 159–217 (2004). An early field experiment in development economics that showed the impact of understanding spillover effects in economic experiments. 64. List, J. A., Momeni, F. & Zenou, Y. Are Measures of Early Education Programs Too Pessimistic? Evidence from a Large-Scale Field Experiment Working Paper (National Bureau of Economic Research, 2019). 65. Smith, A. An Inquiry Into the Nature and Causes of the Wealth of Nations (A. Strahan & T. Cadell, 1776). 66. Rabb, N. et al. Evidence from a statewide vaccination RCT shows the limits of nudges. Nature 604, E1–E7 (2022). 67. Heller, S. B. et al. Thinking, fast and slow? Some field experiments to reduce crime and dropout in Chicago. Q. J. Econ. 132, 1–54 (2017). 68. Bhatt, M. P., Guryan, J., Ludwig, J. & Shah, A. K. Scope Challenges to Social Impact Working Paper 28406 (National Bureau of Economic Research, 2021). 69. Bettinger, E. P., Long, B. T., Oreopoulos, P. & Sanbonmatsu, L. The role of application assistance and information in college decisions: results from the H&R block FAFSA experiment. Q. J. Econ. 127, 1205–1242 (2012). 70. Bird, K. A. et al. Nudging at scale: experimental evidence from FAFSA completion campaigns. J. Econ. Behav. Organ. 183, 105–128 (2021). 71. Bryan, C. J., Tipton, E. & Yeager, D. S. Behavioural science is unlikely to change the world without a heterogeneity revolution. Nat. Hum. Behav. 5, 980–989 (2021). 72. List, J. A. On the interpretation of giving in dictator games. J. Polit. Econ. 115, 482–493 (2007). 73. List, J. A. The behavioralist meets the market: measuring social preferences and reputation effects in actual transactions. J. Polit. Econ. 114, 1–37 (2006). 74. Walton, G. M. & Yeager, D. S. Seed and soil: psychological affordances in contexts help to explain where wise interventions succeed or fail. Curr. Dir. Psychol. Sci. 29, 219–226 (2020). 75. Szaszi, B. et al. No reason to expect large and consistent effects of nudge interventions. Proc. Natl Acad. Sci. USA 119, e2200732119 (2022). 76. Holland, P. W. Statistics and causal inference. J. Am. Stat. Assoc. 81, 945–960 (1986). 77. Deaton, A. & Cartwright, N. Understanding and misunderstanding randomized controlled trials. Soc. Sci. Med. 210, 2–21 (2018). 78. List, J. A., Pernaudet, J. & Suskind, D. L. Shifting parental beliefs about child development to foster parental investments and improve school readiness outcomes. Nat. Commun. 12, 5765 (2021). 79. List, J. A. & Shogren, J. F. Calibration of the difference between actual and hypothetical valuations in a field experiment. J. Econ. Behav. Organ. 37, 193–205 (1998). 80. Walton, G. M. et al. Where and with whom does a brief social-belonging intervention promote progress in college? Science 380, 499–505 (2023).

81.

82. 83. 84. 85. 86. 87. 88. 89. 90. 91. 92. 93. 94. 95.

96.

97. 98.

99. 100. 101. 102. 103.

104.

Blackwell, L. S., Trzesniewski, K. H. & Dweck, C. S. Implicit theories of intelligence predict achievement across an adolescent transition: a longitudinal study and an intervention. Child Dev. 78, 246–263 (2007). Banerjee, A. V., Cole, S., Duflo, E. & Linden, L. Remedying education: evidence from two randomized experiments in India. Q. J. Econ. 122, 1235–1264 (2007). Banerjee, A. et al. From proof of concept to scalable policies: challenges and solutions, with an application. J. Econ. Perspect. 31, 73–102 (2017). Allcott, H. Site selection bias in program evaluation. Q. J. Econ. 130, 1117–1165 (2015). Graham, J. R., Harvey, C. R. & Rajgopal, S. The economic implications of corporate financial reporting. J. Account. Econ. 40, 3–73 (2005). Davies, R., Haldane, A. G., Nielsen, M. & Pezzini, S. Measuring the costs of short-termism. J. Financ. Stab. 12, 16–25 (2014). Laverty, K. J. Economic “short-termism”: the debate, the unresolved issues, and the implications for management practice and research. AMR 21, 825–860 (1996). Marginson, D. & McAulay, L. Exploring the debate on short-termism: a theoretical and empirical analysis. Strateg. Manag. J. 29, 273–292 (2008). Caplin, A. & Leahy, J. The social discount rate. J. Polit. Econ. 112, 1257–1268 (2004). Stern, N. The Economics of Climate Change: The Stern Review (Cambridge Univ. Press, 2006). Dasgupta, P. Discounting climate change. J. Risk Uncertain. 37, 141–169 (2008). Weitzman, M. L. On modeling and interpreting the economics of catastrophic climate change. Rev. Econ. Stat. 91, 1–19 (2009). Banerjee, A., Barnhardt, S. & Duflo, E. Can iron-fortified salt control anemia? Evidence from two experiments in rural Bihar. J. Dev. Econ 133, 127–146 (2018). Fryer, R. G., Levitt, S. D., List, J. A. & Samek, A. Towards an Understanding of What Works in Preschool Education, working paper (Univ. Chicago, 2017). Fryer, J., Roland G., Levitt, S. D., List, J. A. & Samek, A. Introducing CogX: A New Preschool Education Program Combining Parent and Child Interventions Working Paper (National Bureau of Economic Research, 2020). Charness, G., List, J. A., Rustichini, A., Samek, A. & Van De Ven, J. Theory of mind among disadvantaged children: evidence from a field experiment. J. Econ. Behav. Organ. 166, 174–194 (2019). Andreoni, J. et al. Toward an understanding of the development of time preferences: evidence from field experiments. J. Publ. Econ. 177, 104039 (2019). Andreoni, J., Di Girolamo, A., List, J. A., Mackevicius, C. & Samek, A. Risk preferences of children and adolescents in relation to gender, cognitive skills, soft skills, and executive functions. J. Econ. Behav. Organ. 179, 729–742 (2020). Cappelen, A., List, J., Samek, A. & Tungodden, B. The effect of early-childhood education on social preferences. J. Polit. Econ. 128, 2739–2758 (2020). Islam, A., List, J. A., Vlassopoulos, M. & Zenou, Y. Early Childhood Education, Parent Social Networks, and Child Development, working paper (Univ. Chicago, 2023). List, J. A. Field experiments: a bridge between lab and naturally-occurring data. BE J. Econ. Anal. Pol. 5(2), 1–47 (2007). Hall, J. V., Horton, J. J. & Knoepfle, D. T. Pricing in Designed Markets: The Case of Ride-Sharing Working Paper (National Bureau of Economic Research, 2021). Chandar, B., Gneezy, U., List, J. A. & Muir, I. The Drivers of Social Preferences: Evidence from a Nationwide Tipping Field Experiment Working Paper 26380 (National Bureau of Economic Research, 2019). Acemoglu, D., Laibson, D. I. & List, J. A. Economics (Pearson, 2017).

Acknowledgements Many thanks to K. Milkman, A. Mobarak and D. Yeager for comments that markedly improved the message of this study. F. Fatchen and D. Franks provided research assistance. Competing interests The author declares no competing interests. Additional information Correspondence and requests for materials should be addressed to John A. List. Peer review information Nature thanks Katherine Milkman, Ahmed Mobarak and David Yeager for their contribution to the peer review of this work. Reprints and permissions information is available at http://www.nature.com/reprints. Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author selfarchiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law. © Springer Nature Limited 2024

Nature | Vol 626 | 15 February 2024 | 499 https://shop126307946.taobao.com/?spm=2013.1.1000126.3.c9895b91Isk2Oc

Article

Rapid spin changes around a magnetar fast radio burst https://doi.org/10.1038/s41586-023-07012-5 Received: 19 May 2023 Accepted: 21 December 2023

Chin-Ping Hu1,2,16 ✉, Takuto Narita3, Teruaki Enoto2,3,16 ✉, George Younes4,16 ✉, Zorawar Wadiasingh4,5,6,16 ✉, Matthew G. Baring7,16, Wynn C. G. Ho8, Sebastien Guillot9, Paul S. Ray10, Tolga Güver11,12, Kaustubh Rajwade13, Zaven Arzoumanian4, Chryssa Kouveliotou14, Alice K. Harding15 & Keith C. Gendreau4

Published online: 14 February 2024 Check for updates

Magnetars are neutron stars with extremely high magnetic fields (≳1014 gauss) that exhibit various X-ray phenomena such as sporadic subsecond bursts, long-term persistent flux enhancements and variable rotation-period derivative1,2. In 2020, a fast radio burst (FRB), akin to cosmological millisecond-duration radio bursts, was detected from the Galactic magnetar SGR 1935+2154 (refs. 3–5), confirming the long-suspected association between some FRBs and magnetars. However, the mechanism for FRB generation in magnetars remains unclear. Here we report the X-ray observation of two glitches in SGR 1935+2154 within a time interval of approximately nine hours, bracketing an FRB that occurred on 14 October 20226,7. Each glitch involved a significant increase in the magnetar’s spin frequency, being among the largest abrupt changes in neutron-star rotation8–10 observed so far. Between the glitches, the magnetar exhibited a rapid spin-down phase, accompanied by an increase and subsequent decline in its persistent X-ray emission and burst rate. We postulate that a strong, ephemeral, magnetospheric wind11 provides the torque that rapidly slows the star’s rotation. The trigger for the first glitch couples the star’s crust to its magnetosphere, enhances the various X-ray signals and spawns the wind that alters magnetospheric conditions that might produce the FRB.

SGR 1935+2154, the most active Galactic magnetar of the past decade with a spin period of 3.25 s and a dipole (polar) magnetic-field strength of 4 × 1014 G (ref. 12), has shown several major X-ray and radio outbursts13,14. The most recent outburst episode of SGR 1935+2154 occurred in October 2022 and lasted several days, during which it emitted hundreds of short X-ray bursts15–17. At Coordinated Universal Time (UTC) 19:21:47 (topocentric time of the Canadian Hydrogen Intensity Mapping Experiment, hereafter CHIME) on 14 October 2022, this magnetar emitted a fast radio burst (FRB) with multiple radio peaks measured with CHIME and the Robert C. Byrd Green Bank Telescope (GBT)6,7. Alerted before this activity15, we initiated a series of Neutron Star Interior Composition Explorer (NICER) observations with 75-ks exposure and Nuclear Spectroscopic Telescope Array (NuSTAR) X-ray observations with 96-ks exposure from all scans starting from 17:32:40 UTC on 12 October until 6 November 2022 (Extended Data Table 1). Hereafter, we use the elapsed time t since the FRB detection at barycentric-corrected modified Julian date (MJD) 59,866.80817034 at infinite frequency. Our high-cadence X-ray observations achieved

67% temporal coverage bracketing the FRB, −17 h 1 1 (green circles) and

YFP Reinforce A

Reinforce B


0.25 0.125




** **

NS

NS

0.5

0.5

-0.9

***

Occurring during laser on

4

0

***

0.4

Versus others

100

8

Occurring during laser on

0.8 0.6

***

Baseline Shuffled Real data

1.0

Shuffled

5

–1.8

Reinforcement 1 sessions

650

550 Time interval preceding target

i Align baseline behaviour to target/simulated laser

**** **** ****

700

600

80 12

g

800 750

Transitions in window (%)

0.2

f Deviance

0.3

Similarity

0.4

Similarity

EMD score

0.4

0.1

*** *** **

0.6

log2-transformed average normalized frequency

d

0.5

Reinforce B 5 6 7

0.5 Session

c

Reinforce A 1 2 3

1

Laser on

ng

Reinforce B 5 6 7

2

si

Session

YFP mean YFP individual

en t

Reinforce A 1 2 3

ChR2–YFP individual

cr ea

1

ns i

4

1

**** ****

4

De

4

**** ****

8

Tr a

16

ChR2–YFP mean

****

In c su rea st se ai d ne / d

64

16

16

Area under precision-recall curve

64

****

32

log2-transformed inter-target action interval (s)

256

log2-transformed inter-target action interval (s)

256

0.6

NS 1.2

NS 1.8

Moving windows (seconds from central window) Backwards from central window

Forward from central window

Fig. 3 | DA mediates retrospective reinforcement of freely moving behaviour. a,b, ChR2-dependent reinforcement decreases inter-action intervals for action A (a; n = 15 ChR2–YFP) and B (b; n = 13 of 15 ChR2–YFP). n = 10 (YFP). n values indicate animals (biological replicates). For a and b, data are mean ± s.e.m. Significant differences across time and ChR2–YFP/YFP are indicated (mixed-effects model: action A: F3,69 = 72.26, P = 3.0 × 10 −21; action B: F3,62 = 33.78, P = 4.6 × 10 −13). For b, post hoc two-tailed Tukey’s test was applied for multiple-comparison analysis of the data shown in a. c–e, The distribution of action dynamic types (n = 464 (non-target actions), 15 (target actions) and 15 (ChR2–YFP mice)) according to target similarity and median time to target (c), target similarity (d) and median time to target (e). For d and e, the violin plots show the median and quartiles. Statistical analysis was performed using two-tailed permutation tests; Bonferroni-adjusted P values are shown. f,g, Multinomial logistic regression of all factor combinations in real data (200 independent models) versus shuffled data (10,000 independent models, 50 independently shuffled datasets). Baseline, 200 independent models. f, The two-factor model fits data better than one-factor models. Groups differ across combinations (repeated-measures two-way ANOVA; F2,30,594 = 1,082, P = 0.0 × 100). Two-tailed post hoc Dunnett multiple-comparisons test was applied. Data are

mean ± s.d. g, The performance of the double-factor regression model was determined using the area under the precision-recall curve criterion. Statistical analysis was performed using two-tailed permutation tests; Bonferroni-adjusted P values are shown (P = 5.9 × 10 −4, all comparisons). Data are mean ± s.e.m. h, The pipeline for identifying sliding-window-enriched action transitions. D→E, arbitrary action transition. i, ChR2-dependent reinforcement for action A increases sliding-window-enriched action transitions before and during stimulation. The average normalized frequency of action transitions enriched within specific sliding windows was plotted over sessions 1–3. Top, the percentage of transitions occurring during stimulation in each sliding window. Middle and bottom, the mean ± s.e.m. normalized frequency of action transitions. j, Quantification of i. Data are mean ± s.e.m. Significant differences across time and retrospective/forward reinforcement directions are indicated (mixed-effects modelling; ChR2–YFP session 1: F6,168 = 114.8, P = 8.7 × 10 −57; ChR2–YFP session 3: F6,168 = 46.62, P = 2.5 × 10 −33; YFP session 1: F6,108 = 10.52, P = 3.6 × 10 −9; YFP session 3: F6,168 = 0.8992, P = 0.49). Post hoc two-sided Šidák multiple-comparison test was applied. ****P < 0.0001, ***P < 0.001, **P < 0.01, *P < 0.05; NS, not significant. Statistical and sample details are provided in the Supplementary Information.

that occurred leading up to, during and after DA stimulation (Methods). We identified baseline-occurring action transitions enriched within specific sliding windows centred around the target action and tracked their average frequencies per window over the course of closed-loop reinforcement. Action transitions that were enriched in windows up to

1.2 s before stimulation onset, as well as during stimulation, were reinforced early on (Fig. 3i). However, action transitions that were enriched in windows after stimulation were not reinforced, suggesting an asymmetric process. Indeed, action transitions enriched in windows leading into stimulation were also preferentially reinforced over those enriched

Nature | Vol 626 | 15 February 2024 | 587 https://shop126307946.taobao.com/?spm=2013.1.1000126.3.c9895b91Isk2Oc

Article 20

5

0 14 30

6

10

*

5

*

0

Ba

lin e

fie ld

2

se

30

n

14

Session

Ba

pe

6

Session

pe

ld fie n

2

O

pe

n Ba field se lin e

8 16 2 4 Open-field median interval (s)

6 14 30

20 15

lin e Se s La sio n st se 1 ss io n Ba se lin Se e s La sio n st 1 se ss io n

Criterion frequency

40

se

10

YFP

**** ****

O

O

pe

2

Frequency (triggers per min)

15

1

2 6 14 30

ChR2–YFP criterion

YFP

ChR2–YFP

O

YFP

5

n Ba field se lin e

Baseline-subtracted frequency (triggers per min)

ChR2–YFP

40 15 10

e

40

lin e

Laser

Δt(T1→T2)

d

T1→T2 15

se

...→T1→...→T2→... b

c

T2: target 2

Ba

T1: target 1

Trigger

Baseline-subtracted frequency (triggers per min)

Reinforcement rule

ChR2–YFP mice

a

Sessions

h

g

0.5

T1-related actions

n

1 n io

Se s La sio n 1 w st s ith es tri sio gg n er

n

lin e se

1

io

ss

Ba

se

La

lin e se

Ba

Midpoint

10

Slow learners

Half-max

Fast learners

1 0

2

4

6

8

10

Open field median interval (s)

La

st

se

ss

io

e lin se

ss Se

La

Ba

n

1

io

n io

ss

se

Ba

st

se

ss se st La

ss

lin

e

n io

1 n io ss Se

se

lin

e

0

Ba

fie l pe

2

1

T1→T2 100

T2-related actions

30

Session

l

**** ****

14

6

ss

lin e

*

2

Ba

se

**** ****

d

30

O

O

pe

14

n

e lin

d fie l

0.25

6

Session

Sessions to criterion frequency

1

2

Refinement

2

NS

3

Se

T2:T1 ratio

4

8 Median target-normalized frequency (refinement index)

k 8

0

n

on

Last 5 min of condition

j

5

iti

io in ct

Ex t

qu is

n

M

n

1 0.5

Re

ai

nt e

10 20 30 40 50 60 70 80 90

Criterion interval

**** **

10

io

2

0

0 Time (min)

4

15

Se

5

8

st

T1→T2 median Interval (s)

15 10

T1→T2 median interval (s)

25

20

ac

0.5

YFP NS NS

16

se

Frequency (triggers per min)

1.0

ChR2–YFP criterion

YFP

ChR2–YFP

**** ****

na nc e

Normalized frequency

40

i

32

NS

1.5

Ba

f

Fig. 4 | The relationship between pre-reinforcement inter-action intervals and learning of a two-action sequence. a, Experiment schematic. b, ChR2dependent increase in T1→T2 triggers (no laser during open field/baseline). c, Open-field inter-action intervals of the T1/T2 pairs chosen. The same colour codes are used for ChR2-YFP mice in d–e and g–l. d, Individual learning curves labelled according to the colours in c. For c, d and h, a log 2-scale x axis was used. e, Frequency changes over conditions. Statistical analysis was performed using repeated-measures one-way ANOVA (F 1.911,24.85 = 51.02, P = 2.2 × 10 −7). f,g, Extinction of T1→T2 sequence (ChR2–YFP). f, Data are mean (black line) ± s.e.m. (orange shading), and individuals (grey lines). g, Frequency changes over extinction conditions. Statistical analysis was performed using repeatedmeasures one-way ANOVA (F1.073,12.87 = 52.96, P = 9.8 × 10 −6). h,i, ChR2-dependent decrease in T1→T2 intervals in ChR2–YFP (h) and YFP (i) mice. Statistical analysis was performed using repeated-measures one-way ANOVA (F1.377,17.90 = 35.95,

P = 1.5 × 10 −5) (i). The log 2-scale y axis was used to help to visualize interval changes in animals starting with lower initial values. j, T2:T1 frequency ratios (ChR2–YFP). k, Target refinement shown by median target-action-normalized frequencies of related actions. Statistical analysis was performed using repeated-measures one-way ANOVA (T1: F1.237,16.08 = 43.38; T2: F1.171,15.22 = 48.74; both: P = 4.4 × 10 −6). l, A sigmoidal relationship between the open-field T1→T2 interval and the number of sessions to the criterion frequency. The log10 -scale y axis was used to capture the relationship between large values and smaller values in the same visualization. For b–l, n = 15 (b, d and h), 14 (e and i–l) or 13 (f and g) ChR2–YFP mice and 6 YFP mice (biological replicates). For e, g, i and k, statistical analysis was performed using repeated-measures one-way ANOVA with post hoc Šidák test. For b and d–l, plots of individual mice are shown. ****P < 0.0001, **P < 0.01, *P < 0.05; NS, not significant.

in windows after stimulation (Fig. 3j). Thus, DA stimulation promotes the reinforcement of behaviours occurring during stimulation and a few seconds before stimulation.

between the two target actions, as well as interleaving actions were allowed. We applied closed-loop optogenetics to examine whether naive animals can learn a T1→T2 reinforcement rule, whereby the delays between T1 and T2 are governed by the spontaneous behaviour of the animals and are not experimentally controlled (n = 15 (ChR2–YFP) and n = 10 (YFP) mice; Fig. 4a and Extended Data Figs. 2b, 3a,d,e and 10–14). Various T1–T2 pairs were sampled, with a focus on sequences sharing general commonalities in movement order across animals (Methods and Extended Data Fig. 1d,f,g). Overall, the mice learned to increasingly perform these sequences to obtain DA stimulation. Some animals showed a ChR2-dependent increase in reinforcement within 5 sessions, but others experienced a lag in learning (Fig. 4b). We hypothesized that this relates to the initial time distance between T2 trigger and

Credit assignment for action sequences In the real world, when animals are spontaneously shifting between actions in their repertoire, outcomes are often not the result of a single action but, rather, are the result of a sequence of actions performed at variable time intervals, and with other actions interleaved. We therefore investigated the dynamics of reinforcement when the release of DA was contingent on performance of a sequence of two target actions, target actions 1 (T1) and 2 (T2), whereby variations in the time interval 588 | Nature | Vol 626 | 15 February 2024

https://shop126307946.taobao.com/?spm= 2013.1.1000126.3.c9895b91Isk2Oc

the closest distal T1 (T1→T2 interval). Indeed, animals reinforced for action pairs with initially long interval values tended to learn slower (Fig. 4c,d). To capture a timepoint at which individuals reach similar rising phase in their respective learning curves, a criterion frequency was set (Methods). In total, 14 out of 15 trained mice eventually reached the criterion (Fig. 4e and Extended Data Fig. 10). Sequence performance depended on continuing DA pairings (Fig. 4f,g). Learning was also revealed by decreases in the median T1→T2 time intervals (Fig. 4h,i) and convergence of the T1-to-T2 frequency ratio towards 1 (Fig. 4j). To quantify the specific credit assignment of T1 and T2, we used a refinement index that compares the median frequency of actions that are uniquely similar to T1 with those that are uniquely similar to T2, with the frequencies normalized to either that of T1 or T2 (Methods). This index is based on the observation that actions that are most similar to the target action decrease in relative performance over time (Fig. 1k (inset)). Values of less than 1 indicate greater refinement. By the end of learning, T1 and T2 became credited as the reward-producing actions relative to their similar counterparts (Fig. 4k). YFP controls did not show learning trends (Fig. 4d,e,h,i). Thus, closed-loop reinforcement promoted the learning of a two-action sequence rule in freely moving mice starting from a naive state. Importantly, the initial median T1→T2 interval of action pairs was inversely related to the eventual number of sessions required for each ChR2–YFP animal to reach criterion frequency (Fig. 4l). A sigmoidal curve was fit to the data, indicating that initial intervals that were longer than the sigmoidal midpoint were associated with slower learning (Fig. 4l). ChR2–YFP animals were divided according to the half-maximum point of the sigmoidal curve into 'fast learners’ and ‘slow learners’. Fast learners quickly reached the criterion frequency and low T1→T2 time intervals, whereas slow learners were delayed in reaching the criterion frequency and low T1→T2 intervals. Slow learners tended to suddenly increase sequence frequency in sessions that showed a decrease in the median T1→T2 interval to below 2–4 s (Fig. 4d,h). By contrast, there was no stable sigmoidal relationship between T1–T2 action similarities and the number of sessions to the criterion frequency (Extended Data Fig. 11b). Furthermore, there was no relationship between the baseline frequency or initial inter-trigger intervals and the number of sessions to the criterion frequency (Extended Data Fig. 11c,d). Importantly, the observed patterns held when we analysed learning by matching the number of reinforcements (Extended Data Fig. 11e–g), indicating that they were not caused by fast learners having more stimulations/reinforcers. Lastly, we evaluated whether differential conditionability40–43 accounts for sequence learning differences (Extended Data Fig. 12). Target actions that showed less conditionability in single-action reinforcement did not differ in initial baseline frequencies but tended to have more action types transitioning into and out of them at the baseline (Extended Data Fig. 12d–f). Thus, differential conditionality among target actions relates to greater variation in the behavioural environment surrounding the target action. However, the same parameters do not account for variation in the learning rate across animals in the action sequence reinforcement experiment (Extended Data Fig. 12g–l). These results support the idea that the initial median time distances between distal action T1 and proximal action T2 (which produced DA stimulation) modulated how fast animals learned to effectively perform the reinforced action sequence. If DA retrospectively reinforces actions performed earlier in time, the action most proximal to reinforcement, T2, should experience earlier refinement relative to the distal action, T1. We again used the median target-normalized frequencies of actions uniquely related to T1 or T2 as refinement indices (Methods). T2 clearly refines towards its most refined level earlier than T1, at least in some of the animals (Fig. 5a). We calculated differential refinement between the two actions by subtracting the area under the T1 refinement curve from that of T2. Positive values indicate differential refinement favouring T2, and

vice versa. The open-field median T1→T2 interval was linearly related to the differential refinement between T1 and T2 (Fig. 5b). This trend holds even when accounting for within-session refinement (Methods and Extended Data Fig. 13a). Thus, for longer T1→T2 median intervals, T2 spends more sessions being relatively more refined than T1, and this pattern cannot be explained by other potential covariates: (1) initial intervals between the proximal action and the next initiation of sequence (T2→T1); and (2) similarity between T1 and T2 (Fig. 5b (right) and Extended Data Fig. 13b). The increased differential refinement favouring the proximal T2 could reflect increased refinement of T2 or reflect reduced refinement of distal T1 without refinement for T2. To distinguish between these interpretations, we analysed changes in T1–T2 refinement curves relative to the ‘starting points’ at which the refinement indices of T1 and T2 are most similar or are biased towards T1 rather than T2 (Methods). Slow learners initially showed differential refinement favouring T2 from these starting points and, after reaching a maximum differential refinement favouring T2 (called the turning point), refinement begins to turn towards favouring T1 (Fig. 5c). By the turning points, the median intervals of T1→T2, but not T2→T1, events decreased significantly relative to the initial values (Fig. 5d and Extended Data Fig. 13c). Thus, the median T1→T2 interval decrease occurred before a decrease in the interval to perform the next sequence (T2→T1) (Fig. 5e). Using these learning landmarks, we investigated more rigorously how animals homed in on T1 versus T2 over time (Fig. 5f,g). Animals initially refined the action proximal to stimulation (T2), whereas T1 refinement occurred later, after the turning point (Fig. 5f,g). By contrast, fast learners show relatively little differential refinement over learning (Fig. 5c and Extended Data Fig. 14b,c). Finer temporal analyses revealed that T1 was increasingly likely within the seconds preceding T2 reinforcement events by the turning point (Fig. 5h,i and Methods), even though T1 refinement was not yet apparent (Fig. 5f). After the turning point, T1 refinement and increased sequence performance coincide with T1 becoming significantly more probable within seconds after T2 reinforcement events (Fig. 5h,i), indicating increased sequence reinitiation. These results demonstrate how animals can assign credit to sequences of temporally distant target actions that lead to reinforcement, following retrospective dynamics predicted by single-action credit assignment. Specifically, actions that are most proximal to reinforcement are refined early on and the actions that are more distal to reinforcement become refined later, when they probabilistically start to occur within a few seconds of DA release.

Discussion Our results show that DA promotes credit assignment to specific actions and action sequences from a naive state through a dynamic process whereby the entire behavioural repertoire is restructured and refined. During initial reinforcements, there is a rapid increase in the frequency of not only the target action, but also of actions in the repertoire that are similar to the target action. However, dissimilar actions decrease in frequency. This rapid restructuring of the entire behavioural repertoire based on similarity to the target action facilitates the credit-assignment process. There is also an increase in actions that occur within a precise time window of a few seconds before and during, but not after, VTA DA neuron stimulation. With repeated reinforcement, gradual refinement unfolds to home in on the action that produces DA release. In the case of action sequences, both target actions in the sequence gradually become credited relative to their most similar actions. However, there is an interaction between the dynamics of refinement of the different target actions in the sequence and the temporal proximity to DA release. When sequences naturally varying in temporal separations between the two targets were reinforced, sequences with a naturally short temporal distance between the two targets tend to refine together. However, credit assignment for sequences with naturally long

Nature | Vol 626 | 15 February 2024 | 589 https://shop126307946.taobao.com/?spm=2013.1.1000126.3.c9895b91Isk2Oc

b

Refinement

e

Odds ratio

6 4 2

0 +

→ tu

+

Fast learners First T1 before T2 trigger

First T1 after T2 trigger

300 200

i

0 Starting point

100 50 0

Turning point

100 50 0

Criterion frequency session

Total T1 probability rank Δ among actions (percentile)

T1 T2

1.5 1.0 0.5 0

*

**

300 200

j

Last session

Criterion ffreq. sess. 300 NS *** ** * 200

300 200

50

g point Turning NS NS

NS

1

Slow learners

*

NS

NS

0 –1 –2

T1 T2

–3

Short initial Δt(T1→T2)

...→ T1 →.→ T2 →......... Long initial Δt(T1→T2)

First T1 after T2 trigger

Time from T2 trigger (s) Time position of laser during reinforcement

NS

*** ** **

Refinement

...→ T1 →......→ T2 →......... Turning point

...→ T1 →..→ T2 →.........

–100 –200

...→ T1 →.→ T2 →.........

Time bins from T2 trigger (s)

Fig. 5 | The behavioural process underlying the learning of a two-action sequence. a, T1/T2 refinements in two ChR2–YFP individuals. b, The linear relationship between the initial T1→T2 interval and differential T1–T2 refinement (F test, non-zero slope significance: T1→T2, P = 0.0004; T2→T1, P = 0.7063). c, Progression of differential T1–T2 refinement from the starting point in individual learners. d, The T1→T2 interval significantly decreased by the turning point in slow learners. Repeated-measures two-way ANOVA was used to analyse the time-specific difference (slow learners, F2.184,26.20 = 54.21, P = 5.3 × 10 −10; fast learners, F1.700,20.40 = 92.12, P = 6.3 × 10 −9). Post hoc two-tailed Tukey’s multiple-comparison test was applied. e, The odds ratio of T1→T2/ T2→T1 interval changes in slow learners. Statistical analysis was performed using two-tailed paired Wilcoxon tests (P = 0.0312). f, Preferential refinement of T2 relative to T1 by the turning point in slow learners. Raw scaled refinement indices are shown. A repeated-measures mixed-effects model was used to analyse the significant main effects (time: F2.184,26.20 = 54.21, P  T2

*

0

+ +

1

f

+

Turning point Criterion frequency session Last session

–2

6

** * ****

8

Initial

70

Sessions from starting point (early session of closest T1, T2 refinement values)

T2 > T1

T1→T2

**

15

Refinement

11

*

10

Open field median interval (s)

Tu r po nin in g t

Median interval (s)

10

*** *

5

10 0

8

–1

ΔScaled refinement index from starting point

*

6

4

Scaled refinement index

Ba

d

2

0

R2 = 0.07 (NS)

a portin in g t Tu rn po in in g t fre Cr qu iter se e io ss nc n io y La n st se ss io n

Reinforcement sessions T1 T2

1

se

lin e

0

R2 = 0.66 (***)

–0.5

0

Refinement

0.5

0

1

Refinement

T1→T2 (5.25 s) >Midpoint

1.0

0.5

St

se Ba Scaled refinement index

1.5

2

1.0

fre Cr q it se ue erio ss nc n io y n La st se ss io n

7

1 Reinforcement sessions

Slow learners Turning points

ChR2–YFP mice

Tu r po nin in g t

ΔArea under T1–T2 refinement curves (per session)

0.5 0

c T2→T1

T1→T2

Starting-point-subtracted Δarea under T1–T2 refinement curves

T1→T2 (4.65 s)