Some Complaints About Lepore's If Then, Which I Otherwise Enjoyed

I had initially set out to write a full set of thoughts on Jill Lepore’s If Then, but it is a book which touches on so many topics that it is difficult to produce the sort of short-form capsule review I like to write. Ostensibly a book about the early data science firm Simulmatics Corporation, it is also a book about American political history from the 1950s to the 1970s, the end of older styles of politics based on dignified speeches and the rise of advertising, TV, and polling, about the moral compromise of mid-century American liberalism in the Vietnam war, about the Cambridge Analytica scandal, about contemporary anxieties surrounding political data science.

Some reviewers have complained about the extent to which the book is not solely about the Simulmatics corporation. They have a point. I had bought the book expecting a history of the firm, its methods, their relationship to today’s methods, what it actually pioneered, and what impacts it had. To be sure, the book includes some of this. Much of the additional history and context serves a clear purpose, which is to contextualise the firm in the culture and politics of its time. But if one stripped the 328 pages down to the core sections on Simulmatics, a rather larger chunk than is probably reasonable would be removed. More so still if much of Lepore’s needless exposition were also removed.

It suffers in some ways from a problem familiar to readers of philosophical biographies. We pick up these books because of our interest in their subject’s intellectual contributions. But these contributions are often dense and technical in nature, and need a talented author to present them in an understandable way. Some books therefore fail as secondary readings because they are too dense and poorly explained for a lay reader to understand. Others fail in the opposite direction: they focus too much on context, too much on biographical details we would not be interested in were we not interested in the subject’s body of work. If Then perhaps suffers from this second flaw.

This is not to say that much of the additional context is not interesting or relevant: only that too much time is spent on it. One interesting fact covered briefly is that in response to the speed at which TV can deliver news, newspaper reporting shifted from being descriptive to interpretative. She is enthusiastic about this change (“better than it had ever been before”), and sees this as being a turn away from deference to the government and towards a more adversarial and critical relationship. If it is possible to judge solely from If Then, then even without this explicit praise one might say that Lepore has consciously modelled her writing on this tradition.

I say so because Lepore has approached the history of the Simulmatics corporation with a determination to take a stance. It is, in my view, to the detriment of the book. The book’s great strength and interest is Lepore’s keen eye for an interesting historical topic neglected by others, but she combines it with a surprising incuriosity on many aspects of the story she is telling us. Her interest is primarily in making the story of Simulmatics a story of the present, regardless of how tenuous the connection sometimes is. She does not really tell us what the methods of Simulmatics were, or whether and how they might have understood the terms ‘simulation’ and ‘prediction’ differently to how we do today, or if their understanding was shared, how different their methods were. She is in fact a determined sceptic of quantitative modelling of political behaviour, and summarises her position in the epilogue thus:

But the study of the human condition is not the same as the study of the spread of viruses and the density of clouds and the movement of the stars. Human nature does not follow laws like the law of gravity, and to believe that it does is to take an oath to a new religion […] The profit-motivated collection and use of data about human behaviour, unregulated by any government body, has wreaked havoc on human societies, especially on the spheres in which Simulmatics engaged: politics, advertising, journalism, counterinsurgency, and race relations. Its rise marked the near abandonment of humanistic knowledge.

A person who asserts that human behaviour follows gravity-like laws is of course a fool. You would, however, be hard-pressed to find someone who believed it. Serious practitioners of quantitative political science – of the wider quantitative social and behaviour sciences – do not believe this. We make use of statistical models precisely because human behaviour is so variable. In causal modelling we speak of the average effect of some X on Y (such as education on vote choice), precisely because we know that’s all we can speak of. We know that our results may not generalise to other contexts, whether over space (e.g. to another country) or time (the same country several years before or after), and so we emphasise the need for comparative studies and for tests of generalisation.

In case it seems as though I am exaggerating Lepore’s position from a single quote, let me provide two more on this point. Here is another prior to the epilogue but still relatively later on in the book:

Simulmatics died. The fantasy of predicting human behaviour by way of machines did not.

Or here is another from earlier on in the book, wherein Lepore makes a surprising and largely unwarranted extrapolation from the founding statement of the company:

“The Company proposes to engage principally in estimating probable human behaviour by the use of computer technology.” The company proposes to predict the future.

The confusion I think lies in the word ‘predict’. In colloquial usage, predict is indeed often used to mean forecast. But in data science and in quantitative political science, to predict something is to estimate an unknown quantity. There is nothing necessarily temporal about this. Consider one type of contemporary predictive model widely used by polling companies: multilevel regression with poststratification, or MRP. MRP is a predictive model which takes in a nationally representative survey, and uses it to generate estimates of vote share in smaller geographies such as Westminster constituencies or US states1. MRP has become prominent in polling practice because it works well for predicting these small area outcomes. An important feature of what makes it work is that it does not need to make accurate predictions at the individual level. What it needs is predictions whose errors aggregate one another out2.

Now, Simulmatics predates both the credible causal revolution in the social sciences and statistical modelling and the widespread use of MRP by pollsters, so a reader of Lepore might challenge me that I should focus specifically on the beliefs and methods of the Simulmatics firm. I do not know in which way they might have understood the word ‘predict’, though this is in part because Lepore has not told me. I have two responses. First, Lepore herself invites these arguments by making blanket assertions about the ‘fantasy of predicting human behaviour’ or extrapolating ‘predict the future’ from ‘estimating probable human behaviour’. She invites it by drawing a direct line from Simulmatics to Cambridge Analytica. She invites it by seeing fit to condemn prediction of human behaviour as a fool’s errand in itself, rather than merely only critiquing the methods and ethics of a particular firm in a particular moment in history.

A second objection however is that having read If Then I don’t really know what the methods of Simulmatics were. I am possibly being too demanding here, in that I am a trained political methodologist and data scientist, and what is appropriate for me is not necessarily appropriate for a lay reader. But I do think that, at the very least, a book about a firm like Simulmatics ought to explain in a lot of detail what they were actually doing and what methods they were using, rather than focusing solely on the controversial things they were involved in at a high level and the relatively small ways in which they touched big moments in American history.

We know that an initial step was the collection of about one hundred thousand survey responses3, followed by the segmentation of survey respondents into 480 voter types based on simple combinations of demographics (e.g. Midwestern, rural, Protestant, lower income, female). We know that responses to 50 or so issue questions were tallied up, as were election returns from the years in which the surveys were conducted (1952, 1954, 1956, 1958). We don’t know, from If Then, how this data was actually analysed, or what the firm took ‘computer simulation’ to mean as compared to how we understand it today.

We do know that based on analysis of historical data, the firm’s first report to the Democratic [Party] Advisory Council argued black voters had been essential to the party’s coalition under FDR. They had subsequently been lost by the party because of its weaker stance on civil rights than the GOP’s. Simulmatics pointed out that just 8 states with high African American voter turnout accounted for 210 electoral college votes out of 537, and that any route to the presidency required these voters. Now, from this, I think I can infer that what the ‘use of computer technology’ actually entailed here is rather rudimentary descriptive statistics. I’m not against this: it’s a sensible way to analyse an electorate. But advanced prediction as we understand the notion today it is not.

Given their liberal commitments (while the recommendation was based on empirics, it reflected the politics of the Simulmatics staff at the time), it is disappointing then that towards the end of the book Lepore saw fit to write the following:

By 1965, a century after the end of the Civil War, a century of lynchings and bombings and beatings, civil rights had been won, voting rights had been guaranteed. But the closer African Americans had come to being able to vote, the more furiously political consultants had labored to divide and segment the electorate by ideology and race.

The implication is that including a race variable in a public opinion dataset – which enables us to understand if and how voters of different races vote as compared to each other, which enabled Simulmatics to make an empirical case for the Democratic party to stop ignoring black voters – is the same as segregation, the same as lynching, the same as systematic discrimination. The comparison is as offensive as it is unwarranted. It is doubly strange to choose that comparison when other parts of Simulmatics’ work really were deserving of being characterised as racist.

At the point in the book where she makes said comparison, we have already encountered much from the firm that is ethically dubious. The decision of the firm to perform research in Vietnam to support the war effort, and the research it performed there, especially the way in which it performed that research, is one such example. The firm conducted fieldwork (opinion surveys) which, from the start, was bedevilled by astounding methodological and ethical flaws. The men of Simulmatics arrived in military jeeps which probably made the Vietnamese they surveyed suspect them of being CIA, and hence likely led to false responses. Incredibly (or perhaps not), they failed to consult Vietnamese interpreters on the contents of their surveys, and interpreted ambiguous responses such as “we wish for peace” in the way that best suited them. They treated their interpreters awfully. They, in other words, committed almost every methodological and ethical sin in the rulebook of fieldwork and surveys.

While the results of these surveys were compiled into computers, I am not sure what analysis if any was really performed on them. Much of the time the consensus seems to have been that they were useless, and it’s surprising that Simulmatics was funded for the work for so long. Lepore rightly criticises the role of Simulmatics in the war, but an interesting discussion absent from the book is the question of how a scientist ought to consider the question of providing support to an unjust government in a democracy. There is a rich literature on this question, and indeed more broadly on the question of whether or not scientists should be neutral. Du Bois for instance argued that science ought to be value-neutral, so as to enable trust in science, and so that those best placed to make use of scientific information could do so4. While Ithiel del Sola Pool (one of Simulmatics’ leaders and perhaps its most prominent political scientist) was a supporter of the war and later on one among many former Trotskyists turned Neocon, many of the more junior staff quickly turned against it.

In a democratic society, how should a scientist navigate their obligations to democracy as Du Bois characterises them in a situation such as this? I don’t think Lepore should have an answer, or change her view. I do think the question ought to have been raised and left open, so that the reader is encouraged to confront it for themselves. One might add that, as Lepore tells us, the RAND corporation also performed fieldwork of a similar nature in Vietnam – albeit more competently – and concluded that the Vietcong had been severely underestimated, and that deep commitment of US forces in a land war would be a ‘catastrophic error’. They were not listened to, but a great deal of loss of life might have been avoided had they been. Perhaps the best chance at avoiding catastrophe would have been missed had the RAND scientists not performed this research. The question of how scientists should or should not engage with controversial government policies is a difficult and nuanced one, and deserves to be treated as such.

Other work taken on by Simulmatics was similarly ethically concerning, such as work on riot prediction, partly drawing on its methodologically flawed and rather incompetent work in Vietnam. This work was similarly bumbling and incompetent. One team sent to Rochester made a firm prediction of a riot, without making use of computer technology or ‘simulation’. Police were sent to the area, and no riot occurred, so the team chalked up their prediction as a success, because due to the intervention no riot had occurred. The episode is an amusing one, and relevant to the history of the firm, but rather less relevant to today’s controversies around the role of predictive modelling in police and criminal settings. It is also not clear how the episode is relevant to Lepore’s discussion on the possibility of predictive modelling of human behaviour, or indeed the idea that Simulmatics was a precursor to Cambridge Analytica.

As E.H. Carr advises us in What Is History?, a historian must choose what facts are relevant to their history. Their choices of course are determined in part by their values and preferences5. The same is true of a journalist doing history. And as one might expect of a journalist in the interpretative tradition, Lepore is keen to provide many examples of contemporary moral revulsion to the work of Simulmatics and indeed to computing more widely. For instance, one political scientist and author who was roughly adjacent to the firm saw dire consequences emerging from its work:

“This may or may not result in evil,” Burdick would warn. “Certainly it will result in the end of politics as Americans have known it.”

Newton Minow, an advisor to repeat Democratic presidential candidate Adlai Stevenson II, wrote a request for advice on an early Simulmatics proposal as follows:

“Without prejudicing your judgement, my own opinion is that such a thing (a) cannot work, (b) is immoral, (c) should be declared illegal. Please advise.”

Lepore presents such quotes without comment. She is of course not beyond making judgements, and makes many in the book. This includes a judgement on Pool’s defence of his decision to do government work as angry and overheated. Yet she does not seem to consider two obvious critiques of responses such as these. Bearing in mind that they are discussing the proposal of modelling and ‘predicting’ voter choice, as these comments predate the later ethically questionable decisions of the Simulmatics scientists, these reactions seem from the perspective of today to be largely the kind of moral panic which arises in response to technological and social change.

Another example of this kind of intense reaction was to the introduction of standardised time. This was a highly disruptive technology and a poor fit for many people’s lives when first introduced, but was essential for enabling things like predictable rail travel and a modern economy. There is no doubt that the negative reaction of many who experienced the transition was genuine, but with hindsight, we recognise the value of the clock. One particularly extreme reaction against its introduction was that of an Anarchist who attempted in 1894 to blow up the Royal Observatory in Greenwich which was the centre of globally standardised time6. Quantification and computerisation are no stranger to this process.

My point is not therefore that these negative reactions did not come from genuine concerns. Rather, it is that to quote such contemporary negative reactions without further consideration of how justified they were in hindsight is not neutral history. It is a deliberate act of selection, provided without qualification or discussion because it aligns with the view of the author. A central tension in Lepore and Minow’s view is the simultaneous notion on the one hand that prediction of human behaviour does not work, and on the other hand that it is evil or immoral. It is difficult to do evil with a tool that doesn’t work. A view that is both more interesting and closer to the truth is that predictive modelling works well enough for certain purposes conditional on the data and models used, but abstracts away from a great amount of local detail7; and encodes and replicates the biases of the data used to produce the model8.

The history of Simulmatics is fascinating, and I am glad I read If Then, despite the negative tone of my review, focused as it is on some frustrations with a book I wanted to love. But I regret that Lepore, who in some respects has a great deal of curiosity and fair mindedness (see for instance some of the sections on Pool, whom one gets the feeling Lepore felt a degree of admiration for), fails to exhibit the same qualities when attempting to drive at her chosen goal of making Simulmatics relevant to the present day. The core of her thesis is that the firm is a precursor to Cambridge Analytica, but it strikes me at most as a rather incompetent precursor for modern-day polling firms like YouGov in its domestic work, or in certain kinds of data science firm or global intelligence firms in other parts of its work. Perhaps all that can be said is it covered a wide range of work precisely because the market for quantitative social and behavioural science was not yet well-defined.

Cambridge Analytica, and social media more generally, are different to polling in the vital respect of both how they obtain their data and how much of it they obtain. Polling firms gain the consent of those they survey, and typically pay them. Respondents can choose to refuse to answer: it is a choice whether they provide any information. Social media firms technically gain consent, through their terms and conditions, but it is not clear how well understood these terms and conditions are, or, in a society where social media is so important, how much choice there necessarily is in the matter.

It remains widely debated whether they should be allowed to collect as much data as they do. Cambridge Analytica of course not only used this data but broke the terms of its agreement with Facebook by doing so for the purpose of predictive modelling. The ethical issues raised by the modern phenomena of social media, while still about data, go much further than those raised by the place of data analysis in political campaigning. There is nothing wrong with using empirical information in campaigning. There is a lot wrong with using data for purposes not consented to for profit.

Readers interested in the Simulmatics firm and the history of quantitative modelling of political behaviour should read If Then despite its flaws. One reason is that no one else has bothered to write a history of the firm. It is a credit to Lepore that she has made its history available to the rest of us, and a testament to her ability to identify an interesting historical episode for contemporary readers. Of course, one reason no one had bothered in the past might be that, as she acknowledges throughout the book, it is not clear at all that Simulmatics had any impact whatsoever through its work. But the firm is interesting for many reasons, one of which is that it really is an ancestor of the quantitative political science of today.

For all of these reasons the book is genuinely interesting, and an enjoyable read. But readers who follow my advice should do so cognisant of the unfortunate fact that it is a book written with a clear agenda from the very start, and an agenda which takes the book’s analysis and interpretation away from some of the reasons why they might be interested in the story of Simulmatics. It and its founders moral failures are important to its story and should not be hidden. But so are its methods and technology, and to avoid providing real detail on these, or to draw tenuous connections with contemporary concerns around data harvesting and microtargeting, is to fail in telling a vitally important part of the same story.


Footnotes

  1. Park, D.K., Gelman, A. and Bafumi, J. (2004) Bayesian Multilevel Estimation with Poststratification: State-Level Estimates from National Polls, Political Analysis 12 (4). DOI: https://doi.org/10.1093/pan/mph024

    Hanretty, C. (2020) An Introduction to Multilevel Regression and Post-Stratification for Estimating Constituency Opinion, Political Studies Review 18 (4). DOI: https://doi.org/10.1177/1478929919864773

    Lauderdale, B.E., Bailey, D., Blumenau, J. and Rivers, D. (2020) Model-based pre-election polling for national and sub-national outcomes in the US and UK, International Journal of Forecasting 36 (2). DOI: https://doi.org/10.1016/j.ijforecast.2019.05.012 

  2. Kuh, S., Kennedy, L., Chen, Q. and Gelman, A. (2023) Using leave-one-out cross validation (LOO) in a multilevel regression and poststratification (MRP) workflow: A cautionary tale, Statistics in Medicine 43 (5). DOI: https://doi.org/10.1002/sim.9964

    Kennedy, L., Vehtari, A. and Gelman, A. (2023) Scoring multilevel regression and poststratification based population and subpopulation estimates, arXiv preprint. DOI: https://doi.org/10.48550/arXiv.2312.06334 

  3. Lepore writes ‘Pool and McPhee began by collecting punch cards from one hundred thousand surveys […]’. I took this to mean individual survey responses, rather than one hundred thousand surveys, as on the scale of surveys the number is not plausible. It’s a small confusion in language if I’m right, but potentially a revealing one. 

  4. Bright, L.K. (2018) Du Bois’ democratic defence of the value free ideal, Synthese 195 (5). DOI: https://doi.org/10.1007/s11229-017-1333-z 

  5. I am more sanguine than Carr on the possibility of a sort of idealised, rigorous history, in that I think a historian’s values could be rooted in honest enquiry and in having criteria other than their pre-existing beliefs for choosing what is relevant for selection. But I do not think this is true of Lepore, whatever other merits her writing has. 

  6. Zadeh, J. (2021) The Tyranny Of Time, Noema. URL: https://www.noemamag.com/the-tyranny-of-time/ [accessed 18/03/2026] 

  7. Farrell, H. and Fourcade, M. (2023) The Moral Economy of High-Tech Modernism, Daedalus 152 (1). DOI: https://doi.org/10.1162/daed_a_01982.

    Farrell, H., Gopnik, A., Shalizi, C. and Evans, J. (2023) Large AI models are cultural and social technologies, Science 387 (6739). DOI: https://doi.org/10.1126/science.adt9819 

  8. For instance, on more recent types of predictive model:

    Bender E.M., Gebru T., McMillan-Major, A. and Shmitchell, S. (2021) On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?, FAccT ‘21: Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency. DOI: https://doi.org/10.1145/3442188.3445922

    Weidinger, L. et al. (2022) Taxonomy of Risks posed by Language Models, FAccT ‘22: Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. DOI: https://doi.org/10.1145/3531146.3533088

    Birhane, A., Prabhu, V., Han, S. and Boddeti, V.N. (2023) On Hate Scaling Laws For Data-Swamps, arXiv preprint. DOI: https://doi.org/10.48550/arXiv.2306.13141