^{News
July 29, 2020

5

Global Technology, Technocracy}

What Happened When 1950s China Dreamed of “Total Information”?

_{When China rejected random sampling in favor of exhaustive enumeration of individuals, masses of “data” flooded in, but what did it mean?} _{News
July 29, 2020

5

Global Technology, Technocracy}

Share: Facebook; Twitter/X; LinkedIn; Flipboard; Print; Email

A historian of modern China recounts the outcome of a momentous decision that China’s new Communist rulers made in the 1950s. They decided to abandon conventional methods of gathering statistics that use probability and adopted the method of exhaustive counting of everybody and everything. Why did their dream of total information became a nightmare?

Harvard historian Arunabh Ghosh (right), author of Making It Count: Statistics and Statecraft in the Early People’s Republic of China (2020), explains that in the 1950s, newly communist China faced a choice about how to survey the population accurately while making “a clean break with the past.” For philosophical reasons, debates about how to gather statistics came to the fore:

In a speech in 1951, Li Fuchun, one of a handful of technocratically minded leaders, summarily dismissed the utility of Nationalist-era statistics, branding them an Anglo-American bourgeois conceit, unsuitable for ‘managing and supervising the country’. New China needed a new kind of statistics, he declared.
Arunabh Ghosh, “Counting China” at Aeon (23 July 2020)

Briefly, China rejected the “globally dominant” view of statistics as a universal science in favor of the Soviet view that it was “social science”:

With their sights set and rightful purpose claimed, Chinese statisticians proceeded to interpret Marxism’s explicit teleology as grounds to reject the existence of chance and probability in the social world. In their eyes, there was nothing uncertain about mankind’s march towards socialism and, eventually, communism. What role, then, could probability or randomness play in the study of social affairs?
Arunabh Ghosh, “Counting China” at Aeon (23 July 2020)

Rejecting probability meant rejecting practices like large-scale random sampling where a representative sample of a large group is assumed, for practical purposes, to stand in for the group. The Party favoured exhaustive counting as the only way of generating “extensive, complete and objective knowledge”:

Out of this understanding emerged a strict hierarchy of methods. At the top was complete enumeration, realised through a vast system of comprehensive and periodic reports covering all sectors of the economy. Next came one-time censuses, which were used to collect data on an ad-hoc, as-needed basis. Finally, only in those circumstances when an exhaustive count wasn’t possible, did Chinese statisticians also use non-randomised (ethnographic) sample surveys.
Arunabh Ghosh, “Counting China” at Aeon (23 July 2020)

The results, Ghosh tells us, were impressively enormous figures, given China’s size. But they were a “nightmare” for determining what was really happening out there:

Every level of the statistical system contributed to the overproduction of data. In a system that valued the production of material goods above all else, the only way a white-collar service such as statistics could draw attention to itself was by claiming, as Feng did, that statistical tables were a material contribution to the economy, just like wheat and steel. With the production of tables so incentivised, the entire system responded with gusto to produce them. Soon, there were so many reports circulating that it was impossible to keep track of them. Internal memoranda bemoaned the chaos, but it was a pithy four-character phrase that truly captured the exasperation. Translated, it reads: ‘Useless at the moment of creation!’
Arunabh Ghosh, “Counting China” at Aeon (23 July 2020)

Data streams were often irreconcilable and measuring systems were sometimes incommensurate, resulting in “chronic delays” of needed figures. As masses of data traveled up the chains of bureaucracy, error margins grew. Analysis was not often attempted. But if it had been, where to begin, amid so much uncertainty?

By 1957, Chinese statisticians realized that it just wasn’t working. They reached across to India, which used large-scale random sampling because it was cheaper and more accurate. But then in 1958, Mao’s Great Leap Forward overtook the proposed statistics-gathering reform and snuffed it out. The Party introduced the idea that neither exhaustive surveys nor random sampling would do. Only “detailed, in-person investigation” could yield reliable data.

Ghosh tells us, “The shift left the statistical apparatus with no reliable means to check its own data.” Lack of reliable statistical information may, he thinks, have contributed to the deaths of 30 million people from famine between 1959 and the end of the Great Leap Forward in 1962. Over the decades since then, China has gradually restored probabilistic methods of acquiring statistical information.

He concludes by pointing out a contemporary parallel in the rise of Big Data, the belief that “the more information we quantify, the better shall our knowledge be, and the more appropriate our solutions.” Instead, he argues, “we need to recognise that each method—the randomised, the ethnographic and the exhaustive—offers unique insights. And although none is a panacea, together they constitute a far more supple toolkit, expanding both what we can know and how we can know.”

But that means coming to terms with inevitable uncertainty. And it may be that China’s rulers most dreaded coming to terms with inevitable uncertainty.

You may enjoy these items by Pomona College professor Gary Smith on what statistics can and can’t do for us:

Ransacking flawed data for hidden treasures seldom ends well: The Internet provides a firehose of data that financial market researchers can use to interpret human behavior—but cherry-picked patterns usually vanish.

We see the pattern! But is it real? It’s natural to imagine that a deep significance underlies coincidences. Unfortunately, patterns are not always a source of information. Often, they are a meaningless coincidence like the 7-11 babies this summer.

and

The paradox of luck and skill: Why did Shane Lowry win the British Open golf championship? Because someone had to.