Mind Matters Natural and Artificial Intelligence News and Analysis
a-hand-is-typing-on-a-laptop-keyboard-with-colorful-lines-an-966730739-stockpack-adobestock
A hand is typing on a laptop keyboard with colorful lines and numbers. Concept of creativity and innovation, as the hand is using the keyboard to input data or code
Image Credit: panumas - Adobe Stock

Are AI Developers As Smart As They Are Made Out To Be?

My confidence in the intelligence of AI developers was shaken by two recent events
Share
Facebook
Twitter/X
LinkedIn
Flipboard
Print
Email

Many commentators seem to have forgotten Goodhart’s Law: When a measure becomes a target, it ceases to be a good measure.

Jeff Hammerbacher, an early Facebook employee, once lamented that

The best minds of my generation are thinking about how to make people click ads. That sucks.

Now, hundreds of thousands of smart, energetic people are working on large language models (LLMs) which, so far, have chalked up few successes and lots of collateral damage. The latter includes spreading scams and political disinformation, undermining education, contributing to the replication crisis that discredits scientists and scientific research, and addicting vulnerable people to risky AI buddies. Surely, the people working on LLMs could be doing other things that would benefit society; perhaps designing better transportation systems and less expensive housing. The opportunity costs are enormous.

At least that is what I thought until my confidence in their intelligence was shaken by two recent events.

The Rise of Tokenmaxxing

LLMs break text into small chunks called “tokens,” which are words, parts of words, punctuation marks, or blank spaces. Each token is given an ID number which is then used for pattern searching and text generation. Unlike everyday consumers who are charged by the month, LLM providers charge businesses and power users by the token, with output tokens costing two to four times more than input tokens.

Executives at Meta, Microsoft, Amazon, and some other companies came up with the half-baked idea that, instead of measuring employee productivity by the value of what they produce (which is, admittedly, hard to quantify), they would measure it by the number of tokens used. Someone who is using a lot of tokens must be working hard, right? These companies even posted token leaderboards, with the top users given whimsical titles like “Token Legend.”

It is astonishing that the company executives who thought up this nutty idea did not anticipate that it would backfire.

Back in 1968‒1985, the Bank of England was trying to stabilize the economy by maintaining a steady rate of growth of whichever measure of the money supply was most closely correlated with GDP. A financial advisor named Charles Goodhart complained that when the government targeted the money measure most closely correlated with the economy, it ceased to be closely correlated with the economy. His lament applies far beyond monetary policy and is generally expressed more generally as Goodhart’s Law: “When a measure becomes a target, it ceases to be a good measure.” For example, when the USSR’s State Planning Committee’s five-year plan told nail factories how many nails to produce, the factories lowered their costs by making nails that were tiny — and useless.

When these tech company executives made token usage a measure of productivity, Goodhart’s Law kicked in and token usage ceased to be a good measure. Software developers are pretty smart and they soon found clever ploys that came to be called tokenmaxxing; for example, they would paste enormous documents into prompts and repeatedly request huge batches of computer code for imaginary projects.

One tokenmaxxing Meta employee used 281 billion tokens in a single month, which is 5,600 times what an average developer uses. Even if the employee was working 9-hour days, 7 days a week, that amounts to more than a billion tokens an hour. Meta employees overall burned through 60.2 trillion tokens in 30 days, which cost Meta an estimated $100 million to $150 million, most of which was wasted on tokenmaxxing. Microsoft got off lucky. Axios reported that an unnamed company had used Anthropic’s Claude platform with no employee usage limits, and racked up a one-month bill of $500 million.

Ignoring Goodhart’s Law can be an expensive mistake. So can measuring productivity by inputs instead of outputs. Who said tech executives are smart?

The ELIZA Effect

In the 1960s, MIT computer science professor Joseph Weizenbaum (1923–2008) wrote a computer program he named ELIZA that interacted with users the way a psychotherapist might; for example, asking followup questions based on the user’s remarks: “You don’t like your brother? Why is that?”

Even though the people trying out ELIZA knew it was a program, many were soon sharing intimate thoughts because they felt that the computer had genuine intelligence and empathy. This is now called the ELIZA effect.

LLMs are far more advanced than Weizenbaum’s ELIZA and many users are seduced into thinking that they have human-like intelligence, understanding, and compassion. Tech bros should be immune to that seduction. They aren’t. When University of Toronto computer scientist Geoffrey Hinton, recipient of a Nobel Prize in physics and a Turing Award (“the Nobel Prize of computing”), was recently asked “Do you think consciousness has perhaps already arrived inside AI?,” Hinton replied, “Yes, I do.”

Marc Andreessen is another example. He co-created Netscape Navigator, the dominant web browser until Microsoft famously cut off its air supply by bundling its browser, Internet Explorer, for free with its Windows software. Andreessen had nonetheless become a multimillionaire and appeared on the cover of Time magazine, sitting barefoot on a throne. He was literally the poster child for the tech takeover of our lives.

In 2009, Andreessen and his business partner Ben Horowitz founded the venture capital giant Andreessen Horowitz, which has made him even richer and more influential in Silicon Valley and beyond. He has been a vocal cheerleader of LLMs, describing them as “pure, absolute, indescribable magic.”

Confirming the cynical claim that there is only a loose connection between wealth and intelligence, Andreessen recently offered advice on how to combat the tendency of LLMs to make things up (“hallucinations”). He suggested beginning a prompt with these instructions:

You are a world class expert in all domains. Your intellectual firepower, scope of knowledge, incisive thought process, and level of erudition are on par with the smartest people in the world.

Later in his long prompt, he gave this instruction: “Never hallucinate or make anything up.”

Andreessen was deservedly ridiculed for not understanding how LLMs work. You can’t make an LLM intelligent by telling it that it is intelligent. You can’t stop an LLM from making stuff up by telling it to stop making stuff up.

Alberto Burneko, a prominent blogger and co-founder of Defector, suggested that Andreessen’s conversations with LLMs had sucked him into a rabbit hole with the delusional belief that LLMs are human: “I would argue that’s at least something akin to AI psychosis—the phenomenon of a person losing their grip on reality due to chatbot interactions.”

Perhaps I overstated the opportunity cost of having so many people working on AI.


Gary N. Smith

Senior Fellow, Walter Bradley Center for Natural and Artificial Intelligence
Gary N. Smith is the emeritus Fletcher Jones Professor of Economics at Pomona College. His research on stock market anomalies, statistical fallacies, the misuse of data, and the limitations of AI has been widely cited. He is the author of more than 100 research papers and 20 books, most recently, Standard Deviations: The truth about flawed statistics, AI and big data, Duckworth, 2024.
Enjoying our content?
Support the Walter Bradley Center for Natural and Artificial Intelligence and ensure that we can continue to produce high-quality and informative content on the benefits as well as the challenges raised by artificial intelligence (AI) in light of the enduring truth of human exceptionalism.

Are AI Developers As Smart As They Are Made Out To Be?