‘One recent study, by Mohammad Atari and his colleagues, found that L.L.M.s tend to parrot values that many readers of this article would share, because their training data come from societies that are Western, Educated, Industrialized, Rich, and Democratic—weird. (The study has not yet been peer-reviewed.) Relative to the rest of the world, weirdos are “more individualistic, independent, and impersonally prosocial (e.g., trusting of strangers) while being less morally parochial, less respectful toward authorities, less conforming, and less loyal to their local groups,” the authors write. The less weird a society, the worse its misalignment with L.L.M.s.
L.L.M.s don’t even represent the values of the United States. The views of elderly and religious people, for example, are underrepresented, both in datasets and on the teams that create A.I. systems. The psychologist Geoffrey Miller, who is very worried about A.I., wrote on Twitter, “Funny how ‘AI alignment with human values’ often seems to boil down to ‘AI alignment with Lefty Bay Area transhumanist atheist values.’ ” And when a former graduate student at the University of British Columbia, Brent Stewart, asked Bing to fill out a psychological test known as the Moral Foundations Questionnaire, he found that it seemed to care less than most people about authority, purity, and loyalty. Delphi says that it’s morally O.K. for a woman to have an abortion; whether this is a successful alignment with human values depends on the human who is asking. Aligning an A.I. with one set of values could knock it out of alignment with everyone else’s values…
It’s possible to view human values as part of the problem, not the solution. Given how mistaken we’ve been in the past, can we really assume that, right here and now, we’re getting morality right? “Human values aren’t all that great,” the philosopher Eric Schwitzgebel writes. “We seem happy to destroy our environment for short-term gain. We are full of jingoism, prejudice, and angry pride. . . . Superintelligent AI with human-like values could constitute a pretty rotten bunch with immense power to destroy each other and the world for petty, vengeful, spiteful, or nihilistic ends.”’ (from the New Yorker)


Leave a comment