Wittgenstein & Google Translate: killing the excitement

von Peter Winslow, veröffentlicht am 18.02.2019

Rechtsgebiete: Weitere ThemenInternationalesJuristische Übersetzungen1|6437 Aufrufe

As a professional translator who has studied Wittgenstein (and even translated his Family Letters) and is – unfortunately – familiar with translation results rendered by neural networks such as Google Translate, I was initially excited to learn of Olivia Goldhill’s article entitled “Google Translate is a manifestation of Wittgenstein’s theory of language.” It is rare that my interests in Wittgenstein and translation converge in a single online post. And Ms. Goldhill doesn’t disappoint. Her piece contains some big claims right at the outset: Google translate provides “a practical example of Wittgenstein’s hypotheses” and is a “very literal representation of Wittgenstein’s work.” It turns out that Ms. Goldhill means a practical example and a very literal representation of the later Wittgenstein – a distinction of some importance in understanding Wittgenstein and his contributions to philosophy in general and to the philosophy of language in particular. But hairsplitting aside, her article leads to other trouble – i.e., as Donald Davidson once put it, that “it is hard to improve intelligibility while retaining the excitement.”*

The trouble starts long before Ms. Goldhill’s click-bait hyperbole that “George Boole and Gottlob Frege first created computer code;” rather, it starts with Ms. Goldhill’s characterization of Wittgenstein’s (later) work as containing “hypotheses.” This characterization sets off alarm bells for anyone familiar with the early or the later Wittgenstein. He famously read aloud poetry to the Vienna Circle, a group of logical empiricists who believed that only propositions verifiable by the empirical method are meaningful, and, throughout his life, he was adamant in his belief that he was not doing science, but philosophy; as his biographer, Ray Monk, has put it, for Wittgenstein:

Philosophy cannot be transformed into a science, because it has nothing to find out [= nothing to hypothesize about]. Its puzzles are the consequence of a misuse, a misunderstanding, of grammar, and require, not solution, but dissolution. And the method of dissolving these problems consists not in constructing new theories, but in assembling reminders of things we already know (298)

It is unclear how much stock Ms. Goldhill puts in her suggestion that Wittgenstein’s work contains hypotheses confirmable by means of the empirical method. Maybe she doesn’t place much stock in it at all. Maybe she isn’t interested in whether her characterization is consistent with Wittgenstein’s understanding of his own work; ignoring an author’s intentions for interpretive purposes would hardly be a novelty. Maybe she just used an unhappy choice of word to convey the view that Google Translate constitutes a real-world manifestation of Wittgenstein’s ideas. But maybe she believes that Google Translate and not humans ought to make “sense of words in their context.” After all, very little of the later Wittgenstein is recognizable in her article. Let me try to explain.

Ms. Goldhill would like her readers to believe that (1) Wittgenstein’s imputed belief that “meanings lie in their use” (apparently, a paraphrase of § 43 of the Philosophical Investigations) and (2) Wittgenstein’s idea that the meanings of words of language constitute “a complicated network of similarities overlapping and criss-crossing” (§ 66) have been put to the test and confirmed by Google Translate. Ms. Goldhill writes:

For the translations to work, programmers have to then create a “neural network,” a form of machine learning, that’s trained to understand how these words relate to each other. Most words have several meanings (“trunk,” for example, can refer to part of an elephant, tree, luggage, or car […]), and so Google Translate has to understand the context. The neural network will read millions of texts, focusing on the two words preceding and following on from any one word, so as to be able to predict a word based on the words surrounding it.

A fair and charitable reading of this quote is something along the lines of the following: language is a complicated network, the meaning of whose individual words not only consists of the interrelations between them, but also can be determined by their statistical probability of occurrence in any given context within that network. Ms. Goldhill even goes on to cite one Mr. Hebron, who more or less says as much:

There’s a very literal connection between these two ideas because the ways we’re coming up with the representations of words within word2vec is that we are basically finding a place for them in space by looking at their surrounding words and pinpointing them as defined by the sum of all of their in-context uses.

Yet, there is no literal connection between Google Translate and these two ideas or hypotheses of Wittgenstein’s … or whatever one would like to call them. What Ms. Goldhill reports and what Mr. Hebron says is closer to Frege than it is to the later Wittgenstein. As great as the differences between the two are, several philosophers on both sides of the Atlantic have formulated them quite succinctly: for Frege, words have meaning insofar as they are part of a sentence; for the later Wittgenstein, words have meaning insofar as they are part of a language.

As for Google Translate, it seems to have taken Wittgenstein’s ideas and repurposed them. It seems inspired by one sentence and one partial sentence, both of which are quoted by Ms. Goldhill either incompletely – in the case of “meanings lie in their use” – or out of context – in the case of “a complicated network …” (see below). Both are presented void of anything resembling Wittgenstein interpretation.

Google Translate is imputed to have taken Wittgenstein’s slogan-like sentence “the meaning of a word is its use in the language” (§ 43 of the Philosophical Investigations) and reinterpreted it to mean that the meaning of a word is its use in strings of at least five words. We are told that Google Translate’s neural network focuses “on the two words preceding and following on from any one word.” Whatever else Wittgenstein meant by language, it strains credulity to suggest that he meant strings of at least five words.

One might object that Wittgenstein himself claims that primitive languages (primitive being Wittgenstein’s characterization) comprised of only a handful of words can be complete. The builders game introduced at § 2 of the Philosophical Investigations is a case in point. And this is right as far as it goes. But there looms a controversy around the completeness of the builders game – involving the Wittgenstein scholars Rhees, Malcolm, Cavell, and Stern, for instance. Roughly, they think either that Wittgenstein’s builders game cannot be complete or that it can be with certain restrictions or that this absence of clarity belongs to Wittgenstein’s mode of presentation. My personal view is that Wittgenstein’s builders game is used as an object of comparison (in this case: to be compared with the view of language Wittgenstein presents at § 1 of his Philosophical Investigations), and his use of complete has its roots in Adolf Loos’s work. Wittgenstein’s builders game is complete in the same sense that Loos’s objects of use are complete – that is, in the sense not only that nothing essential to them is absent, but also the converse notion that nothing inessential to them is present. There is almost a one-to-one convergence between the builders’ game and the view of language presented by Wittgenstein at § 1.

Wittgenstein was surely far too sensitive a thinker and too much of a Krausian to believe that the slogan-like sentence “the meaning of a word …” could serve as anything other than a useful generality. He even appends to it caveats and misgivings: “[f]or a large class of cases,” “though not for all,” “‘meaning’ […] can be explained” [= it need not be so explained] etc. (cf., § 43 of the Philosophical Investigations). His formulation comes at a juncture where it can be read only as a kind of interim, even tentative, conclusion to a long and complicated philosophical argument.

Wittgenstein was also far too sensitive a thinker to believe that his idea of words constituting “a complicated network of similarities …” is straightforward or literal. Wittgenstein presents this idea in response to an objection raised by his imagined interlocutor at § 65. The interlocutor objects that Wittgenstein is forgoing any attempt at explaining the essence of language and is, by extension, letting himself off “the very part of the investigation that once gave [him] the most headache.” Wittgenstein agrees. And he attempts to give a response to his interlocutor by asking him or her to consider what we call games: the various kinds of activities, their various similarities, and their various differences. He ends these considerations by saying not that the meanings of words constitute a complicated network of similarities, but:

The upshot of these considerations is: we see a complicated network of similarities overlapping and criss-crossing: similarities in the large and in the small (§ 66).

The “complicated network” is clearly a metaphor intended to shake the idea that there has to be some essence of language. Wittgenstein’s words are part and parcel of an assembly of reminders – of things we already know about games and the concept of game. The metaphor is not, and it is not intended to be, part of a theory of language; instead, it is, and it is intended to be, part of a dissolution of a theory of language. There’s a world of difference between the two. Wittgenstein might say: they’re different language games.

To be clear: I don’t doubt that Wittgenstein’s ideas inform Google Translate’s view of words as constituting a complicated network. In fact, due to the imputed reinterpretation of Wittgenstein’s words, it is rather easy to see how they might inform that view. Still and all, being informed by someone’s ideas does not, of necessity, engender a manifestation of them. Words matter. Their context matters. Wittgenstein does not understand use to be determined by five-word strings. He clearly posits that a word’s meaning(s) has something to do with that word’s use in the language – not in a sentence or even sentential fragment. Wittgenstein’s “complicated network” is a metaphor intended not to construct a theory, but to dissolve one. What is more, it seems clear enough that, at this juncture in his argument, he is interested more in the idea of similarities and differences than in the idea that they constitute a network. And the reason for this seems clear enough: positing meanings, concepts, or language as a network would be giving his interlocutor an explanation of the essence of language; such positing would be tantamount, or at least dangerously close, to the view that language has an essence. It would commit him to the very view he wishes to dissolve. As such, Wittgenstein does not conceptualize networks in the same sense in which Google Translate is a network, as Ms. Goldhill suggests or at least seems to suggest. The view of language imputed to Google Translate by Ms. Goldhill is exactly the sort of limited view of language with which Wittgenstein took issue in his later philosophizing.

Be that as it may, the later Wittgenstein would readily admit that such a limited view of language is both useful and a powerful problem-solving tool whose utility can, but need not, be made dependent on statistical determinations of words’ interrelations within five-word strings along the lines imputed to Google Translate. But the later Wittgenstein took issue with this sort of limited view of language as a view of language as such, because such a view fails to account for the innumerable characteristics of human language; that limited view of language is, to use Wittgenstein’s own words, an accurate description of “a system of communication; only not everything we call language is this system” (§ 3 of the Philosophical Investigations). Put another way: if the later Wittgenstein can at all be said to have a theory of language, then his theory would encompass, but not be limited to, the view of language imputed to Google Translate by Ms. Goldhill and Mr. Hebron. Whatever the case, it is most certainly not Wittgenstein’s primary interest or concern.

Two additional points

First, if Google Translate does in fact believe what Ms. Goldhill and Mr. Hebron impute to it, then we have a clear understanding of why machine translation still fails as often as it does – even when the machine is a neural network. The idea that the meaning of words can be determined by looking at five-word strings – the word itself, the two words preceding it, and the two words subsequent to it – is a version of the logical fallacy of composition, the error of holding what is true of the parts as being true of the whole. The idea that an accurate translation of five-word strings will result in an accurate translation of either sentences or entire texts is as clear a statement of the fallacy of composition as one might wish for. The notion that what is true of the parts is true of the whole has been exposed to be fallacious at least since Aristotle. It is laughable to expect fallacious underpinnings to produce accurate translations.

Second, here is a case in point. We are asked to believe that Google Translate has had its neural network read “millions of texts,” establishing how words relate to each other. Go to Google Translate and enter the sentence: “The pig is in the pen and plays in the mud.” If one has it translated into German, then the result is: “Das Schwein ist in der Feder und spielt im Schlamm” (the pig is in the quill [sic] and plays in the mud). Correct would be something along the lines of “Das Schwein ist im Schweinestall …” Are we really supposed to believe that the “millions of texts” read by the machine lead to some preponderance of probability that, in this instance, the word Feder (quill) is correct or that the two words preceding it (in der) and the two words subsequent to it (und spielt) are determinative for meaning? And, for that matter, why should Feder (quill) be right? Why doesn’t the machine use Kugelschreiber or Kuli? After reading millions of texts, the machine somehow establishes a connection between medieval writing utensils and pigs. … This involuntary humor shows that neither use nor meaning is determined by the relations contained in strings of words – i.e., neither is determined by “how these words relate to each other.”

Whatever else language use and meaning are, they require human beings; human language without humans is not a tenable view. … And this – I think – is precisely the later Wittgenstein’s point in discussing the ideas of language, language games, and forms of life. He saw language as inextricably connected to human activities, not solely to syntactical constructs. Machine translation will continue to be inadequate until machine translation providers take seriously this small, but powerful idea of Wittgenstein’s.

Endnote

* Quoted from p 5 of: Davidson, D. “On the very idea of a conceptual scheme.” Proceedings and Addresses of the American Philosophical Association, Vol. 47. 1973–74. pp 5–20. Print.