Symbolic and sub-symbolic NLP are founded on western epistemologies of language, in particular, on language-as-lexicogrammatical-code and on language-as-data. However, outside the world's ~500 institutional languages, there are a further 6,500 languages with primary orality. Here, many NLP/AI researchers and large technology companies seek to support the 'next thousand languages' and deliver the standard suite of technologies centred on written language, such as speech-to-text and machine translation, and to extract substantial quantities of primary linguistic data in the process. I observe that such practices typically do not meet the requirements for prior informed consent or for the self-determination of Indigenous peoples, and that an ethical approach needs to take seriously local purposes linked to cultural survival, and local epistemologies including language-as-situated-and-embodied-communication. In this talk, I report on a five year study conducted in an Australian Aboriginal community, and how this gave rise to non-extractive designs for language technologies and an agency-enhancing language technology design pattern. This proposal represents a return to an older understanding of AI, not replicating but augmenting human information processing capabilities.
Over the past 3+ decades, Steven Bird has been working with minoritised people groups, and developing ways to keep oral languages and cultures strong, including fieldwork in Africa, Melanesia, Amazonia, and Australia. He has held academic appointments at Edinburgh, UPenn, Berkeley, and Melbourne. Since 2017 Steven has been research professor at Charles Darwin University, where he directs the Top End Language Lab, http://language-lab.cdu.edu.au. He pursues other language-related projects at http://aikuma.org.