Friday, November 24, 2023

Data and the Genome

The word data comes from the Latin meaning that which is given. So one might think it is entirely appropriate to use the word for our DNA, given to us by our parents, thanks to millions of years of evolution. DNA is often described as a genetic code; the word code either refers to the way biological information is represented in the molecular structure of chromosomes, or to the way these chromosomes can be understood as a set of instructions for building a biological entity. Watson and Crick used the word code in their 1953 Nature article.

However, when people talk about the human genome, they are often referring to a non-biological representation in some artificial datastore. In other words, given by biology to data science.

Shannon E French objects to talking about data stored on DNA like it’s some kind of memory stick, and Abeba Birhane sees this as part of the current trend that is so determined to present AI as human-like at all costs, describing humans in machinic terms has become normalised.

Elsewhere, Abeba Birhane is known for her strong critique of AI. As well as important ethical issues (algorithmic bias, digital colonialism, accountability, exploitation/expropriation), she has also raised concerns about the false promise of AI hype.

But describing humans (or other biological entities) in machinic terms, or treating them as instruments. is far older than AI. When we replace animals with technical devices (canaries. carrier pigeons, horses), the substitution implies that the animals had been treated as devices, the replacement often justified by the argument that technical devices are cheaper, more efficient, or more reliable, or don't require regular breaks - or are simply more modern. Conversely, when scientists try to repurpose DNA as a data storage mechanism, this also seems to mean treating biology in instrumental terms.

But arguably what is stored or encoded in the DNA - whether in its original biological manifestation or more recent exercises in bioengineering - is still data, regardless of how or for whom it is used.



Abeba Birhane, Atoosa Kasirzadeh, David Leslie and Sandra Wachter, Science in the age of large language models (Nature Reviews Physics, Volume 5, May 2023, 277–280)

Abeba Birhane and Deborah Raji, ChatGPT, Galactica and the Progress Trap (Wired, 9 December 2022)

Grace Browne, AI is steeped in Big Tech's 'Digital Colonialism' (Wired, 25 May 2023)

J.D. Watson and F.H.C. Crick, Genetical Implications of the Structure of Deoxyribonucleic Acid (Nature, 30 May 1953)

Related posts: Naive Epistemology (July 2020), Limitations of Machine Learning (July 2020), Mapping out the entire world of objects (July 2020), Lie Detectors at Airports (April 2022), Algorithmic Intuition (November 2023)

No comments:

Post a Comment