Let's let the AIs summarize our research. What could possibly go wrong?
A Nature News piece came out this week on an AI project to write short summaries of scientific papers:
The creators of a scientific search engine have unveiled software that automatically generates one-sentence summaries of research papers, which they say could help scientists to skim-read papers faster.
“I am amazed it has taken this long to see it in practice,” says Jevin West, an information scientist at the University of Washington in Seattle who tested the tool at Nature’s request.
I am also amazed that it appears no thought has gone into how such a tool will be used, and to what ends?
I informally tested the tool using abstracts from my papers (a very convenient sample, let it be known) and some others. For papers intended for computer science conferences, it does a pretty good job, although I would argue the summaries are rephrasings of the title (which it was not shown). But for other papers, such as those intended for applied math or physics venues, it generally just duplicates the “Here we show […]” sentence, sometimes with one or two synonyms swapped in. I think most trained readers are already well practiced at jumping right to that sentence.
I worry a lot about information overload, misinformation, and the tsunami of scientific papers. Could AI help us? Yes. But can it harm us? Absolutely.
TLDR: do vaccines cause autism?
As another test, I ran the summary of the retracted Andrew Wakefield paper through their tool. Here is its TLDR:
We identified a consecutive series of children with chronic enterocolitis and regressive developmental disorder, associated with measles, mumps, and rubella vaccination.”
Is that a successful summary? It’s well written and clear, and seems to capture, rather forcefully, the intent of the paper.
Let’s look at the original text, so we can also see what their method did. I’ve boldfaced two sentences I wish to discuss:
We investigated a consecutive series of children with chronic enterocolitis and regressive developmental disorder.
12 children (mean age 6 years [range 3–10], 11 boys) were referred to a paediatric gastroenterology unit with a history of normal development followed by loss of acquired skills, including language, together with diarrhoea and abdominal pain. Children underwent gastroenterological, neurological, and developmental assessment and review of developmental records. Ileocolonoscopy and biopsy sampling, magnetic-resonance imaging (MRI), electroencephalography (EEG), and lumbar puncture were done under sedation. Barium follow-through radiography was done where possible. Biochemical, haematological, and immunological profiles were examined.
Onset of behavioural symptoms was associated, by the parents, with measles, mumps, and rubella vaccination in eight of the 12 children, with measles infection in one child, and otitis media in another. All 12 children had intestinal abnormalities, ranging from lymphoid nodular hyperplasia to aphthoid ulceration. Histology showed patchy chronic inflammation in the colon in 11 children and reactive ileal lymphoid hyperplasia in seven, but no granulomas. Behavioural disorders included autism (nine), disintegrative psychosis (one), and possible postviral or vaccinal encephalitis (two). There were no focal neurological abnormalities and MRI and EEG tests were normal. Abnormal laboratory results were significantly raised urinary methylmalonic acid compared with agematched controls (p=0·003), low haemoglobin in four children, and a low serum IgA in four children.
We identified associated gastrointestinal disease and developmental regression in a group of previously normal children, which was generally associated in time with possible environmental triggers.
We see that the method has taken an early sentence (boldfaced sentence 1), and glued it onto a later sentence (boldfaced sentence 2). It replaced a word with a synonym (‘investigated’ becomes ‘identified’, which is stronger) and it removed extenuating clauses (“associated by the parents").
In other words, the summarization actively strips out the little bit of nuance from this—famously retracted—paper. Overcompression is dangerous.
Imagine what happens when someone starts tweeting that out, then it jumps over to Facebook? Or worse, not a person but a bot!
I should also say, I am unpleasantly surprised at the upbeat coverage from Nature News. Normally, I expect an expert “who was not involved in the study” to provide at least some context or nuance about the work, whether it be remaining obstacles or unanswered questions. Why didn’t someone mention how dangerous a tool like this could be?
From Nature News again:
But Weld says the team is working on generating summaries for non-expert audiences.
Many people are concerned about AI and misinformation, just look at the conversation around tools like GPT-2 and GPT-3. It does not appear, at least from their papers, that Weld’s team put much thought into these concerns. What does the Allen Institute for AI (whose tagline is “AI for the Common Good”) think of their project?