Wednesday 1 September 2021

The urge to correct

Scientists who analyse large amounts of data in their research often store it in Excel spreadsheets. Excel has for decades been a standard component of the suite of applications in Microsoft Office, but, despite regular upgrades, it has not lost the irritating urge to correct what it often dumbly supposes is one of your errors. Although it's not too difficult to turn off autocorrect, not everyone knows how to or can be bothered to find out. Also, in normal circumstances it can be a helpful tool, provided that you carefully read through what you thought you had written before you launch it into the public domain. (And let's be grateful that Clippy was put back in the box.)

It seems that autocorrect is a particular problem for geneticists. A gene called Membrane Associated Ring-CH-type finger 1, commonly known as MARCH1 for short, is, for instance, frequently corrected to the date March 1st. Something similar happens to genes known as SEPT1 and  DEC2 and there are many other examples both in English and other languages.

Although the problem was first noticed in 2004, it wasn't until 2016 that Mark Ziemann drew wider attention to this hazard. Then, last July he and some colleagues published a paper in the open source journal Public Library of Science (PLOS) Computational Biology entitled “Gene name errors: Lessons not learned". He and his co-authors surveyed 166,000 genomics-related papers published between 2014 and 2020, and they found that the number of papers using Excel had steadily increased and the proportion plagued with autocorrect errors still hovered around 30%. Various remedies have been proposed: a change in the official names of genes to make them less tempting targets for correction, the use of bespoke scientific software for data processing, ...  and (my naive idea) to make the Excel default "autocorrect OFF". (My thanks to The Economist of 1st September 2021 for telling me about this.)

No comments:

Post a Comment