A colleague recently pointed out a preprint on arXiv, Mitchener et al. 2025, “Kosmos: An AI Scientist for Autonomous Discovery”. Part of its abstract, emphasis mine:
Here we present Kosmos, an AI scientist that automates data-driven discovery. Given an open-ended objective and a dataset, Kosmos runs for up to 12 hours performing cycles of parallel data analysis, literature search, and hypothesis generation before synthesizing discoveries into scientific reports. Unlike prior systems, Kosmos uses a structured world model to share information between a data analysis agent and a literature search agent. The world model enables Kosmos to coherently pursue the specified objective over 200 agent rollouts, collectively executing an average of 42,000 lines of code and reading 1,500 papers per run. Kosmos cites all statements in its reports with code or primary literature, ensuring its reasoning is traceable. Independent scientists found 79.4% of statements in Kosmos reports to be accurate, and collaborators reported that a single 20-cycle Kosmos run performed the equivalent of 6 months of their own research time on average.
I’m not planning to use this (my accuracy is better than a C+, I’d hope), but I imagine colleagues will, and this is just its first generation. And researchers and others are using ChatGPT and related approaches to help with papers already, despite the known issues with made up citations. Popular app Grammarly now offers a service that lets users write scientific statements (their example is from geology) and then it automatically finds and cites relevant papers – no need to actually read the literature, just decorate your text with the funny names and numbers in parentheses that make it look credible. Grammarly claims it is both a “partner for better grades” and will help you “turn claims into supported arguments.”
There’s a whole discussion to have about the ethics of using such services and the wisdom of the many colleges paying for their communities to have access to them. But I want to talk about something different: which papers these services will choose.
Search Engine Optimization (SEO) has long been a concern for website owners: how do we make it so that when people search “vacation in Aruba” my hotel site is the top hit? That matters for people in science, too (people on the job market, make sure you have a good website!), but it’s less of an emphasis. For writing papers, there’s some optimization (like the endless debate over the wisdom of including colons in titles), but it’s a messy, human process. And journals have various tactics for bumping up their impact factor (“Hey, want to publish version 1.2.1 of your popular software package in our journal?”; “Has there been a really good review of this popular subject published this week? Let’s commission one and get those citations!”). But increasingly, the entities “reading” our papers and deciding what to cite won’t be humans: they’ll be AI recommendation engines telling people what is most relevant. Even once the rumored AI economic bubble bursts, I suspect it’s unlikely tools like this will go away, unless there becomes enough of a pushback on their use that it becomes unprofitable to provide them.
Consequences of AIO (artificial intelligence optimization) for scientific publishing
Pressure to get in the corpus: Initially, AI was trained on huge chunks of the internet, books, and other sources, under a legal theory that it’s transformative fair use (various lawsuits by authors and publishers beg to differ). There are now some licensing deals arising for AI training, like Reddit’s deal with Google. In the search engine space, such deals are rare: for most websites (especially under the standard ad-supported model), views are what matter, so unless one has a truly unique resource (which arguably is Reddit) one doesn’t block any search engine, and in fact web publishers take steps to make searching easier (like creating site maps). The same pressures will come for publishing. A publisher might cut a deal to allow only AI company 1 to learn from their articles, and the publisher will get some money from that. If AI company 2 is what Grammarly uses to generate citations, then none of the publisher’s articles get cited by people using Grammarly to do their work, dropping impact factors across the board for that publisher’s journals. This also increases the pressure for work to be open access so there’s no paywall blocking indexing (not that paywalls have stopped scraping in the past).
Formatting for bots: If most “readers” aren’t humans, one wants articles structured in such a way that the bots can parse them easily, to make them more likely to be recommended. This could lead to nice things like accessible metadata, but it could also lead to things like very inflexible article structure. My guess is that publishers will try to optimize for this as it’s pretty easy for them to test and control.
Writing for bots: Every recipe online starts with a multi-paragraph essay before getting into the actual recipe (with a few exceptions – huzzah for King Arthur!). This supposedly helps with ranking in search engines; it also helps with copyright. It annoys actual humans who get to the recipe (thus the addition of “Jump to recipe” buttons on pages) but this wall of text helps get views in the first place. I don’t think that’s happening yet in science, but my guess is that making writing work best for AIO is coming. For example, Nature Publishing Group (NPG) now has a manuscript advisor that will give advice on style, alternative titles, abstracts, etc. (as well as suggest references). It would be easy to tune this so that the advice reflects what works best for getting a paper recommended later. I don’t know what specific advice that would be yet: maybe shorter sentences, maybe bulleted texts (like buzzfeed listicles), maybe long introductions so the paper is enriched for the words that signal its relevance to a field. Many journals are now including abstracts in two languages: the standard one for that journal and the language for the region where the work was done. If AIs deprioritize work with two languages, will there be pressure to discontinue two language abstracts?
Figures for bots: Right now we create figures to appeal to people (“I’m limited to two figures – fine, see panel K of figure 1”). Recommendation engines focus on text at the moment, but there are enough video and image generation AI efforts that we can expect generators to start suggesting existing figures (with citations) as well, especially for student work. This will prioritize figures that each make a single point rather than multipanel figures, figures with insets, etc. This honestly is probably a good thing (if the number of figures is allowed to go up), especially for accessibility, but I expect this to start happening.
Reviewing by bots: “Is this article a good fit for this journal?” is a question human associate editors are often faced with, and part of “fit” is often “important enough.” It’s a very fuzzy concept and subject to all sorts of issues. I doubt we’re far off from being able to predict number of citations for a manuscript using a machine learning tool (based on factors like the subject of the research, nature of the discovery, connectedness of the authors, and more). For example, in my field, a paper describing new species is likely to have far fewer citations over the next few years than a paper describing a new software package, but their importance doesn’t necessarily match this. What will happen when editors, in addition to seeing “major revision, minor revision, minor revision” from human reviewers sees “this manuscript is expected to have only 48-57% of the citations of the median paper in your journal”? With pressures to improve impact factors (which are flawed but still used a lot) this may start having an effect. Some solutions for improving a potential low impact paper might be beneficial for science (“write this in a more accessible manner, making sure to define your terms”) some might not be (“have Bob write a few sentences and become a coauthor – everyone always cites Bob’s papers a lot, including him when he writes invited reviews”).
Biases: We know that AIs and other algorithms trained on data can reflect human biases (see Cathy O’Neil’s Weapons of Math Destruction which came out nearly a decade ago, as well as the work of Timnit Gebru and many others). It’s not like humans picking papers to read and cite are bias free, but the AIs will have some of the same plus biases unique to them. For example, perhaps a recommendation engine prefers using fewer words (reduced use of tokens, maybe), so articles from Evolution might be recommended more than articles from Proceedings of the Royal Society B: Biological Sciences which requires more words to cite (maybe until it launches its rebranded title, SirBio).
Spiraling: Mad cow and other prion diseases are what you get when you feed an animal on tissue from a closely related animal: a malformed protein is similar enough to its own proteins to lead to more malformed proteins, etc. and this gets worse the more cycles of cannibalism occur. There is a lot of work on the issues that can come from AI being trained on AI-generated material, and this is definitely going to happen, especially as there will be incentives to hide whether material in a paper is AI-created. To take a simple made-up example: if 10% of papers cited are from the Nature Publishing Group (a guess), and their Manuscript Adviser tends to recommend papers from their journals for legit reasons (“avoid super spammy journals from that bad publisher over there”), then perhaps the next year of papers has 11% NPG citations, and that then becomes the corpus for future years, and so forth.
Stasis: “What software should I use to make a phylogenetic tree?” The best software twenty years ago is not the best software today, and there could be some amazing new thing released tomorrow. Trained on a corpus of fifty years of papers on making trees, what will a recommendation engine suggest? More papers will use the older software, but perhaps it will have a bias for what the most recent papers use (if programmed in). But nothing in the training corpus will have used the new great tool. How long will it take to start recommending the newest, best thing? Probably longer than it would take a trusted colleague who keeps up with the literature to recommend the new software. This is especially true if one wants to prioritize work done pre-AI to prevent the spiraling issue.
I’m not sure how to make this actionable for people at the moment – we don’t know enough about how AIO will work (maybe someone will launch a consulting company to start doing this). My guess is that it will lead to more harms than benefits. I do think it’s going to be the case that whether or not we individually want to use AI to help write papers, our professional incentives will lead us to make sure our work is present in the training corpus used by such tools and is optimized for its use. I also believe publishers are going to start acting to make the AIs happy. We can steer what happens to try to avoid the pitfalls, but it’s worth considering the structural incentives that will encourage certain changes.
Note after writing this, I noticed that the R package pkgdown, used to make websites, now has a new function that “that automatically creates files that make it easier for LLMs to read your documentation.” So optimization for output (in this case, help files for R packages one writes) to be used in a training corpus has already come for software.
To subscribe, go to https://brianomeara.info/blog.xml in an RSS reader.
Citation
@online{o'meara2025,
author = {O’Meara, Brian},
title = {AI Optimization},
date = {2025-11-07},
url = {https://brianomeara.info/posts/ai_optimization/},
langid = {en}
}